Visualizations and Analysis


It is clear through both keyword searching and the topic models, that the European political documents had little to no impact on the novels in the form of the topic bourgeois and proletariat, as well as their other forms. This was particularly interesting as these words are very prevalent in the works of all the European writers. However, when looking at the documents by Thomas Carlyle, these words are absent as well. One possible explanation is that the words bourgeois and proletariat were still only widely popular among the French as well as Marx and Engels who were scholars of French socialism. Instead, in the novels words like aristocracy, worker, laborer are used, which in some ways are similar to the French counterpart. By looking at the dynamic topic model, topics 19 and 44, with terms relating to factories and labor, follow very similar trends, both peaking around the publication of Engels’ Condition of the Working Class In England, as well as various other articles, and decreasing in popularity dramatically after 1846. However, after 1846 the topic 12 became dominant, with terms like bourgeois. After 1846, topic 19 shifts to focus on terms like spinner, weaver, and later cotton. These terms are most commonly found in the works of Gaskell and Dickens. Similarly, for topic 44, it shifts to agriculture, which is significantly more prevalent in the novels.

Topic 0 and 45 are also very surprising. Topic 0 has a major peak in the mid 1830s with the publication of Thomas Carlyle’s French Revolution. At a later point, this peaks again, which lines up with the publication of Dickens’ A Tale of Two Cities. This signifies the impact Carlyle had on Dickens, which makes sense given he carried Carlyle’s book around while writing. However, what is surprising is that topic 45 drops of completely and is not at all prevalent during the time of Dickens’ publication. 

As for the works of Gaskell, it is more challenging to determine the possible connection. Topic 19 peaks around the publication of Engels’ work, then drops but regains popularity around the time North and South is published, but the terms change to factory, cotton, and employment rather than workers, weaving, and miners. Overall though, this connection is not clear as many of the topics popular at the time of Gaskell’s publications are more clearly related to other such works published at the same time. The most likely connection that could be drawn is between Engels and Gaskell, as the popularity of topic 19 increases with the publication of their works, and similarities through keyword searching, it is clear that both Mary Barton and North and South make use of this vocabulary. 

In general, this research sought to explore the connection between the novels and political publications of the mid 19th century. This analysis could in some ways be improved, however some of the methodology used in this project has proved to be useful and should be looked at in other similar use cases. Specifically, BERTopic has provides an excellent means of analyzing topics and how they change over time. Additionally, the model allows for representative documents to be shown. The downside to this is less intractability with the intertopic displacement map, BERTopic’s visualization does not allow for clicking and viewing the topic like a pyLDA visualization. Additionally, using different tokenizers and preprocessing packages like gensim, stanza, or spaCy is more difficult with BERTopic. Lastly, BERTopic works best with smaller texts, so it requires the documents to be broken down into smaller chunks, which can be difficult if the corpus is very large. If someone was to do a similar study with a larger corpus, this process of breaking down the texts should be done with an external script. 

Along with the some of the downsides of BERTopic, the corpus itself has a large impact on the outcome of this research. The documents chosen were all translated texts, meaning that some of the phrasing or words were adapted to fit the message behind the original text but not a direct translation. This can cause issues when looking at what phrases or words become used in the novels. Additionally, because translated texts were required, some documents that could have had significance were left out. If possible, scanning and translating the documents oneself could be beneficial to have more accurate results. Another issue that arose was digital availability. Many of these documents were found on Marxist.org, the Internet Archive, and Project Gutenberg. However, many articles by Marx and Engles that appeared in various newspapers are a part of a publication of their collected works, which is no longer publicly available for download. One way to still use these would be to purchase the collected work and do OCR scans of the documents, a practice that is legal. This process could also be done with various other newspapers such as the La Refrome and the Northern Star, which would provide a larger corpus of articles. Lastly, the size of this corpus led to issues when analyzing the impact of certain documents. It may be beneficial to focus on fewer documents by one or two authors, for example just Marx and Engels, and only their most popular publications.