Methods


To perform this textual analysis, a variety of translated documents were chosen and run through a topic modeling script made in a Python package called BERTopic. These texts were selected due to their relation with working class, revolutionary, and socialist rhetoric. For the novels, four works by Emily Gaskell and Charles Dickens were chosen. These being North and South, Mary Barton, Barnaby Rudge, and A Tale of Two Cities. As for the other documents, this was made up of works by Karl Marx, Friedrich Engels, Thomas Carlyle, as well as a couple of other German and French writers. These documents were works such as The Communist Manifesto, The Condition of the Working Class in England, The French Revolution: A History, as well as a number of articles and pamphlets published in major newspapers around the 1840s. The documents were broken down by sentence, major stopwords removed, and added to a CSV file with the date they were published. Then using the BERTopic package for Python, both a topic model and dynamic topic model are made and a visualization is made. BERTopic provides multiple built in preprocessing methods to tokenize and reduce the impact of stopwords. The topics were then analyzed and fine tuned using the methods provided by BERTopic to provide more insightful topics, terms, and visualizations.

To perform this exploration, a corpus was made out of various translated documents and run through a topic modeling script made using the Python package BERTopic. The documents chosen were both novels and non-fiction political articles and pamphlets. For the novels, the following were selected; A Tale of Two Cities, Barnaby Rudge, North and South, and Mary Barton. These works are by Charles Dickens and Elizabth Gaskell. These works were chosen due to their connection with the topic of revolution and the working-class. The two works by Dickens, revolution and rebellion are central to the plot of the novels, as well as a distinct class structure. For Gaskell, these works were chosen due to their plots centering around working-class individuals as well as the divide between the upper and lower classes. In an introduction to a modern translation of The Condition of the Working Class in England, by Friedrich Engels, Gaskell’s Mary Barton is said to complement what is discussed in Engels’ piece. Additionally, Gaskell begins this work with a preface, discussing her aim to represent the feelings of the manufacturing class of Manchester in her novel, as well as giving mention to the revolutions spreading across Europe. North and South was chosen as the second novel by her for its descriptions and narrative around the divide between the manufacturers and the upper class in Manchester. The other documents were made up of political publications. The main documents in this part of the corpus are The Communist Manifesto by Karl Marx and Friedrich Engels, The Condition of the Working Class in England by Friedrich Engels, The French Revolution: A History, by Thomas Carlyle. Additionally, the corpus has various pamphlets, articles, or other types of publications found in Newspapers or letters from the period. These documents are from the following authors: Karl Marx, Friedrich Engels, Thomas Carlyle, Auguste Blanqui, Pierre-Joseph Proudhon, and Felix Pyat. These documents and authors were chosen due to the content relating to revolution and labor conditions as well as availability and impact. When it came to availability, it became challenging to find documents, as some fall under copyright and cannot be accessed digitally, or had too many scanning errors to make them viable for the corpus. For example, many of the works of Louis Blanc, a notable socialist in France, were unavailable for download via sources like the Internet Archive, Marxist.org, or other databases. Proudhon, Blanqui, and Pyat had many available publications that related to the topic, as well as have been discussed in works by Marx himself, and historians on the subject. Many of these articles were published in The Northern Star (specifically by Marx and Engels’ publications), a Chartist publication in England, and the Rheinische Zeitung, a popular socialist newspaper in Germany. Thomas Carlyle, is the only English author chosen for this section of the corpus. His work was included in the corpus due to his relation to Charles Dickens, as his The French Revolution: A History, provided much of the information Dickens used for A Tale of Two Cities. Additionally, his publications about the Chartist movement and thoughts about the revolution, and provides a bridge from the German and French authors to the English novelists. For both the articles and the books by Marx and Engels, these were chosen based on their relevance to the topic. The Communist Manifesto was chosen due to how influential it has been and its time of publication, that being 1848. Engels’ The Condition of the Working Class in England was chosen due to its focus on England, where the novels being analyzed for influence were written.

There are a few different libraries for Python that handle topic modeling and model visualization, however BERTopic had features that made it the best fit for this analysis. BERTopic allows for dynamic topic models to be created, essentially using a CSV file with texts and dates associated with them to create a model that tracks how a topic changes over time. These topics stay relatively similar, however the words associated with them change. This was crucial to this research, as it allows for analysis of how different words are used at different times to describe the same topic. 

Some of these documents were plain text files created by performing an optical character recognition (OCR) scan on the pages of the document. These documents were looked over and errors and scanning were removed as much as possible, in order to keep these errors from impacting the topic model. 

Each text in the corpus was broken down into smaller sections, with a section being a sentence. This was done due to the way BERTopic creates topics using sentence transformers to embed documents.  From there, a CSV file was used to mark each sentence/document with the publication date. From there the CSV file is used in a Python script, that uses custom settings and other embedding to fine-tune and create both a static and dynamic topic model. The visualizations are then created using plotly, a visualization package for Python.