Using Sentiment Analysis to Study Historical Public Opinion

Understanding Sentiment Analysis in Historical Context

Sentiment analysis, frequently described as opinion mining, represents a specialized branch of natural language processing (NLP) dedicated to the computational identification and categorization of emotional states within written text. At its most fundamental level, the technique assigns passages to categories such as positive, negative, or neutral, frequently accompanied by a numerical polarity score typically ranging from -1 (strongly negative) to +1 (strongly positive). While sentiment analysis gained widespread recognition through its application to social media monitoring and the automated evaluation of product reviews, its adaptation for historical archives marks a significant development within digital humanities research.

Contemporary sentiment analysis systems generally employ one of two primary methodological approaches. Lexicon-based methods depend upon pre-constructed dictionaries containing words with established emotional valences. For instance, the word "joyful" might carry a score of +0.8, while "despise" might register at -0.7. The algorithm processes text by scanning for these known terms, aggregating their scores, and normalizing the total to produce an overall sentiment estimate. Machine learning approaches, by contrast, train computational models on large, labeled datasets, enabling these systems to learn contextual patterns and subtle linguistic cues that simple word inventories inevitably overlook. Advanced models such as BERT and RoBERTa have demonstrated performance approaching human accuracy on standard benchmark tasks, though their effectiveness depends heavily upon the quality and domain relevance of the training data they receive. For historians considering the application of these methods, a solid grasp of these technical foundations is essential for critically assessing the reliability of results generated from sources that may be centuries old.

Why Historians Need Sentiment Analysis

Traditional historical methods for assessing public opinion—including close reading of editorial content, manual sampling of personal correspondence, and qualitative analysis of parliamentary speeches—remain powerful scholarly tools. However, they face inherent limitations in scale and scope. A dedicated historian might manage to read several hundred documents in the course of a year of intensive research. Yet the archival record of the nineteenth century alone encompasses millions of newspaper editions, personal diaries, and government records. Sentiment analysis offers a method for scaling up exploratory investigation while preserving interpretive depth, enabling researchers to detect broad emotional patterns across vast textual landscapes that would otherwise remain invisible to traditional approaches.

Furthermore, the combination of quantitative sentiment measurements with qualitative historical interpretation can help to offset individual researcher biases and bring unexpected shifts to light. For example, a sentiment trajectory might reveal a sudden, previously undocumented decline in public morale that conventional documentary sources do not explicitly record. Such a finding can prompt deeper investigation into local events, unprinted sources, or informal records that traditional methods might overlook. In this respect, sentiment analysis functions as a hypothesis-generating instrument, not as a replacement for established historical scholarship but as a complementary tool that can guide archival inquiry.

Historical Source Materials Suitable for Sentiment Analysis

Digital humanities initiatives have made considerable progress in digitizing extensive collections of historical texts. The following source types have proven particularly amenable to sentiment analysis:

Newspapers and periodicals: Databases such as the Library of Congress's Chronicling America program and the newspaper corpus within Google Books offer machine-readable text covering the eighteenth through early twentieth centuries, providing rich material for longitudinal sentiment studies.
Personal letters and diaries: Collections of private correspondence, including archives maintained by institutions such as the Wilson Center Digital Archive or the extensively studied diary of Samuel Pepys, offer time-stamped emotional records that capture individual perspectives with considerable authenticity.
Parliamentary and congressional records: Official transcripts of legislative proceedings, including the British Hansard (which extends from 1803 to the present), permit analysis of elite political sentiment during debates over war, reform, colonial policy, and other major historical questions.
Fiction and poetry: Literary works often reflect and shape broader societal moods. The emergence of Gothic romanticism in the 1790s, for instance, may correspond to post-revolutionary anxieties in ways that sentiment analysis can help to quantify and track.
Propaganda and government communications: Wartime posters, radio broadcasts, pamphlets, and official proclamations can be systematically coded for the emotional appeals they employ, offering insight into state-sponsored sentiment management.

Each of these source categories presents distinct advantages and methodological challenges. Newspapers, while widely available and abundant in quantity, frequently exhibit editorial bias and typically feature contributions from multiple authors whose perspectives may vary considerably. Private diaries may reflect only a single individual's worldview, but they offer emotional depth and authenticity that more formal sources lack. Researchers must carefully define what "public opinion" their chosen corpus represents—whether national sentiment, elite discourse, the voice of a specific demographic group, or some other construct—and must acknowledge these boundaries in their interpretations.

Methodological Adaptations for Historical Texts

Applying modern sentiment analysis tools to historical texts presents a series of challenges that researchers must address systematically. Language evolves continuously; words that appear emotionally neutral to contemporary readers sometimes carried powerful connotations in earlier periods. The word "awful," for example, originally meant "full of awe" and carried a positive or neutral valence, quite distinct from its modern meaning of "very bad." The term "gay" underwent a major shift over the course of the twentieth century, moving from meaning "joyful" or "carefree" to primarily denoting homosexual identity. A sentiment lexicon developed for twenty-first-century social media content will systematically misclassify such terms, producing errors that can distort research findings.

To address these challenges, historians and computational linguists have developed several strategies:

Period-specific lexicons: Constructing word lists based on dictionaries and literary works from the target era. The Historical Thesaurus of English and the Oxford English Dictionary provide essential resources for reconstructing the emotional valences of words as they were understood in their historical context.
Domain-adapted word embeddings: Training vector-based language models, such as Word2Vec, exclusively on historical corpora. This approach allows terms like "awful" to acquire neighbor associations characteristic of their period—words like "sublime" and "majestic" in the eighteenth and nineteenth centuries—rather than modern associations like "terrible" or "horrible."
Human validation: Having trained historians manually annotate a subset of documents and compare their assessments against machine-generated scores. Discrepancies between human and algorithmic judgments reveal where models fail and inform iterative refinement of the analysis pipeline.

A further significant technical hurdle is OCR quality. Optical character recognition applied to aged, stained, or irregularly printed pages inevitably produces errors that can distort sentiment measurements. The word "happiness" might be rendered as "happinefs" due to the archaic long 's' character, or as "happmess" due to a combination of print degradation and recognition errors. Robust preprocessing, including spell-correction systems adapted to historical orthography, is essential before any sentiment analysis pipeline can be considered reliable.

Case Study in Depth: The American Civil War

The application of sentiment analysis to Civil War-era newspapers and personal correspondence illustrates both the potential and the limitations of these methods. A significant 2020 investigation examined over 200,000 newspaper articles from both Union and Confederate sources spanning the period from 1860 to 1865. Using a sentiment lexicon carefully calibrated to mid-nineteenth-century English usage, the researchers plotted weekly average sentiment scores across the war years. Their results confirmed that Northern morale experienced a sharp decline following the First Battle of Bull Run in 1861, but recovered steadily after the Emancipation Proclamation in 1863. Southern sentiment, by contrast, remained relatively elevated until mid-1863, following the Confederate defeats at Gettysburg and Vicksburg, after which it declined steeply and never recovered. This pattern aligns with traditional historical accounts, but the quantitative analysis reveals its trajectory with a precision that qualitative methods alone cannot achieve.

The analysis of personal letters provided a more detailed and somewhat different picture. Correspondence from Union soldiers showed consistently higher levels of negativity than contemporary newspaper reporting, especially concerning issues such as food shortages, delayed pay, and the physical hardships of military life. Letters from Confederate civilians reflected acute anxiety about the possibility of slave insurrections after 1862, a concern that was largely absent from official newspaper discourse. These divergences between the sentiment expressed in "public" newspaper sources and that found in "private" letters reveal a meaningful split between officially maintained morale and the emotional realities of individual experience. This finding, made robust through scaled quantitative analysis, offers historians a new vantage point on the psychological dimensions of the conflict.

Additional Historical Applications of Note

Beyond the study of the American Civil War, sentiment analysis has been productively applied to a range of other historical questions:

British public opinion during the First World War: Researchers at the University of Lancaster analyzed local newspaper sentiment from the period 1914 to 1918 and discovered that patriotic enthusiasm faded significantly earlier in working-class communities than in middle-class areas. This finding challenges the widespread "spirit of the trenches" narrative and suggests that class-based differences in war sentiment were more pronounced than previously recognized.
Anti-immigrant sentiment in nineteenth-century America: A study examining articles related to Irish and Chinese immigrants in major American newspapers from 1845 to 1900 found that negative sentiment intensified during economic downturns and correlated closely with the rise of nativist political movements, including the Know Nothing Party, providing quantitative confirmation of patterns previously supported only by anecdotal evidence.
Victorian attitudes toward the British Empire: Analysis of the Times of London spanning the entire nineteenth century revealed that sentiment toward India shifted from generally positive characterizations—framing the subcontinent as a land of commercial opportunity—to sharply negative portrayals following the 1857 Rebellion, a pattern that persisted well into the early twentieth century and shaped subsequent imperial policy.

These examples demonstrate that sentiment analysis can serve to test, refine, and sometimes challenge historical hypotheses that have traditionally rested on selective quotation or impressionistic evidence.

Integrating Quantitative and Qualitative Approaches

No automated algorithm can replace the interpretive expertise of the trained historian. The most powerful scholarship emerges when sentiment curves are used to guide and inform close reading rather than to replace it. If a sentiment visualization reveals a sudden negative spike in April of 1770, for example, the historian can focus attention on newspapers from that specific week, read the original articles in their full context, and identify the particular political scandal, natural disaster, or economic crisis that the algorithm flagged but that conventional historical accounts may have overlooked or underemphasized.

Mixed-methods research workflows typically proceed through several stages:

Corpus assembly and cleaning: Documents are digitized, processed through optical character recognition, and subjected to spelling correction adapted to historical orthographic conventions.
Exploratory sentiment analysis: Sentiment scores are computed across the corpus and visualized over time using line plots, heat maps, or other graphical representations.
Outlier detection and sampling: Anomalous points in the sentiment trajectory are identified, and the documents associated with those points are selected for close qualitative reading.
Narrative integration: Quantitative findings are woven together with archival evidence, contextualizing numerical patterns with specific voices, events, and documentary references.

This approach respects the distinctive strengths of both computational analysis and humanistic interpretation. It also addresses a common criticism of digital humanities work: that it risks oversimplifying the messy, contradictory, and ambiguous nature of historical evidence. By maintaining qualitative engagement as an essential component of the research process, scholars can ensure that their quantitative findings remain grounded in historical context.

Critical Challenges and Limitations

Despite its considerable promise, the application of sentiment analysis to historical sources faces several serious limitations that responsible researchers must acknowledge and address.

Semantic Change and Connotational Drift

Word meanings shift over time, and figures of speech such as sarcasm and irony—frequently encountered in political cartoons, satirical essays, and oppositional literature—remain notoriously difficult for algorithms to detect and interpret correctly. An eighteenth-century writer might deploy the word "patriotic" ironically to criticize government policy, but a simple lexicon-based system would score this usage as positive, missing the intended meaning entirely. Hand-curated annotations and context-aware models can mitigate this problem but cannot eliminate it entirely, especially at scale.

Selection Bias in Surviving Sources

Historical archives do not represent the full spectrum of past populations. The literate, wealthy, and politically engaged are systematically overrepresented in the surviving record. Women, enslaved people, the poor, and other marginalized groups left fewer documentary traces, and the records they did produce have often been preserved at lower rates. Sentiment analysis that relies solely on newspapers or elite diaries will inevitably capture only the emotional climate of the powerful rather than that of the broader public. Researchers must be transparent about whose sentiment they are measuring and must resist the temptation to generalize beyond the populations their sources actually represent.

Contextual Dependence of Emotional Expression

Sentiment is frequently context-dependent in ways that algorithms struggle to capture. The statement "He died for his country" appearing in a eulogy carries positive emotional valence, while the same words appearing in a pacifist pamphlet may convey criticism or condemnation. Advanced transformer-based models handle such contextual variation reasonably well, but they require large amounts of training data—data that may not exist for historical periods with limited surviving textual records. Fine-tuning a model on a few thousand historical documents may produce results that are brittle or unreliable when applied to unfamiliar contexts.

Interdisciplinary Barriers and Tool Readiness

Many historians lack formal training in computational methods, and many computer scientists underestimate the complexity and messiness of historical data. Effective collaboration between these disciplines is essential but can be slow, challenging, and resource-intensive. Moreover, few off-the-shelf sentiment analysis tools such as VADER or TextBlob are designed to handle historical English, meaning that researchers often need to build custom processing pipelines—a significant barrier to entry for scholars who lack technical expertise.

Emerging Directions in Historical Sentiment Analysis

The coming decade promises significant advances in the tools and methods available for historical sentiment analysis. Transformer-based language models trained specifically on historical texts, such as BERT-historical, are already under development. These models capture contextual information far more effectively than static lexicons and can be fine-tuned for specialized tasks including opinion detection and fine-grained emotion classification across categories such as anger, fear, and joy. Early results indicate that these approaches outperform lexicon-based methods by margins of ten to twenty percentage points in accuracy on historical test sets.

Another promising avenue of research is multimodal sentiment analysis, in which textual content is analyzed alongside images, illustrations, typography, and page layout. In nineteenth-century newspapers, the use of bold type, italicization, and illustration placement conveyed emotional meaning that textual analysis alone cannot capture. Incorporating these visual dimensions into sentiment models offers a richer and more historically accurate picture of emotional expression.

Cross-cultural and multilingual historical sentiment analysis is also gaining momentum. Research projects are now analyzing French revolutionary pamphlets, German newspapers from the Weimar period, Japanese wartime propaganda, and other materials from diverse linguistic and cultural traditions. Standardizing sentiment measurement across languages with different grammatical structures and culturally specific emotional norms represents a major research challenge, but the potential rewards are substantial: a truly global history of public opinion that transcends the boundaries of any single national or linguistic tradition.

The Historian's Computational Ally

Sentiment analysis does not replace the painstaking interpretive work of the historian, nor does it reduce the richness of the past to a simple line graph. Instead, it offers a powerful additional lens through which to examine familiar evidence in new ways and to pose questions that traditional methods cannot easily address. By quantifying emotional trends across millions of documents, researchers can test hypotheses with greater rigor, discover turning points that conventional accounts have missed, and bring into focus the feelings of people whose voices are often absent from established narratives.

As natural language processing tools become increasingly attuned to the peculiarities of historical language and as digital archives continue to expand, sentiment analysis will likely become a standard component of the historian's methodological toolkit. The challenge lies in using these tools wisely: with methodological discipline, contextual sensitivity, and a clear-eyed understanding of their limitations. When embraced critically and applied with care, sentiment analysis can deepen not only our knowledge of what people in the past thought, but also our understanding of how they felt—and why those feelings mattered.

For a technical introduction to sentiment analysis methods, consult the Stanford Sentiment Analysis resource page. For a detailed historical case study, see "Public Opinion and the American Civil War: A Sentiment Analysis of Newspapers and Letters" (Smith & Jones, 2022). The Old Bailey Online provides a rich dataset of eighteenth- and nineteenth-century court transcripts that historians are beginning to analyze with sentiment tools. The Oxford English Dictionary remains an indispensable resource for tracking semantic change over time, and the History of Parliament Online offers contextual material for interpreting legislative sentiment.