Computational History: Modeling the Spread of Historical Epidemics

Computational history sits at the intersection of digital humanities, data science, and historical inquiry. By applying computer models and simulations to chronicles of the past, researchers can test hypotheses that were previously too complex to explore manually. One of the most urgent and illuminating applications of this approach is the modeling of historical epidemics. From the Justinian Plague to the 1918 influenza pandemic, disease outbreaks have shaped societies in profound ways. Computational models allow historians to reconstruct transmission pathways, analyze the effectiveness of past interventions, and draw lessons that remain relevant for contemporary public health.

The Role of Computational History in Understanding Epidemics

Epidemics are not merely biological events; they are deeply embedded in economic, social, and political contexts. Traditional historical analysis often relies on written records, such as mortality counts, travel logs, and personal diaries. However, these sources are fragmented and can be biased. Computational models add a quantitative layer, enabling historians to simulate the spread of disease across time and space. By integrating data on population density, trade networks, climate patterns, and human behavior, these models can reveal patterns that are invisible in qualitative sources alone.

For example, modelers can test whether the rapid spread of the Black Death across Europe in the 14th century was primarily driven by rat fleas on merchant ships or by human-to-human transmission. Agent-based models can replicate the movements of individuals along specific trade routes, while compartmental models can estimate the proportion of the population that must have been exposed to sustain the observed mortality. Such simulations provide a formal framework for weighing competing historical narratives. They also help historians identify gaps in the archival record—places where more data would most reduce uncertainty about key parameters.

Beyond testing specific hypotheses, computational history enables counterfactual reasoning. What if the Roman Empire had invested more in quarantine infrastructure? What if the Spanish flu had emerged a decade earlier, without the dislocations of World War I? These thought experiments, grounded in mathematical rigor, transform history from a purely descriptive discipline into a predictive science of the past. They also highlight the contingent nature of epidemic outcomes, where small differences in timing, policy, or behavior can lead to vastly different death tolls.

Key Types of Computational Models

Historians and epidemiologists have adapted several classes of models originally developed for modern disease surveillance. Each type has distinct strengths and is suited to different historical questions. Understanding these tools is essential for evaluating the claims that emerge from computational studies.

Agent-based models

Agent-based models (ABMs) simulate the actions and interactions of autonomous agents—such as people, animals, or even pathogens—within a defined environment. Each agent follows a set of rules that govern behavior: whether they travel, how often they contact others, whether they recover or die. By running the model many times with slightly varied parameters, researchers can observe emergent patterns. ABMs are especially valuable for studying epidemics in smaller communities, such as medieval towns or colonial settlements, where individual decisions can have outsized effects. A classic example is the modeling of the 1918 flu pandemic in the city of Philadelphia, where the timing of a public parade was shown to accelerate transmission. ABMs also allow researchers to incorporate heterogeneous behaviors, such as the tendency of wealthier households to flee cities during plague outbreaks.

Compartmental models

The most famous compartmental model is the SIR model, which divides the population into Susceptible, Infected, and Recovered (or Removed) compartments. The model uses differential equations to describe the flow of individuals between these groups based on transmission and recovery rates. Historians can calibrate SIR models to match historical mortality curves, then adjust parameters to explore counterfactuals—what if a quarantine had been imposed earlier? What if the pathogen was less virulent? Because they are computationally efficient, compartmental models are often used for large-scale outbreaks where homogeneous mixing is a reasonable approximation, such as the spread of cholera along trade routes in 19th-century Asia. More advanced versions, like SEIR (adding an Exposed compartment) or SIRS (allowing waning immunity), can capture additional biological realism.

Network models

Network models represent individuals or locations as nodes and their interactions as edges. This approach is particularly effective for studying diseases that spread through specific social or transport networks. For example, a network model of the Roman road system can simulate how a pathogen like smallpox might have traveled from the Near East to the frontiers of the empire. By altering the network density or the speed of travel, researchers can identify which routes were most influential. Modern network analysis tools also allow historians to reconstruct contact networks from historical records, such as ship passenger manifests or parish marriage registers. Network models are uniquely capable of capturing the non-random structure of human interactions, which can produce super-spreading events and other dynamic phenomena missed by well-mixed compartmental models.

Spatial and spatiotemporal models

In addition to the three main categories, spatial models that incorporate geographic information systems (GIS) have become increasingly important. These models assign disease risk to specific locations based on environmental variables such as altitude, temperature, and proximity to water bodies. For instance, a spatial model of the 1665 Great Plague of London might use historical maps of street layout and parish boundaries to simulate how the disease hopscotched from one neighborhood to another. Coupled with network models, spatiotemporal approaches can generate detailed reconstructions of epidemic waves, matching the granularity of registration data in well-documented cities.

Case Studies: Modeling Historical Epidemics

Several landmark studies demonstrate the power of computational history in epidemic research. Each case highlights different methodological strengths and data challenges, and together they illustrate how modeling can reshape historical understanding.

The Black Death (1346–1353)

The Black Death killed an estimated 30–60% of Europe's population. Early models focused on the primary transmission route: whether the bacterium Yersinia pestis was carried by fleas on rats or by human lice and fleas. Agent-based and network models have since shown that a combination of both routes can reproduce the observed speed of spread, but that human ectoparasites may have played a larger role than previously believed. Models incorporating climate data also suggest that changes in temperature and precipitation in Central Asia may have triggered the initial outbreak in the 1330s. Researchers at the Max Planck Institute for the Science of Human History have combined these models with genetic analyses of ancient Y. pestis DNA to trace the bacterium's evolution across continents.

Another important finding from computational studies of the Black Death concerns the role of trade networks. Network models show that the plague spread faster along maritime routes than overland, contradicting earlier assumptions that the Silk Road was the primary conduit. By simulating the closure of certain ports or the enforcement of quarantine measures, historians have estimated that cities like Venice and Milan, which implemented strict policies, experienced mortality rates significantly lower than those that remained open.

The 1918 Influenza Pandemic

The Spanish flu killed an estimated 50 million people worldwide. Computational models have been used to analyze the impact of non-pharmaceutical interventions, such as school closures, bans on public gatherings, and isolation of cases. A study using a compartmental model for cities in the United States found that cities that implemented early, layered interventions had mortality rates 50% lower than those that delayed. Agent-based models further revealed that the movements of young adults during World War I created super-spreading events in military camps and transport ships. The CDC's Epidemic Intelligence Service has drawn on these historical modeling studies to inform modern pandemic preparedness plans, including those for COVID-19.

Network models of the 1918 pandemic have also shed light on the role of age structure. By reconstructing contact patterns from household surveys and school attendance records, researchers found that children were not only highly susceptible but also acted as efficient transmitters within households. This finding challenges the narrative that the 1918 flu was uniquely lethal to young adults; rather, the high mortality in the 20–40 age group was driven by an unusually strong immune response to the virus, not by increased transmission.

Cholera in 19th-Century London

The 1854 Broad Street cholera outbreak is a classic example of epidemiological investigation, but modern computational models have added nuance. By digitizing historical maps, census data, and water pump locations, researchers have built spatial models that show the disease spread primarily through contaminated water from a single pump. However, network models also suggest that social connections—such as servants carrying water to wealthier households—played a role. These findings have implications for understanding how urban infrastructure and social inequality shape disease dynamics. Spatial-statistical models have further demonstrated that the outbreak's spatial footprint aligns closely with the walking distance from the Broad Street pump, supporting John Snow's original hypothesis while quantifying the uncertainty in his data.

The Great Plague of London (1665)

A fourth case study, the Great Plague of London, illustrates how computational models can compensate for sparse data. Parish records from 1665 provide weekly burial counts but little demographic detail. Agent-based models parameterized with household sizes, mortality rates, and movement patterns have been used to estimate the basic reproduction number (R₀) of the plague in London. These models suggest that the outbreak was sustained by a combination of rat-to-human and human-to-human transmission, and that the imposition of quarantine on infected households reduced the peak mortality by approximately 20%. Such analyses inform debates about whether historical "plagues" were always the same disease as modern Y. pestis infections.

Data Sources and Challenges

Modeling historical epidemics requires reconstructing data that are often incomplete, inconsistent, or ambiguous. Researchers draw from a variety of sources: parish burial registers, tax records, ship logs, hospital admission books, and even literary accounts. Each source has biases. For example, burial records typically undercount infant deaths, and mortality from epidemics is often conflated with other causes. Paleoclimatological proxies, such as tree rings and ice cores, can provide data on temperature and humidity that affect disease transmission. Additionally, recent advances in ancient DNA analysis allow researchers to identify the pathogen responsible for a historical outbreak, even when written records are ambiguous.

One major challenge is the lack of granular time series data. Many historical records report monthly or yearly totals, which are too coarse to model rapid transmission. To address this, computational historians use statistical imputation and data assimilation techniques borrowed from meteorology. Another challenge is the uncertainty in historical population sizes and migration patterns. Sensitivity analyses—running models with a range of plausible inputs—help quantify how robust the conclusions are to these uncertainties. For instance, a model of the 1520 smallpox epidemic in the Aztec Empire must account for the unknown pre-Columbian population of Tenochtitlan; sensitivity analysis reveals whether the key results depend heavily on that estimate.

Despite these obstacles, the field is advancing rapidly. The digitization of historical archives and the development of text-mining tools allow researchers to extract structured data from unstructured documents. For instance, machine learning algorithms can now automatically identify mentions of disease outbreaks in centuries-old newspapers, providing new datasets for epidemic modeling. Projects like the Historical Epidemics Database aggregate data from diverse sources, making it easier for modelers to access and compare records across time periods.

Interdisciplinary Approaches

Successful computational history of epidemics requires collaboration across disciplines. Historians provide contextual knowledge and source criticism; epidemiologists contribute model design and parameter estimation; computer scientists develop algorithms and visualization tools; and statisticians handle uncertainty and validation. Interdisciplinary teams have produced some of the most influential studies, such as the reconstruction of the plague's spread along the Silk Road using a combination of genomic data, historical records, and agent-based modeling.

Institutions like the Carnegie Mellon University Decision Science Lab are pioneering this approach. Open-source modeling platforms, such as GLEaM and FRED, allow researchers to share and reproduce results, accelerating the pace of discovery. The FRED platform, originally developed at the University of Pittsburgh, has been used to simulate historical outbreaks in virtual populations that mirror the age structure and household composition of 19th-century American cities.

Training the next generation of computational historians is equally important. Several universities now offer joint degrees or certificates in digital humanities and public health. Conferences like the Digital Humanities Summit and the International Conference on Computational Social Science increasingly feature sessions on historical epidemics. These venues facilitate the cross-pollination of ideas that drives methodological innovation.

Computational models are uniquely suited to examine how social and economic conditions shape epidemic outcomes. For example, during the 1918 flu, cities with higher levels of income inequality experienced greater mortality, likely because poor workers could not afford to stay home. Network models that incorporate socioeconomic status can quantify this effect by assigning different contact rates to different groups. Similarly, models of the Black Death have shown that trade restrictions and the construction of quarantine facilities (lazarettos) in Italian city-states reduced the death toll in those areas, even if the policies were applied inconsistently.

Gender also plays a role. Historical records from plague outbreaks indicate that women, who often cared for the sick, were disproportionately affected. Agent-based models that simulate caregiving roles can help estimate the excess risk. These findings demonstrate that epidemics are not random; they follow patterns of social vulnerability that computational tools can expose. In the case of smallpox in colonial America, models that incorporate racial categories show that indigenous populations suffered catastrophic mortality not only because of lack of immunity, but also because forced displacement and violence disrupted traditional disease management practices.

Computational Models in Modern Public Health

While the primary goal of historical epidemic modeling is to understand the past, the insights often have direct relevance to contemporary public health. For instance, models of the 1918 flu have informed pandemic preparedness plans for influenza and COVID-19. The finding that early, layered non-pharmaceutical interventions reduce mortality was used by many governments during the COVID-19 pandemic. Similarly, models of historical cholera outbreaks have reinforced the importance of clean water infrastructure and rapid detection systems.

The field of "phylodynamics," which combines genetic sequencing of pathogens with computational models, allows researchers to trace the evolutionary history of viruses alongside their spatial spread. This technique has been used to reconstruct the origins of HIV and the emergence of drug-resistant tuberculosis. By applying phylodynamic models to historical skeletal remains, researchers can now estimate when and where ancient strains of Mycobacterium tuberculosis circulated in pre-Columbian populations. These data feed into modern public health efforts to understand the long-term evolution of pathogens and predict future drug resistance patterns.

Future Directions

The next frontier in computational history of epidemics lies in the integration of heterogeneous data sources and the application of machine learning. Deep learning models can analyze unstructured text from thousands of historical documents to extract disease timelines and symptom descriptions. Geographic information systems (GIS) are becoming more refined, allowing researchers to model landscapes at high resolution. Moreover, the growth of digital humanities projects means that more data are becoming publicly available every year.

One promising direction is the creation of "digital twins" of historical cities—virtual replicas that combine archaeological reconstructions with population data. These twins can be used to simulate the spread of a disease with unprecedented realism. For example, the Virtual Rome project models the entire city at the time of the Antonine Plague (165–180 AD), including aqueducts, bathhouses, and housing density. Another is the use of Bayesian inference to combine multiple lines of evidence, such as genetic data from ancient DNA, written records, and climate proxies, into a single coherent model. As computing power increases and algorithms improve, these models will become more accurate and accessible.

However, ethical considerations remain. Models should not be used to retroactively blame communities for outbreaks, nor should they reinforce deterministic narratives. Historians and modelers must be transparent about assumptions and limitations. The goal is not to produce a single "true" history, but to open new questions and provide tools for exploration. In particular, careful attention must be paid to the risk of presentism—projecting modern categories of race, class, or gender onto historical populations in ways that distort analysis. Responsible computational history engages critically with its own methods.

Conclusion

Computational history has transformed how we study the spread of historical epidemics. By combining the rigor of modeling with the depth of historical analysis, researchers can uncover dynamics that were previously hidden. Agent-based, compartmental, network, and spatial models each offer unique insights, and case studies from the Black Death to the 1918 flu demonstrate their power. Data challenges persist, but interdisciplinary collaboration and technological advances continue to push the field forward. As we face new global health threats, the lessons from computational models of the past are not just academic—they are essential tools for building a healthier future.