world-history
Applying Survival Analysis to Historical Business and Industry Data
Table of Contents
Every business, industry, and economic structure carries an inherent lifespan. While some institutions endure across centuries, the vast majority succumb to market shifts, technological obsolescence, or systemic crises within decades. The systematic study of these lifespans belongs to the domain of survival analysis—a statistical framework developed for clinical trials that has found a powerful application in economic history. By adapting techniques used to model patient survival to analyze the mortality of firms and industries, researchers can move beyond anecdotal narratives to rigorously quantify the forces of resilience and decline across time.
The Foundations of Survival Analysis in Economic History
From Clinical Trials to Corporate Lifespans
The conceptual bridge between medicine and economic history is less a stretch than it initially appears. In a clinical trial, the "event" is typically death, disease recurrence, or recovery. In business history, the event is analogous: bankruptcy, dissolution, closure, or a specific corporate action such as an acquisition that terminates the firm's independent existence. The central question remains the same: what factors influence the time until the event occurs?
Traditional economic history often relies on qualitative narratives or simple descriptive statistics—counting the number of firms that failed in a given year. Survival analysis offers a deeper analytical framework. It accounts not only for whether a firm failed but when it failed and how that timing relates to covariates such as firm size, industry sector, geographic location, or macroeconomic conditions. This temporal dimension transforms historical business records from static archives into dynamic datasets capable of revealing patterns of vulnerability and endurance.
Why Historians Need Quantitative Longevity Models
Historians face a fundamental problem when studying business failure: the data is inherently incomplete and biased toward survivors. Firms that fail early often leave scant records, while successful firms generate extensive archives. Standard regression techniques that ignore the temporal ordering of failure and treat survival status as a simple binary outcome discard valuable information. They also cannot handle censored observations—firms that were still alive at the end of the study period or whose fate is unknown. Survival analysis was designed specifically to address these data structures, making it an indispensable tool for anyone conducting rigorous empirical research in business and economic history.
Core Statistical Concepts for Historical Data
Defining the Event and the Time Origin
Every survival analysis begins with a precise definition of two elements: the time origin and the failure event. In corporate history, the time origin can be the date of incorporation, the start of commercial operations, or the entry into a specific market. The failure event might be bankruptcy, liquidation, or a change in control through acquisition. The choice of these definitions depends heavily on the research question and the nature of the available historical data. For example, a study of textile mills might define time zero as the first year of production and failure as the cessation of manufacturing operations, regardless of whether the corporate entity legally dissolved. Consistency in these definitions is essential for constructing interpretable survival curves.
The Survival Function and the Hazard Function
Two fundamental functions characterize survival processes. The survival function, denoted S(t), gives the probability that a firm survives beyond a given time t. It is a decreasing function that starts at 1 (all firms are alive at time zero) and approaches 0 as time progresses (eventually all firms fail or exit). The hazard function, denoted h(t), describes the instantaneous rate of failure at time t, conditioned on survival up to that point. A high hazard rate in the early years of a firm's life captures the well-known phenomenon of high failure rates among startups, while a rising hazard rate later in life might indicate senescence or technological obsolescence.
For the economic historian, these functions provide intuitive and powerful summaries. Plotting the survival function for different cohorts of firms—such as those founded before and after a major regulatory change—visually reveals differences in longevity. The hazard function identifies critical periods of vulnerability. Industries facing rapid technological disruption often exhibit a rising hazard rate after an initial period of stability, reflecting the slow but accelerating decline of incumbents.
The Critical Issue of Censored Data
Censoring is the defining challenge of survival data, and it permeates historical archives. Right-censoring occurs when a firm is still operating at the end of the observation period. Its full lifespan is unknown, but we know it survived at least to that point. Left-censoring arises when firms were founded before the data collection period begins. Their early struggles and potential early failures are missing from the record. Interval-censoring occurs when we only know that a failure happened between two observation points, such as between decennial census years. Standard statistical methods that discard censored observations produce biased results. Survival analysis incorporates censored data directly into the likelihood function, allowing researchers to extract maximal information from incomplete historical records.
Methodologies and Historical Data Sources
Kaplan-Meier Estimator
The Kaplan-Meier estimator is the most familiar non-parametric method for estimating the survival function from observed data. It is a step function that drops at each observed failure time. For the economic historian, it provides a straightforward way to compare the survival experiences of distinct groups. One might calculate and plot Kaplan-Meier curves for all automobile manufacturers founded in the United States between 1900 and 1910, comparing those located in Detroit to those elsewhere. The visual comparison immediately reveals whether a significant survival advantage existed for firms in the emerging industrial cluster. The log-rank test provides a formal statistical test for differences between these survival curves.
Cox Proportional Hazards Model
The Cox proportional hazards model is the workhorse of modern survival analysis. It allows researchers to estimate the effect of multiple covariates on the hazard rate without assuming a specific parametric form for the baseline hazard. The model takes the form:
h(t, X) = h0(t) exp(β1X1 + ... + βnXn)
For the economic historian, this means one can estimate the relative risk of failure associated with a specific characteristic while controlling for other factors. The output is a hazard ratio. A hazard ratio greater than 1 indicates an increased risk of failure, while less than 1 indicates a protective effect. For example, a study of railroad companies in the 19th century might find that being headquartered in a state with high levels of government subsidy reduced the hazard of failure by 30%, controlling for firm size and debt levels. The Cox model transforms historical narrative into testable hypotheses and permits rigorous inference about the causes of business mortality.
Competing Risks and Parametric Models
Historical business failures are rarely homogeneous. A firm might exit through bankruptcy (a failure of the business model) or through acquisition (a successful exit for founders, though the firm as an independent entity disappears). These are competing risks. Standard survival analysis can censor acquisitions or treat them as a distinct type of failure. Competing risks models allow researchers to estimate the cause-specific hazard for each exit type, providing a richer picture of corporate fates. Additionally, parametric models such as the Weibull or log-normal distribution can be used when the shape of the hazard function is of theoretical interest, allowing researchers to test whether the risk of failure increases, decreases, or remains constant with firm age.
Data Provenance: Archives, Financial Records, and Registries
The application of survival analysis to historical data requires locating and digitizing appropriate sources. Rich datasets exist for several historical contexts:
- Corporate Registries: The UK Companies House archives contain records of incorporations and dissolutions dating back to the mid-19th century. Similar registries exist in most industrialized nations.
- Stock Exchange Listings: Historical manuals for the New York Stock Exchange, London Stock Exchange, and other exchanges provide dates of listing and delisting, offering a continuous record of corporate existence.
- Industry-Specific Directories: Trade publications and directories (e.g., Lloyd's Register of Shipping, the Thomas Register of American Manufacturers) provide comprehensive lists of firms in specific industries over time.
- Census Data: Economic censuses, such as the United States Census of Manufactures conducted from 1820 onward, capture manufacturing establishments and their operational status at decennial intervals, providing interval-censored survival data.
These sources require careful cleaning and standardization, but they form the empirical foundation for rigorous historical survival analysis.
Historical Case Studies
Banking Failures During the Great Depression
The banking panics of 1930-1933 represent one of the most thoroughly analyzed periods in financial history, and survival analysis has been central to modern understanding of that crisis. Researchers have constructed survival curves for banks in Federal Reserve districts, measuring how factors like asset size, Federal Reserve membership, and exposure to agricultural loans affected the instantaneous risk of failure. The Great Depression data show clearly that smaller, non-member banks in agricultural regions experienced a hazard rate orders of magnitude higher than larger, urban, member banks. This quantitative evidence confirmed historical narratives about the vulnerability of small rural banks and provided rigorous support for theories about the mechanics of financial contagion. The analysis also demonstrated that banks that survived the initial panics of 1930-1931 did not necessarily become safer—the hazard rate remained elevated for surviving banks through 1932 and 1933, suggesting a second wave of distress driven by falling asset prices and declining economic activity.
Technological Displacement: Canals, Railroads, and the Transportation Revolution
The 19th century transportation revolution offers a classic case of technological disruption amenable to survival analysis. Canal companies enjoyed high survival rates and stable revenues in the 1820s and 1830s. The emergence of railroads dramatically increased the hazard rate for canal companies. A survival analysis of canal companies operating in the northeastern United States shows that the hazard rate for failure began to rise in the 1840s and peaked in the 1850s, precisely as the railroad network expanded. Researchers can estimate a Cox model where the key covariate is the "distance to the nearest railroad line" at yearly intervals. The canals and transportation revolution literature demonstrates that control of feeder canals and integration into emerging rail networks provided survival advantages, while independent canal companies with fixed routes faced accelerating hazard rates. The Panic of 1893 was particularly devastating for the remaining canal companies, representing a terminal shock that eliminated nearly all survivors.
Post-War Industrial Policy in Japan
The survival rates of Japanese industrial firms in the post-World War II era provide a compelling case study in the effects of institutional structure on longevity. The corporate groupings known as keiretsu are thought to have provided member firms with a protective network of stable shareholders, preferential lending from group banks, and long-term business relationships. Survival analysis has been used to test this proposition formally. Studies comparing the hazard rates of keiretsu-affiliated firms to independent firms find that membership in a horizontal keiretsu significantly reduced the risk of financial distress and bankruptcy, particularly during economic downturns. This protective effect persisted even after controlling for firm size, age, and industry. The analysis reveals that the institutional structure of the keiretsu functioned as a form of implicit insurance, lowering the hazard rate for member firms and contributing to the stability of the postwar Japanese corporate system. The academic research on keiretsu provides detailed survival models that isolate this network effect.
The Long Decline of New England Textile Mills
The shift of the American textile industry from New England to the South across the late 19th and early 20th centuries offers another rich terrain for survival analysis. Mills in New England faced rising labor costs, aging infrastructure, and competition from newer, more efficient Southern mills. Survival analysis shows that the hazard rate for New England mills began to increase steadily after 1900, with a sharp spike during the 1920s. Mills that invested in modernizing equipment and shifted toward higher-quality specialty fabrics demonstrated lower hazard rates and longer survival times. The Great Depression delivered the final blow to many laggards. By 1940, a majority of the New England mills operating in 1900 had closed or relocated. The survival analysis identifies the timing and covariates of this decline precisely, showing that proximity to raw materials (cotton produced in the South) initially provided a survival advantage but declined in importance over time relative to labor costs and technological modernization.
Overcoming the Challenges of Historical Data
Survivorship Bias and Fragmented Records
Perhaps the greatest challenge facing historical survival analysis is the systematic bias introduced by incomplete records. Firms that failed early often left few traces in the historical record. Their entire existence may be reduced to a single line in a corporate registry or a brief mention in a newspaper archive. This creates a form of left-truncation that must be modeled explicitly. Researchers often employ inverse probability weighting or restrict their analysis to cohorts with comprehensive coverage. Acknowledging the limitations of the data and conducting sensitivity analyses to assess the potential impact of missing observations is a standard part of rigorous historical survival analysis.
Changing Industrial Classifications
Industrial classification systems have evolved dramatically over the past century. A firm classified as a "cotton manufacturer" in 1900 might fall under several different standard industrial classification (SIC) or North American Industry Classification System (NAICS) codes today. Mapping historical classifications to modern systems requires detailed knowledge of industrial processes and careful judgment. Some researchers construct their own custom classification schemes that are consistent across the period of study, grouping firms into broad categories (e.g., "textiles," "iron and steel," "machinery") rather than relying on fine-grained modern codes that have no historical analog.
Linking Historical Entities to Modern Data
Tracing the survival of a specific corporate entity across decades or centuries often requires linking records from disparate sources. A manufacturing firm listed in the 1880 Census of Manufactures might appear under a slightly different name in a 1900 credit report and an entirely different name after a merger in 1920. Linking these records requires probabilistic matching algorithms and meticulous archival research. The payoff is significant: a longitudinal dataset that tracks individual firms across their entire lifespan, enabling truly dynamic survival analysis. Projects such as the Historical Corporate Database at several universities have invested heavily in this linking process, producing datasets that allow researchers to trace the life courses of thousands of firms across more than a century.
Practical Applications for Modern Analysts and Historians
Informing Policy Evaluation
Survival analysis provides a rigorous framework for evaluating the long-term impact of historical policies. Governments and central banks can use these methods to assess whether past interventions—such as bailouts, tariff protection, or regulatory reforms—actually improved firm survival rates. Did the Smoot-Hawley Tariff of 1930 protect domestic manufacturers from failure? Survival analysis of manufacturing firms before and after the tariff shows that while some industries experienced a temporary reduction in hazard rates, the overall effect was muted and possibly negative due to retaliatory tariffs and the deepening of the Great Depression. This historical evidence continues to inform contemporary trade policy debates.
Strategy and Competitive Dynamics
Understanding the historical survival patterns of firms in a given industry provides strategic context for modern managers. The hazard rates observed in 19th-century transportation, 20th-century textiles, and late-20th-century technology sectors reveal common patterns: firms that fail to adapt to technological change experience sharply rising hazard rates, while those that diversify or innovate can extend their lifespans dramatically. These historical regularities are not deterministic, but they provide a valuable baseline against which to assess the risks facing contemporary firms. The competitive dynamics that drove consolidation and exit in earlier eras continue to operate today, and survival analysis offers a quantitative framework for understanding those dynamics.
Conclusion: Endurance as a Metric of Success
Survival analysis does not merely count failures; it structures our understanding of how and why businesses endure. It transforms historical archives into laboratories where theories of economic resilience can be tested. By measuring the hazard rates faced by canal companies in the 1850s, banks in the 1930s, and textile mills in the 1920s, the method reveals the quantitative contours of economic change that narrative histories only describe qualitatively. In the long run, every firm and industry eventually fails. Understanding the timing and determinants of that failure is the work of historical survival analysis. It provides a rigorous, evidence-based lens through which we can interpret the fragility and endurance of the economic institutions that shape our world.