AI-driven mortality models are supposed to provide accurate, data-backed insights into patient survival, disease burden, and healthcare risk assessment. These models influence epidemiological research, drug safety evaluations, and insurance risk predictions, all critical components of modern healthcare decision-making. However, their accuracy is only as good as the data feeding them, and mortality data often remains fragmented, outdated, and not AI-ready without human intervention.
In mortality analytics, zombie data refers to outdated or residual death records. Cases where deceased patients continue to appear active in datasets due to delayed reporting, misclassification, or synthetic data. This often results in post-mortem activity, where patient data points indicate signs of life in data systems (like filling prescriptions) long after death. Legacy systems, fragmented sources, and delayed reporting all contribute to this problem.
For instance, one U.S. government audit found 6.5 million deceased Americans still listed as living in Social Security records, due to antiquated reporting processes.1 State-level vital record systems can be even slower; in some cases it takes up to two years for state death registries to update, meaning a patient might continue “receiving” prescriptions or insurance coverage for years after death. With no single up-to-date national death database, companies must patch together information from sources like the Social Security Death Master File (DMF), state records, obituaries, and even credit bureaus.
The Two Faces of Zombie Data in Mortality Analytics
Mortality data quality issues generally fall into two broad categories:
Category
Fact of Death (FOD) Errors
|
Definition
The confirmed legal record that a person has died, pulled from official registries (e.g., SSA Death Master File, state death databases).
|
Key Data Risks
|
Category
Cause of Death (COD) Errors
|
Definition
The medical reason recorded on a death certificate (e.g., cardiac arrest, cancer, COVID-19).
|
Key Data Risks
|
These errors pollute AI training datasets, leading to inaccurate survival predictions, incorrect epidemiological modeling, and compromised healthcare strategies.
Why cause-of-death (COD) data can mislead AI mortality models
AI-driven mortality models rely heavily on cause of death (COD) data from death certificates. However, these records are sometimes delayed or inaccurate:
- Many U.S. death certificates contain vague or incorrect cause-of-death classifications or other errors, often listing "cardiac arrest" or "respiratory failure" instead of an underlying condition.2
- ICD-10 coding lag creates reporting gaps. Early COVID-19 deaths were frequently misclassified as pneumonia.3
- AI models trained on this data overweigh cardiovascular and respiratory deaths while underestimating conditions like neurodegenerative diseases or opioid overdoses.
Example of training on an incorrect cause of death:
A predictive AI model for cancer mortality risk is trained on death certificate data, but because most late-stage cancer deaths are recorded as "cardiac arrest," the model fails to accurately reflect the true burden of cancer mortality.
The lag between death and data:
Fact of death (FOD) records often suffer from significant delays in official reporting or duplicated reports.
In 2021, the Social Security Administration (SSA) Death Master File still listed 6.5 million deceased Americans as “alive.” 1
Even efforts to “catch up” on backlog can introduce anomalies. In March 2025, the SSA undertook a massive data cleanup, adding about 7 million previously unrecorded death entries in one batch. However, roughly 6 million of those new records had their dates of death defaulted to March 2025, resulting in implausible data (e.g. huge spikes where one million people supposedly died on the same day, and many individuals suddenly listed as 120+ years old).4
Example of post-mortem activity due to delayed reporting:
An AI mortality model analyzing medication adherence sees that a certain patient has been steadily filling prescriptions each month and concludes they are following their treatment. In reality, the patient died six months ago, but due to reporting delays the death wasn’t recorded and their pharmacy refills or claims kept coming (perhaps via an automated refill or a misattributed record), an instance of post-mortem activity.
A better solution: Verified mortality data at your service
HealthVerity, in partnership with Veritas Data Research, offers a curated mortality data solution built to support AI accuracy and regulatory confidence. This joint approach accounts for post-mortem activity and ensures only high-confidence death records are used.
✅ How it works:
- Verified Fact of Death Index: Veritas confirms deaths across multiple reliable sources before inclusion.
- HealthVerity Mortality Masterset: We’ve built a pre-linked, research-ready Masterset using Veritas’ high-confidence mortality data, connected to HVIDs for seamless integration across real-world data sources.
- Post-Mortem Filters: Records showing suspicious signals, like activity after the date of death, are flagged and held back.
- Cross-Referenced with External Sources: Veritas' review of SSA’s 2025 “data dump” revealed major errors. Instead of blindly ingesting 7 million new entries, they cross-validated each record to prevent corrupting downstream analytics.
- Data De-duplication: By removing duplicates and fragmented identities, we reduce the noise and distortion that often skews AI mortality models.
“Unconfirmed records from SSA, from this release of data, are flagged as low confidence, giving customers control over inclusion.” — Veritas Data Research |
The result:
A clean, AI-ready mortality dataset made up of patients confirmed to be truly deceased. With the transparency and confidence needed for real-world evidence, risk modeling, and AI deployment.
Key takeaways for AI-driven life sciences research
- Fragmented and low-quality data in mortality analytics leads to flawed AI-driven health predictions.
- Delayed death reporting creates patient after-images that distort adherence, cost, and risk modeling.
- Misclassified COD trends introduce bias in disease burden and pharmaceutical safety analysis.
- Synthetic data creep reinforces historical biases, leading AI models to predict mortality trends that do not align with real-world evidence.
- HealthVerity’s data solutions ensure AI models are trained on clean, validated, and provenance-traceable mortality insights.
Are you building AI-driven life sciences models? Ensure your data is free from residual patient distortions. Dying to know more? Let’s discuss how HealthVerity can optimize your mortality insights.
References
- Office of the Inspector General, Social Security Administration. (2023, July). Numberholders Age 100 or Older Who Did Not Have Death Information on the Numident. https://oig.ssa.gov/assets/uploads/a-06-21-51022.pdf
- Schuppener LM, Olson K, Brooks EG. Death certification: errors and interventions. Clinical Medicine & Research. 2020;18(1):21. https://pmc.ncbi.nlm.nih.gov/articles/PMC7153801/#b1-0180021
- Rivera R, Rosenbaum JE, Quispe W. Excess mortality in the United States during the first three months of the COVID-19 pandemic. Epidemiology and Infection. 2020;148:e264. https://pmc.ncbi.nlm.nih.gov/articles/PMC7653492/
- The Social Security Administration (Ssa) recently addressed concerns about payments to potentially deceased individuals, by classifying many individuals as deceased, which generated 7 million ‘new’… | Veritas Data Research. (n.d.). Retrieved July 7, 2025, from https://www.linkedin.com/posts/veritas-data-research_mortalitydata-veritasdataresearch-dataquality-activity-7316513210679922689--f7g