AI-driven mortality models are supposed to provide accurate, data-backed insights into patient survival, disease burden, and healthcare risk assessment. These models influence epidemiological research, drug safety evaluations, and insurance risk predictions, all critical components of modern healthcare decision-making. However, their accuracy is only as good as the data feeding them, and mortality data often remains fragmented, outdated, and not AI-ready without human intervention.
In mortality analytics, zombie data refers to outdated or residual death records. Cases where deceased patients continue to appear active in datasets due to delayed reporting, misclassification, or synthetic data. This often results in post-mortem activity, where patient data points indicate signs of life in data systems (like filling prescriptions) long after death. Legacy systems, fragmented sources, and delayed reporting all contribute to this problem.
For instance, one U.S. government audit found 6.5 million deceased Americans still listed as living in Social Security records, due to antiquated reporting processes.1 State-level vital record systems can be even slower; in some cases it takes up to two years for state death registries to update, meaning a patient might continue “receiving” prescriptions or insurance coverage for years after death. With no single up-to-date national death database, companies must patch together information from sources like the Social Security Death Master File (DMF), state records, obituaries, and even credit bureaus.
Mortality data quality issues generally fall into two broad categories:
Category
Fact of Death (FOD) Errors
|
Definition
The confirmed legal record that a person has died, pulled from official registries (e.g., SSA Death Master File, state death databases).
|
Key Data Risks
|
Category
Cause of Death (COD) Errors
|
Definition
The medical reason recorded on a death certificate (e.g., cardiac arrest, cancer, COVID-19).
|
Key Data Risks
|
These errors pollute AI training datasets, leading to inaccurate survival predictions, incorrect epidemiological modeling, and compromised healthcare strategies.
AI-driven mortality models rely heavily on cause of death (COD) data from death certificates. However, these records are sometimes delayed or inaccurate:
A predictive AI model for cancer mortality risk is trained on death certificate data, but because most late-stage cancer deaths are recorded as "cardiac arrest," the model fails to accurately reflect the true burden of cancer mortality.
Fact of death (FOD) records often suffer from significant delays in official reporting or duplicated reports.
In 2021, the Social Security Administration (SSA) Death Master File still listed 6.5 million deceased Americans as “alive.” 1
Even efforts to “catch up” on backlog can introduce anomalies. In March 2025, the SSA undertook a massive data cleanup, adding about 7 million previously unrecorded death entries in one batch. However, roughly 6 million of those new records had their dates of death defaulted to March 2025, resulting in implausible data (e.g. huge spikes where one million people supposedly died on the same day, and many individuals suddenly listed as 120+ years old).4
An AI mortality model analyzing medication adherence sees that a certain patient has been steadily filling prescriptions each month and concludes they are following their treatment. In reality, the patient died six months ago, but due to reporting delays the death wasn’t recorded and their pharmacy refills or claims kept coming (perhaps via an automated refill or a misattributed record), an instance of post-mortem activity.
HealthVerity, in partnership with Veritas Data Research, offers a curated mortality data solution built to support AI accuracy and regulatory confidence. This joint approach accounts for post-mortem activity and ensures only high-confidence death records are used.
“Unconfirmed records from SSA, from this release of data, are flagged as low confidence, giving customers control over inclusion.” — Veritas Data Research |
A clean, AI-ready mortality dataset made up of patients confirmed to be truly deceased. With the transparency and confidence needed for real-world evidence, risk modeling, and AI deployment.
Are you building AI-driven life sciences models? Ensure your data is free from residual patient distortions. Dying to know more? Let’s discuss how HealthVerity can optimize your mortality insights.