Synchronizing real-world data for a quicker response to disease outbreaks

With the proliferation of data available today and strides in technology leading to advances in healthcare, we are discussing how government agencies can leverage real-world data (RWD) and Privacy-Preserving Record Linkage (PPRL) technology to preserve and promote public health. In the second installment of this five-part series, we are going to delve deeper into disease surveillance for a rapid response to outbreaks.

As mentioned in our introduction to this series, government public health agencies can monitor RWD to understand disease spread, predict potential outbreaks and allocate resources when threats to public health arise from outbreaks, such as COVID-19, Mpox (formerly known as Monkeypox) or respiratory syncytial virus (RSV). In times of disaster, such as the COVID pandemic, government agencies can face unnecessary challenges if starting from scratch when accumulating the data they need to monitor and manage the situation. In addition to current data sources curated by government agencies (state registries, electronic case reporting, etc.), the government needs to leverage traditionally fragmented RWD sources, such as medical and pharmacy claims, electronic medical records (EMRs), labs and more.

The government has made significant investments in innovative surveillance programs, such as the National Syndromic Surveillance Program (NSSP), collecting data from over 6,400 healthcare facilities, the National Wastewater Surveillance System, monitoring for COVID outbreaks in major cities, and the Vaccine Adverse Event Reporting System (VAERS), a self-reporting system aimed at monitoring vaccine safety. As agencies continue to advance disease surveillance, there are many data requirements to consider. 

Disease surveillance data requirements

There are a number of considerations when leveraging RWD for disease surveillance. For optimal disease surveillance, government agencies need:

  • Large quantities of data that represent all areas of the U.S.
  • Short data lags for near real-time insight
  • Records representing both inpatient and outpatient settings
  • Long-durations of coverage for a longitudinal view of an individual’s healthcare journey
  • Coverage of vulnerable populations and social determinants of health data
  • The ability to synchronize with additional data sources or to validate self-reported data
  • A standard data model

While individual data sources have yielded some positive results, given the fragmentation of healthcare data, one data source alone cannot address all of the above concerns. Government agencies need to consider a variety of data sources, and be able to synchronize it while managing privacy considerations.

Different data sources to meet disease surveillance requirements

Each data source has unique attributes that also need to be considered when selecting data for disease surveillance:

  • Open claims - Open claims sources from clearinghouses can provide immediate feedback on a health crisis because claims data is often from physician or hospital visits for the prior day or week. Open claims also offer a long-term view of a patient's relationship with a particular physician or pharmacy, but only for those providers in available clearinghouses or pharmacy feeds. Additionally, open claims offer the shortest lag times to benefit disease surveillance.

  • Closed claims - Closed claims sources from payers provide an in-depth view of a patient’s healthcare journey across healthcare settings; however, it is only for the time that the patient is enrolled and, on average, 15% to 20% of patients change their insurance plans each year.1 Additionally, there is a long lag time of three to six months with closed claims data that limits the quick response needed with disease surveillance.

  • EMRs - EMRs provide robust data, covering both insured and uninsured patients, and offer a myriad of clinical attributes and observations, although lag time, depending on the source, can be one to three months.

  • Labs - Lab results provide objective insights that confirm diagnoses, quantify disease severity, offer genomic and pathological characteristics, and the data is available with a short lag, often from the day before. Labs, however, do not provide information on treatment.

  • Hospital chargemaster - Hospital chargemasters provide data on both insured and uninsured patients with an itemized list of all of the procedures and tests that occurred during a patient’s hospital visit, as well as all of the drugs and equipment utilized, including specific brand names. There is a mid-term lag time for this data, often one to three months, but the results of tests performed are not included.

  • Social determinants of health (SDOH) - With the proper privacy-preserving techniques, you can synchronize the SDOH characteristics of various populations, including those who may have comorbidities, putting them at higher risk. This information can better equip government agencies to quickly mobilize support to these populations during outbreaks or other natural disasters.

Synchronizing sources

For optimal disease surveillance that allows for rapid response times and detailed insight on a maximum number of patients, aspects of each data type above are needed, but historically, agencies could only procure a single, siloed data source or had to analyze individual datasets separately, without the ability to quickly and reliably sync patients across data sources.2 PPRL technology empowers government agencies to realize the true potential of RWD by synchronizing data from multiple sources while meeting privacy regulations. This allows the positive features in each data type to come to life to create a comprehensive, fast-reacting dataset needed for disease surveillance. 

As mentioned in the introduction to this series, not all PPRL technologies are alike and there are certain gold standards. Because HealthVerity de-identifies patient records using a universal patient identifier, known as a HealthVerity ID (HVID), and matches the HVID to a continuously updated referential database consisting of over 200 billion healthcare and consumer transactions, our PPRL technology is 10x more accurate than legacy tokenization techniques and our data ecosystem, representing more than 75 unique data sources, is fully interoperable and HIPAA compliant. This synchronized approach also allows agencies to weave in their own data or data from other third-party sources, such as regional health systems. Our cutting-edge PPRL technology has been implemented by several government agencies, including the CDC for disease surveillance during the COVID-19 pandemic. Click here to learn more about this and other PPRL projects. 

In the next installment of our five-part series, we will discuss how RWD and PPRL technology can help government agencies monitor drug safety.   

Click here to learn more about how HealthVerity can support disease surveillance or your agency's other data initiatives.


1Fang, H., Frean, M., Sylwestrzak, MA. (2022). Trends in Disenrollment and Reenrollment Within US Commercial Health Insurance Plans, 2006-2018. JAMA Network Open. February 24, 2022. 2022;5(2):e220320. https://jamanetwork.com/journals/jamanetworkopen/fullarticle/2789399

2Maximus (2023). Executive survey: Agencies want data that can lead to actionable intelligence. https://maximus.com/sites/default/files/documents/Federal/data-for-actionable-intelligence-fnn-executive-survey-final.pdf

Back to Blog