Case study: Partnering with the National Cancer Institute to advance COVID-19 research

Background

While the COVID-19 pandemic is behind us, the virus remains a concern that needs to be studied. In the first half of 2024 alone, over 23,000 people in the United States died of the disease. The virus also continues to evolve, with the FDA recently approving updated vaccines to address the latest variants and strains. 

The rapid response to this public health emergency, led to several vaccine and treatment options being granted Emergency Use Authorization, allowing us to return to normal. But there is still much to learn about COVID, including the effectiveness of vaccines and treatments and long-term health effects. This is particularly relevant for immunocompromised populations, such as those with cancer. This concern is what led the National Cancer Institute (NCI) to launch an initiative known as the COVID-19 Real-World Data Infrastructure (CRWDi) project. The CRWDi is the first-of-its-kind platform that allows researchers to access real-world data (RWD) from a variety of sources, all while protecting patient privacy and upholding HIPAA regulations. This enables research on important and timely questions related to the SARs CoV-2 pandemic. 

 

Challenge

To create this platform, NCI needed numerous data sources, including medical and pharmacy claims, and lab and vaccine data. It was critical that all data be privacy-protected and interoperable using Privacy-Preserving Record Linkage (PPRL) technology. This RWD also needed to be able to be accurately synchronized with NCI’s SEER cancer registries, representing 48.9% of U.S. cancer data, using PPRL technology.  

Additionally, to streamline analysis, infrastructure needed to be developed to house this synchronized RWD and allow vetted researchers to discover data of interest and build patient cohorts in real time. Researchers also needed to be able to perform analytics. NCI wanted all of this functionality in a singular platform while remaining HIPAA compliant.

 

Solution

With the nation’s largest, fully interoperable and privacy-protected healthcare data ecosystem, HealthVerity had the existing technology and RWD needed to meet NCI’s needs. Our PPRL solution, HealthVerity Identity Manager, goes beyond legacy tokenization techniques by using a universal de-identifier known as a HealthVerity ID (HVID). HVIDs are matched from a continuously updated referential database of over 200 billion healthcare and consumer transactions that leverages probabilistic matching with machine learning techniques to handle the inherent noise in RWD. This creates a single source of truth for identity and enables our vast, interoperable, HIPAA-compliant healthcare data ecosystem. We were able to use this PPRL technology to de-identify and synchronize the SEER registries and data from a hospital system with our extensive data ecosystem, consisting of more than 75 sources, including both open and closed medical and pharmacy claims.

This synchronized solution also allowed HealthVerity to quickly respond to the pandemic and develop the most comprehensive COVID dataset in the country. The continuously updated COVID-19 Masterset was leveraged by seven of the top eight pharmaceutical companies with vaccines and antiviral treatments, as well as leading government agencies. It consists of nearly 200 million individuals who tested positive or had a confirmed diagnosis for the virus, received treatment, were vaccinated, or have been tested, providing a near real-time longitudinal view of the nation’s COVID journey. By synchronizing RWD in this manner, HealthVerity was able to provide NCI with the privacy-protected and interoperable data needed for their CRWDi initiative. 

HealthVerity Marketplace is an existing self-service cloud solution where researchers can easily discover and license the synchronized RWD in our data ecosystem, building custom cohorts and instantly seeing patient counts and data provider overlaps in real time. Additionally, HealthVerity Marketplace enables clients to seamlessly incorporate their own data as a private tile that is only viewable by designated users. This technology allowed NCI to overlay the data from the SEER registries with the universe of RWD in HealthVerity Marketplace, enabling a longitudinal view of the cancer patient journey and the impact of COVID.

HealthVerity is leveraging its self-directed analytics environment from Databricks to provide researchers with built-in support for multiple programming languages, such as Python, R, SQL and Scala, so they don’t need their own analytics tools and infrastructure. Approved users can work in the language they’re comfortable with and that is fit for their purpose, while remaining HIPAA compliant.  

 

Results

All of these solutions were able to come together to create a single RWD platform for the CRWDi project that is now live and available free of charge to non-commercial, academic research groups. Interested researchers can complete the application process through NCI to gain access to HealthVerity Marketplace, including the private tile for SEER and the COVID Masterset, and be able to build custom cohorts of patients of interest. Once patients meeting the researcher’s criteria have been discovered, another form is completed to gain access to the analytics environment where the analytics of choice can be run in a HIPAA-compliant manner.  

 

 

To learn more about CRWDi, email crwdiuseraccess@nih.gov