What is Privacy Preserving Record Linkage (PPRL)?
Written by Emery Niemiec, Director, Partners & Alliance at HealthVerity
I had the pleasure of hosting a lunch and learn for the Johns Hopkins Applied Physics Laboratory, alongside HealthVerity Chief Data Scientist, Austin Eliazar, on not just the power of privacy preserving record linkage but how HealthVerity enables patient identity resolution with 10x the accuracy in comparison to industry standards.
Johns Hopkins Applied Physics Laboratory (APL) is the nation’s largest university affiliated research center. For more than 80 years it has provided the U.S. Government with solutions focused on advancing research across a wide range of scientific challenges, including national health. For example, APL’s Electronic Surveillance System for the Early Notification of Community-based Epidemics (ESSENCE) has been adopted by the CDC for tracking developing situations across a broad range of public health concerns.
From the opioid crisis to COVID-19 and now to Monkeypox, disease surveillance is dependent on real-time access to patient data. Yet the fragmentation of our healthcare delivery system makes it nearly impossible to connect all elements of care ranging from hospitalizations, testing, vaccine administration and prescribed treatments to knowing where supply chain efforts should be targeted. Making this even more difficult is our responsibility to preserve patient identity and comply with HIPAA requirements. The HealthVerity Privacy Preserving Record Linkage (PPRL) technology, Identity Manager, provides a solution.
What is Privacy Preserving Record Linkage?
PPRL enables the de-identified linkage of individual patient records across time and data sources in a way that is both HIPAA compliant and forges interoperability. Specifically, HealthVerity’s PPRL technology can:
- De-identify patients on premises and behind the data owners’ firewalls
- Assign a universal patient identifier across multiple datasets in lieu of personally identifying information (PII)
- Flag and deduplicate multiple records across multiple datasets
How is PPRL used today?
PPRL is used across the nation’s top pharmaceutical and biotech companies, payers, and government agencies to unify patient records for a better understanding of their data. To zoom in on a single example, HealthVerity leveraged PPRL to support the CDC and HHS over the past two years in developing an end-to-end solution that tracks the vaccinated population in the U.S in a fully de-identified and HIPAA-compliant manner.
Prior to our PPRL contract with the CDC, they had no way to connect vaccination status across states, resolve the various ways PII was captured, or link that status for research. PPRL enables us to connect vaccination records across various data formats and across state and local governments so that all of the data can come together in a single pane of glass.
Beyond connecting vaccination status, HealthVerity’s PPRL technology enabled the linking of vaccination records to external datasets for unrealized insights into vaccine status and associated outcomes both broadly and by specific cohorts. For example, understanding how vaccination status impacts populations with HIV, viral hepatitis, and much more.
How has HealthVerity’s PPRL technology been able to resolve patient identity with 10x the accuracy?
With HealthVerity’s PPRL technology, the initial de-identification of patient PII uses bloom filter hashing across a broad range of PII. Bloom filters overcome typos, shortened names, optical character recognition errors, and others. Following initial hashing, probabilistic matching uses all available hashed PII to produce a confidence score and reliably matches data despite errors or missing information. Further, machine learning of dependencies and frequencies ensures that the value of information from each specific field (for instance, a common name or a highly populated region) is included in the matching assessment. These models continue to adapt to evolving conditions using HealthVerity’s current production system and the knowledge of 330 million individuals in our centralized referential database.
How does HealthVerity ensure my patients’ privacy is protected?
HealthVerity never directly receives any PII, nor requires access to the data owners’ networks. Our technology is installed behind our customers’ firewalls and all data transmitted is first de-identified and encrypted. This technology has been certified as meeting the de-identification requirements of HIPAA using expert determination across leading third-party privacy experts. Additionally, the solution ensures that broadscale clinical data linked to patients’ de-identified identity is combined in a way that excludes or transforms any data fields that would otherwise lead to re-identification, such as excluding publicly recorded events (motor vehicle accidents, incarcerations, and so on).