Featured Content
Data Ecosystem
Technology Products

Understanding re-identification risk in PPRL: key takeaways from our expert webinar

New research is challenging long-held assumptions about privacy-preserving record linkage (PPRL). While PPRL remains essential for de-identified data exchange, the study presented during our recent webinar shows that certain token-based approaches can introduce re-identification risk when used at scale.1

To help organizations understand the implications, HealthVerity brought together four leaders in privacy and data science: Andrew Kress, CEO of HealthVerity; Austin Eliazar, PhD, Chief Data Scientist at HealthVerity; Bradley Malin, PhD, of Vanderbilt University Medical Center; and Kristen Rosati, Partner at Coppersmith Brockelman. Their discussion highlighted three themes every organization working with de-identified data should consider.

 

“No one wants to interpret the maze of privacy rules alone.”

— Andrew Kress, CEO, HealthVerity

1. Large datasets can unintentionally increase re-identification risk

The study showed that repeated PPRL encodings, when combined with common demographic fields like ZIP3 or year of birth, can form distinct patterns that correlate with public reference datasets.1

In fact, this risk actually grows with population size. Instead of blending individuals together, larger datasets create more unique “fingerprints,” which can make re-identification possible at national scale. “This problem isn’t going away,” Brad Malin added, “it becomes more pronounced as linkage increases.” For organizations performing multi-source linkage or working with broad populations, understanding how encodings intersect with demographics is essential.

2. Expert determination frameworks may need a fresh look

Expert determinations tied to tokenization methods should not be treated as permanent. As new research uncovers different types of risk, organizations may need to reassess whether their current PPRL approach still meets the standard.

During the recent webinar, privacy expert Kristen Rosati noted that contractual controls help manage downstream use, but they cannot replace the methodological requirements of the expert determination process. From a regulatory standpoint, Kristen encouraged organizations to “kick the tires” on their current de-identification frameworks and pay close attention to how demographic fields included in a dataset could contribute to re-identification when combined with encoded identifiers. Legal and compliance teams should review how demographics are shared, who receives the data, and whether the current structure meaningfully reduces re-identification risk.

 

3. Centralized matching avoids the risk conditions identified in the study

The vulnerabilities outlined in the research emerge only when tokens appear alongside demographic fields in de-identified datasets.1 The centralized matching methodology that HealthVerity employs removes that condition entirely.

In our model, tokens move to a secure environment for linkage, and only a privacy-preserving identifier returns to the dataset. Austin Eliazar noted that HealthVerity also evaluates any third-party tokens entering the system, ensuring protection “belt and suspenders” as datasets scale.

Missed the webinar?

To hear the full discussion watch the webinar replay as the experts walk through the latest research and its implications for organizations dealing with RWD.

Explore Identity Manager

If you’re evaluating how to strengthen your privacy-preserving record linkage strategy, learn how Identity Manager can support secure, centralized matching across your data ecosystem. 

References

  1. Eliazar A, Brown JT, Cinamon S, Kantarcioglu M, Malin B. Re-identification risk for common privacy preserving patient matching strategies when shared with de-identified demographics. Journal of the American Medical Informatics Association. Published online October 17, 2025:ocaf183. doi:10.1093/jamia/ocaf183