HealthVerity IPGE Blog Series: Privacy

In the first installment of our HealthVerity IPGE blog series covering the four foundational elements of Identity, Privacy, Governance and Exchange, we explored the importance of accurate patient identity resolution. In our second installment, we turn our focus to the importance of Privacy and how HealthVerity is uniquely positioned to build the nation’s largest, fully interoperable healthcare and consumer data ecosystem, including your own proprietary data.    

As we described in our previous blog post regarding Identity, de-identified data begins by replacing a patient’s personally identifiable information (PII) with a universal patient ID that we call a HealthVerity ID (HVID). Given the noise in healthcare data, especially spelling errors, nicknames and missing fields, it is critical that patient identity resolution software is able to accurately assign the same patient ID even when presented with slightly different views of the same PII. HealthVerity leverages a cloud-based solution that assigns HVIDs 10x more accurately than any existing solution on the market today.

The challenges of data privacy

Many companies would have you believe that simply replacing PII with a patient ID, in many cases the wrong ID, is sufficient for achieving “de-identification” so as to ensure privacy. At HealthVerity, however, we know that while Identity is a foundational element, it is only the first step on the path to privacy. Disparate cohorts of real-word data (RWD) cannot simply be joined together with the same patient ID schema and privacy rules are thereby satisfied. In a similar vein, companies cannot join new third-party data with existing “de-identified” data without ensuring that specific privacy rules are being implemented.

Safe Harbor vs. Expert Determination

The HIPAA Privacy Rule established that there are two privacy standards by which data can be deemed to be de-identified: Safe Harbor and Expert Determination.



With Safe Harbor, a data provider must remove 18 data fields associated with protected health information (PHI) such as zip codes, procedure codes and dates of service. This technique has certain benefits for protecting data, but it also has two key shortcomings. First, it does not provide a method for linking patient identity across disparate datasets because of the way in which Safe Harbor eliminates patient identity attributes. Second, it does not allow for dates of events or services, which are crucial when creating a meaningful timeline of events or establishing causality for analysis. This is why it is rarely utilized for research-quality data among Life Sciences and Insurance markets.

With Expert Determination, HIPAA provides that a person with appropriate knowledge of and experience with generally accepted statistical and scientific principles and methods for rendering information not individually identifiable can serve as an expert for purposes of a HIPAA review. The role of the expert, as defined by Health and Human Services (HHS), is then to apply such principles and methods so as to determine that the risk is very small that the information could be used, alone or in combination with other reasonably available information, by an anticipated recipient to identify an individual who is a subject of the information. Furthermore, the expert must document the methods and results that justify such determination. There is often a significant amount of privacy work required to normalize and transform native healthcare data to meet the requirements of an expert determination with the benefit being that the retained attributes enable de-identified patient linkage and the data retains the vast majority of key clinical fields. For these reasons, a HIPAA certification is the gold standard for clients seeking to pursue detailed patient and provider analytics.

Privacy challenges when linking multiple datasets

While individual datasets may be deemed to be HIPAA-compliant, companies often run afoul of HIPAA when they attempt to combine disparate datasets that include different fields or varied privacy approaches within key fields. The combination of unique datasets may, in fact, create issues that potentially violate HIPAA by exposing quasi-identifiers that could be employed by a bad actor to possibly re-identify patient information. This threat is exactly why the practice of joining data from many unique sources is such a difficult and time consuming task and is an often overlooked burden when licensing a wide array of data types.  It also reinforces the point that basic patient linkage is just a fraction of the effort required to enforce patient privacy under the law. More importantly, the burden of this privacy work often falls on teams that are not well equipped to manage the effort or are time-constrained in pursuing a HIPAA certification that often takes weeks or months to secure. The result can often leave a team frustrated, or even worse, exposed to privacy risk.

HealthVerity IPGE platform and Privacy

Despite hosting more than 75 RWD data sources in the HealthVerity IPGE platform, HealthVerity has established the gold standard in consistent data models and data normalization and transformation protocols that ensure that all of its partner data sources and data types are HIPAA-compliant in any and all combinations. HealthVerity achieved this platform wide-certification by working closely with Dr. Brad Malin of Vanderbilt University, one of the leading experts in the US on data privacy. As such, data licensed through HealthVerity, including any combination of medical claims, pharmacy data, lab data, EMR and beyond, falls under our standard certification and is HIPAA-compliant on day one such that critical research and analytics can begin the same day as delivery. 

More importantly, the HealthVerity IPGE platform makes our privacy technologies available to clients who are seeking to transform their own data into a HIPAA-compliant format or to combine proprietary data with third-party data from HealthVerity’s data ecosystem. In this case, HealthVerity delivers both Identity and Privacy solutions to ingest client data, often conform it to HealthVerity privacy standards and return client data in combination with third-party data, all completely HIPAA-compliant under our certification and ready to study as a unified cohort. This approach takes the guesswork and legwork out of an otherwise very challenging task.

Social determinants of health (SDOH) and Privacy

While healthcare data is more readily joined as described above, many clients are eager to take advantage of consumer data, also known as social determinants of health (SDOH). SDOH can be a dangerous data category because it includes important attributes such as income, marital status, education and home ownership that can leak quas-identifiable information that would otherwise violate HIPAA. In most cases, SDOH cannot even be physically present in the same network as de-identified healthcare data, regardless of the intent of the end user. 

Leveraging our leadership in Privacy, HealthVerity has designed a patented privacy computing environment as a part of the HealthVerity IPGE platform. This environment enables complex analysis of healthcare data in virtual combination with SDOH to generate HIPAA-compliant analytics for cohorts of 10 or more patients that does not reveal patient-level outcomes. Life sciences and media companies have been able to leverage these patented techniques to explore numerous demographic and socioeconomic trends regarding key patient cohorts while remaining on the right side of HIPAA.

In summary, Privacy is both one of the most important and most challenging aspects of working with patient-centric healthcare and consumer data. While many companies offer tokenization as a path to de-identification, HealthVerity not only starts with a 10x more accurate patient identity resolution approach, but delivers a gold standard approach to managing RWD at scale with a HIPAA-compliant, ready to research methodology. Join us for our next blog when we discuss the importance of Governance, efficient and effective techniques for enforcing Privacy across the data ecosystem.

Partnering with HealthVerity

As the largest healthcare data RWD ecosystem in the US, the HeathVerity IPGE platform offers access to more than 150 billion de-identified transactions for 330 million American patients from more than 75 data sources that are 10 times more accurate than other technology solutions in the market. This RWD contributes to critical analyses that drive analytics and lead to the creation of RWE changing modern healthcare enterprises.

To learn more about how RWD is managed across the IPGE platform and how HealthVerity’s expertise in Privacy can benefit your life sciences organization, payer or government agency, please fill out the form below, email us at or contact us!

Back to Blog