Why data provenance matters for regulatory submissions and clinical trials
Regulators increasingly expect that real‑world evidence (RWE) be built on high‑quality, traceable data. When life‑science companies in the US submit evidence to agencies like the FDA or OHRP, they must be confident that the data’s origin, linkage and licensing meet strict regulatory requirements. HealthVerity delivers Verified real‑world data (RWD) with clear provenance, persistent patient linkage and usage‑governed licensing. This rigor has enabled sponsors to strengthen clinical trials and regulatory submissions, as demonstrated in the semaglutide randomized pragmatic (SEPRA) trial.1,2
Case Study: SEmaglutide randomized PRAgmatic (SEPRA) trial
In the SEPRA trial (NCT03596450), researchers used the HealthVerity Identity Manager to link trial participants back to a unique HVID with real-world data available through HealthVerity Marketplace. This process converted personal identifiers into unique, privacy-preserving tokens that could then be matched against a broad range of real-world data assets. Out of 1,278 participants, 87% were successfully tokenized and half (49.5%) had medical or pharmacy claims data available during the licensed period.
Further, the baseline characteristics observed in the overall trial population were similar to those found in the HealthVerity claims data (Figure 1). This suggests that the trial population closely mirrors real-world patients thereby increasing external validity of the clinical trial.
HealthVerity real-world data linkage allowed researchers to:
-
Confirm that the trial population mirrored the real-world population
-
Follow up with the study population more effectively
- Enhance the quality and richness of the trial data with real-world insights from claims data
These findings underscore the importance of verified data linkage tools for clinical trials and regulatory processes. By ensuring that participants’ data can be accurately connected to real-world evidence without compromising privacy. Identity Manager enhances the robustness and credibility of trial results. This method not only improves the quality of real-world evidence but also lays a stronger foundation for regulatory submissions and ongoing post-market monitoring.
Figure 1. Baseline characteristics of the SEPRA trial population (N=1,278) compared with participants matched to HealthVerity claims data (N=633). The chart illustrates age, gender, region, and treatment category distribution across both datasets, showing strong demographic alignment. Baseline characteristics observed in the overall trial population were similar to those found in the HealthVerity claims data. Figure created using data from clinical trial NCT03596450 and the ISPOR 2023 poster presented by Novo Nordisk Inc (Poster ID: SA16).1
Achieving unmatched precision in clinical trial data synchronization
Approximately 87% of participants were successfully synchronized with HVID, and nearly half had associated claims data across the characteristics indicated, demonstrating superior linkage effectiveness over traditional aggregator methods.1,3 HealthVerity Identity Manager, which utilizes the HVID, has demonstrated superior accuracy in patient matching compared to legacy tokenization techniques, achieving approximately 0.2% false positive and 3-5% false negative rates. Traditional deterministic methods report significantly higher error rates, between 1-3% false positives and 9–42% false negatives (Figure 2).4,5 This capability enables comprehensive insights into patient outcomes and adherence patterns, critical for effective real-world evidence generation.
Figure 2. HealthVerity identity management technology achieves a considerable improvement over legacy de-identification methodologies with a lower false positive and false negative rate while maintaining high accuracy.
Future-Proofing pharma RWD strategies
By shifting from traditional aggregators to HealthVerity Marketplace, life‑science organizations gain long‑term sustainability through stable, direct contracts, enhanced data accuracy and richer longitudinal patient views via persistent linkage, and regulatory readiness with clear data provenance. In addition, HealthVerity taXonomy helps researchers identify precise patient cohorts across open and closed claims, laboratory and electronic health record sources, making it easier to build compliant RWE for regulatory submissions.
Three pillars of a Verified data ecosystem:
HealthVerity Marketplace supports a Verified data ecosystem built on three foundational pillars:
- Source-Traceable RWD: Each data set maintains clear lineage to its original source.
- Persistent, Privacy-Protected Record Linkage (HVID): The HealthVerity ID (HVID) consistently matches records over time, creating richer datasets that grow more informative and robust as additional data is linked to each patient.
- Use-Governed Licensing: Ensuring data usage aligns explicitly with contractual agreements, fostering long-term data access stability and compliance.
Building reliable real-world evidence with HealthVerity:
-
Data provenance and linkage quality are critical for regulatory success. HealthVerity provides source‑verified, privacy‑preserved data linkage that meets high regulatory standards
-
Case study validation: The SEPRA trial demonstrated that HealthVerity data can accurately augment clinical trial populations and support comparative analyses.
-
Marketplace advantage: Transparent sourcing and direct contracting minimize legal and security risks associated with resold datasets
-
Long‑term sustainability: Verified, governed datasets future‑proof pharma’s RWE strategies and regulatory submissions
References:
- Zacherle E, Nordahl H, Morgan J, Liang M, Leonard S., Trial tokenization accelerating innovation in SEPRA – a pragmatic randomized trial. Poster. Presented at the ISPOR 2023 Annual Meeting; May 7–10 2023; Boston, MA, USA https://www.ispor.org/docs/default-source/intl2023/ispor23zacherlepostersubmitted-pdf.pdf?sfvrsn=d624c228_0
- Buse, J. B., Nordahl Christensen, H., Harty, B. J., Mitchell, J., Soule, B. P., Zacherle, E., Cziraky, M., & Willey, V. J. (2023). Study design and baseline profile for adults with type 2 diabetes in the once-weekly subcutaneous SEmaglutide randomized PRAgmatic (Sepra) trial. BMJ Open Diabetes Research & Care, 11(3), e003206. https://doi.org/10.1136/bmjdrc-2022-003206
- Kranz, A. M., Dworsky, M., Ryan, J., Heins, S. E., & Bhandarkar, M. (2023). State All Payer Claims Databases: Identifying Challenges and Opportunities for Conducting Patient-Centered Outcomes Research and Multi-State Studies. RAND Corporation.
https://aspe.hhs.gov/sites/default/files/documents/a2add9e2d2e196f240357fee73cf3990/APCD-PCOR-Report-2023.pdf - Brantley, J. (2022). Overcoming Data Fragmentation is Key to Avoiding Future Health Care Crises. Medical Economics. https://www.medicaleconomics.com/view/overcoming-data-fragmentation-is-key-to-avoiding-future-health-care-crises
- Chung, S. C., Toh, S., & Wang, S. V. (2023). Leveraging national insurance claims data for insights on rare diseases: a public health approach. BMJ Public Health, 2(1), e000346. https://bmjpublichealth.bmj.com/content/2/1/e000346