At the 2025 Indegene Digital Summit, a session hosted by Indegene and Datavant, senior leaders Mayank Rizada and Melissa Wissner unpacked the practical realities of using real-world data (RWD) and tokenization to streamline clinical trial operations. With over two decades of experience each, Rizada, Senior Director at Indegene, and Wissner, Principal for Clinical Development Solutions at Datavant, emphasized that while the industry has long discussed RWD’s promise, it is now poised for tangible, scalable deployment across the clinical development lifecycle. From optimizing patient recruitment to long-term safety monitoring, their discussion highlighted a clear shift toward integrating fragmented data sources into coherent, actionable insights.

Breaking the Data Silos: The Tokenization Imperative

The conversation opened with a technical and operational overview of tokenization—defined as the de-identification of personally identifiable information (PII) to generate a unique, irreversible, encrypted key (a “token”) for each patient. Tokenization enables longitudinal data linkage without compromising privacy, offering a mechanism to unify disparate datasets, including claims, EHRs, lab results, and social determinants of health.

Wissner explained that trial datasets often capture only a sliver of a patient’s clinical history, and traditional Subject IDs are not interoperable across systems. This siloing makes it difficult to assess prior diagnoses, comorbidities, or treatment regimens—all of which are critical for study design and eligibility screening. Through tokenization, however, researchers can link a patient’s full health history to the trial record, gaining a 360-degree view that supports both near-term activities (e.g., confirming inclusion/exclusion criteria) and long-term follow-up (e.g., hospitalizations, mortality, treatment adherence).

The need for such linkage is not hypothetical. A survey cited in a related session revealed that only 11% of organizations had fully implemented AI/ML strategies across clinical activities, despite significant investments in data systems. The gap between data availability and data usability remains wide, and tokenization offers a path to bridge it.

Case Study: Rare Disease Trial Validates RWD Linkage Strategy

A concrete case study illustrated how tokenization can be operationalized. A top-five pharmaceutical company implemented tokenization in a late-phase trial for a rare genetic disorder. Pediatric participants or their guardians were offered an optional informed consent addendum allowing for the generation of a token.

The company then conducted an overlap analysis across 10+ RWD sources—including claims, EHRs, lab data, and consumer information—to determine how many participants were represented in these external datasets. The results revealed significant overlap with claims data, but insufficient EHR representation for the analyses they intended to run.

Instead of halting the effort, the company opted for targeted medical record retrieval using patient authorization to extract specialized clinical care information. This blended approach—tokenization for breadth, and targeted retrieval for depth—enabled the team to reconstruct missing medical histories and reduce the burden of additional data collection, while preserving patient privacy.

The trial demonstrated not only the feasibility of tokenized linkage but also the flexibility required to pivot when coverage gaps emerge. According to Wissner, such strategies can inform future study designs, trial simulations, and cohort modeling by making use of both enrolled and screen-failed patient data.

Operational Realities: Consent, Complexity, and Cost

Despite the promise of RWD linkage, the operational challenges are non-trivial. Consent remains a critical human element. Presenting tokenization and medical record retrieval consent at the initial informed consent discussion—not later—yields the highest opt-in rates. To support this, study sites must be armed with clear, accessible materials to explain tokenization in plain language and overcome fears that it might be linked to abstract concepts like blockchain or cryptocurrency.

From a sponsor perspective, cost is another friction point. Engaging multiple data vendors to source and validate external data can escalate budgets rapidly—especially if the research questions are not well-defined. Wissner cautioned that many sponsors purchase access to data sources without knowing what they will do with the data. The result: ballooning expenses and minimal return on investment.

To mitigate this, she recommended a “fit-for-purpose data assessment” early in the planning cycle. Rather than layering data for data’s sake, sponsors should start by identifying specific use cases—such as confirming prior medication use or modeling dropout risk—and then select datasets that serve those needs.

AI and the Future of Intelligent Patient Finding

The discussion closed with a forward-looking reflection on the role of artificial intelligence in transforming real-world data applications. Datavant already uses machine learning to structure unstructured clinical notes and optimize patient-to-data matching. But as Wissner emphasized, the bottleneck is not data volume—it’s data fragmentation.

AI’s next frontier, therefore, lies in targeting and surfacing the right data sources for the right questions. For example, AI tools can be applied to predict which patients are most likely to be eligible for a trial based on modeled RWD, or to simulate how design changes might affect cohort diversity or dropout rates.

Rizada echoed this sentiment, noting that AI is central to Indegene’s evolving patient recruitment strategy. Their focus is not just on identifying eligible patients faster, but on integrating AI-driven workflows that evaluate fit-for-trial status dynamically—especially for rare or “needle-in-the-haystack” populations.

Key Takeaways and Future Directions

The session reinforced a few key messages for industry stakeholders:

  • Start early. Incorporating tokenization into the IRB submission and protocol from the outset improves both feasibility and patient opt-in rates.
  • Educate everyone. Patients, sites, and study teams all require tailored education to understand the value and safety of data linkage strategies.
  • Define questions first. A targeted data strategy prevents scope creep and controls costs.
  • Combine technologies. Real-world data linkage, AI, and medical record retrieval are not mutually exclusive—they work best in combination.

With regulatory bodies such as the FDA increasingly supportive of real-world evidence and data linkage strategies, and industry partners demonstrating operational feasibility, the convergence of AI, tokenization, and patient-centric trial design appears inevitable. As stakeholders move beyond pilots and proofs-of-concept, the field must prioritize scalable, consent-driven frameworks that deliver real-world results—faster, cheaper, and with deeper insights than ever before.

Website | + posts

Moe Alsumidaie is Chief Editor of The Clinical Trial Vanguard. Moe holds decades of experience in the clinical trials industry. Moe also serves as Head of Research at CliniBiz and Chief Data Scientist at Annex Clinical Corporation.