With advances in artificial intelligence (AI) making headlines almost daily, the life sciences industry is eagerly looking to implement this beneficial technology to advance the science. Andrew Goldberg, chief operating officer of HealthVerity, recently sat down with Naqi Khan, physician executive global healthcare gen AI & ML for Amazon Web Services (AWS), to discuss the short-term lessons and long-term impact of AI across the industry. If you missed the webinar, here’s a recap:
A number of life science and biopharma companies are choosing to build the infrastructure to support AI and machine learning (ML) capabilities. In conversations with life sciences executives, Naqi noted three key areas of interest for deploying AI capabilities:
Gilead, a top 20 pharmaceutical company and partner with AWS and HealthVerity, has been working with AWS on patient access modeling using Amazon SageMaker, an ML platform that allows you to build, train and deploy ML models for any use case with fully managed infrastructure, tools and workflows.
Gilead built a foundational common data model with AWS, creating a data lake with various inputs. Naqi noted the importance of validating, verifying or vetting in some way the inputs or data being used in AI models. This tends to be a critical friction point for most organizations.
The data lake created by Gilead feeds various AI models, including:
The output from all of this is optimized study feasibility and trial execution insights, creating an end-to-end pipeline. Using LLM, similar to having a chat feature in the workflow, allows investigators to actively query and filter clinical data to understand trial performance, such as if a particular trial is achieving or missing diversity metrics. Using a single pane, users can easily gather insights that would have required extensive back and forth with the data sciences team. The models Gilead is developing will allow them to conduct feasibility studies and execute trials in a much more efficient and cost effective manner.
During the webinar, Andrew and Naqi engaged in a lively discussion to answer AI-related questions:
Andrew: How long did it take Gilead to implement their AI strategy?
Naqi: Gilead has been working on this stack of infrastructure for over a year. Their commitment and development with us has been going on longer, but a lot of gen AI has only come out recently, since last November, so actual experimentation with newer user experiences is relatively recent.
Model development is not trivial and can require a lot of time. It also takes time to source and curate the right datasets for the model you’re trying to build and the constant validation is where that time sync tends to happen. It’s not an overnight phenomenon even though it sounds like that when you read about a new chatbot every day.
Gilead made an investment in adopting Amazon Bedrock (a platform that provides an easy way to build and scale generative AI applications with foundation models) over the course of the last year or so and that’s where they’ve been able to get to a level of success.
Andrew: When you think about pharma, payers, providers, academics and government, which segment is adopting AI the most and where do you see the most impact now versus down the road?
Naqi: I do feel like everyone wants to do something with AI right now. My role is a bit more biased towards the provider organizations and healthcare delivery systems, but I think in terms of ease or industries seeing that quick return on investment, it’s surprisingly frequently been biopharma, especially with drug development. There’s a whole unlock that’s happening with being able to run experiments that previously would have been something that you’d run on a bench. Now you can do new interesting protein synthesis or drug molecule synthesis using gen AI tech that wasn’t even available last year.
It makes me a little bit envious, being a clinician and seeing our hospitals struggle a little bit to get things going, but I do think we’ll get there. It’s the nature of the information and data that’s available. Even though healthcare data is massive on the provider front, I do think there’s a whole other level of privacy and issues that arise. Having a validated data entity do some of the work might help with that.
Andrew: To your point that Gilead hasn’t been on this journey all that long and that certain technologies have only been available for the last year or so, but still, for those who think they’re late in adopting the technology, what do you think they can do to accelerate the gains?
Naqi: There definitely is that sense that you’re missing out and just wanting to try something. Our role at AWS tends to be let’s try to figure out what you’re trying to do before you just start throwing tech at it. If you’re actually trying to keep pace with development and emergence of these models that keep popping up on scene, there’s a new model almost every week, development against that kind of timeframe becomes near impossible.
What AWS has tried to do is break this abstraction layer through Bedrock where you have a singular API and can sort of de-couple from that model development. And when we have customers who have zoomed in on the use case they’re trying to solve, that’s also hugely beneficial.
Andrew: Another point you made is that LLMs are getting commoditized and the value is in the data. As you think about the learning, is it just trial and error or is it a drive toward better data, better models or better business models overall?   
Naqi: The data piece is so significant. I’ll give an example from the clinical side. We have many health systems who want to build chatbots or a Belson summarization tool and think they have all this EHR data sitting around that can be used to build or refine a model, but the reality quickly becomes that some of this data is quite bad or hasn’t been validated or there’s a lot of duplicate data and they don’t have the teams or capabilities inhouse to refine that dataset.
A common issue that we see with LLMs is that they hallucinate, or if given the wrong sets of information, they tend to generalize the wrong kinds of outputs. So having that data be really pristine and clean is so significant. Again, there’s a new model coming onto the scene every week that seems to beat the previous week’s running model, so it just feels like that has already achieved commodity-level status and we’re just going to be trying to optimize against costs.
Andrew: It’s a bit of a garbage in, garbage out problem. A lot of public tools have been trained on whatever data has found its way onto the internet or various public access databases. I think the point we’re both trying to make is if you rely ultimately on clinical-level data and you don’t understand the data provenance of where that data is coming from, you don’t appreciate the accuracy of the way those patients are linked, you don’t appreciate false positive and negative rates, it’s going to drive conclusions that aren’t actually there, right?
Naqi: Exactly. You can experience hallucination and drift.
Andrew: Another point, the trial and error is not free, so how does someone rationally think about compute and the cost of all the experimentation? Does the cost go down over time?
Naqi: That is the question of the moment. AWS tends to be really focused on making sure you’re not spending a lot of money. As an example, we have a vendor partner who is using Bedrock today to inference on large sets of clinical data. When they originally started on their journey several months ago, I think their inference costs for every single query were running almost $1.80 to do every single summarization. This was because of the large amount of data they were sending us. Users were hitting this thing all the time, so costs were going crazy. We stepped in and did a bunch of optimization on how instances were set up, using LaMDA and serverless, and we dropped those costs down to $0.09 per summary, a significant savings.
When you’re in the experimentation phase, our organization steps in. We can often provide some sort of innovation credit or be able to provide some sort of layer so you can get a handle on how some of these models might be working.
To help with this cost piece, if you’re coming in with this whole, huge, undifferentiated dataset, you’re going to have problems. That’s the first place where we tell you that you need to scope down. One of the mechanisms that frequently customers are trying to use is retrieval augmented generation, which is this capability where you can point your whole gen AI set up to an existing, validated data silo. So when the LLM is providing answers, it can somewhat preferentially choose from that data silo when providing its response so that the hallucination risk goes down a little bit. But even in that, if you have just a huge trove of information for your retrieval augmented generation set up, you’re, again, going to be hitting those costs.
My core recommendations are to start smaller with that dataset if you can. There are ways to do that experimentation and get a good handle on whether this is going to work or not in scale. And, obviously, our teams are available to help you drive those costs down.
Andrew: Given that, especially on the pharma side, people’s lives are at stake, and with everyone being inundated with calls and emails about a new technology or dot ai company, are there techniques to quickly tell which ones are delivering actual value?
Naqi: This is the conundrum. From our standpoint, we certainly have tooling in SageMaker. We really emphasize the use of model cards that provide a glimpse of what this model is built off of, what dataset you can use, what are the best applications. There are other techniques in forms of evaluation or existing frameworks. There’s a really great framework called HELM for comparing one of these things to another. A lot of this we’ve tried to bake into SageMaker as much as possible for the builder mindset, but for the buyer mindset, the CFOs, CTOs, CEOs or even as a line supervisor trying to make some sense of a startup who’s coming to them, admittedly, I think it’s very challenging right now to determine if it’s going to be useful for you. Most of the organizations I work with are very conservative and are looking for these startups to come to them with a case study already or some example of where it works. If they don’t, then I think they’re outright just seeking some kind of RSUs (restricted stock units) from that organization to proceed further. They want to have skin in the game, so they will outright ask to be part of that cap table even. I think that’s what we’re seeing. I don’t know if this is the best answer, but I think the be cautious and wait approach might be ok for some organizations.
Andrew: As Nvidia (an AI chipmaker) starts to take over the world, is there a Moore’s Law (an observation that the number of transistors in a computer chip doubles every two years or so) aspect to AI, are the processes, language models or insights getting faster or doubling in performance every 18 months?
Naqi: I am a bit of a geeky builder myself, but I don’t know. Right now it feels like everything is sort of hockey sticking to this crazy degree. I’ll give you one example. There’s a company called Groq that just dropped onto the scene a few days ago and they’re taking a whole different approach to how these LLMs and chatbots provide information back. Usually, if you use ChatGPT today, when you type in a question, it probably takes a few seconds for it to answer, if not more depending on what you’ve asked it. With Groq, you type a question in and you’re getting almost an instantaneous response back. They’re creating a chipset that’s a little bit different and trying to optimize the heck out of understanding how these queries will be analyzed. So I think there’s almost unlimited abilities for us right now. It feels like Moore’s Law kind of broke
Andrew: Given your global role, is this happening outside of the U.S.?
Naqi: Oh absolutely! It’s been really fascinating to see. Some of the best models right now are actually coming out of China. There’s a really great set of open large language models, I believe they’re called Qwen-7b. We’ve seen a massive amount of models coming from the Middle East. The UAE has been churning out these models and they’re able to use data in a more interesting way than we necessarily can in the states. For some reason France has become the headquarters for some of these cooler generative AI companies, like Hugging Face or Minstral, an almost rebellious group of individuals creating models and just shipping them out. I think I just saw a headline this morning that a number of Japanese businesses are adopting very, very quickly. That’s surprising because I always feel like they tend to proceed conservatively. I feel like there’s a rush everywhere.
Andrew: As you talk about all these public models and then people think about the intellectual property that is their data, and the ability to use APIs through Bedrock to tap into some of these models, when I use a public model, does the public model also learn from my data and retain it, or am I only taking advantage of the public model and all the learning and intellectual property stays with me?
Naqi: To add another layer of distinction, there’s open source models that exist in Hugging Face, etc., and there’s the closed source, proprietary models that come from Open AI or Anthropic. Ultimately both those sets, they will either learn or not learn based on where you’re doing that experimentation, where you’re doing that build. With Bedrock, you can only use it in what we call a virtual private cloud. So all your calls, everything you’re doing with that model, that doesn’t go back up to the parent company, it doesn’t go back to Anthropic based on how the virtual private cloud is set up; however, as a builder, you can drop into your own environment that you have configured and if you haven’t set up that guardrail or that blockade or almost like a firewall, that information presumably can go back.
I think we were seeing a lot of this during the early days, probably just six months ago. As an example, I think it was Samsung employees who were using ChatGPT and they were just uploading and sending private information. Then all of a sudden, somebody at Nokia could find out what Samsung had been asking about because of how that model was running. To be fair, no one had said that your information would be protected in that scenario. We certainly take the opposite approach and we tend to be very enterprise centric. We want to make sure that when customers come to us and are using us, they can at least feel assured that Meta doesn’t happen to know your earnings report before you do.
There are use cases for AI throughout the pharmaceutical lifecycle, from clinical trials through commercialization. Gaining the high-quality data that’s needed is dependent on accurately resolving the patient’s identity across time and sources. HealthVerity offers an approach that allows you to establish a unique but persistent identity for an individual with 10x the accuracy of legacy technologies, enabling the synchronization of the patient’s data across unlimited sources and providing insights before, during and after your particular use case.
To ensure ongoing quality throughout the process, HealthVerity FLOW manages all aspects of identity and personally identifiable information (PII), while also governing permissions around what you’re allowed to do or not allowed to do with a patient’s data. Additionally, there is the optionality to license de-identified or identifiable data for those patients, with all of the needed protections and data governance, allowing high-quality, AI-ready data to flow where you need it, when you need it and how you need it.
For more information on HealthVerity FLOW and AI-ready data: