Can Generative AI Rebuild Health Economic Models? Insights from an Ulcerative Colitis Case Study

29 May 20250

At ISPOR 2025 in Montreal, our team at Value Analytics Labs—in collaboration with Takeda Pharmaceuticals—presented groundbreaking research on the use of generative AI in health economics. Our study explored a frontier application: using large language models to replicate published cost-effectiveness models.

Why This Matters

Health economic models are foundational to health technology assessments (HTAs), pricing decisions, and formulary submissions. Yet replicating or adapting existing models is often slow, opaque, and resource-intensive, largely due to inconsistent documentation and restricted access to model code.

Generative AI offers a promising solution. If it can accurately extract and reconstruct models from published materials, it could dramatically increase transparency and accelerate model development timelines.

A New Use Case for Generative AI in HEOR

In our research, titled “Evaluating Generative AI in Replicating Health Economic Models: A Case Study on Ulcerative Colitis,” we evaluated the capabilities of ValueGen.AI, a GPT-4-powered platform, to extract and rebuild a complex, HTA-ready Markov model from text alone—without access to the original model code.

The model we focused on, originally published by Salcedo et al.1, described a complex treatment landscape for ulcerative colitis, incorporating multiple lines of therapy, surgical options, and a range of health outcomes.

What We Found

Our evaluation tested the AI’s performance using two types of source material: the original journal publication and a more detailed technical report. We assessed its ability across five core modeling components:

  • Health States: GPT-4 performed well at identifying clearly labeled states, particularly in the technical report. However, it sometimes hallucinated non-existent states or misclassified vague concepts.
  • Transition Probabilities: While some values were correctly extracted, the AI struggled with recognizing time horizons and interpreting hazard ratios, especially when transitions to death were not explicitly described.
  • Costs: Surgery costs were accurately captured from the technical report. However, drug costs, administration, and adverse event costs were frequently missed due to unstructured formatting or logic embedded in the text.
  • Utilities: When utility values were clearly labeled, extraction was generally accurate. But the AI occasionally misapplied values to hallucinated health states.
  • Treatment Pathways: This domain posed the greatest challenge. The AI was unable to reconstruct therapy sequences or conditional transitions, primarily due to narrative formatting and the absence of explicit logic rules.

What This Means for the Future of HEOR

Our findings underscore both the potential and the limitations of generative AI in health economic modeling. When documentation is structured and transparent, AI tools like ValueGen.AI can already support significant portions of model replication. But when inputs are vague or embedded in narrative descriptions, the AI’s performance drops significantly.

This highlights a critical opportunity for the field: improving the standardization of economic model documentation. If we want to unlock AI’s full potential to support reproducible, scalable modeling, we must give equal attention to how we document models—not just how we build them.

Looking Ahead

At Value Analytics Labs, we’re committed to advancing the responsible integration of AI in health economics. This research shows that generative AI can go beyond literature review and evidence synthesis—it can support model reconstruction, with implications for HTA bodies, payers, and life sciences companies.

Interested in learning more about how ValueGen.AI can accelerate model development and replication?

Contact us to schedule a demo or discuss a potential collaboration!

Salcedo, Jonathan, Daniel Hill-McManus, Chloë Hardern, Oyin Opeifa, Raffaella Viti, Ludovica Siviero, Antonio Saverio Roscini, and Gennaro Di Martino. “Cost-Effectiveness of Vedolizumab as a First-Line Advanced Therapy Versus Adalimumab Treatment Sequences for Ulcerative Colitis in Italy.” PharmacoEconomics-Open 8, no. 5 (2024): 701-714.

Samur et.al., Evaluating Generative AI in Replicating Health Economic Models: A Case Study on Ulcerative Colitis, https://www.ispor.org/docs/default-source/cti-meeting-21021-documents/ee147dbe-c5f0-48e1-a727-cc9392646347.pdf?sfvrsn=3c04ce2d_0, Accessed by May 2025.

Leave a Reply

Your email address will not be published. Required fields are marked *

©2025 Value Analytics Labs. All rights reserved | Terms of Service | Privacy Policy | Design by 9piksel