Objectives
A previous study (ISPOR 2024, P48) described a method ‘LLMAdapt’ that uses a large language model (LLM) to automatically adjust an Excel-based cost-effectiveness model (CEM) from the setting of one country to another. The authors found a high level of accuracy (97%) for one test case. Assessment of generalizability is an important step for uptake and acceptance of AI-based methods by decision-makers. The objective of this study was to assess the generalizability of LLMAdapt across two distinct disease areas and countries.
Methods
LLMAdapt (powered by Generative Pre-trained Transformer 4 [the gpt-4-1106-preview model]) was used to automatically adjust two HTA-ready Excel CEMs from the setting of one country to the setting of another. To support the adaptations, GPT-4 was provided with tabular data for each of the target countries in a format that mimicked the output of a targeted literature review. Prior to conducting the study, each CEM received minor updates to improve its interpretability, such as clarifying vague descriptive text. The models spanned the following disease areas: muscle-invasive urothelial carcinoma (MIUC) and myelodysplastic syndrome (MDS) and were adapted to the following countries: the Czech Republic and the United States. All automated adaptations were manually checked by a human health economist to assess accuracy.
Results
The adaptations were performed without human intervention in 132 and 207 seconds. LLMAdapt performed 101/102 and 198/199 required updates successfully, resulting in accuracy scores of 99.0% and 99.4%. Two errors were identified, in which required parameter value changes were missed.
Conclusions
We found that the accuracy of LLMAdapt was maintained across two distinct disease areas and countries, demonstrating the generalizability of LLM-based methods to automate the adaptation of Excel-based CEMs. This is an important step towards uptake of these methods.
Cite This Article
HTA156 Assessing the Generalizability of Automating Adaptation of Excel-Based Cost-Effectiveness Models Using Generative AI Rawlinson, W. et al. Value in Health, Volume 27, Issue 12, S384. https://doi.org/10.1016/j.jval.2024.10.1981