2025 AMA Research Challenge – Member Premier Access

October 22, 2025

Virtual only, United States

Would you like to see your presentation here, made available to a global audience of researchers?
Add your own presentation or have us affordably record your next conference.

Background

Quality assessments relying solely on structured EHR data do not capture note-documented explanations for apparent non-adherence to lipid management guidelines. Large language models (LLMs) present a promising approach for extracting this information at scale.

Purpose

To evaluate the accuracy and impact of LLMs when applied to assess adherence to lipid management guidelines in CAD.

Methods

We used structured EHR data to analyze LDL-C levels and statin use of patients with ASCVD seen in cardiology clinics at a large academic medical center from 2022-2024. We then applied GPT-4o with zero-shot, chain-of-thought prompting to free-text cardiology clinic notes to (1) identify external LDL-C values documented within the notes of patients without a recent in-system lipid panel and (2) infer reasons for statin non-use among those with LDL-C ≥70 mg/dL. We validated GPT-4o's accuracy through manual review of 500 randomly selected charts.

Results

Among 8,595 patients (median age 71.0 years, 61.5% male, 70.8% White) initially identified, 53.6% were on high-intensity statins and 38.1% had LDL-C <70 mg/dL. GPT-4o confirmed a CAD diagnosis in 6,534 (76.0%) patients (Figure); within this group, GPT-4o identified external lipid panels in n=403 (6.2%), high-intensity statin use (n=59, 0.7%), and statin intolerance (n=804, 12.3%) documented in notes but not captured in structured fields (PPV 0.91-1.00). Among patients with GPT-4o-confirmed CAD and updated medication/laboratory data had higher measured rates of high-intensity statin use (61.4% vs. 53.6%) and LDL-C <70 mg/dL (44.1% vs. 38.1%). Excluding statin-intolerant patients further increased high-intensity statin use to 68.4%. Among patients with LDL-C ≥70 mg/dL not on high-intensity statins (N=1,164), GPT-4o-extracted reasons for nonuse were clinician decision (13.0%), intolerance (38.5%), patient preference (8.2%), and undocumented (40.4%). GPT-4o classification performance was strong (F1 > 0.71 across all categories), with the highest performance observed for statin intolerance (F1 = 0.96).

Conclusions

GPT-4o can accurately identify clinical factors not captured in structured EHR data. This approach enables more accurate guideline-based quality assessments and yields actionable insights into reasons for apparent gaps in care.

Downloads

Transcript English (automatic)

Next from 2025 AMA Research Challenge – Member Premier Access

Ophthalmic Conditions in Patients with Multiple Sclerosis

Ophthalmic Conditions in Patients with Multiple Sclerosis

2025 AMA Research Challenge – Member Premier Access

Ha-Neul Yu

22 October 2025