25 Aug 2025

Abridge Outlines Approach to Eliminating AI Hallucinations in Clinical Notes

Abridge, a clinical documentation platform, has released a new white paper titled The Science of Confabulation Elimination, outlining its approach to detecting and eliminating hallucinations - or unsupported claims - in AI-generated clinical notes before they are sent to clinicians. By focusing on transparency and trust, the company aims to set a new industry standard for safe and reliable use of AI in healthcare.


The rapid adoption of AI in clinical settings, including Abridge’s platform now deployed across more than 150 health systems, has delivered clear benefits such as time savings and reduced clinician burnout. Yet this growth also highlights the need to safeguard accuracy and quality. Abridge notes that documentation errors are not unique to AI; a 2020 study found that 21% of patients who accessed their notes perceived mistakes, and 42% of those were considered serious.


To address this issue, Abridge has created a structured framework for categorizing unsupported claims along two axes: “Support” and “Severity.” The Support axis determines whether a statement is fully supported by a transcript, contradicted by it, or not substantiated at all - for example, a “directly supported” claim matches the transcript exactly, while an “unmentioned” claim cannot be inferred from it. The Severity axis assesses the potential impact of an unsupported statement. A “major severity” claim, such as a fabricated diagnosis, could negatively affect care or cause significant harm, whereas “minimal severity” claims, like minor wording changes, carry little clinical consequence.


Building on this framework, Abridge has developed “purpose-built guardrails” to improve factual accuracy. These include a proprietary AI model trained on more than 50,000 curated examples from open-source and domain-specific clinical data, alongside an automated correction system. In testing against an internal benchmark of more than 10,000 clinical encounters, the Abridge system caught 97% of confabulations, while a general-purpose model such as GPT-4o identified only 82% - missing six times as many errors.


Despite these results, Abridge emphasizes that clinician oversight remains critical. Its platform includes features such as Linked Evidence, which allows clinicians to trace every AI-generated summary back to the original transcript. This combination of AI guardrails and human review ensures that notes entered into the electronic health record maintain the highest possible standard of accuracy.


Click here to read the original news story.