Link to original post here.
Most physicians don't spend much time thinking about clinical documentation integrity. We document for care, not for billing — and that gap has always cost hospitals money. For every mildly low sodium mentioned in a note but never formally coded, revenue walks out the door. Health systems have known this for years, and they've responded with teams of CDI specialists whose entire job is to chase down those missed diagnoses after the fact.
AI is now taking over that job — and doing it at a scale no human team ever could. Ambient listening tools transcribe your conversations in real time. Coding AI scans the full chart retrospectively and surfaces every billable condition you mentioned but didn't list as a formal problem. The ROI data from early adopters is striking. But a new analysis from Blue Cross Blue Shield's research arm — covering 62 million members across three years — is telling a more complicated story about what happens when you maximize coding completeness without a corresponding commitment to clinical accuracy.
How AI Documentation Tools Are Making Upcoding Worse
I'm a physician who eventually wants to lead an ICU or department—something beyond Healthcare Huddle. So I'm developing two distinct mindsets: the physician and the leader. The physician mindset is straightforward, and it's my main focus: do no harm while improving a patient's condition through medical interventions, surgical procedures, or palliative care. The leader mindset asks: how do I generate revenue for my unit to sustain operations and deliver value?
Those two mindsets sometimes pull in opposite directions. But occasionally, they converge — and when they do, you start noticing problems that most clinicians walk right past.
Clinical documentation integrity, or CDI, is one of them.
One area that has received tremendous attention over the past decade is clinical documentation integrity, or CDI. Every institution approaches it differently, but CDI typically involves a team of humans (this is changing—if you don't care for my analysis, jump to the solutions section) reviewing every note in a patient's chart to ensure no condition goes undocumented. In other words, they ensure every condition a patient presented with or developed during admission is coded for and can be billed for.
Since humans are prone to error, we often mention something in a note without coding for it. If a patient arrived at the hospital mildly hyponatremic, I may have ignored it—but that should be listed as a hospital problem!
Hospitals operate on thin margins—or so they say. Ensuring everything is properly coded helps the bottom line, but it requires significant human labor to review charts. This is, in my opinion, a problem.
Root Cause Analysis: 5 Whys
The 5 Whys process in root cause analysis involves repeatedly asking "Why?" five times to drill down into the root cause of a problem by exploring the cause-and-effect relationships underlying the issue.
The problem: Hospitals struggle to ensure that clinical documentation accurately and completely captures every condition a patient presents with or develops during admission, leaving revenue uncaptured, clinical records incomplete, and care continuity at risk.
- Why?: Physicians document what they're treating and focusing on, not necessarily every diagnosable condition present during the encounter (e.g., a mildly low sodium gets mentioned in a note but never formally listed as a hospital problem).
- Why?: Clinical notes are written under time pressure and most physicians aren't trained in ICD coding. They don't naturally think in terms of billable diagnoses while they're managing a sick patient.
- Why?: The gap between clinical language and billing language is wide. For example, a physician writes "mild hyponatremia, likely dehydration" and a CDI specialist codes it as a condition with a specific DRG weight. Those are two different things in the billing system, and bridging them requires expertise most physicians don't have and weren't trained to develop.
- Why?: CDI programs depend on a retrospective, manual review process. This process is labor-intensive, inconsistent across reviewers, and structurally unable to keep pace with documentation volume.
- Why (root cause)?: There is no real-time, systematic mechanism embedded in the physician's workflow to ensure every clinically significant condition is documented with the specificity required for accurate coding.
Impact Analysis
Impact analysis is the assessment of the potential consequences and effects that changes in one part of a system may have on other parts of the system or the whole.
- Patient: Incomplete documentation has direct consequences for care continuity. If a condition isn't coded, it may not appear in a patient's problem list, transition of care summary, or downstream medical record. A patient transferred to a rehab facility or seen by a new physician weeks later may arrive with an incomplete clinical picture—and decisions get made on incomplete information.
- Clinician or Provider: Physicians carry the burden of documentation without adequate tools or training to meet billing standards. CDI query workflows—where a specialist sends a question back to the ordering physician about their note—interrupt already stretched schedules and add cognitive load at exactly the wrong moment. Most of us don't know what goes unrecorded, because no one shows us in real time. We're expected to document clinically and to billing standards, and the feedback loop for when we fall short is slow and indirect.
- System: Incomplete documentation directly reduces reimbursement. Conditions that were treated but not coded don't generate the DRG weight they should, and hospitals with weak CDI programs leave meaningful revenue on the table. Beyond the financial impact, poor documentation skews quality reporting, distorts risk adjustment models, and undermines population health analytics. Systems can't improve what they can't measure, and they can't measure what was never coded.
Solution
The manual CDI model was always a workaround and the logical evolution is obviously AI. These two categories of tools have now blended together to address the problem:
- Ambient AI: integrates directly with the EHR and converts the clinician-patient conversation into a structured note in real time. As a physician, I speak with a patient, and the tool captures not just the narrative but pulls out diagnosable conditions, assigns specificity, and flags them for coding—without me having to think about ICD-10 at the bedside. One ambient listening vendor markets their product as generating an additional $13,000 per clinician annually in recovered revenue. The big players are Abridge, Nuance DAX Copilot, Commure/Augmedix. They’re being deployed at the enterprise-level at some of the largest health systems in the country, including UPMC, Emory, Northwestern Medicine, and HCA Healthcare. Even my institution, Mount Sinai (although they don’t let me use it… sad 😞)
Coding AI: works retrospectively. After the note is written, these platforms scan the full medical record—labs, problem lists, clinical notes, prior admissions—and surface additional billable diagnoses or higher-severity coding combinations that influence DRG assignment. Companies like CodaMetrix, SmarterDx, and Arintra are operating at major academic medical centers and health systems. SmarterDx markets a 5:1 return on investment on day one. McLaren Health Care reported $11.3 million in additional annual revenue from their deployment. Novant Health doubled their ROI forecast.
This is genuinely useful technology. CDI was always under-resourced and structurally slow, and AI closes the gap between what physicians document and what actually gets captured and billed. From a leader's perspective, I understand the appeal.
But here's where it gets complicated…
Blue Cross Blue Shield's research arm—Blue Health Intelligence—just published an analysis of commercial inpatient claims covering approximately 62 million members, examining claims from April 2022 through March 2025. The timeline matters since it spans the period before significant AI adoption in hospitals through the period of meaningful deployment by early-adopter systems.
Some main findings that I find super fascinating (but, I’m keeping in mind this is published by someone who does NOT like paying more money!)
- Across those 62 million members, per-member inpatient costs increased by approximately 9% from 2023 to 2024. BHI estimates that roughly 20% of that increase is attributable to rising coding intensity—not to actual changes in patient acuity or care delivered.
- The top 10% of hospitals drove most of it. Those high-growth facilities saw the proportion of inpatient admissions coded as clinically complex jump 13 percentage points from 46.8% in mid-2022 to 59.8% by early 2025 . The other 90% of hospitals saw a much more gradual 4-percentage-point rise over the same period.
- Maternity admissions explain this jump! Across all hospitals in the analysis, coding for postpartum anemia (ICD-10 D62) climbed steadily—from 6.8% of maternity admissions in mid-2022 to 9.3% by early 2025. Transfusion rates, which are the standard treatment for clinically significant postpartum anemia, stayed essentially flat the entire time: 1.0% to 1.1%.
- Among the highest-growth hospitals specifically, postpartum anemia coding went from 4.0% to 12.3% of maternity admissions—a threefold+ fold increase. Transfusions did not increase as such.
One BCBS Plan went further and audited maternity cases at one of the largest outlier hospital systems in their network. They found that less than 20% of cases coded with postpartum anemia actually met established clinical criteria for the diagnosis.
The BHI report estimated that in maternity admissions alone, the coding intensity shift contributed approximately $22 million in additional spending over the analysis period, for a single secondary diagnosis code, with limited evidence of corresponding treatment.
BHI was careful to note that the analysis is not a determination of clinical appropriateness or provider intent, since upcoding can happen without anyone making a deliberate decision to commit fraud. When an ambient listening tool captures every passing mention in a clinical conversation and a coding AI surfaces it as a billable secondary diagnosis, the physician may never see the final coded output. The machine is optimizing for completeness and revenue—which is exactly what it was designed to do.
In Sunday's newsletter, I'll go deeper into this—because I think something significant is about to happen. Healthcare costs may actually increase. If hospitals are now coding more precisely for everything, and insurance companies are increasing reimbursement for conditions that previously went unbilled, insurers will need a way to recover those costs. They're going to raise premiums. And so it goes…
In summary, AI-powered documentation tools are improving coding completeness and capturing previously missed revenue, but they're also driving coding intensity increases that may not reflect actual changes in patient acuity or treatment. The gap between what gets coded and what gets treated—illustrated starkly in the postpartum anemia data—suggests we're (AI) documenting more without necessarily doing more.
Jared Dashevsky, MD, is an internal medicine physician and incoming pulmonary and critical care fellow at Mount Sinai, and the founder of Healthcare Huddle — a newsletter read by over 30,000 physicians and healthcare professionals. He writes at the intersection of clinical medicine, health policy, and health technology, translating complex industry dynamics into sharp, evidence-based commentary for busy clinicians. His work covers AI in practice, drug pricing, insurance dysfunction, and the business forces reshaping how medicine is delivered.