16 Jun 2026

Designing the Health Systems AI Promised

Authors:

Anca del RíoCo-Founder & Executive PartnerAcuvera
Tyson WelzelFounder & Managing PartnerAcuvera

Every technology transition in healthcare has produced the same pattern: capability arrives before governance, deployment outpaces integration, and the accountability gap closes, eventually, through incident, litigation, or regulation. AI is running the same pattern, faster and at greater scale, in systems that were already fragile before it arrived.

The more uncomfortable question shifts the lens away from what AI is doing to health systems, toward what health systems were never built to be. The institutions now absorbing agentic AI were designed for a different era, a different set of tools, and a different distribution of risk. They were not built for this. Which means the work ahead is not fixing what exists. It is designing what does not exist yet.

That is an invitation as much as it is a diagnosis. The question is not whether we can imagine the health systems AI promised. It is whether we can carry them.



The system was never built to absorb this.

A clinician orders, documents, and signs. An agentic system initiates, sequences, and acts across functional boundaries simultaneously. It does not wait for a workflow to invite it in, nor produce a single traceable decision point. It produces distributed effects across pathways, departments, and patient records that no existing governance chain was designed to follow.

Figure 1. The governance blind spot.

Patient safety has understood for decades that error is not a character flaw. It is a system property. To Err Is Human built the modern patient safety movement on that insight: that human error is inevitable, and that clinical systems — from critical incident reporting (CIRS) to the governance frameworks of clinical risk management — must be designed to absorb, surface, and learn from it. The entire architecture of patient safety assumes one thing: that the human remains in the causal chain, visible, interruptible, and ultimately accountable.

Agentic AI inverts the problem. The question is no longer what happens when the human errs. It is what happens when the machine errs or when the interface is designed in a way that makes human error not just likely, but structurally guaranteed. The Swiss cheese model assumed fallible humans and defensive systems. Agentic AI introduces a third condition: a system that performs well enough, often enough, to erode the vigilance that safety depends on. That is not a technology problem layered onto an organisational one. It is a categorical mismatch between the architecture of the tool and the architecture of the institution deploying it. This holds across the spectrum, whether systems are fully autonomous, semi-autonomous, or bounded by explicit constraints. The governance gap is not a matter of degree. It is structural.

Procurement is where this mismatch becomes concrete. AI acquisition decisions are made through processes designed to evaluate equipment — assessed on cost, compliance, and vendor credibility — not on clinical workflow fit, accountability mapping, or governance architecture. Innovations are acquired, deployed into workflows that were never redesigned to receive them, and handed to clinicians whose job descriptions, scopes of accountability, and professional standards have not changed. The result is not transformation. It is complexity without coherence.

When deployment outpaces integration, technology does not resolve fragmentation. It compounds it.”



The human-in-the-loop is not a safety net. 

On a Monday post-take round, an agentic medicines-optimisation system presents the covering doctor with forty-one prescribing actions across the night’s admissions: reconciliations, renal dose adjustments, and several deprescribing proposals. Each carries a green confidence marker and a single Approve control, with the underlying reasoning two clicks away, behind data she has neither the time nor the full visibility to reconstruct. She clears the queue in under four minutes, because the round is moving and the system has been right almost every time for six months. One action continues an anticoagulant at a dose the agent inferred from an outdated weight; no one re-derives it, because the interface was built to be cleared, not interrogated. The approval is recorded against her registration number.

This is the design working as intended, not an aberration. We have built systems in which a human ratifies a decision they did not make, on information they cannot fully interrogate, in a timeframe that forecloses real evaluation. That passes for oversight without being it; more dangerous than full autonomy or full human control, because it carries accountability’s legal form without its substance. The fault is structural. These systems run human-on-the-loop: the clinician monitors but does not gate each action, while regulatory and professional frameworks still assume human-in-the-loop, placing responsibility on a named individual presumed to have authorised the act. The signature at the end of the queue closes that gap on paper and nowhere else. When the rare error surfaces, liability has already gone to the person least able to have caught it — the out-of-the-loop reviewer whom the low error rate has made least vigilant (automation bias).

Figure 2. Human-in-the-Loop ≠ Human-on-the-Loop

The dilemma extends well beyond academic debate. In May 2024 the Master of the Rolls, Sir Geoffrey Vos, framed the dilemma from the bench: a professional may be negligent for using AI and negligent for declining to, damned either way. The EU AI Act makes that bind concrete: it requires human override capability as a legal standard. But the workflow has designed that capability out. When the control cannot be exercised, the failure defaults to the clinician. Legal and clinical scholarship has a name for this: the liability sink.

The remedy is not more oversight but better-placed oversight. Adding sign-offs to a saturated clinician produces approval theatre: attention spread so thin that none is real, exposure raised without safety gained. Structured checkpoints belong at genuine decision nodes — the irreversible act, the high-consequence prescription, the boundary past which an error cannot be recovered. Everywhere else the system should run and be monitored, not gated. This redefines the clinician’s role rather than removing it: a judgement holder at defined risk boundaries, not an output validator at every step, whose trust is a design requirement to be met.



The accountability gap is not a policy failure.
 

The conversation about AI in healthcare tends to locate the problem in misaligned incentives: stakeholders pulling in different directions, short-term pressures overriding long-term value. That diagnosis is accurate as far as it goes. But it stops short of the more specific and more actionable failure underneath it: the accountability architecture was never built.

The EU AI Act is unambiguous on this. Annex III classifies AI systems used in healthcare as high-risk. Article 14 mandates that such systems be designed to enable effective human oversight, including the explicit ability for an authorised person to override or reverse the system's output. This is not a policy aspiration. It is law that entered into force in 2024. As this piece goes to press, the European Commission has opened a public consultation available to providers, deployers, researchers, and civil society, on its draft guidelines for classifying high-risk AI systems under that same Act. Yet procurement frameworks at facility level do not require compliance with Article 14 as a condition of purchase. AI systems are acquired by multiple stakeholders — CIOs, CDOs, clinical leads — each with authority over a fragment of the problem, rarely in consultation with legal, clinical risk, information security, or patient safety functions. The regulation sets a standard; the purchase order does not enforce it. Education of staff and redesign of processes must precede deployment, and yet neither is currently a procurement condition.

Three structural failures sustain that gap. Regulation helps AI products enter health markets; it does not yet ensure they are procured on terms that make governance a requirement rather than a feature. Operational AI (scheduling, coding, administrative triage) scales faster than clinical AI because its ROI is legible and its liability is lower, which means efficiency gains accumulate before the accountability architecture problem is ever forced into the open. And when no governance layer is designed into deployment, accountability does not disappear. It concentrates: on the clinician at the checkpoint, the institution in the regulatory filing, the patient in the outcome.

Figure 3. Accountability inversion.

None of this was designed deliberately. But none of it was designed against, either. That is the distinction between a policy failure and an architecture failure and it is the distinction that determines whether the fix requires a negotiation or a redesign. This one requires a redesign: of procurement criteria, accountability mapping, and governance architecture built into deployment from the start, not appended to it after the fact.


HLTH Europe opens today and with it, the conversation this piece is part of. We were asked step outside: to leave the comfort zone and walk into the unknown. The unknown is where the work lies. Stepping outside does not need bolder ideas, it needs the accountability architecture that makes it survivable: to align what we deploy with who answers for it, and to build health systems that deliver what they were promised. The question is not whether we can imagine more, but whether we can carry it. Coherence is the intervention. Everything else is infrastructure.