
Within four days of each other at the start of January 2026, OpenAI and Anthropic both announced consumer-facing products that connect their chatbots directly to users' medical records. OpenAI's ChatGPT Health, launched January 7, partners with b.well to pull electronic health records from over two million U.S. providers alongside data from Apple Health, MyFitnessPal, and other wellness apps. Anthropic's Claude for Healthcare followed on January 11, using HealthEx to aggregate records from 50,000+ provider organizations. Both companies emphasize that health conversations are stored separately, excluded from model training, and designed to support rather than replace medical care.
Raising the Basement
Big Tech this, Big Tech that, I know…but let's start with a genuine case for optimism here. The current baseline for health literacy is, to put it charitably, not great. Patients leave clinic appointments, nod politely, and reach the parking lot with no clear understanding of what their doctor just said. This Post-Visit Amnesia drives confusion, non-compliance, and preventable readmissions.
If these tools can work as a plain-language interpreter—taking a dense discharge summary and explaining it in accessible terms—that's a meaningful win. The ability to ask follow-up questions, clarify terminology, and prepare for appointments could genuinely democratize access to basic medical understanding. For populations with limited healthcare access or low health literacy, raising that floor matters.
Bounded vs. Unbounded
The challenge is that both platforms have been pitched as ostensibly unbounded in their capabilities. Connect your data, ask any health question, get personalized insights. The marketing implies these systems can do for health what ChatGPT does for writing assistance.
Here, we hit a snag. LLMs excel at what might be called bounded work: explaining what VO2 max means, describing how HRV works, or answering specific analytical questions like "how did my sleep patterns change after I started this medication?" These tasks play to the model's genuine strengths in language comprehension, information synthesis, and education.
Unbounded longitudinal health analysis is something else entirely. It has been reported that when one user connected a decade of Apple Watch data to ChatGPT Health and asked for a cardiac health assessment, the system produced a failing grade. His actual doctor said he was fine. When he asked again, the grade jumped significantly. The system was, in effect, guessing, and it was doing it confidently.
The Structural Problem
This is not about the models being poorly trained, per se - it's about a fundamental mismatch between what LLMs reliably do and what longitudinal health analysis requires.
Let's consider what a single day of Apple Watch data contains: heart rate logged hundreds of times, HRV snapshots, movement tracking, sleep staging—the works. Multiply by ten years and you're looking at millions of data points. Any interface to an LLM must select, aggregate and transform this data. Those aggregation choices become the product. Summarize incorrectly and you've distorted your signal before the model gets to sink its teeth into anything.
Worse, the kind of raw health data now being imported into LLMs often includes noisy readings that the device's own algorithms would discard. Apple's irregular rhythm notifications are algorithmically filtered and validated signals. An LLM analyzing a raw export sees everything but lacks the engineering judgment that makes that data clinically meaningful.
LLMs don't inherently know how to weight recent data against older readings, what's normal for you versus the general population, or whether a metric shifted because your health changed or because you treated yourself to a new device. They can mimic statistical reasoning, but reliability varies dramatically query to query.
Healthcare is a high-reliability domain—inconsistent performance causes real harm. An AI that gives you a failing cardiac grade when you're healthy creates genuine anxiety and unnecessary medical visits. The inverse—false reassurance for someone at actual risk—is potentially worse. Choose your poison.
The Regulatory Tiptoe
Both companies are carefully walking the line between health and wellness. Their terms explicitly disclaim diagnostic or treatment functions. ChatGPT Health "is not intended for use in the diagnosis or treatment of any health condition." Anthropic directs users to healthcare professionals for personalized guidance.
But there's a gap between legal positioning and how people actually use it. When someone uploads their lab results and asks whether they should be worried about their liver function, they're seeking a diagnostic explanation. The response they receive—contextualized to their age, medication history, and prior results—that's diagnostic work, whatever the disclaimer says.
Many tech companies are tiptoeing this line to avoid regulatory friction, and the distinction between health and wellness is certainly clearer to industry insiders than to consumers who might rely on LLM recommendations without understanding they could be fabricated or could disregard other aspects of their health profile.
WebMD gave everyone the same static, human-authored articles. These tools pull your actual records and engage you in conversation—which builds trust faster, even when the underlying reliability hasn't earned it.
The Jevons Paradox Question
For stretched health systems, these products are implicitly positioned as solutions: better-informed patients, more efficient consultations, reduced pressure on services. The assumption is that accessible health interpretation reduces demand.
Enter the Jevons Paradox, which describes the phenomenon where efficiency improvements increase rather than decrease consumption. Ergo, making health information more comprehensible may not reduce health anxiety or service utilization.
It may make people more attuned to their health data, but also more uncertain about what it means, and more likely to want a professional to validate what the AI told them. That creates a new task for clinicians: adjudicating between the patient's understanding and the algorithm's output. Whether that nets out as time saved or time spent is an open question.
The quantified self movement offers a cautionary parallel. Self-tracking of physical and mental health data has generated genuine benefits alongside documented harms: obsessive self-monitoring, anxiety, and in some cases, worse mental health outcomes. Making all health data constantly interpretable may not empower people so much as widen the circle of what people think they should worry about.
The question of whether these tools will be used has already been answered by the hundreds of millions of health queries these platforms already receive. The real question is whether we're building infrastructure that genuinely supports health understanding or something that looks like "support" while shifting risk and responsibility onto individual users navigating systems they may not fully understand.