28 Nov 2023

GPT-4 outperforms 99.98% of simulated human readers in complex clinical diagnoses

In a recent study published in the New England Journal of Medicine, OpenAI's GPT-4 exhibited impressive diagnostic capabilities, accurately diagnosing complex clinical cases at a rate of 52.7%. This outperformed medical journal readers (36%) and surpassed 99.98% of simulated human readers. Conducted by Danish researchers, the evaluation involved presenting 38 cases to GPT-4 and comparing responses with 248,614 online medical journal readers.


The study highlighted the most common diagnoses, including infectious diseases (39.5%), endocrinology (13.1%), and rheumatology (10.5%). Patient demographics ranged widely, from newborns to 89-year-olds, with 37% being female.


A temporal analysis of GPT-4's performance revealed 52.7% accuracy for cases published up to September 2021 and an improved 75% accuracy for cases published thereafter. Despite these promising results, the study noted a slight decrease in performance in the newest version of GPT-4.


While emphasising GPT-4's high reproducibility and clinical promise, researchers urged caution, stressing the need for proper clinical trials to ensure safety and efficacy. The study also underscored the importance of ethical considerations, transparency, and regulatory adherence. Addressing concerns about data protection and privacy, the authors called for future AI models to include training data from developing countries, promoting global applicability.


The study envisions a future where AI, like GPT-4, becomes a valuable tool in healthcare decision-making, complementing human oversight rather than replacing medical professionals.


Click here to read the original news story.