Google researchers conducted a study published in Nature, showcasing the performance of their generative AI technology, Med-PaLM, in answering medical questions. The study revealed that Med-PaLM provided long-form answers that aligned with scientific consensus on 92.6% of questions, which closely matched the accuracy of clinician-generated answers at 92.9%.
Med-PaLM is a generative AI technology that utilises Google's LLMs (large language models) to answer medical queries. The researchers utilise MultiMedQA, a standardised dataset comprising six existing medical question datasets encompassing research, professional medicine, and consumer queries. Additionally, they used HealthSearchQA, a dataset containing commonly searched medical questions.
The MultiMedQA questions underwent analysis using PaLM, a 540-billion parameter LLM, as well as Flan-PaLM, a variant tuned with instructions. The generated answers were then evaluated by humans to assess comprehension, reasoning, factuality, and potential harm and bias.
Through various prompting strategies, Flan-PaLM demonstrated remarkable accuracy in answering the MultiMedQA dataset, achieving 67.6% accuracy in the U.S. Medical Licensing Exam-type questions, surpassing previous accuracy levels by 17%. However, the researchers observed significant gaps in Flan-PaLM's answers to consumer medical questions.
To address this, the researchers introduced instruction prompt tuning, an alignment technique that led to the creation of Med-PaLM. This improved version exhibited substantially higher accuracy (92.9%) compared to Flan-PaLM (61.9%), and it also resulted in fewer potentially harmful outcomes (5.9% vs. 29.7%).
The study found that the inaccuracy rate of clinician-generated answers was similar to that of Med-PaLM, both at 5.7%. Despite these promising results, the researchers acknowledged that there are still several limitations to address before the models can be considered for clinical use. Further evaluation is necessary, particularly regarding safety, bias, and equity.
Vivek Natarajan, an AI researcher at Google and one of the researchers involved in the study, expressed hope that LLM systems like Med-PaLM, designed with safety as a top priority, will democratise access to high-quality medical information, particularly in regions with limited medical professionals. With additional development and rigorous validation of safety and efficacy, Med-PaLM could potentially be adopted widely in direct care pathways, supporting clinicians, reducing administrative burdens, aiding clinical decision-making, and ultimately making healthcare more accessible, equitable, safer, and compassionate.
Click here to read the original news story.