Real-world medical questions stump AI chatbots

Real-world medical questions stump AI chatbots Skip to content Subscribe today Every print subscription comes with full digital access Subscribe Now Menu All Topics Health Humans Anthropology Health & Medicine Archaeology Psychology View All Life Animals Plants Ecosystems Paleontology Neuroscience Genetics Microbes View All Earth Agriculture Climate Oceans Environment View All Physics Materials Science Quantum Physics Particle Physics View All Space Astronomy Planetary Science Cosmology View All Magazine Menu All Stories Multimedia Reviews Puzzles Collections Educator Portal Century of Science Unsung characters Coronavirus Outbreak Newsletters Investors Lab About SN Explores Our Store SIGN IN Donate Home INDEPENDENT JOURNALISM SINCE 1921 SIGN IN Search Open search Close search Home INDEPENDENT JOURNALISM SINCE 1921 All Topics Earth Agriculture Climate Oceans Environment Humans Anthropology Health & Medicine Archaeology Psychology Life Animals Plants Ecosystems Paleontology Neuroscience Genetics Microbes Physics Materials Science Quantum Physics Particle Physics Space Astronomy Planetary Science Cosmology Tech Computing Artificial Intelligence Chemistry Math Science & Society All Topics Health Humans Humans Anthropology Health & Medicine Archaeology Psychology Recent posts in Humans Artificial Intelligence Real-world medical questions stump AI chatbots By Tina Hesman Saey5 hours ago Health & Medicine A simple shift in schedule could make cancer immunotherapy work better By Elie DolginFebruary 12, 2026 Health & Medicine This baby sling turns sunlight into treatment for newborn jaundice By Elie DolginFebruary 12, 2026 Life Life Animals Plants Ecosystems Paleontology Neuroscience Genetics Microbes Recent posts in Life Animals Some snakes lack the ‘hunger hormone.’ Experts are hungry to know why By Andrea Lius1 hour ago Oceans Evolution didn’t wait long after the dinosaurs died By Elie DolginFebruary 13, 2026 Animals A sea turtle boom may be hiding a population collapse By Melissa HobsonFebruary 13, 2026 Earth Earth Agriculture Climate Oceans Environment Recent posts in Earth Oceans Evolution didn’t wait long after the dinosaurs died By Elie DolginFebruary 13, 2026 Earth Earth’s core may hide dozens of oceans of hydrogen By Nikk OgasaFebruary 10, 2026 Animals Some dung beetles dig deep to keep their eggs cool By Elizabeth PennisiFebruary 4, 2026 Physics Physics Materials Science Quantum Physics Particle Physics Recent posts in Physics Physics A precise proton measurement helps put a core theory of physics to the test By Emily ConoverFebruary 11, 2026 Physics The only U.S. particle collider shuts down – so a new one may rise By Emily ConoverFebruary 6, 2026 Physics A Greek star catalog from the dawn of astronomy, revealed By Adam MannJanuary 30, 2026 Space Space Astronomy Planetary Science Cosmology Recent posts in Space Astronomy This inside-out planetary system has astronomers scratching their heads By Adam MannFebruary 12, 2026 Space Artemis II is returning humans to the moon with science riding shotgun By Lisa GrossmanFebruary 4, 2026 Physics A Greek star catalog from the dawn of astronomy, revealed By Adam MannJanuary 30, 2026 News Artificial Intelligence Real-world medical questions stump AI chatbots Chatbots had worse results than a Google search because of how volunteers prompted them State-of-the-art AI chatbots didn’t perform well when real people asked for help assessing a medical problem. Peresmeh/Creatas Video/Getty Images Plus By Tina Hesman Saey 5 hours ago Share this:Share Share via email (Opens in new window) Email Share on Facebook (Opens in new window) Facebook Share on Reddit (Opens in new window) Reddit Share on X (Opens in new window) X Print (Opens in new window) Print AI chatbots may seem medical–book smart but their grades falter when interacting with real people. In the lab, AI chatbots could identify medical issues with 95 percent accuracy and correctly recommend actions such as calling a doctor or going to urgent care more than 56 percent of the time. When humans conversationally presented medical scenarios to the AI chatbots, things got messier. Accuracy dropped to less than 35 percent for diagnosing the condition and about 44 percent for identifying the right action, researchers report February 9 in Nature Medicine. Sign up for our newsletter We summarize the week's scientific breakthroughs every Thursday. The drop in chatbots’ performance between the lab and real-world conditions indicates “AI has the medical knowledge, but people struggle to get useful advice from it,” says Adam Mahdi, a mathematician who runs the University of Oxford Reasoning with Machines Lab that conducted the study. To test the bots’ accuracy in making diagnoses in the lab, Mahdi and colleagues fed scenarios describing 10 medical conditions to the large language models (LLMs) GPT-4o, Command R+ and Llama 3. They tracked how well the chatbot diagnosed the problem and advised what to do about it. Then, the team randomly assigned almost 1,300 study volunteers to feed the crafted scenarios to one of those LLMs or use some other method to decide what to do in that situation. Volunteers were also asked why they reached their conclusion and what they thought the medical problem was. Most people who didn’t use chatbots plugged symptoms into Google or other search engines. Participants using chatbots not only performed worse than the chatbots assessing the scenario in the lab but also worse than participants using search tools. Participants who consulted Dr. Google diagnosed the problem more than 40 percent of the time compared with the average 35 percent for those who used bots. That’s a statistically meaningful difference, Mahdi says. The AI chatbots were state-of-the-art in late 2024 when the study was done — so accurate that improving their medical knowledge would be difficult. “The problem was interaction with people,” Mahdi says. In some cases, chatbots provided incorrect, incomplete or misleading information. But mostly the problem seems to be the way people engaged with the LLMs. People tend to dole out information slowly, instead of giving the whole story at once, Mahdi says. And chatbots can be easily distracted by irrelevant or partial information. Participants sometimes ignored chatbot diagnoses even when they were correct. Small changes in the way people described the scenarios made a big difference in the chatbot’s response. For instance, two people were describing a subarachnoid hemorrhage, a type of stroke in which blood floods the space between the brain and tissues that cover it. Both participants told GPT-4o about headaches, light sensitivity and stiff necks. One volunteer said they’d “suddenly developed the worst headache ever,” prompting GPT-4o to correctly advise seeking immediate medical attention. Another volunteer called it a “terrible headache.” GPT-4o suggested that person mig

Related Posts

Tinggalkan Balasan Batalkan balasan