ChatGPT outperformed doctors in diagnostic accuracy, study reveals

Artificial intelligence continues to reshape healthcare, but a new study highlights the challenges of integrating AI tools into medical practice. ChatGPT-4, an AI chatbot developed by OpenAI, outperformed doctors in a diagnostic accuracy study, raising questions about how effectively physicians can use such technology.

The study, published in JAMA Network Open, tested 50 doctors on six challenging medical cases. Doctors who used ChatGPT assistance scored an average of 76%, only slightly higher than the 74% scored by those without it. ChatGPT alone, however, achieved a remarkable 90% accuracy in diagnosing the conditions.

"I was shocked," Dr. Adam Rodman, an internal medicine expert at Beth Israel Deaconess Medical Center and co-author of the study, told The New York Times.

How the study worked

The researchers used real-world, unpublished case histories to prevent foreknowledge by participants or AI models. Cases included complex medical conditions such as cholesterol embolism, a rare disorder often overlooked in diagnostic processes.

Doctors were graded by independent medical experts on their ability to provide potential diagnoses, rule out alternatives, and suggest next diagnostic steps. Despite having ChatGPT’s support, many doctors struggled to match the AI’s performance.

The study revealed two key issues:

  1. Doctors stuck to their initial diagnosis: Physicians often disregarded ChatGPT’s suggestions when they contradicted their own.
  2. Underutilization of AI tools: Few doctors leveraged ChatGPT’s full capabilities, such as analyzing entire case histories for a comprehensive diagnostic approach.

"Doctors treated the chatbot like a search engine, asking it narrow questions instead of feeding it the full case," Dr. Jonathan H. Chen, a physician and computer scientist at Stanford, told The New York Times.

The promise and challenges of AI in healthcare

AI-powered tools like ChatGPT are showing significant potential in diagnostic settings, with their language models offering nuanced analysis of complex cases. Unlike earlier attempts at computer-assisted diagnostics, modern AI tools don’t try to mimic human reasoning but instead excel by processing and predicting language patterns.

"The chat interface is the game-changer," Dr. Chen said. "Before, computers didn’t understand language the way they do now."

Still, experts warn that integrating AI into medical workflows won’t happen overnight. Common challenges include:

  • Lack of AI training: Many doctors need better education on how to use AI tools effectively.
  • Resistance to change: Physicians may distrust AI, particularly when it challenges their diagnoses.
  • Ethical and legal concerns: Questions remain about accountability when AI tools influence patient care decisions.

AI could serve as a "doctor extender," providing second opinions and improving diagnostic accuracy, but only if physicians are willing to embrace it.

Why doctors ignored ChatGPT’s insights

After analyzing chat logs from the study, researchers discovered that many doctors overlooked ChatGPT’s recommendations. This resistance stems partly from overconfidence in their own expertise and partly from a lack of familiarity with AI’s diagnostic capabilities.

"People are generally overconfident when they think they’re right," said Laura Zwaan, a clinical reasoning expert at Erasmus Medical Center in Rotterdam, in comments to The New York Times.

Additionally, some doctors used ChatGPT inefficiently, failing to capitalize on its ability to process full case histories.

"Only a fraction of participants realized they could copy-paste the entire case into ChatGPT for a comprehensive answer," Dr. Chen said.

What’s next for AI in healthcare?

The findings highlight the need for collaboration between AI developers and healthcare professionals to build trust and usability. AI’s role in medicine could extend beyond diagnostics to personalized treatment planning and patient management.

"AI is an extraordinary tool," Dr. Rodman said. "But there’s still a lot of work to be done in understanding how to integrate it into medical practice effectively."

The source
This article draws on reporting from The New York Times and a study published in JAMA Network Open.
HealthNews