Can AI Answer Medical Questions Better Than Your Doctor?

Last year, headlines describing a study about artificial intelligence (AI) were eye-catching, to say the least:

ChatGPT Rated as Better Than Real Doctors for Empathy, Advice
The AI will see you now: ChatGPT provides higher quality answers and is more empathetic than a real doctor, study finds
Is AI Better Than A Doctor? ChatGPT Outperforms Physicians In Compassion And Quality Of Advice

At first glance, the idea that a chatbot using AI might be able to generate good answers to patient questions isn’t surprising. After all, ChatGPT boasts that it passed a final exam for a Wharton MBA, wrote a book in a few hours, and composed original music.

But showing more empathy than your doctor? Ouch. Before assigning final honors on quality and empathy to either side, let’s take a second look.

What Tasks Is AI Taking On in Health Care? Link to heading

Already, AI has a growing list of medical applications: drafting doctor’s notes, suggesting diagnoses, interpreting x-rays and MRIs, and monitoring real-time health data like heart rate and oxygen levels.

But the notion that AI-generated answers might be more empathetic than physicians struck me as both amazing — and sad. How could a machine outperform a physician in a deeply human skill?

Can AI Deliver Good Answers to Patient Questions? Link to heading

Imagine two scenarios:

You leave a message for your doctor and later get a callback with an answer.
You send the same question via email or text, and an AI sends a reply within minutes.

Which would be better — in terms of both quality and empathy?

The Study Link to heading

Researchers collected 195 patient questions from an online forum where doctors voluntarily respond. They then submitted the same questions to ChatGPT, and both sets of responses were rated by a panel of three medical professionals.

Rating Criteria: Link to heading

Quality: very poor → very good
Empathy: not empathetic → very empathetic

The Results Link to heading

Quality: 78% of ChatGPT answers were rated “good” or “very good” vs. only 22% for physicians.
Empathy: 45% of ChatGPT answers rated “empathetic” or better, vs. just 4.6% for physicians.
Answer Length: ChatGPT averaged 211 words; physicians averaged 52.

Like I said, not even close.

Important Limitations of the Study Link to heading

Before declaring AI the winner, consider the study’s flaws:

No accuracy checks: Evaluators didn’t verify if responses were medically correct.
Length bias: Longer answers may appear more empathetic due to detail, not true emotional connection.
Incomplete blinding: Reviewers might have guessed the answer’s origin based on tone and length.

So, the glowing headlines? Not fully justified.

Bottom Line Link to heading

Could physicians learn a thing or two from AI’s tone and thoroughness? Maybe. Could AI become a useful tool that doctors supervise and refine? That’s already happening in some hospitals.

But we aren’t ready to trust AI with unsupervised medical advice just yet. And even ChatGPT agrees:

I asked it, “Can you answer medical questions better than a doctor?”
Its reply? “No.”

We’ll need more rigorous, accuracy-focused studies before we set the AI genie free in healthcare. But we’re getting closer.