AI in Healthcare: Addressing the Reality of Hallucinations

By Matt Phillion

The use of AI tools continues to grow as clinicians use them for everyday processes like the creation of chart notes and care plans. But what happens when AI gets the facts wrong, and how can that impact patient safety?

Even occasional users of large language models (LLM) like ChatGPT have experienced errors, or “hallucinations,” when the AI outputs are incorrect or misleading. While the rate of these hallucinations can vary, a study of Google’s Med-PaLM 2 scored 85% on medical exam questions, but still produced dangerous or implausible answers in some cases. A JAMA Network Open paper in 2023 found AI-generated discharge summaries contained incomplete or misleading information about 18% of cases.

The problem compounds itself as those incorrect diagnoses or medications AI inserts into the patient’s documentation are copied and carried forward across multiple notes. That incorrect information is propagated and then, when other clinicians refer to those notes, they can then lead to inappropriate clinical decisions, unnecessary interventions, omissions of clinical actions, or delays in appropriate care.

While AI tools can be a game changer for healthcare providers and help minimize many administrative burdens that have long been a drag on the industry, safeguards must be in place to make sure human oversight is maintained, AI-generated clinical content is accurate and appropriate, and that these tools work seamlessly within the clinical workflow.

“It’s interesting. Ambient listening technology, much of which is AI-based, has lulled providers into the assumption that they can just talk to the patient and their data is automatically going to be collected, get written down, and recorded into some kind of document,” says Jay Anders, MD, chief medical officer of Medicomp Systems. “But you have to read whatever the AI gives you.”

Anders points to common errors AI models tend to make: misgendering the patient based on the recording of the conversation, or attributing diagnoses discussed during the recording to the patient, such as discussing the patient’s father having diabetes but recording that the patient has diabetes instead.

“That stuff gets into someone’s medical record, and then the information that does not pertain to you is transmitted down the road to other providers or specialists,” says Anders. “Now you have a diagnosis of diabetes or some other disease you’ve never had that will appear when someone pulls up your medical record.”

This can be a huge problem if it’s a health or life insurance company, Anders points out.

“It’ll affect your rates, or even if you are accepted at all,” he says. “And this misinformation that gets stuck into medical records has an insidious way of following people along. It can be really hard to get rid of, and no providers want to take responsibility for something like that.”

Catching it early

There’s opportunity to correct this misinformation at the time of recording, and technologies to help identify when you have a diagnosis that does not appear elsewhere in the medical record that can be used to sift out or glean errors and then verify with the patient.

“The problem with that is with the original mistake, it can be sent out to five or 10 different places, all of which you as a clinician don’t have in front of you. It takes a long time to fight that kind of stuff,” says Anders. “I believe it’s a complacency issue, and a new technological reliance on AI to do everything correctly 100% of the time is a mistake. Clinicians still need to look at everything.”

There are really two places to catch the errors early. One is with the clinicians themselves, Anders says. When they make their documentation and before they save and commit to it, they need to make sure it recorded what they did indeed say.

“They’ve got to read it and identify things like gender mismatches or family history that doesn’t belong in the record,” says Anders.

But patients also need to be educated about the use of these systems.

“Big providers have notices all over the place that they’re using ambient listening technology,” says Anders. “Organizations using this technology need to be really up front with patients and give them the option to opt out and go back to the old way to speak it or type it out and not use AI to message all that information.”

The impact on how clinicians practice

There needs to be warnings to the clinicians themselves, too, about what the use of this technology means to how they practice medicine and visit with their patients.

“Something said by a primary care physician at a big medical group using ambient listening technology: ‘It’s changed the practice of medicine for me but not in a good way,’” Anders says. “She said it saves her time, but she feels like she’s on the witness stand when she’s talking to a patient because the technology picks everything up. It is cutting back on the conversations that physicians have all the time with their patients: how are your parents, have you been on vacation, how’s grandma doing? That doesn’t belong in your medical record, but if you have a machine record it you have to go back in and take it all out. And because of this, she said, it changed the way she talked with her patients and that’s not a great thing, particularly for a primary care doctor.”

It’s those unique conversations that help you get to know your patient better over time, Anders points out and add a lot of nuance to what is pertinent to the patient.

“It’s all important and helps build trust,” he says.

Addressing the culture shift

Anything that can get in the way of communicating with your patient can be an issue, Anders notes, and the move to AI marks a cultural shift in how clinicians do their work in this regard. Take, for instance, the volume of information it takes in.

“One of my relatives had a significant medical event and now has six different clinicians, and of the six, three said the same thing as the primary care physician did: ‘It’s changed the way I do things and I don’t like it,’” he says. “It produced a whole bunch of text, and they ask, ‘Why don’t I just put it in myself to begin with?’ It’s supposed to be saving time, and it’s a cultural shift to be reading what they put in as that’s where the biggest bang for the buck can occur.”

Despite the feedback, opting out is not an option for the clinicians themselves. It’s a move away from older techniques like manual recording or transcriptionists or even voice to text.

“At times, it’s not being vetted enough to be what it should be, but in large organizations the clinicians have no input whatsoever,” Anders says.

The control, however, lies with the patient.

“The patient can say, “No, you’re not recording this. I don’t trust it or want it, and I don’t know where whatever I say goes,’” says Anders.

And that’s a bigger issue: Where does that data go when conversations are being recorded in bulk?

“The voice conversation has to go somewhere to be processed,” says Anders. “And you don’t know where. There are privacy concerns. I’m not against the technology, but it needs to be vetted.”

There’s a nuanced conversation the industry needs to consider when it comes to patient information and AI.

“A recent study found that 80% of patients are okay with having their medical records shared with other providers. I’ve not seen any type of study that says they are okay with sending it to ChatGPT directly,” says Anders. “Patients are okay with every physician they see having a complete picture of them. It makes good clinical sense. It’s not so certain or easy to understand if you send this to a company that has a large LLM that is now storing that information and learning from it and training with it and doing other things not related to clinical care. No one is asking that question: Are you okay with your medical record having the same privacy as Webster’s Dictionary?”

This is a place where there’s a need for better, clearer regulations, Anders explains.

“The federal government needs to step in and say, ‘You’ve got to be transparent. We’re not going to tell you what to do, but you need to tell everyone what you’re doing,’” he says. “When that happens, we’ll see where the industry ends up. That’ll put a control rod into some of these technologies that have no control rods. Until that time, I think that patients need to make sure they’re aware and organizations need to tell them what is happening.”

For now, Anders points to a need to calibrate understanding what the technology can and can’t do.

“The biggest issue is the expectation of what the tech can do. If you’re expecting it to do 100% of that documentation work without you having any input that’s a false expectation. If you’re expecting that, you’ve already failed,” he says. “Do your homework before installing and expecting great things and maybe not getting them.”

And lastly, remember the accountability factor.

“AI does not have the responsibility clinicians do. It can’t be held accountable for a mistake,” says Anders. “Organizations need to do their homework or the patient will suffer. That’s who bears the brunt of medical mistakes. Stop talking about replacing a whole cadre of well-trained individuals and throwing them all away. Don’t replace them, enhance their ability to work.”

Matt Phillion is a freelance writer covering healthcare, cybersecurity, and more. He can be reached at matthew.phillion@gmail.com.