Healthcare Leaders Join Together to Craft Standard for AI Safety

By Matt Phillion

A coalition of executives from leading health systems across the U.S. have launched the first operational standard specifically designed to govern AI that communicates directly with patients or which is used to shape clinical messaging.

The AI Care Standard has been developed to address the rapid spread of patient-facing AI across healthcare without regulatory standards to keep pace with the development of this technology. Because existing AI frameworks focus on performance rather than patient safety and clarity, there is opportunity for it to confuse patients, overstep clinical boundaries, undermine trust, and introduce safety and liability risks—including recent cases where AI chatbots have been reported as contributing factors in cases of suicide and death for users relying on the technology for guidance.

The standard was developed through a rigorous approach including in-depth interviews with health system, policy, safety, patient experience, and AI innovation leaders, structured discussions and facilitated roundtables, as well as stress-testing of real-world adoption, safety, and liability scenarios and iterative refinement based on expert feedback.

The time to step up and provide guidance is now, explains Raj Ratwani, PhD, vice president of Scientific Affairs with MedStar Health Research Institute.

“If you look at the frequency of patients using chatbots, we’re talking about something like 20 million users every day. There’s a significant need here,” he says. “Patients want easier access to guidance, but the challenge is the information this technology is turning out to patients can be inaccurate or biased, and that can lead to patient harm. We want to enable the best of this technology while protecting patients from the worst of it.”

There are currently no safeguards at the regulatory or national policy level, Ratwani notes, which is the role this standard is trying to fill, creating safeguards so patients can realize the benefits of the technology but avoid the negative complications that will lead to patients losing trust in the tech instead.

“We have crash safety ratings for cars, but we don’t have any such standards in healthcare for this technology,” says Aaron Patzer, CEO of Vital Software. (Patzer speaks in depth on the standard here.) “You have two diametrically opposed forces. On the one side you’ve got what we’d call ‘big AI’ who want to cowboy it, to do whatever they need to do and safety be damned. And on the other side, you’ve got health systems that are notoriously conservative who won’t approve its use with no regulation. They want zero legal risk involved. And then you’ve got the patients in the middle.”

Patients are using LLMs to answer questions but are facing a lack of guidance and guardrails. You might ask for exercises for back pain, Patzer says, but the LLM won’t take into account your age, previous injuries, preexisting conditions that will impact what recommendations a patient might need.

“Health systems don’t want patients to reflexively go to the ER or call 911, but the patients have to have answers to questions in a reasonable amount of time. It’s that middle ground where we’re trying to strike a reasonable balance,” Patzer says.

Leading with intent

Without government regulation, the industry finds itself at a stalemate. To get past this, the biggest, broadest players in the field need to get together, Patzer explains. This means prominent health systems, academic institutions, patient advocacy groups, and more.

“This is the industry saying it needs to set a standard for itself,” says Patzer. “And if enough people use it, it becomes the standard in reality.”

The speed of change in this technology is in great part why there is no governmental guidance, Ratwani explains.

“There’s no federal policy in place because it moves very rapidly. Our regulatory processes do not move very rapidly and by the time you have a framework, it’s already outdated. It’s difficult to regulate this technology at this stage,” Ratwani says. “There’s patient benefit, but also risk, and when you have those competing factors in place you need to do what we’re doing here, bringing industry leaders together and formulating a framework rapidly that can be adapted as we need it. We can move faster than any federal regulatory structure to build an adaptable safeguard that builds trust.”

Working to the industry’s advantage is that it now has technology that can help with technology. Analysis tools that are easy to use and can look at the outputs of AI tools regularly to confirm how the AI is being developed, trained, and used.

“We can ask: How often are you looking at the output of your AI? Did you look at it just once, or are you looking at it every quarter, or every year? How did you train it?” says Patzer. “Is it just data from a single health system? If it’s just looking at Colorado’s data, that state has the lowest obesity rate in the country, so if you bring that tool to Louisiana, it doesn’t work. Can the tool flag safety issues? Does it know when it’s going off the rails or identify intent for harm?”

Vetting the tools you need

Knowing what the tool is capable of and how it is crafted on a granular level can lay the groundwork to identify risks and viability of a given tool before it reaches the patient.

“You have these different chatbots out there and you may want to adopt some of this technology in your health system. Well, you should vet it, and you should vet it yourself,” says Ratwani. “But health systems don’t typically have a lot of AI experts or data scientists to do all the heavy lifting, who can pick up this framework and assess it.”

AI developers themselves are often not thinking of the safety implications down the line or about adverse data sets that should not be deployed. This is where the evaluation framework for AI tools comes into play. The eight-part questionnaire walks the user through the evaluation process and provides a green/yellow/red evaluation.

“It’s red—don’t do this, yellow—maybe use it but it needs improvement, or green—this looks like it’s probably okay,” says Patzer. “And when it says it needs improvement it gives specific suggestions as well.”

Examples of such improvements are tools that work well but are English-only, or is working well now but doesn’t manage model drift and should be retrained every year.

“It gives you suggestions on the machine learning side and on the practical business side,” Patzer says.

A matter of liability

There is drag on adoption that comes with the cautious nature of healthcare when it comes to new technology that comes into play here.

“Healthcare provider organizations generally want to adopt new technology, but all the liability falls on the provider organization. Any one of us would never adopt a technology if we weren’t able to fully vet it, especially if it’s about what happens to our patients.”

Meanwhile, more than 30 states have already enacted their own regulatory frameworks, creating a patchwork of requirements any developer wanting to enter this space will need to individually address.

“We need the kind of standards that can go across those states and even beyond to move the industry forward,” says Ratwani. “That’s a key barrier. We’ve got to get industry players to start using this framework and we’ve got to make systems aware it exists and how they can use it.”

One of the goals in creating this framework is to make it useful and accessible without a deep background in AI, Patzer says.

“It’s written in plain language so you don’t have to be an AI expert. It’s a simple set of questions that you or your vendor can go through,” he says. “And it’s free. Nobody’s trying to make any money off of it. It exists for the good of the industry.”

“There are several different knowledge gaps,” says Ratwani. “Initially, people have to acknowledge that there are risks to these AI products. That’s going to be a big educational movement both for the public and inside of healthcare. Secondly, we need to help them understand there are ways to assess that risk. Just because the risks exist, that doesn’t mean you shouldn’t use it at all but rather assess and mitigate it.”

Ratwani notes that there’s also a need to address that humans are, in the end, not good at vigilance tasks on their own, particularly with verifying information.

“Humans are not good checkers of information, so we need different safety precautions in place to meet a standard for safer development,” he says.

“One of our questions is literally: can an AI check its own outputs,” says Patzer. “Why is it making a mistake in the first place? You can effectively use one AI to check another, and this can reduce 80% of errors. It’s not going to be perfect but machines have a level of diligence humans don’t.”

This is not to say that the goal is to remove the human element of healthcare, Patzer explains.

“There’s almost a fetish in Silicon Valley for a purely AI doctor but that’s the wrong approach,” he says. “We need to make existing doctors more productive. Only humans have empathy for other humans, and only humans can see you’re looking pale or a little green around the edges and ask probing questions to find out why. We will never try to replace doctors, but we also recognize that doctors don’t have the time to read through 10 years of medical history. These tools have the potential to do things like call up all the heart-related information from the person’s record before their appointment, though.”

With regards to cases where AI chatbot use has led to patient harm, particularly self-harm, Patzer points out that these technology companies have deep pockets and there needs to be another way to address those incidents.

“They have more venture capital than ever and it’d be easier for them to pay off lawsuits and move on,” he says. “The decision should be that they are required to adopt a standard to prevent this from happening in the future. That’s more important than the financial component.”

What the committee who developed this standard represents, Ratwani says, is a coming together of voices to make the change the industry needs.

“It’s rare to get this kind of committee together and to get consensus,” he says. “We’re really proud of what has been accomplished.”

On April 8, PSQH is hosting a webinar with Ratwani, Patzer, and Bridget Duffy, MD, former Chief Patient Experience Officer, Cleveland Clinic, discussing the launch of the AI Care Standard. Register now to save your spot.

Matt Phillion is a freelance writer covering healthcare, cybersecurity, and more. He can be reached at matthew.phillion@gmail.com.