MEMBER MOMENTS: MGB’s Klibanski Discusses AI at AHA National Meeting

Anne Klibanski, M.D., the president and CEO of Mass General Brigham and current chair of the MHA Board of Trustees, joined a panel at the American Hospital Association annual meeting in Washington D.C. this week to discuss the influence and future of artificial intelligence in healthcare.
She was joined by Jim VandeHei, CEO of Axios; Marc Boom, M.D., AHA board chair and president and CEO of Houston Methodist; Jonathan Perlin, M.D., president and CEO of Joint Commission; and Ladd Wiley, senior vice president of global corporate affairs, public policy and advocacy for Epic.
While the panelists all discussed the benefits of AI to improve patient safety and reduce clinician burnout, they pointed to the need for governance by both government entities as well as hospitals.
“So we’ve sort of transitioned from where technologies were available but people were reluctant to use them to technologies are available, computing power is just exploding, all of these things are going to be possible with agentic AI,” Klibanski said, referring to autonomous systems that act independently to achieve specific, complex goals rather than just generating content. “So the question is, what’s going to be the role of the doctor or the clinician? That’s the really critical role.”
Because the capabilities of AI are evolving so quickly, there may be a hesitancy to trust the conclusions it generates – for instance, in assessing imaging and scans.
“As healthcare leaders we’re being conservative,” Klibanski said. “We’re being safe. We want to do pilots, we want to test it, but we do have to think about how fast this is all going and who will be the final arbiter, always remembering that our responsibility is best outcomes and safety for the patients we serve.”
A recent study (“Large Language Model Performance and Clinical Reasoning Tasks” in JAMA Network Open) by researchers at MGB found that while AI is getting better at diagnostic accuracy when presented with comprehensive clinical information, it still underperforms at differential diagnoses when information is lacking.
“In line with their previous study, the researchers found that the [large language models, or LLMs] were good at producing accurate final diagnoses,” MGB reported. “However, all of the models failed to produce an appropriate differential diagnosis more than 80% of the time. In the real world, a differential diagnosis is critical, but in this study, the models were given more information so that they could proceed to the next stage of the clinical workup even if they failed at the differential diagnosis step.”
Massachusetts Health & Hospital Association