Paging Dr. Sinclair

— When it comes to artificial intelligence in healthcare, there are still a few bugs in the system

 A computer rendering of several information boxes floating between a male physician and a laptop
  • author['full_name']

    Fred Pelzman is an associate professor of medicine at Weill Cornell, and has been a practicing internist for nearly 30 years. He is medical director of Weill Cornell Internal Medicine Associates.

About a month ago, our institution announced, to quite a bit of fanfare, the selection of the artificial intelligence program that it would be using going forward.

The initial rollout for general users is limited to the full-featured version of the commercially available product, just for general searches and use across the Internet. The next iteration will involve interaction with programs in use across our institution, including databases, calendars, word processing, and more. And the last stage will be planned interactions and uses within our electronic health record, but this is seen as off in the distance, with much work to be done yet before it rolls out.

It feels like the first stage was basically to get us used to working with these things, to find out how it might help us in our day-to-day lives. There are in fact many groups within the organization that are already helping develop artificial intelligence tools to be used within the electronic health record, interacting with protected health information and patient's charts, figuring out ways to produce first responses to patient portal messages, and assist specialists with interpreting results, such as in Radiology.

For now, they told us to try it out, to see how it works for us and what sort of uses we might come up with.

A Routine Task

During the initial demonstration, the IT folks who were showing it to us had it create some simple coding shortcuts, formatting text in different ways, and collecting formulas from the Internet. In the ensuing weeks, I've occasionally looked up at the icon sitting there on my desktop, waiting for me to use it, waiting for me to come up with a way to use it. Maybe I was hoping that it would make a suggestion, reach out to me and say, "Hey Fred, have you thought about using me for this?"

I've turned to it a few times, once crafting a response explaining the risks and benefits and shared medical decision-making involved in the question many of our patients have had, whether they should go ahead and get the new RSV vaccine. Another perennial question for which some simple text was easy to generate was, "Do you think I should get another COVID-19 booster?"

Hopefully, the app developers who are working on this stuff will figure out a way for patient's messages to be read, figure out what they're asking, and generate the first draft of a response using the AI engine, which would then be presented to us for approval. Off in the future, I see a world where something like this will work not only as an assistant helping out with mundane tasks and routine replies, but searching the electronic health record, the totality of a patient's health interactions with our system, to spot trends, to come up with things I might not have thought of, to help allay their fears and explain things to them in easy-to-understand language.

But this weekend, I had a simple task for it to do. One of the many chores involved in running our practice is the creation of multiple schedules, including our weeknight and weekend on-call schedules, coverage for our teaching conferences, vacation schedules, and a list of emergency backup providers. I decided to see if the AI could help me out.

Little Thought Involved

None of these tasks require much thought. They are pretty rote -- figuring out what days people are available, when they can't do a particular night of call (such as because of childcare responsibilities or other scheduled tasks), their preferred dates, what weekends they absolutely cannot be on call or absolutely want to be on call, a few simple rules, and then a year's worth of calendar slots to fill up.

So, I opened up the app, and typed in the list of all the attendings in our practice who serve as the on-call physician on weeknights. I figured I would ask the AI program to fill in these names sequentially across every weeknight for the academic year, saving me the drudgery of cutting and pasting and copying them all in. I thought one way to do this would be to alphabetize the list, and then just have the program fill in those names on every Monday through Friday, over and over again for all 52 weeks.

"Please alphabetize the following list of names." The program churned for few seconds, then spat out the list of names, rearranged alphabetically. Just as I was about to ask it to insert them into the calendar, my eyes glanced over the list and suddenly noted that the 23rd name on the list was one I didn't recognize: Dr. Sinclair.

That's strange because we don't have a Dr. Sinclair in our practice. I looked back over the list I'd entered. Maybe I'd mistyped something; maybe the program had corrected one of names I had put in and tried to come up with what it thought was the real name. No, it seemed to have just come up with Dr. Sinclair out of nowhere.

So, in the query box under this list, I typed something like, "It appears that you put in a name of a physician who is not on the list I provided. There is no Dr. Sinclair on the list. How did this get there?" The system seemed to think for a brief moment, then replied with something like, "Thanks for catching that mistake. I've now reordered the list without that name."

Is It Dependable?

This sort of floored me. If something as simple as alphabetizing a list of names leads to the creation of a ghost physician who I might've placed on call for one night next week, how can we as physicians be expected to depend on these things to help us safely take care of patients?

We've all seen the reports in the news media about AI tools creating false data, making up whole citations from the literature and creating biographies of fictional people. These are what are known as "hallucination citations," or my preferred term "hallucitations."

If the system is introducing an error at the earliest simplest phase, simply rearranging the data it was given, then one can only imagine that the deeper and deeper we get into its usage, the more corrupt the responses we're going to get might be. If the AI tool is unable to fact-check itself, will it leave out lab values or imaging finding, or deciding that something I put in the chart wasn't important when it was, or vice versa? Will it be deciding that a trend in lab values over time isn't that relevant, when in fact that may be the whole point?

I'm sure there are lots of people testing this stuff, and before any of this comes into prime-time usage, there will be lots of beta testing and quality assurance checks to make sure that this stuff isn't potentially making things worse. If we can't trust what this AI stuff is doing, whether it's fudging the data or making up a citation or ignoring something altogether, how are we going to let it practice medicine alongside us?

True, no matter what may come, we will have to take all of this output with a healthy grain of salt, be extra cautious, read everything it gives us to make sure we're satisfied with the answers it's come up with, and always make our own independent judgments and medical decisions. But if the work entailed with integrating this technology into our practice creates more work at every level, is this really what we need right now?

Hopefully the people who are smarter than me, who know how this stuff works (does anyone really know how it works?), will be able to build enough guardrails and systems checks to make sure that we're not building a system with ingrained errors that only makes things worse.

So, if you know Dr. Sinclair, and you happen to speak to them today, please let them know that they're on call tonight.