Despite all their advancements, the latest medical imaging robots still require a human touch—literally. But in the middle of an operation, a surgeon taking their hands away from the patient to manually program a robot to take a new X-ray could prolong the surgery and increase the risk of complications.
That’s why researchers in Johns Hopkins’ Advanced Robotics and Computationally AugmenteD Environments (ARCADE) Lab have developed a voice-controlled AI system that can image and analyze patient X-rays in real time. The invention has the potential to reduce radiation exposure, streamline surgical procedures, and improve overall patient outcomes, according to ARCADE Lab director and John C. Malone Associate Professor of Computer Science Mathias Unberath.
Unberath and his team presented their work at the 6th International Conference on Information Processing in Computer-Assisted Interventions (IPCAI) held June 17–18 in Berlin, Germany, where it received the sole Best Paper Award.
“When equipped with AI models that can correctly interpret voice commands to show new views of a patient, any robotic C-arm can effectively become a hands-free, intelligent assistant for image-guided surgery,” explains first author Benjamin D. Killeen, Engr ’25 (PhD).
He was joined on the effort by fellow PhD students Catalina Gómez Caballero and Blanca Iñigo Romillo, master’s student of computer science Anushri Suresh, and Christopher R. Bailey, formerly an assistant professor of radiology and radiological science in the Johns Hopkins University School of Medicine.
The researchers’ system works by first transcribing a user’s speech into text using an advanced speech recognition model from OpenAI. The text is then fed into a large language model that translates the user’s words into recognizable robotic imaging commands. Taking advantage of the ARCADE Lab’s own “segment-anything” AI model, which can section X-ray scans into identifiable parts, the system continually updates a “digital twin” of the patient—that is, a 3D picture of the patient’s organs created from past images of their anatomy.
Rather than having to take new X-rays again and again, this digital twin allows the system to autonomously visualize, view-find, and align itself, enabling complex language control commands like “Focus in on the lower lumbar vertebrae”—all while reducing the patient’s exposure to radiation, the researchers say.
“A surgeon using our system is less likely to need as many X-ray images as they would otherwise—and the images they do take can be low-radiation thanks to the system’s automated alignment,” Killeen explains. “Basically, more intelligent X-ray machines mean lower radiation doses for the patient, and less radiation means reduced risk of cancer for everyone involved.”
The researchers tested their system on a cadaver and achieved an average end-to-end success rate of 84%, from speaking aloud to the system to the robotic C-arm finding the desired view of the patient.
While the team works on increasing their system’s accuracy, they also plan to investigate a hands-free solution for muting the surgeon’s microphone and to explore other speech recognition models that are less biased toward American accents.
“Our study shows that a general-purpose, voice-controlled C-arm system is possible—but the next step is to make it more robust and reliable in a variety of scenarios,” Killeen says. “We hope that by scaling up our simulation efforts even further, we can transform these enabling devices into full-fledged intelligent surgical assistants.”