Tumor segmentation, the process of identifying and outlining the precise boundaries of a tumor in medical images, is vital for accurate cancer diagnosis, surgery, and radiotherapy planning—as well as training AI tools to help with those tasks. But segmenting CT scans by hand is time-consuming and expensive, taking even seasoned radiologists 30 to 60 minutes per scan and adding extra tasks to their already busy workload.

That’s why Johns Hopkins researchers have developed a new AI-driven tumor segmentation method that takes advantage of something radiologists already do every day: writing reports. The team presents its findings at the International Conference on Medical Image Computing and Computer Assisted Intervention in South Korea this week, where the work has already been nominated and shortlisted for the conference’s Best Paper and Young Scientist Awards, a distinction afforded only to the top three percent of papers.

“Even though hospitals have millions of CT scans, only a few hundred have accompanying tumor segmentations. By contrast, almost every scan has an accompanying radiology report in which radiologists describe what they see, such as whether and where a tumor is present,” explains first author Pedro R.A.S. Bassi, a visiting PhD student from the University of Bologna and the Italian Institute of Technology. “Writing reports is part of a radiologist’s everyday job—and we can use that to our advantage.”

Led by Zongwei Zhou, a Malone Center member and an incoming assistant research professor in the Department of Computer Science, and Bloomberg Distinguished Professor Alan Yuille, Bassi and a team from Johns Hopkins’ Computational Cognition, Vision, and Learning group developed a new methodology to train AI models from these reports.

First, the researchers fed each report into a large language model like ChatGPT, which used the text to extract valuable information about the tumors present in the associated scan, such as location, size, and number of growths. Then, they trained a segmentation AI model to “draw” around areas in each scan matching this information.

“To power this new training methodology, which we call R-Super, we created a new ‘loss function’—a mathematical equation used to correct the AI when it draws tumors with the wrong size, location, or count as compared to the report,” Bassi says. “This let us use the reports to expand our training data more than a hundredfold, without requiring any extra work from radiologists.”

The team compared R-Super with state-of-the-art methods on multiple tumor detection and segmentation metrics and found that it surpassed all of them. R-Super also takes less than one minute to segment all the tumors in a CT scan using state-of-the-art equipment—and even on older computers, like those at low-resource hospitals or in remote regions, R-Super only slows down to two minutes, they say.

“This is a huge improvement compared to the time it takes to manually segment each tumor,” Bassi says.

The ability to use radiology reports to teach AI how to segment tumors unlocks a massive amount of large-scale training data, which will significantly boost AI performance—especially for rarer, understudied tumors, the researchers say. In fact, they report that using R-Super allows existing AI detection tools to find far more tumors than if they were trained on manual segmentations alone.

“The earlier and more accurate tumor detection provided by R-Super can drastically improve patient survival, turning hundreds of thousands of routine hospital reports into invaluable AI training fuel,” Bassi says.

The team has already combined a large public dataset of about 2,000 manual segmentations with over 100,000 radiology reports, creating the largest tumor segmentation dataset to date. The researchers plan to use this dataset to create a new AI model that’s capable of detecting 16 different types of cancer.

“The best tumor detection AI tools to date have been trained with less than 10,000 CT scans,” Bassi says, “so we expect our new AI to be highly impactful and to strongly improve tumor detection performance, thus advancing the early detection of cancer.”

Learn more about the team’s ongoing efforts here.

Additional authors of this work include CS PhD students Wenxuan Li and Jieneng Chen, alumnus Tianyu Lin, Engr ’25 (MSE), and researchers from the University of Bologna, the Italian Institute of Technology, University of California, San Francisco, the University of California, Berkeley, and L’École Polytechnique Fédérale de Lausanne.

This research was supported by the Lustgarten Foundation for Pancreatic Cancer Research, the Patrick J. McGovern Foundation Award, and the National Institutes of Health.