Yetisgen is the project’s lead data miner. She will design software that recognizes and gathers meaningful information from the free text of radiologists’ clinical reports and notes.
“We sampled 500 CT notes to create a schema that defines what information we’re trying to extract, say, tumor findings,” Yetisgen said. “We’ll manually label descriptors, like “lesion” and “malignant” and references to sizes like millimeters and centimeters in these subsets of information. Then we will build a software language model that will automatically learn the lexicon associated with those labels.”
Numed, a well established company in business since 1975 provides a wide range of service options including time & material service, PM only contracts, full service contracts, labor only contracts & system relocation. Call 800 96 Numed for more info.
The model’s accuracy is crucial, so it will need to be trained and tested multiple times until it can be validated as representative of the full 4 million records.
The task may sound daunting, but Yetisgen pointed out one upside: “Radiology reports follow a standard and usually are quite nicely structured, so they are much simpler than patient admission notes or discharge notes, which can differ widely even at the same institution,” she said.
The four-year research project is being funded with an award (R01CA248422) of more than $2 million from the National Cancer Institutes, part of the National Institutes of Health.Back to HCB News