Project: Data Mining of Medical Image Databases
Student Researchers: Sarah Dubauskas, Kenda Stewart
Advisor: Margrit Betke
Institution: Boston University





Goals and Purpose
Data mining uses database queries to search for hidden patterns in data. Little work has been done in searching medical image databases for hidden patterns [Brodley 1999]. A large number of computed tomography (CT) scans are produced regularly to follow the 8.2 million patients with a history of cancer in the US. Lung cancer screening of smokers is still controversial. If accepted, it would result in an explosion of the number of chest CT scans to be analyzed.

Preliminary computer-aided diagnosis (CAD) systems have been developed that attempt to copy the rules that radiologists use in evaluating chest CT scans and detecting pulmonary nodules. However, a "gold standard'' for these rules has not been established. More sophisticated and advanced database and data mining systems may be able to optimally use the information and knowledge stored in CAD systems and potentially improve the diagnostic capabilities of radiologists.

We plan to design indexing and data mining algorithms for a database of chest CT scans. Database searches will be based on spatial and temporal properties of nodules, such as location, shape, and volumetric changes in consecutive CT studies. Queries such as "Where are the majority of stable nodules located?" and "Find a patient with a nodule that has a similar growth pattern" would be run on the database. These queries may reveal information about the differences between malignant and benign nodules. Our long-term goal is to discover properties and characteristics that can be used to assist physicians in interpreting diagnostic imaging studies.

Process
Each student focused on a different aspect of medical image analysis. Kenda worked mainly with the goal of segmenting out the ribs from a series of Chest CT images. Segmentation is the process of dividing an image into meaningful regions. The first objective to be achieved was to find a way to isolate out the rib cage from the rest of the image, and determine the exact threshold at which only the bone would appear in the image. This would enable it to be possible to find a way to segment out each rib individually. This done, each rib can be labeled individually and located in each successive slice of the Chest CT scan. This is related to the overall goals of the project because by accomplishing the labeling of each rib individually, it is possible to locate the positions of other objects in the scan in relation to a specific rib, such as a tumor. This could greatly assist in helping physicians pinpoint objects within the scan, thereby increasing the effectiveness and localization of treatments.

Sarah's project involved helping in the testing and upgrading of previously developed algorithms designed to segment lungs from Chest CT images. Hours and hours were spent testing a medical image analysis program designed by other researchers on expansive sets of test data. The program outlines the lungs and trachea in the 100-200 successive image slices. Other researchers continuously updated the algorithms as errors were found. Sometimes the program itself would become "confused", mislabeling the trachea as part of a lung or missing a lung entirely. In these cases, the Hounsfield units the program was using had to be changed to get an accurate segmentation. This proved to be a long process, even when recording what worked best for a previous similar image slice. Once an entire data file was segmented properly, the results were converted into a text file of the lung contour.

Conclusion
Medical image analysis is a rapidly growing and changing field. Our project has scratched the surface of work that will be continued by numerous researchers across the globe for years to come.
We did, however, achieve some valuable results during the past year:
Regarding rib segmentation:
The threshold that should be used to best view the rib cage in a binary image of a Chest CT scan was determined. Numerous data sets were collected at this optimal threshold, and can later be an alyzed and utilized in achieving the rib segmentation goal. Morphological operations were used on the binary images, which enabled an unders tanding of the capabilities that these operations can achieve regarding smoothness of an image, and improvement of connectivity. Finally, after the sternum and vertebrae were removed from the image, a ground truth data set was generated, in which a box was placed around each rib identifying it as an object.

Regarding lung segmentation:
Once researchers are sure the lungs can be properly segmented from CT images, re searchers can take this data and apply to a much greater goal of the project: correct nodule segmentation. Right now, different computer vision methods of nodule segmentation are being tested and them compared to a hand-segmentation do ne by a radiologist to determine efficiency. It is quite hard to follow a nodule, cancerous or not, and correctly identify it through a series of images. Once this is possible, the work load of the radiologist will be greatly lessened . During this specific project, an important ground-truth data set for lung segmentation was established that can be used by these researchers.

The medical image analysis program itself was updated to account for errors that were unforeseen, even after previous test trials.

Publications
For additional information, student web pages have been created detailing work that has been completed thus far:
http://cs-people.bu.edu/jstewart/RibSeg.htm
http://cs-people.bu.edu/sdubs/CREW