Data mining uses database queries to search for hidden patterns in data. Little work has been done in searching medical image databases for hidden patterns [Brodley 1999]. A large number of computed tomography (CT) scans are produced regularly to follow the 8.2 million patients with a history of cancer in the US. Lung cancer screening of smokers is still controversial. If accepted it would result in an explosion of the number of chest CT scans to be analyzed.
Preliminary computer-aided diagnosis (CAD) systems have been developed that attempt to copy the rules that radiologists use in evaluating chest CT scans and detecting pulmonary nodules. However, a "gold standard" for these rules has not been established. More sophisticated and advanced database and data mining systems may be able to optimally use the information and knowledge stored in CAD systems and potentially improve the diagnostic capabilities of radiologists.
We plan to design indexing and data-mining algorithms for a database of chest CT scans. Database searches will be based on spatial and temporal properties of nodules, such as location, shape, and volumetric changes in consecutive CT studies. Queries such as "Where are the majority of stable nodules located?" and "Find a patient with a nodule that has a similar growth pattern" would be run on the database. These queries may reveal information about the differences between malignant and benign nodules. Our long-term goal is to discover properties and characteristics that can be used to assist physicians in interpreting diagnostic imaging studies.