|
|
" Mining Patterns from Protein Structures"
Wei Wang, University of North Carolina at Chapel Hill
One of the next great frontiers in molecular biology is to understand,
and predict protein function. Proteins are simple linear chains of
polymerized amino acids (residues) whose biological functions are
determined by the three-dimensional shapes that they fold into. Hence,
understanding proteins requires a unique combination of chemical and
geometric analysis. A popular approach to understanding proteins is to
break them down into structural sub-components called motifs. Motifs
are recurring structural and spatial units that are frequently
correlated with specific protein functions. Traditionally, the
discovery of motifs has been a laborious task of scientific
exploration.
In this talk, I will discuss recent data-mining algorithms that we
have developed for automatically identifying potential spatial
motifs. Our methods automatically find frequently occurring
substructures within graph-based representations of proteins. We
represent each protein's structure as a graph, where vertices
correspond to residues. Two types of edges connect residues: sequence
edges connect pairs of adjacent residues in the primary sequence, and
proximity edges represent physical distances, which are indicative of
intra-molecular interactions. Such interactions are believed to be key
indicators of the protein's function.
This representation allows us to apply innovative graph mining
techniques to explore protein databases and associated protein
families. The complexity of protein structures and corresponding
graphs poses significant computational challenges. The kernel of our
approach is an efficient subgraph-mining algorithm that detects all
(maximal) frequent subgraphs from a graph database with a
user-specified minimal frequency. Our algorithm uses the pattern
growth paradigm with an efficient depth-first enumeration scheme,
searching through the graph space for frequent subgraphs. Our most
recent algorithms incorporate several improvements that take into
account specific properties of protein structures.
Bio:
Wei Wang is an associate professor in the Department of Computer
Science and a member of the Carolina Center for Genomic Sciences at
the University of North Carolina at Chapel Hill. She received a MS
degree from the State University of New York at Binghamton in 1995 and
a PhD degree in Computer Science from the University of California at
Los Angeles in 1999. She was a research staff member at the IBM
T. J. Watson Research Center between 1999 and 2002. Dr. Wang's
research interests include data mining, bioinformatics, and
databases. She has filed seven patents, and has published one
monograph and more than 90 research papers in international journals
and major peer-reviewed conference proceedings. Dr. Wang received the
IBM Invention Achievement Awards in 2000 and 2001. She was the
recipient of a UNC Junior Faculty Development Award in 2003 and an NSF
Faculty Early Career Development (CAREER) Award in 2005. She was named
a Microsoft Research New Faculty Fellow in 2005. Dr. Wang is an
associate editor of the IEEE Transactions on Knowledge and Data
Engineering and ACM Transactions on Knowledge Discovery in Data, and
an editorial board member of the International Journal of Data Mining
and Bioinformatics. She serves on the program committees of
prestigious international conferences such as ACM SIGMOD, ACM SIGKDD,
VLDB, ICDE, EDBT, ACM CIKM, SDM, IEEE ICDM, and SSDBM.
|