आईएसएसएन: 0974-276X
Jing Ding, Daniel Berleant, Jun Xu, Kenton Juhlin, Eve Wurtele and Andy Fulmer
The rapid development of microarray and other genom ic technologies now enables biologists to monitor t he expression of hundreds, even thousands of genes in a single experiment. Interpreting the biological m eaning of the expression patterns still relies largely on biologist's domain knowledge, as well as on information collected from the literature and various public databases. Yet i ndividual experts’ domain knowledge is insufficient for large data sets, and collecting and analyzing this information manually from the literature and/or public databases is tedious and time-consuming. Computer-aided functional analy sis tools are therefore highly desirable.
We describe the architecture of GeneNarrator, a tex t mining system for functional analysis of microarr ay data. This system’s primary purpose is to test the feasib ility of a more general system architecture based o n a two-stage clustering strategy that is explained in detail. Gi ven a list of genes, GeneNarrator collects abstract s about them from PubMed, then clusters the abstracts into funct ional topics in a first clustering stage. In the s econd clustering stage, the genes are clustered into groups based on similarities in their distributions of occurrence across topics. This novel two-stage architecture, the primary cont ribution of this project, has benefits not easily p rovided by one- stage clustering.