Bioinformatics projects: Genetic networks

Global discovery of conserved genetic modules (joint research project with Stuart Kim, Developmental Biology, Art Owen, Statistics, and Josh Stuart, Bioengineering, UCSC )

DNA microarrays provide us with a first step towards uncovering gene function on a global scale. Since genes that participate in the same pathway are often co-regulated, functionally-related genes often exhibit expression patterns that are correlated under a large number of diverse conditions in DNA microarray experiments. Furthermore, gene interactions that are physiologically significant should be conserved through evolution, so that orthologous pairs of genes should show similar expression correlations in DNA microarray data from diverse organisms.

We are assembling all available DNA microarray data from several key organisms (human, mouse, fly, worm and yeast), and finding sets of orthologs that are co-expressed in multiple organisms. This conservation implies that the co-expression of these gene pairs confers a selective advantage and therefore that these genes are functionally related. Many of these relationships provide strong evidence for the involvement of new genes in core biological functions such as the cell cycle, secretion, and protein expression.

Furthermore, in addition to learning about the function of individual genes, we can use the network to analyze entire sets of genes in order to understand the system as a whole. For example, some pathways have gene interactions that are evolving rapidly whereas gene interactions in others are stable. We have characterized the connectivity properties of the gene co-expression network as a whole, and have found that some genetic pathways are designed to be large and others are engineered to be small.

Stuart, J. M.*, Segal, E.*, Koller, D#, and Kim, S. K#. A gene co-expression network for global discovery of conserved genetic modules, Science, 302, 249-255, 2003. *co first authors, #co-last authors

Genome wide discovery of DNA regulatory motifs in C. elegans (joint research project with Stuart Kim, Developmental Biology and Serafim Batzoglou, Computer Science )

Dissection of DNA regulatory motifs that control gene expression is one of the great challenges of functional genomics. Until recently, detection of regulatory motifs was done one gene at a time through mutagenesis studies. To identify C. elegans regulatory motifs on a genome scale, we first compiled many sets of co-regulated genes from individual microarray experiments as well as the C. elegans topomap. These sets of co-regulated genes involve many growth conditions, developmental stages, and varieties of mutants. We also developed computational tools to predict regulatory motifs from the promoter sequences of co-regulated genes and evaluate their statistical significance. Our motif-finding program, CompareProspector, takes advantage of C. elegans/C.briggsae sequence comparison for the prediction of putative motifs. The statistical significance of each motif predicted was evaluated using criteria such as motif enrichment. From the 44 sets of co-expressed genes encompassing 9,507 genes, or half of the C. elegans genome, we identified many significant regulatory motifs. These motifs offer new insight into biology. For example, a motif with the consensus TGATAA is identified from several aging-related datasets. The consensus matches the consensus of known binding sites for GATA transcription factors, suggesting that GATA factors may be involved in worm aging. We also validated some of the motifs and their individual binding sites using mutagenesis studies.