Genome-wide Discovery of DNA Regulatory Motifs in C. elegans
Yueyi Liu1, Serafim Batzoglou2, Stuart K. Kim3, 4
1Stanford Medical Informatics, Stanford University, Stanford, CA 943052.
2Department of Computer Science, Stanford University, Stanford, CA 943053.
3Departments of Developmental Biology and Genetics, Stanford University, Stanford, CA 943054.
4Corresponding author
650-725-7671
650-725-7739 fax
kim@cmgm.stanford.edu
To identify cis-regulatory elements that control gene expression in C. elegans, we searched for DNA sequence motifs from 44 large sets of genes that are co-expressed either in specific DNA microarray experiments or across a compendium of 553 DNA microarray experiments. We used CompareProspector, a Gibbs sampling-based motif-finding program that biases the search in regions that are conserved between C. elegans and C. briggsae. 173 motif groups were found by CompareProspector in the promoter regions of these co-expressed genes. The motifs found by CompareProspector matched known regulatory sites in 8 out of 10 genes that had been previously studied. Furthermore, using site-directed mutagenesis and GFP reporters, we showed that two of the new sites predicted by CompareProspector are important for gene expression. Overall, we found DNA motifs in the promoter regions of 7,498 genes, elucidating the putative cis-regulatory controls guiding expression in 43 gene sets. Our study is a first step towards building a genome-wide regulatory network for C. elegans.
Supplementary Data
Supplementary Table 1. Genes in the 44 gene sets.
There are 173 motif groups that are enriched in these 44 sets of co-expressed genes. 69 of the 173motifs are found in genes that are significantly co-expressed than genes randomly selected from the same gene set. The rest of the motifs (104) are not signficantly co-expressed.
Supplementary Table 2. 173 motif groups that are significantly enriched.
Supplementary Table 3. 7498 genes with one or more of the 173 motifs.
Supplementary Table 4. Percentage of genes in/out gene set that have a given motif.
You can search for your favorite gene here.
Programs Used
CompareProspector: de novo motif search from the promoter sequences of co-regulated genes.
MatrixScan: given a motif matrix, search for the occurrences of the motif from a set of promoter sequences. This program is developed by Xiaole Shirley Liu, who has generously agreed to make the program public. You can download the Solaris Executable and a brief README. Please realize that the program is a research tool still in the development stage and that it is being supplied "as is", without any accompanying services or improvements from the developer.