Table 1. Families of orthologous genes regulated by transcription attenuation1
|
Representative COGs2 (p-value3) |
M4 |
|
|
Alanyl-tRNA synthetase |
COG0013Fi alaS (9e-5) |
3 |
|
Amino acid transport |
COG0833 lysP (5e-5) |
|
|
Arginyl-tRNA synthetase |
COG0018 ArgS (9e-5) |
3 |
|
Cobalt transport |
|
|
|
3-deoxy-D-arabino-heptulosonate 7-phosphate (DAHP) synthase |
|
|
|
Dissimilatory sulfite reductase |
|
|
|
DNA methylase |
COG0116Fi ypsC (2e-6) |
|
|
Exoribonuclease R |
|
|
|
Fe-S-cluster-containing hydrogenase components 2 |
|
|
|
Intracellular trafficking |
|
|
|
Isoleucyl-tRNA synthetase |
COG0060Fi ileS (3e-8) |
3 |
|
Leucyl-tRNA synthetase |
COG0495Fi leuS (9e-5) |
3 |
|
Membrane protein |
COG3601Fi ypaA (1e-6) |
4 |
|
Metal ion transport system |
COG1135Fi yusC (9e-8), COG2011Fi yusB (5e-7), COG1464Fi yusA (2e-6) |
4 |
|
Nucleoid DNA-binding protein |
COG0776Pr hupB (4e-9) |
|
|
Oxidation of intracellular sulfur |
COG1553Pr Uncharacterized (2e-6), COG2923Pr Uncharacterized (2e-7), COG2168Pr Uncharacterized (2e-7) |
|
|
Phenylalanyl-tRNA synthetase |
COG0016 pheS (8e-15), COG0072 pheT (4e-8), COG0073 pheT (8e-9), |
3 |
|
Polyribonucleotide nucleotidyltransferase |
COG1185 pnpA (3e-11) |
|
|
Predicted GTPase |
COG0536 yhbz (1e-9) |
|
|
Pyrimidine biosynthesis |
COG2065Fi pyrR (5e-10), COG2233Fi pyrP (8e-7), COG0540 pyrB (1e-8), COG0044Fi pyrC (4e-5), pyrDII (1e-5), COG0284Fi pyrF (1e-5) |
2 |
|
Pyrimidine biosynthesis |
COG0504Fi pyrG (2e-5) |
|
|
Regulator of length of lipopolysaccharide chains |
|
|
|
Riboflavin biosynthesis |
COG1985Fi ribD (5e-5), COG0307Fi ribB (5e-5), COG0117Fi ribD (4e-5) |
4 |
|
Ribose transport |
|
|
|
Ribosomal protein L36 |
2 |
|
|
Ribosomal structure |
COG0228 rpsP (1e-5) |
2 |
|
Ribosomal structure and RNA polymerase |
COG0093Pr rplN (6e-5), COG0199Pr rpsN (9e-5), COG0096Pr rpsH (4e-7), COG0097Pr rplF (1e-5), COG0256Pr rplR (6e-5), COG0098Pr rpsE (1e-5), COG1841Pr rpmD (1e-6), COG0200Pr rpl10 (1e-5), COG0201Pr secY (1e-5) |
2 |
|
RNA-binding protein |
COG3688Fi yacP (2e-6) |
|
|
SAM-dependent methyltransferase |
COG2384Fi ykfn (7e-6) |
4 |
|
Seryl-tRNA synthetase |
COG0172 Fi serS (1e-5) |
3 |
|
Subunits of ubiquinone oxidoreductase |
|
|
|
Thiamine biosynthesis |
COG0422Fi thiC (3e-5) |
4 |
|
Transcription antiterminator and ribosomal protein L10 |
|
|
|
Transcription elongation factor |
COG0782 greA (2e-10) |
|
|
Transcriptional regulator LysR family |
COG0583 lysR oxyR lysR metR ampR nhaR ptxR rbcR gltR mleR (9e-15) |
|
|
Translation and ribosomal structure, Undecaprenyl pyrophosphate synthase, CDP-diglyceride synthetase |
COG0264 tsf (7e-8), COG0020 uppS (2e-6), COG0575 cdsA (8e-6) |
|
|
Translation factors and ribosomal structure |
COG0779 Uncharacterized (7e-23), COG0195 nusA (1e-12), COG0858 rbfA (8e-6) |
|
|
Threonyl-tRNA synthetase, translation and ribosomal structure |
COG0441 thrS (1e-7), COG0290 infC (3e-10), COG0291 rpmI (1e-6) |
3 |
|
Tryptophan biosynthesis |
COG0147 trpE (2e-9), COG0512 trpG (1e-8), COG0547 trpD (4e-9), COG0134 trpC (4e-8), COG0135 trpF (4e-8), COG0133 trpB (4e-8), COG0159 trpA (4e-8) |
1, 2, 3 |
|
Tyrosyl-tRNA synthetase |
COG0162 tyrS (2e-5) |
3 |
|
Uncharacterized protein |
|
|
|
Valyl-tRNA synthetase |
COG0525 Fi valS (2e-6) |
3 |
1On the basis of existing genome annotation, the 400 nt immediately upstream of every gene in the 180 fully-sequenced bacterial genomes available in GenBank (ftp://ftp.ncbi.nih.gov/genbank/) were identified and examined. We selected only those genes predicted to be immediately preceded by a site of intrinsic transcription termination. These sequences were considered the analysis window in our compilation. In this window, we searched for a ârun of T'sâ (rts) that would correspond to a run of U's in RNA, an essential characteristic of an intrinsic terminator. The length of the rts selected was 6 nt, at least five of which must be T (âtsâ). The arbitrarily selected maximum acceptable spacing between the rts and the first base of the downstream gene or coding region, was 110 nt. For every rts, our program searched for the most stable secondary structure that could be formed within the preceding 60 nt. Initially, the secondary structures that could be formed from this 60 nt was predicted using the RNAfold program. The only secondary structure that was considered to be part of a terminator (T), was a stem-and-loop structure (SLS). If the sequence analyzed contained more than one SLS, only the one closest to the rts was considered to be the terminator. In the event that folding predictions yielded structures different from a SLS (e.g., cloverleaf-like), the first nt of the 60 nt long sequence was removed and a search for the SLS repeated recursively until the program located a sequence, which when folded, corresponded to the most stable SLS. This SLS was considered part of a T only if the base at the bottom of the stem was no more than 4 nt from the rts, the maximum number of loops in the structure was 3, and the energy of the structure was at least -0.2 kcal/mol per nucleotide. For every T found, the program then searched for the presence of an associated anti-terminator (AT). To do this the program initially scanned a 60-nt sequence immediately upstream of the middle of the T loop. The program then located the most stable SLS using the procedure described for the T analysis. A SLS was considered an AT on the basis of the following somewhat arbitrary preferences: If at least 3 bases of its stem overlapped with the T sequence, the maximum number of loops in the structure was 4, and if it had a free energy that was at least one fourth that of its corresponding T. The previous analysis was also repeated with each AT, searching for its corresponding anti-anti-terminator structure (AAT). To consider the AAT and AT as mutually exclusive structures, they must share at least 3 bases. The free energy of the AAT must be at least one fourth that of itâs associated AT. The predicted terminator/antiterminator attenuators as well as those attenuators that do not depend on an antiterminator structure can be found on our web page (http://cmgm.stanford.edu/~merino/).
2According to the Cluster of Orthologous Groups of proteins database (COG). Where there is a predominant phylum regulated by transcription attenuation within each COG, this is indicated according to GenBank listing (http://www.ncbi.nih.gov/Genbank/). Fi stand for Firmicutes and Pr stands for Proteobacteria. COGs associated with riboswitches or conserved motif sequences are indicated in bold face.
3p-value of the over-representation of genes regulated by transcription attenuation in a specific COG assuming a hypergeometric distribution. COGs with p-values lower than 1x10-4 or relevant COGs previously reported to have a clear tendency to be regulated by transcription attenuation are shown in the table. Only non-redundant strains were considered in our statistical analyses. We considered strains as redundant if they share more than 85% of their genes as orthologous genes with another organism previously included in this analysis.
4Indicates the type of transcription attenuation mechanism described in Box 1 that is used to regulate the members of this COG.