Table 1. Families of orthologous genes regulated by transcription attenuation1


General function of the Transcript unit

Representative COGs2 (p-value3)

M4

Alanyl-tRNA synthetase

COG0013Fi alaS (9e-5)

3

Amino acid transport

COG0833 lysP (5e-5)


Arginyl-tRNA synthetase

COG0018 ArgS (9e-5)

3

Cobalt transport

COG4721Fi ykoE (2e-5)


3-deoxy-D-arabino-heptulosonate 7-phosphate (DAHP) synthase

COG0722Pr aroH (6e-5)


Dissimilatory sulfite reductase

COG2920Pr yccK (1e-5)


DNA methylase

COG0116Fi ypsC (2e-6)


Exoribonuclease R

COG0557Fi rnr (2e-5)


Fe-S-cluster-containing hydrogenase components 2

COG1142Pr hycB (5e-7)


Intracellular trafficking

COG0481 lepA (6e-5), COG0681 lepB (9e-5)


Isoleucyl-tRNA synthetase

COG0060Fi ileS (3e-8)

3

Leucyl-tRNA synthetase

COG0495Fi leuS (9e-5)

3

Membrane protein

COG3601Fi ypaA (1e-6)

4

Metal ion transport system

COG1135Fi yusC (9e-8), COG2011Fi yusB (5e-7), COG1464Fi yusA (2e-6)

4

Nucleoid DNA-binding protein

COG0776Pr hupB (4e-9)


Oxidation of intracellular sulfur

COG1553Pr Uncharacterized (2e-6), COG2923Pr Uncharacterized (2e-7), COG2168Pr Uncharacterized (2e-7)


Phenylalanyl-tRNA synthetase

COG0016 pheS (8e-15), COG0072 pheT (4e-8), COG0073 pheT (8e-9),

3

Polyribonucleotide nucleotidyltransferase

COG1185 pnpA (3e-11)


Predicted GTPase

COG0536 yhbz (1e-9)


Pyrimidine biosynthesis

COG2065Fi pyrR (5e-10), COG2233Fi pyrP (8e-7), COG0540 pyrB (1e-8), COG0044Fi pyrC (4e-5), pyrDII (1e-5), COG0284Fi pyrF (1e-5)

2

Pyrimidine biosynthesis

COG0504Fi pyrG (2e-5)


Regulator of length of lipopolysaccharide chains

COG3765Pr wzzE (5e-5)


Riboflavin biosynthesis

COG1985Fi ribD (5e-5), COG0307Fi ribB (5e-5), COG0117Fi ribD (4e-5)

4

Ribose transport

COG1869 rbsD (2e-6)


Ribosomal protein L36

COG0257Pr rpmJ (1e-7)

2

Ribosomal structure

COG0228 rpsP (1e-5)

2

Ribosomal structure and RNA polymerase

COG0093Pr rplN (6e-5), COG0199Pr rpsN (9e-5), COG0096Pr rpsH (4e-7), COG0097Pr rplF (1e-5), COG0256Pr rplR (6e-5), COG0098Pr rpsE (1e-5), COG1841Pr rpmD (1e-6), COG0200Pr rpl10 (1e-5), COG0201Pr secY (1e-5)

2

RNA-binding protein

COG3688Fi yacP (2e-6)


SAM-dependent methyltransferase

COG2384Fi ykfn (7e-6)

4

Seryl-tRNA synthetase

COG0172 Fi serS (1e-5)

3

Subunits of ubiquinone oxidoreductase

COG4659Pr rnfG (4e-5), COG4660Pr rnfE (8e-5)


Thiamine biosynthesis

COG0422Fi thiC (3e-5)

4

Transcription antiterminator and ribosomal protein L10

COG0250 musG (1e-7), COG0244 rplJ (4e-7)


Transcription elongation factor

COG0782 greA (2e-10)


Transcriptional regulator LysR family

COG0583 lysR oxyR lysR metR ampR nhaR ptxR rbcR gltR mleR (9e-15)


Translation and ribosomal structure, Undecaprenyl pyrophosphate synthase, CDP-diglyceride synthetase

COG0264 tsf (7e-8), COG0020 uppS (2e-6), COG0575 cdsA (8e-6)


Translation factors and ribosomal structure

COG0779 Uncharacterized (7e-23), COG0195 nusA (1e-12), COG0858 rbfA (8e-6)


Threonyl-tRNA synthetase, translation and ribosomal structure

COG0441 thrS (1e-7), COG0290 infC (3e-10), COG0291 rpmI (1e-6)

3

Tryptophan biosynthesis

COG0147 trpE (2e-9), COG0512 trpG (1e-8), COG0547 trpD (4e-9), COG0134 trpC (4e-8), COG0135 trpF (4e-8), COG0133 trpB (4e-8), COG0159 trpA (4e-8)

1, 2, 3

Tyrosyl-tRNA synthetase

COG0162 tyrS (2e-5)

3

Uncharacterized protein

COG3501 Pr Uncharacterized (2e-6)


Valyl-tRNA synthetase

COG0525 Fi valS (2e-6)

3

1On the basis of existing genome annotation, the 400 nt immediately upstream of every gene in the 180 fully-sequenced bacterial genomes available in GenBank (ftp://ftp.ncbi.nih.gov/genbank/) were identified and examined. We selected only those genes predicted to be immediately preceded by a site of intrinsic transcription termination. These sequences were considered the analysis window in our compilation. In this window, we searched for a “run of T's” (rts) that would correspond to a run of U's in RNA, an essential characteristic of an intrinsic terminator. The length of the rts selected was 6 nt, at least five of which must be T (“ts”). The arbitrarily selected maximum acceptable spacing between the rts and the first base of the downstream gene or coding region, was 110 nt. For every rts, our program searched for the most stable secondary structure that could be formed within the preceding 60 nt. Initially, the secondary structures that could be formed from this 60 nt was predicted using the RNAfold program. The only secondary structure that was considered to be part of a terminator (T), was a stem-and-loop structure (SLS). If the sequence analyzed contained more than one SLS, only the one closest to the rts was considered to be the terminator. In the event that folding predictions yielded structures different from a SLS (e.g., cloverleaf-like), the first nt of the 60 nt long sequence was removed and a search for the SLS repeated recursively until the program located a sequence, which when folded, corresponded to the most stable SLS. This SLS was considered part of a T only if the base at the bottom of the stem was no more than 4 nt from the rts, the maximum number of loops in the structure was 3, and the energy of the structure was at least -0.2 kcal/mol per nucleotide. For every T found, the program then searched for the presence of an associated anti-terminator (AT). To do this the program initially scanned a 60-nt sequence immediately upstream of the middle of the T loop. The program then located the most stable SLS using the procedure described for the T analysis. A SLS was considered an AT on the basis of the following somewhat arbitrary preferences: If at least 3 bases of its stem overlapped with the T sequence, the maximum number of loops in the structure was 4, and if it had a free energy that was at least one fourth that of its corresponding T. The previous analysis was also repeated with each AT, searching for its corresponding anti-anti-terminator structure (AAT). To consider the AAT and AT as mutually exclusive structures, they must share at least 3 bases. The free energy of the AAT must be at least one fourth that of it’s associated AT. The predicted terminator/antiterminator attenuators as well as those attenuators that do not depend on an antiterminator structure can be found on our web page (http://cmgm.stanford.edu/~merino/).


2According to the Cluster of Orthologous Groups of proteins database (COG). Where there is a predominant phylum regulated by transcription attenuation within each COG, this is indicated according to GenBank listing (http://www.ncbi.nih.gov/Genbank/). Fi stand for Firmicutes and Pr stands for Proteobacteria. COGs associated with riboswitches or conserved motif sequences are indicated in bold face.


3p-value of the over-representation of genes regulated by transcription attenuation in a specific COG assuming a hypergeometric distribution. COGs with p-values lower than 1x10-4 or relevant COGs previously reported to have a clear tendency to be regulated by transcription attenuation are shown in the table. Only non-redundant strains were considered in our statistical analyses. We considered strains as redundant if they share more than 85% of their genes as orthologous genes with another organism previously included in this analysis.


4Indicates the type of transcription attenuation mechanism described in Box 1 that is used to regulate the members of this COG.