We considered four methods to
generate sets of orthologous genes: 1).
clusters of orthologous groups
(COGS) defined by NCBI, 2). eukaryotic
gene orthologs (EGO) defined by TIGR, 3). reciprocal best blast hits, and
4). transitive reciprocal best blast hits.
Transitivity indicates that if human gene A is a reciprocal best blast
hit of worm gene B and fly gene C, then worm gene B and fly gene C also need to
be reciprocal best blast hits in order for genes A, B and C to be grouped as
orthologs.
We could not use the definition of
orthologs found in the COG database because some sets of orthologs contained a
large number of genes from a single organism.
For example, in some cases over 100 genes from C. elegans were
grouped together. Having a large number
of genes from a single organism would complicate the gene correlations; a
single human gene would have Pearson correlations to each of the 100 worm genes
in the same orthologous group.
We could not use the EGO database
because the same gene was sometimes assigned to separate orthologous
groups. For example, tentative
orthologs 336024, 350993 and 402694 each contain the same yeast gene encoding
nuclear transport factor 2. Having
multiple orthologous groups would complicate gene correlations since a gene
from one organism would have Pearson correlations for each group.
We used reciprocal best blast hits to define orthologous genes (Fig. S4 A), and then compared results using this approach to results using the other three approaches. First, we found that most meta-genes contained a single gene from each organism (Fig. S4 B), so this approach avoids the problem found with the COG database. Second, we found that over 78% of the meta-genes exhibited transitive relationships. This result indicates that this method and the approach requiring transitivity would generate similar sets of orthologous genes, although the transitive method would be somewhat more restrictive and hence generate a smaller number of meta-genes. Third, we compared the meta-genes to the sets of orthologs defined by the EGO database. We examined a random set of 47 meta-genes, which contain a total of 201 pairs of orthologous genes. Of these 201 ortholog pairs, 184 (91.5%) were also linked together in the EGO database. Hence, the method used in this paper and the method used by the EGO database generate the same orthology relationship in most cases. Fourth, it could be that some of the links do not identify true orthologs but rather close homologs; this might be the case for the 22% of meta-genes that do not exhibit transitivity. In order to determine whether using a close homolog rather than an ortholog would significantly affect the network relationships, we calculated the Pearson correlation between close homologs and found them to be significantly high. This result indicates that the meta-gene network would yield similar results using close homologs rather than true orthologs.