Construction of the multiple species co-expression network



We identified conserved expression interactions using a probabilistic method. For every pair of meta-genes, we computed the probability of observing their gene-gene correlations by chance using the technique of order statistics. We computed the P-value for every directed pair of meta-genes (m,m) based on their gene correlations in n species. Let gms be a gene belonging to meta-gene m in species s. We ranked all of the other genes relative to gms based on their Pearson correlation and then divided the rank by the total number of genes with meta-genes (and with data) in organism s, yielding n rank ratios for the (m,m) pair, r1, r2, , rn. To find out how significant the gene correlations of the pair are, we computed the probability of getting the observed rank ratios by chance where the order of the species did not matter. As more organisms are added, the chance occurrence of high rank ratio combinations increases dramatically (grows as n! where n is the number of organsisms). If we assume the rss are drawn independently and uniformly, then we can compute the P-value from the joint cumulative distribution of an n-dimensional order statistic:

See for a good description of the joint distribution of n order statistics. We can efficiently compute the above with the recursive formula:

where r0=0 and the recursive call to P supplies all of the original arguments except the (n‑i+1)th argument. Since we included 4 species in the analysis we used n=4.


We then connected any two meta-genes containing significantly low interaction P-values. To correct for the multiple tests performed, we used an adjusted P-value cutoff. Specifically, for a significance level of α=0.05, we included any meta-gene interactions with P-values less than α/N where N was the total number of meta-genes containing data in at least two organisms. In our case, N=4725, giving a P-value cutoff of 1.05x10-5. Using this cutoff, we expected = 236 false positives.