S-Table 1. The master data file.  This table contains all relevant raw data, processed data, and annotation for all the genes considered in the analysis.


An internal reference number is given (column A). The microarray spots considered valid in this analysis are numbered from 1 to 17703 (column B).  The gene names assigned to the DNA fragments used on our microarrays have changed from the time the microarrays were designed until now (November 2001).  Thus, there is a possibility that a segment of DNA on the microarray has had multiple gene name assignments through time.  The gene name used throughout our analysis is shown (column C) along with the current gene name prediction (column D). The gene name assignment to each particular segment of DNA in the genome has changed because of the changing prediction of gene start and end points, and will continue to change with better predictions in the future and more empirical evidence of gene structure.  We therefore internally refer to each segment of DNA on the microarray as an immutable Stanford University Identification number (SUID, column E). While gene assignments will change in the future, the SUID will not, and is therefore the best reference number for future considerations of this data. The chromosome number (column F) and the predicted initiator codon (column G) and end base pair number (column H) is given.    The description of each gene obtained from Proteome™ is given in column I.  The paralog family number of each gene is in column J (see experimental procedure for details). The operon family assignments (courtesy of Tom Blumenthal, manuscript submitted) is in column K.  The raw data for the L1 muscle experiments is given in columns L-Q (ratios) and columns R-W (converted percentile ranks), the average ranks and associated P values (see experimental procedures) are in columns X and Y, respectively.  If the gene is considered an L1 muscle gene (p<0.001 and mean percentile rank>global mean), then “YES” is placed in column Z.  If the L1 muscle gene is considered clustered with another muscle gene within 10 kb of another muscle gene, then “YES” is in column AA. Similar data is given for the mock mRNA-tagging N2 microarray experiments in columns AB-AL.  The 650 Sperm genes are indicated in column AM, while the clustered sperm genes are in column AN. Similar data is given for the gene lists presented in the paper in columns AO-AV.  If no data is available for a given entry, NaN is assigned.