S-Figure 4. The order of operations used in the chromosomal cluster analysis. The challenge of chromosomal cluster analysis is to randomly sample the same number of genes considered in the query list of genes (experimentally derived or otherwise) from the same pool of genes in the genome that the query list of genes was restricted to. Only in this way can statistical comparisons be made between the number of clustered genes in the query list and the average number of clustered genes from the 10000 random samples. One hypothetical example of how this is done is shown here. A. Assessing the effective number of genes in the query list and scoring the number of genes clustered. The positions of all genes within the genome considered in the analysis is known (i), so that the positions of the query list of genes (ii) can be plotted. The genes within operons (blue arrows) in the query list are then unified (iii) so that only the starting position of the most 5 gene in the operon is considered in the cluster analysis. Next, the relationships between two directly neighboring paralogs (red arrows) are ignored and never considered clustered on their own (iv). The number of effective genes remaining is counted (v). In this example, there are 10 effective genes that may be considered in the cluster analysis. The number of genes that are within 10 kb of another gene within the query list is six. B. Assessing the number of clustered genes from randomly sampled lists. First, only those genes in the genome that the query list was restricted to are used in the pool of genes from which the samples are taken randomly (i). For example, although the C. elegans genome is made up of 19, 733 genes (Stein et al., 2001), our microarrays contain only 17, 223 of these genes. The genes identified through our microarray experiments are therefore limited to the 17, 223 genes, which is also to same pool of genes from which random samples are drawn for the chromosomal cluster analysis. Next, the genome is preprocessed for operons and paralogs (ii), where genes within the same operon are unified and paralogous neighbors are ignored for the cluster analysis (purple diagonal lines). From this preprocessed genome, the same number of effective gene as the query list are chosen (iii). The number of genes that are within 10 kb of another randomly chosen gene is then counted (iv). By repeating steps B(iii) and B(iv) 9, 999 more times, the average number of clustered genes from the random samples can be directly compared to the number of clustered genes from the query list and significance can be assessed.