pJIM20 was used for the promoter::reporter fusions, which contains a his-24::mCherry reporter and unc-119 selection marker (Murray et al., 2008). Gene expression is driven by the promoter from the gene of interest. The upstream regions were inserted into pJIM20 either by Gateway recombination using promoter constructs from the promoterome (Dupuy et al., 2004) or from DNA fragments generated by PCR from genomic DNA. Promoters from the promoterome include the region upstream of the ATG start codon extending either to the next gene or 2 kb, whichever comes first. Those generated by PCR from genomic DNA include the 2-5 kb region of DNA upstream of the start codon.
The DNA constructs were introduced into unc-119(e3) worms by microparticle bombardment (Praitis et al., 2001), and transgenic worms containing integrated copies were obtained by screening for stable lines that did not segregate Unc progeny. The integrated transgenic strains were crossed with PD4251, which is a strain containing a myo-3::GFP reporter that is expressed in body muscle nuclei (Fire et al., 1998). Animals that were homozygous for the mCherry reporter and the myo-3::GFP reporter insertions were selected, and used for image analysis. Detailed information on promoters and strains is in Supplemental Table 1.
To obtain worms early in the L1 stage, eggs were isolated and those that hatched within a 3 hour time window were used for image analysis. Although many genes show stable expression during this three hour time window, some genes (such as sod-3) show dynamic expression at this time in which case variability in expression could be caused by differences in developmental stage. L1 larvae were frozen in liquid nitrogen, thawed in -20°C acetone and then fixed in 200 ml fresh 5% formaldehyde in Ruvkun's Witches Brew (80 mM KCl, 20mM NaCl, 10 mM Na2EGTA, 5 mM Spermidine HCl, 15 mM PIPES pH 7.4) for 1 hr at room temperature (Ruvkun and Giusto, 1989). The fixed specimens were washed with 200ml TTB (100 mM Tris-HCl pH 7.4, 1% Triton X-100,1 mM EDTA) and then stained in 200 ml 0.5 μg/ml DAPI for 1 hr. The worms were washed with TTB and then mounted in 60% glycerol.
Images were obtained using a Leica SP2 AOBS confocal microscope. Each focal plane was scanned sequentially with a 63X lens with 1 Airy pinhole size. The pixel size was 0.116 μm in the x-y plane and 0.122 μm in the z direction, and a scan speed of 200 Hz was used.
Automatic cell lineage analysis
The 3D image stacks of worms were straightened computationally along the anterior-posterior axis (Peng et al., 2008). The cell nuclei in each image stack was segmented automatically (Long et al., 2008), and then manually edited using the WANO interactive interface (Long et al., 2008; Peng et al., 2009) The identities of 357 nuclei were automatically identified using the 82 GFP-labeled nuclei as landmarks with 86% accuracy. Six additional nuclei were manually identified.
Manual editing of nuclei:
The named nuclei were manually corrected using WANO, and an additional six nuclei were manually annotated according to Sulston et al., 1977 and www.wormatlas.org. However, three neighboring nuclei in hyp 7 (ABpraapppp, ABarppaapa and ABarpaappp) have variable locations relative to each other, and could not be reliably identified. Furthermore, some pairs of nuclei in the midline have ambiguous cell lineage identities. For example, for hyp3 nuclei, one cell nucleus (AB.plaapaaaa) migrates into the midline from the left and another cell nucleus (AB.praapaaaa) migrates into the midline from the right. These two nuclei form an A-P pair, but the nucleus originating from the right side can be either the anterior or posterior nucleus in the midline. For convenience, we represent the anterior nucleus of a pair by (lr), while the posterior one by (rl). We did this similarly for other pairs of nuclei with ambiguous lineage identities. For the pharyngeal muscles, the anterior nucleus is denoted (ap) and the posterior one is denoted (pa).
hyp3: anterior = ABp(lr)aapaaaa; posterior = ABp(rl)aapaaaa
hyp4: anterior = ABp(lr)aappaa; posterior = ABp(rl)aappaa
hyp6: anterior = ABp(lr)aappap; posterior = ABp(rl)aappap
hyp7: anterior = ABp(lr)aapppa; posterior = ABp(rl)aapppa
hyp7: anterior = ABp(lr)appppa; posterior = ABp(rl)appppa
hyp10: anterior = ABp(lr)ppppppp; posterior = ABp(rl)ppppppp
DB1/DB3: anterior = ABp(lr)paaaapp; posterior = ABp(rl)paaaapp
Pmaryngeal muscle 2, dorsal pair: anterior = ABaraap(ap)apa; posterior = ABaraap(pa)apa
Pmaryngeal muscle 2, left pair: anterior = ABalpaaa(ap)a(pa); posterior = ABalpaaa(pa)a(ap)
Pmaryngeal muscle 2, right pair: anterior = ABarapaa(ap)a(pa); posterior = ABarapaa(pa)a(ap)
Gene expression measurement
For every cell nucleus, the automatic cell lineage annotator measures the total volume of the nucleus, the total mCherry intensity summed over every voxel within the nucleus, and the total DAPI intensity summed over every voxel in the nucleus. The raw mCherry values were adjusted to account for background fluorescence and for loss of intensity due to distance of the focal plane from the objective.
To measure background mCherry fluorescence, 10 pseudo-nuclei of equal size were drawn in the digestive tract of each worm. The background mCherry was measured in each false nucleus, and then the average fluorescence in the mCherry channel of all ten false nuclei was calculated. The density of the background mCherry is the average background fluorescence in the mCherry channel divided by the average size of the pseudo-nuclei. To find out the amount of background fluorescence for each nucleus, the background mCherry density is multiplied by the size of the cell nucleus. To find the adjusted level of mCherry for each nucleus, the background mCherry level was subtracted from the raw mCherry level. A similar approach was used to calculate the adjusted DAPI levels.
To account for effects on mCherry fluorescence caused by different depths in the confocal image stack, DAPI fluorescence was used as a normalization control because all nuclei in the newly-hatched worm have the same DNA content. We calculated a normalized DAPI fluorescence level for each nucleus by dividing the adjusted DAPI fluorescence level of each nucleus by the median DAPI fluorescence level of all nuclei in the worm. We then calculated the normalized mCherry level for each nucleus by dividing its adjusted mCherry level by its normalized DAPI level. The normalized mCherry level is the level of fluorescence in a nucleus after background fluorescence has been subtracted and after correcting for variable distances on the z-axis. If the normalized level is negative, we used 1 instead.
Multiple worms were imaged for each mCherry reporter, and twelve reporters were used to generate multiple transgenic lines. To show the average level of gene expression in each nucleus, we used the median normalized mCherry expression value from all images.
Many mCherry reporters are expressed in a small fraction of nuclei. For 72 images, we only identified those nuclei that show mCherry expression to expedite the annotation process. Prior to annotation, the mCherry channel was inspected to identify those nuclei with expression above a threshold value. We used a threshold that is 3 fold of the standard deviation of the background fluorescence from the ten pseudonuclei or mCherry fluorescence level in pseudonuclei, whichever was lower. Nuclei that expressed mCherry below this threshold level were not annotated. However, for each reporter, at least one image is fully annotated for all 363 nuclei.
dsRNA of C08B11.3 was induced in E. coli with 100 μl of 0.1 M IPTG. Worms at the L4 larvae stage were added to the plates, incubated 2 days and L1 progeny larvae were scored. ajm-1::GFP is described in (Mohler et al., 1998).
Gene expression terrain map
The transcription profile of the 363 nuclei was analyzed using Genesis (Sturn et al., 2002) to cluster these nuclei in a 2-Dimensional plane according to their similarity of gene expression. The log2(gene expression) is used to calculate correlations between every pair-wise combination of nuclei. For each nucleus, the similarity between it and the top 20 nuclei with the strongest correlations were used as attractive forces to these 20 nuclei. A constant force repels each nucleus from groups of other nuclei. The Genesis algorithm positions nuclei relative to each other under the influence of attractive and repulsive forces in an x-y plane. This way, nuclei with high correlation are placed near to each other.
The cell lineage was used to construct a graph of nodes and directed edges, where nodes represent cells that are connected by directed edges from parent, , to daughter cell, . Since we are working with a partially-annotated worm, we chose to disregard all un-annotated cells by removing the corresponding leaf nodes in the lineage. In addition, we excluded all internal nodes from the graph that contained only un-annotated leaves in their subtree. Finally, the tree was transformed into a bifurcating tree. All internal nodes that do not have two included children are also removed, and a direct path is created from the most recent included ancestor, and the first included descendent. The latter would meet one of two conditions. (1) It has two included daughters or (2) it is an annotated leaf node. Therefore, the tree consists only of annotated cells, and their common ancestral cells. Figure 5 shows the complete cell lineage, where the solid lines represent the modified lineage, and dotted lines show the excluded portions of the original lineage.
Using this fixed graph we build a set of equations for a linear program to assign expression values to every remaining node. For every edge pair consisting of the source parent node , and the target child node, , we define the constraint
where and represent the expression value at parent and child respectively, is the increase in expression between parent and child, the decrease. At any leaf node, we set
where is the log of the observed expression at the cell represented by leaf node, .
Our goal is to solve the resulting system of linear equations, minimizing the amount of change between parent and daughter cells by minimizing the total sum of the values and . To allow flexibility in scoring penalties for increases and decreases in gene expression, we create constants and respectively which are able to affect the penalties for each type of change. Therefore, we wish to minimize the expression
Since these constraints can yield multiple optimal solutions, we use the L2-norm find the unique solution, which is the sample mean of all solutions. Therefore, we use a quadratic program solver and solve the minimization problem
where is a small constant we set to . This constant, , is set to be arbitrarily small to ensure the scoring function still primarily minimizes the linear sum of changes. The small constant is included to add an additional, minimal, penalty to find the sample mean. If the constant is set too high, this value could dominate the score. We also chose to set the constants and to 1. That is, we set the penalty for increases and decreases in gene commitment to be equal. To determine whether our results would be affected if we used another scoring function, we analyzed the data using sum of squares and obtained the same general results, indicating that the analysis is robust to type of scoring system used (Supplemental Figure 6). To determine whether expression in the un-annotated cells have a strong effect, we analyzed the data by assigning either a minimum or a maximum expression level to each un-annotated cell. We re-derived the molecular differentiation map and found relatively little effect (Supplemental Figure 6).
Molecular differentiation map:
The results from the commitment algorithm created using the expression values for every gene at every cell division were summed to determine the total amount of asymmetry at each cell division. For every non-leaf node included in the above described graph, , that has two daughter cells and , then the asymmetry, , is defined as
for a given gene, . In order to determine the amount of activity at every bifurcation in the graph, we adjusted the asymmetry values in order to reduce the originating from the measured gene expression values at the leaf nodes. We established noise to be at 500 units, and determine the amount of activity at every location to be the sum of the asymmetry of each gene, normalized by the average commitment values at the two daughters and a factor of 500. Therefore, the total asymmetry, ,is then set to
for the cell division at a given node, , where is the average of the commitment (calculated as described above) for the two daughters.
Molecular Signature Heat Map:
Pairwise asymmetry was computed for molecular signatures of all annotated L1 cells using the molecular differentiation map metric described above. That is, the total asymmetry is computed between every pair of cells in the L1. For every cell pair , the absolute difference between the observed expression values for those cells is computed
where is the observed expression value for gene in cell . As with the molecular differentiation map, we divide this value by the average observed expression value of this pair of cells, , and the baseline noise established at 500 units. By summing over all genes, we get the resulting asymmetry for this pair of L1 cells
The maximal asymmetry between all pairs was identified, and a linear color scale is created.