VxInsight Validation: correlation with biological functions 

biogroups vs. mountains

biogroups vs. random mountains

click on the image for a larger view
(click
here for an even larger view).

click on the image for a larger view
(click
here for an even larger view).

 

We created 56 lists of genes with similar biological function (biogroup), such as genes involved in meiosis, mitosis, translation, DNA synthesis etc. We then counted the number of genes that overlap in the biogroup with that of the gene expression mountain. We calculated the probability of seeing the observed number of overlaps or more by chance (p-value) for each biogroup-mountain pair assuming a hypergeometric distribution. The figure on the left shows the overlap p-values for each biogroup with each mountain. The figure on the right shows the overlap p-values for each biogroup compared to a randomly constructed mountain of the same size as the original mountain. Random mountains were constructed by drawing from the entire set of genes without replacement to fill up lists of sizes equal to the original set of mountains. The scale shows the log10(p-value). Here is a list of both the biogroups and the mountains. They are listed in the same order as they appear in the above figures The biogroups are ordered so that neighbors have similar mountain profiles (based on their Pearson correlation computed from their -log10 p-values). The mountains are also ordered so that neighbors have similar biogroup profiles (based on their Pearson correlation computed from their -log10 p-values).

The results show that there is a significantly higher overlap in the real solution (left) compared to a randomly generated solution (right). The figure demonstrates that clustering using expression data strongly correlates with grouping based on biological function. These results validate that there are strong biological patterns in the expression data, and that the clustering algorithm used by VxInsight can successfully identify at least some of these patterns.