**VxInsight Validation:
correlation with biological functions**

biogroups vs. mountains |
biogroups vs. random mountains |

here for an
even larger view). |
here for an
even larger view). |

We created 56 lists of genes with similar
biological function (biogroup), such as genes involved in meiosis,
mitosis, translation, DNA synthesis etc. We then counted the number
of genes that overlap in the biogroup with that of the gene
expression mountain. We calculated the probability of seeing the
observed number of overlaps or more by chance (p-value) for each
biogroup-mountain pair assuming a hypergeometric distribution. The
figure on the left shows the overlap p-values for each biogroup with
each mountain. The figure on the right shows the overlap p-values for
each biogroup compared to a randomly constructed mountain of the same
size as the original mountain. Random mountains were constructed by
drawing from the entire set of genes without replacement to fill up
lists of sizes equal to the original set of mountains. The scale
shows the log_{10}(p-value). Here is a list of both the
biogroups
and the mountains.
They are listed in the same order as they appear in the above figures
The biogroups are ordered so that neighbors have similar mountain
profiles (based on their Pearson correlation computed from their
-log_{10} p-values). The mountains are also ordered so that
neighbors have similar biogroup profiles (based on their Pearson
correlation computed from their -log_{10} p-values).

The results show that there is a significantly higher overlap in the real solution (left) compared to a randomly generated solution (right). The figure demonstrates that clustering using expression data strongly correlates with grouping based on biological function. These results validate that there are strong biological patterns in the expression data, and that the clustering algorithm used by VxInsight can successfully identify at least some of these patterns.