EConsense computes consensus trees by the majority-rule consensus tree method, which also allows one to easily find the strict consensus tree. This program can be used as the final step in doing bootstrap analyses for many of the methods in the package.
EConsense is a modified version of the PHYLIP version 3.572c's CONSENSE, by Joseph Felsenstein, with command line control added.
EConsense reads a file of computer-readable trees (in standard nested-parenthesis notation) and computes the consensus tree by the majority-rule consensus tree method (Margush and McMorris, 1981). The input file is produced by many of the tree estimation programs (EProtpars, EDnaPars, EDnaML, EDnaMLK, ENeighbor, EFitch and EKitsch) when the user selects the "multiple data sets" option. Basically the consensus tree consists of monophyletic groups that occur as often as possible in the data. If a group occurs in more than 50% of all the input trees it will definitely appear in the consensus tree. This program can be used as the final step in doing bootstrap analyses (or jackknife, or permutation test).
This program was originally written by Joe Felsenstein (E-mail:firstname.lastname@example.org. Post: Department of Genetics, University of Washington, Box 357360, Seattle, Washington 98195-7360, U.S.A.)
This version was modified for inclusion in EGCG by Maria Jesus Martin (E-mail: email@example.com; Post: EMBL Outstation Hinxton, The European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SQ or E-mail: firstname.lastname@example.org; Post: Tecnologia para Diagnostico e Investigacion, Condes de Torreanaz 5, 28028 Madrid).
All EGCG programs are supported by the EGCG Support Team, who can be contacted by E-mail (email@example.com).
Here is a session with EConsense
% econsense ECONSENSE of what tree file ? fos.trees What should I call the output file (* fos.econsense *) ? Treat trees as rooted (* No *) ? OutGroup root (* No *) ? Print out the sets of species (* Yes *) ? Print out tree (* Yes *) ? Write out trees onto tree file (* Yes *) ? Output written into fos.econsense Tree also written into fos.treefile %
The input file contains a series of trees in standard nested-parenthesis notation, which is produced by many of the tree estimation programs . Each tree starts on a new line and it can have a weight, which is a real number and is located in comment brackets "[" and "]" just before the final ";" which ends the description of the tree.
Here is the input file for the example session.
((((FOSBMOUSE,(FOSBSTAEP,FOSXMSVFR)),((FOSHUMAN,FOSRAT),(FOSMSVFB,FOSMOUSE))),FOSCHICK),FOSAVINK)[0.5000]; ((((FOSBMOUSE,(FOSBSTAEP,FOSXMSVFR)),(FOSHUMAN,(FOSRAT,(FOSMSVFB,FOSMOUSE)))),FOSCHICK),FOSAVINK)[0.5000]; ((((FOSBMOUSE,(FOSBSTAEP,FOSXMSVFR)),(FOSHUMAN,(FOSRAT,(FOSMSVFB,FOSMOUSE)))),FOSCHICK),FOSAVINK); ((((FOSBMOUSE,(FOSBSTAEP,FOSXMSVFR)),(FOSHUMAN,(FOSRAT,(FOSMSVFB,FOSMOUSE)))),FOSCHICK),FOSAVINK); ((((FOSBMOUSE,(FOSBSTAEP,FOSXMSVFR)),(FOSHUMAN,(FOSRAT,(FOSMSVFB,FOSMOUSE)))),FOSCHICK),FOSAVINK); (((FOSBMOUSE,(FOSBSTAEP,FOSXMSVFR)),((FOSHUMAN,(FOSRAT,(FOSMSVFB,FOSMOUSE))),FOSCHICK)),FOSAVINK);
A complete output file consists of a list of the species, a list of the subsets that appear in the consensus tree, a list of those that appeared in one or another of the individual trees but did not occur frequently enough to get into the consensus tree, followed by a diagram showing the consensus tree. The lists of subsets consists of a row of symbols, each either "." or "*". The species that are in the set are marked by "*". Every ten species there is a blank, to help you keep track of the alignment of columns. The order of symbols corresponds to the order of species in the species list.
Here is the output file from the example session.
EConsense of pileup.trees. September 16, 1996 15:21 Species in order: FOSB_MOUSE FOSB_STAEP FOSX_MSVFR FOS_HUMAN FOS_RAT FOS_MSVFB FOS_MOUSE FOS_CHICK FOS_AVINK Sets included in the consensus tree Set (species in order) How many times out of 5.00 ...****.. 5.00 .**...... 5.00 ...****** 5.00 .....**.. 5.00 ....***.. 4.50 .......** 4.00 Sets NOT included in consensus tree: Set (species in order) How many times out of 5.00 ...*****. 1.00 ...**.... 0.50 CONSENSUS TREE: the numbers at the forks indicate the number of times the group consisting of the species which are to the right of that fork occurred among the trees, out of 5.00 trees +----FOS_MOUSE +--5.0 +--4.5 +----FOS_MSVFB ! ! +--5.0 +---------FOS_RAT ! ! +--5.0 +--------------FOS_HUMAN ! ! ! ! +----FOS_CHICK +--5.0 +------------4.0 ! ! +----FOS_AVINK ! ! ! ! +----FOSX_MSVFR ! +-----------------5.0 ! +----FOSB_STAEP ! +-----------------------------FOSB_MOUSE remember: this is an unrooted tree!
PileUp creates a multiple sequence alignment from a group of related sequences using progressive pairwise alignments. It can also plot a tree showing the clustering relationships used to create the alignment. LineUp creates and edits multiple sequence alignments. Pretty displays multiple sequence alignments. Distances creates a table of the pairwise distances within a group of aligned sequences. GrowTree creates a phylogenetic tree from a distance matrix created by Distances using either the UPGMA or neighbor-joining method. You can create a text or graphics output file.
Phylip2Tree displays trees computed with one of the PHYLIP-programs or with EProtPars, EDnaPars, EDnaML, EDnaMLK, ENeighbor, EFitch and EKitsch, in GCG style. ESeqBoot produces multiple data sets from a molecular sequence data set by bootstrap, jackknife, or permutation resampling. EDnaDist computes a distance matrix from nucleic acid sequences, under four different models of nucleotide substitution (Jukes and Cantor (1969), Kimura (1980), Jin and Nei(1990) and a model of maximum likelihood (Felsenstein, 1981)). EProtDist computes a distance measure for protein sequences, using maximum likelihood estimates based on the Dayhoff PAM matrix, Kimura's 1983 approximation to it, or a model based on the genetic code plus a constraint on changing to a different category of amino acid. ENeighbor estimates phylogenies from distance matrix data using the Neighbor-Joining method or the UPGMA method of clustering. EFitch estimates phylogenies from distance matrix data under the "additive tree model" according to which the distances are expected to equal the sums of branch lengths between the species. EKitsch estimates phylogenies from distance matrix data under the "ultrametric" model which is the same as the additive tree model except that an evolutionary clock is assumed. EDnaPars estimates phylogenies from nucleic acid sequences using the parsimony method. EProtPars estimates phylogenies from amino acid sequences using the parsimony method. EDnaML estimates phylogenies from nucleotide sequences by maximum likelihood. EDnaMLK does the same as EDnaML but assumes a molecular clock.
EConsense carries out a family of consensus tree methods called the Ml (M-sub-L) methods (Margush and McMorris, 1981). These include strict consensus and majority rule consensus. Basically the consensus tree consists of monophyletic groups that occur as often as possible in the data. If a group occurs in more than 50% of all the input trees it will definitely appear in the consensus tree.
The tree printed out has at each fork a number indicating how many times the group which consists of the species to the right of (descended from) the fork occurred. Thus if we read in 15 trees and find that a fork has the number 15, that group occurred in all of the trees. The strict consensus tree consists of all groups that occurred 100% of the time, the rest of the resolution being ignored. The tree printed out here includes groups down to 50%, and below it until the tree is fully resolved.
The majority rule consensus tree consists of all groups that occur more than 50% of the time. Any other percentage level between 50% and 100% can also be used, and that is why the program in effect carries out a family of methods. You have to decide on the percentage level, figure out for yourself what number of occurrences that would be (e.g. 15 in the above case for 100%), and resolutely ignore any group below that number. Do not use numbers at or below 50%, because some groups occurring (say) 35% of the time will not be shown on the tree. The collection of all groups that occur 35% or more of the time may include two groups that are mutually self contradictory and cannot appear in the same tree. EConsense include groups that occur less than 50% of the time, working downwards in their frequency of occurrence, as long as they continue to resolve the tree and do not contradict more frequent groups. In this respect the method is similar to the Nelson consensus method (Nelson, 1979) as explicated by Page (1989) although it is not identical to it.
EConsense doesn't carry out any other consensus tree method, such as Adams consensus (Adams, 1972, 1986) or methods based on quadruples of species (Estabrook, McMorris, and Meacham, 1985).
EConsense can be used as the final step in doing bootstrap analyses (or jackknife, or permutation test). See the ESeqBoot documentation.
If the user answers 'yes' to the 'Write out trees onto tree file?' question or uses -TREEFile command-line option, the final tree will be written out onto a file in computer-readable format. The number of times that each group appeared in the input trees will be written after each group. This number is the sum of the weights of the trees in which it appeared, so that if there are trees, ten of them having weight 0.1 and one weight 1.0, a group that appeared in the last tree and in 6 others would be shown as appearing 1.6 times.
All parameters for this program may be put on the command line. Use the option -CHEck to see the summary below and to have a chance to add things to the command line before the program executes. In the summary below, the capitalized letters in the qualifier names are the letters that you must type in order to use the parameter. Square brackets ([ and ]) enclose qualifiers or parameter values that are optional. For more information, see "Using Program Parameters" in Chapter 3, Basic Concepts: Using Programs in the GCG User's Guide.
Minimum Syntax: % econsense [-INfile=]file.trees -default Prompted Parameters: [-OUTfile=]file.econsense output file. -ROOTed treats the trees as rooted. -OUTGroup=1 species used to root the tree. (not available with the ROOTed parameter). -SHOWSets prints the sets of species in the output file. -SHOWTree prints the consensus tree in the output file. -TREEFile prints trees in nested-parenthesis notation in the ".econsense_trees" file.
Adams, E. N. 1972. Consensus techniques and the comparison of taxonomic trees. Systematic Zoology 21: 390-397.
Adams, E. N. 1986. N-trees as nestings: complexity, similarity, and consensus. Journal of Classification 3: 299-317.
Margush, T. and F. R. McMorris. 1981. Consensus n-trees. Bulletin of Mathematical Biology 43: 239-244.
Nelson, G. 1979. Cladistic analysis and synthesis: principles and definitions, with a historical not on Adanson's Familles des Plantes (1763-1764). Systematic Zoology 28: 1-21.
Page, R. D. M. 1989. Comments on component-compatibility in historical biogeography. Cladistics 5: 167-182.
For further information please refer to the "distance.doc" and "consense.doc" files from the PHYLIP (Phylogeny Inference Package) distribution Version 3.57c by Joseph Felsenstein (available by anonymous FTP at evolution.genetics.washington.edu in directory pub/phylip).
Printed: November 15, 1996 11:45 (1162)