EDiverge is a version of Diverge with command line control. Diverge measures the percent divergence of two protein coding sequences using the method of Perler and Efstratiadis.
EDiverge makes a codon by codon comparison of two aligned protein coding sequences using the method of Perler and Efstratiadis (Cell 20; 555-566 (1980) (methods: pp. 564-5)). For each nucleotide difference between sequence one and sequence two, EDiverge scores whether it is a type 1, 2, or 3 silent or replacement change (see Perler and Efstratiadis). This score is divided into the possible silent and replacement changes in each category to come up with six percent divergence figures. All of the data is reported so that the values can be assembled into a weighted average.
This GCG program was modified by Jaakko Hattula (Tampere University of Technology, Finland) and Peter Rice (E-mail: firstname.lastname@example.org Post: Informatics Division, The Sanger Centre, Hinxton Hall, Cambridge, CB10 1RQ, UK).
All EGCG programs are supported by the EGCG Support Team, who can be contacted by E-mail (email@example.com).
Below is a session using EDiverge to measure the percent divergence between G and A gamma. SeqEd and Assemble were used to create files of the coding sequences with intervening sequences removed. Gap with the command line option -OUT would have been used if the coding sequences were not perfectly aligned.
% ediverge EDIVERGE uses nucleotide sequence data EDIVERGE of what sequence ? agammacod.seq Start (* 1 *) ? End (* 444 *) ? Reverse (* No *) ? What sequence (* agammacod.seq *) ? ggammacod.seq Start (* 1 *) ? End (* 444 *) ? Reverse (* No *) ? What should I call the output file (* agammacod.diverge *) ? %
Here is all of the output file:
DIVERGE between: agammacod.seq check: 2862 from: 1 to: 444 ASSEMBLE July 27, 1994 11:40 Symbols: 1 to: 92 from: gamma.seq ck: 6474, 7114 to: 7205 Symbols: 93 to: 315 from: gamma.seq ck: 6474, 7328 to: 7550 Symbols: 316 to: 444 from: gamma.seq ck: 6474, 8417 to: 8545 Human fetal beta globins G and A gamma from Shen, Slightom and Smithies, Cell 26; 191-203. . . . and: ggammacod.seq check: 2906 from: 1 to: 444 ASSEMBLE July 27, 1994 11:40 Symbols: 1 to: 92 from: gamma.seq ck: 6474, 2179 to: 2270 Symbols: 93 to: 315 from: gamma.seq ck: 6474, 2393 to: 2615 Symbols: 316 to: 444 from: gamma.seq ck: 6474, 3502 to: 3630 Human fetal beta globins G and A gamma from Shen, Slightom and Smithies, Cell 26; 191-203. . . . July 27, 1994 11:44 .. Possible Silent Actual Silent Percent Silent 1 2 3 1 2 3 1 2 3 82.0 4.0 73.0 0.0 0.0 0.0 0.0 0.0 0.0 corrected: 0.0 0.0 0.0 Possible Replacement Actual Replacement Percent Replacement 1 2 3 1 2 3 1 2 3 4.0 82.0 285.0 0.0 0.0 1.0 0.0 0.0 0.4 corrected: 0.0 0.0 0.4
SeqEd and Assemble create new sequence files from ranges within existing sequence files. When run with the command line option -OUT, Gap creates aligned sequences in files. LineUp allows you to edit multiple sequence alignments. Distances makes a table of the pair-wise distances between the sequences in a multiple sequence alignment.
Sequences one and two must be aligned codon by codon. Gap may locate gaps across codon boundaries. You may want to align the sequences at the peptide level to make sure the nucleic acid alignment makes sense. LineUp lets you adjust alignments manually.
EDiverge calculates the percent and corrected percent divergence for each category of silent or replacement change exactly as described by Perler and Efstratiadis (Cell 20; 555-566 (1980) methods: pp. 564-5) for changes in coding sequences.
All parameters for this program may be put on the command line. Use the option -CHEck to see the summary below and to have a chance to add things to the command line before the program executes. In the summary below, the capitalized letters in the qualifier names are the letters that you must type in order to use the parameter. Square brackets ([ and ]) enclose qualifiers or parameter values that are optional. For more information, see "Using Program Parameters" in Chapter 3, Basic Concepts: Using Programs in the GCG User's Guide.
Minimum syntax: % ediverge [-INfile1=]agammacod.seq -Default Prompted Parameters: -BEGin1=1 -END1=576 Range of interest [-INfile2=]ggammacod.seq Sequence file -BEGin2=1 -END2=576 Range of interest -NOREV1 -NOREV2 Strand of each sequence [-OUTfile=]agammacod.diverge Output file Optional Parameters: None
The files described below supply auxiliary data to this program. The program automatically reads them from a public data directory unless you either 1) have a data file with exactly the same name in your current working directory; or 2) name a file on the command line with an expression like -DATa1=myfile.dat. For more information see Chapter 4, Using Data Files in the User's Guide.
The translation of codons to amino acids, the identification of potential start codons and stop codons, and the mappings of one-letter to three-letter amino acid codes are all defined in a translation table in the file translate.txt. If the standard genetic code does not apply to your sequence, you can provide a modified version of this file in your working directory or name an alternative file on the command line with an expression like -TRANSlate= mycode.txt. Translation tables are discussed in more detail in the Data Files manual.
The parameters and switches listed below can be set from the command line. For more information, see "Using Program Parameters" in Chapter 3, Basic Concepts: Using Programs in the GCG User's Guide.
Usually, translation is based on the translation table in a default or local data file called translate.txt. This option allows you to use a translation table in a different file. (See the Data Files manual for information about translation tables.)
Perler, F., Efstratiadis, A., Lomedico, P., Gilbert, W., Kolodner, R., and Dodgson, J. (1980). The Evolution of Genes: The Chicken Preproinsulin Gene. Cell 20, 555-566.
Printed: April 22, 1996 15:52 (1162)