Go back to top



EComposition determines the composition of sequence(s). For nucleotide sequence(s), EComposition also determines dinucleotide and trinucleotide content.


EComposition measures the composition of one or a group of sequences. If you specify only one sequence, you can choose a range within the sequence. Lowercase letters are converted to uppercase and counted with their uppercase equivalents. If you specify a group of sequences, EComposition displays the name of each sequence as it finishes the measurement for that sequence.


This GCG program was modified by David Mathog (E-mail: Post: Sequence Analysis Facility, Biology Division, Caltech), and modified for EGCG by Peter Rice (E-mail: Post: Informatics Division, The Sanger Centre, Hinxton Hall, Cambridge, CB10 1RQ, UK).

All EGCG programs are supported by the EGCG Support Team, who can be contacted by E-mail (


Here is a session using EComposition to calculate the molecular weight of sequence gamma.seq.

  % ecomposition -mw
    ECOMPOSITION uses nucleotide sequences
    ECOMPOSITION of what sequence(s) ?  gamma.seq
                Start (* 1 *) ?
                End (* 11375 *) ?
    What should I call the output file (* gamma.composition *) ?
    ECOMPOSITION complete.
   Sequences: 1
Total Length: 11,375
    CPU time: 00.18
   Output file: gamma.composition


Here is part of the output file:

   ECOMPOSITION of: gamma.seq  Check: 6474  from: 1  to: 11,375
   March 19, 1996 15:07
  A: 3,374        C: 2,209        G: 2,496        T: 3,296
  Molecular weight:   3455812.25
                       Other: 0
                       Total: 11,375




You can infer the composition of the bottom strand of a nucleic acid sequence from the composition of the top strand. The -BOTHstrands option measures both strands, but information is lost because G=C and A=T, and so on.


CodonFrequency tabulates codon frequencies for any range of a sequence in a particular reading frame, as opposed to counting all trinucleotides.


If you need to stop this program, use C to reset your terminal and session as gracefully as possible. Searches and comparisons write out the results from the part of the search that is complete when you use C.


You can run this program in the batch queue using a script that we supply. Use Fetch with a file name that starts with this program's name. Modify the file with any text editor so that it specifies the experiment you want to do and queue the script.


See the sections on specifying sequences in Chapter 2, Using Sequences of the User's Guide.


All parameters for this program may be put on the command line. Use the option -CHEck to see the summary below and to have a chance to add things to the command line before the program executes. In the summary below, the capitalized letters in the qualifier names are the letters that you must type in order to use the parameter. Square brackets ([ and ]) enclose qualifiers or parameter values that are optional. For more information, see "Using Program Parameters" in Chapter 3, Basic Concepts: Using Programs in the GCG User's Guide.

  Minimal Syntax: % ecomposition [-INfile=]Primate:* -Default
  Prompted Parameters:
  -BEGin=1 -END=1000              range (for single sequences only)
  [-OUTfile=]primate.composition  output file name
  Local Data Files: None
  Optional Parameters:
  -BOTHstrands  determines composition of both strands of nucleic acids
  -NOCOMmas     removes the commas from the numbers in the output
  -NOMONitor    suppresses the screen monitor showing each sequence
  -NOSUMmary    suppresses the screen summary at the end of the program
  -MW           calculate molecular weight instead
  -RNA          Use U instead of T in calculations
  -DEPhosphorylation  calculates without the 5' phosphate for nucleic acid




The parameters and switches listed below can be set from the command line. For more information, see "Using Program Parameters" in Chapter 3, Basic Concepts: Using Programs in the User's Guide.


measures the composition of both strands of a nucleic acid sequence. Also calculates the mlecular weight for double stranded nucleic acid if the -MW option is used.


EComposition normally displays numbers greater than 999 with commas to make them easier to read; for example, the number 1234567 would look like 1,234,567. These commas make the numbers unreadable to a computer. If you are going to use the output file from this program for input to another program, you can suppress the commas with this option.


calculates the molecular weight, and suppresses other output forms. Option -BOTHstrands is needed to force the program to calculate a molecular weight for double stranded DNA.


uses the RNA bases (U instead of T) in molecular weight calculation.


subtracts the 5' phosphate weight from calculated molecular weight values.


This program normally monitors its progress on your screen. However, when you use the -Default option to suppress all program interaction, you also suppress the monitor. You can turn it back on with this option. If your program is running in batch, the monitor will appear in the log file. If the monitor is slowing the program down, suppress it with -NOMONitor.


writes a summary of the program's work to the screen when you've used the -Default qualifier to suppress all program interaction. A summary typically displays at the end of a program run interactively. You can suppress the summary for a program run interactively with -NOSUMmary.

Use this qualifier also to include a summary of the program's work in the log file for a program run in batch.

Printed: April 22, 1996 15:52 (1162)