Go back to top



PepStats gives a short statistical summary on the composition of a protein sequence and gives the molecular weight and isoelectric point.


PepStats is a smaller and less powerful version of PeptideSort. Its main function is to provide details of protein properties that are only obtainable from PeptideSort by carefully specifying "no enzyme". It can also provide statistics for the longest open reading frame(s) from a file written by Extract.


This program was written by Rodrigo Lopez S. (E-mail:; Post: Biotechnology Centre of Oslo, PO Box 1125 Blindern, N-0317 Oslo 3, Norway).

All EGCG programs are supported by the EGCG Support Team, who can be contacted by E-mail (


Here is a session with PepStats .

  % pepstats
   PEPSTATS uses protein (with stop codons) sequence data
   PEPSTATS of what sequence ?  Sw:Gshr_Human
              Start (* 1 *) ?
              End (* 478 *) ?
   Length = 478
   Peptide Sw:Gshr_Human has 1 peptide(s)
   and the last stop is at position: 479
    What should I call the output file (* gshr_human.stats *) ?


Here is the output file:

   PEPSTATS of: Gshr_Human  check: 4050  from: 1  to: 478
  ID   GSHR_HUMAN     STANDARD;      PRT;   478 AA.
  AC   P00390;
  DT   21-JUL-1986  (REL. 01, CREATED)
  DT   01-NOV-1990  (REL. 16, LAST ANNOTATION UPDATE) . . .
   Continuous From: 1 To: 478   Length: 478
  Summary for whole sequence:
  Molecular weight =   51569.02     Residues =    478
  Average Residue Weight = 107.885     Charge =   1
  Isoelectric point =  7.67
  Residue           Number      Mole Percent
  A = Ala              42             8.787
  B = Asx               0             0.000
  C = Cys              10             2.092
  D = Asp              21             4.393
  E = Glu              29             6.067
  F = Phe              14             2.929
  G = Gly              43             8.996
  H = His              16             3.347
  I = Ile              29             6.067
  K = Lys              34             7.113
  L = Leu              34             7.113
  M = Met              15             3.138
  N = Asn              17             3.556
  P = Pro              24             5.021
  Q = Gln              11             2.301
  R = Arg              17             3.556
  S = Ser              31             6.485
  T = Thr              31             6.485
  V = Val              44             9.205
  W = Trp               3             0.628
  Y = Tyr              13             2.720
  Z = Glx               0             0.000
  Small       (A+G)             85      17.782
  Hydroxyl    (S+T)             62      12.971
  Acidic      (D+E)             50      10.460
  Acid/Amide  (D+E+N+Q)         78      16.318
  Basic       (H+K+R)           67      14.017
  Charged     (D+E+H+K+R)      117      24.477
  Small hphob (I+L+M+V)        122      25.523
  Aromatic    (F+W+Y)           30       6.276


PeptideSort shows the peptide fragments from a digest of an amino acid sequence. It sorts the peptides by weight, position, and HPLC retention at pH 2.1, and shows the composition of each peptide. It also prints a summary of the composition of the whole protein. PepStats is based on PeptideSort.


None known


The input file of PepStats is a protein sequence file.


All parameters for this program may be put on the command line. Use the option -CHEck to see the summary below and to have a chance to add things to the command line before the program executes. In the summary below, the capitalized letters in the qualifier names are the letters that you must type in order to use the parameter. Square brackets ([ and ]) enclose qualifiers or parameter values that are optional. For more information, see "Using Program Parameters" in Chapter 3, Basic Concepts: Using Programs in the GCG User's Guide.

  Minimum syntax: % pepstats [-INfile=]Sw:Gshr_Human -Default
  Prompted parameters:
  -BEGin=1 -END=478            range of interest
  [-OUTfile=]gshr_human.stats  output file name
  Local Data Files:
  [-DATa1=]aminoacid.dat       contains amino acid data
  Optional parameters:
  -MINLen=0                    minimum peptide length
  -NONTERM                     first residue is not the N-terminus
  -NOCTERM                     last residue is not the C-terminus


The files described below supply auxiliary data to this program. The program automatically reads them from a public data directory unless you either 1) have a data file with exactly the same name in your current working directory; or 2) name a file on the command line with an expression like -DATa1=myfile.dat. For more information see Chapter 4, Using Data Files in the User's Guide.

File AminoAcid.Dat contains values for calculation of properties. You can Fetch this file and edit the values. The file is supplied by GCG for use by the program PeptideSort.


The parameters and switches listed below can be set from the command line. For more information, see "Using Program Parameters" in Chapter 3, Basic Concepts: Using Programs in the GCG User's Guide.


specifies a minimum length for peptides to be considered, for example when using the output of EExtractPeptide as an input file.


specifies that the first residue in the input file is not the N-terminus of the protein.


specifies that the last residue in the input file is not the C-terminus of the protein.

Printed: April 22, 1996 15:54 (1162)