Go back to top



CheckLenComp compares two sorted CheckLen output files, and produces a list of entries from the first file which are not found in the second.


CheckLenComp is one of the programs used to generate the PirOnly database by comparison of SwissProt and Pir entries.

The program does not prompt for values, so everything must be specified on the command line. See the PirOnly documentation for more details of the procedure.


This program was written by Peter Rice (E-mail: Post: Informatics Division, The Sanger Centre, Hinxton Hall, Cambridge, CB10 1RQ, UK).

All EGCG programs are supported by the EGCG Support Team, who can be contacted by E-mail (


Here is a sample session with CheckLenComp

  % checklencomp piro.sorted sw.sorted sw-piro.comp pironlyrest.dat


The main output from CheckLenComp is a file containing a list of unique sequences from the first of the input files. This file can then be used as input to the DataSet program to build the new subset database.

The first output file lists the matched identical sequeces between the two input files.

The example below shows part of the output from a run on the Pir and SwissProt databases.

   PIR entries not in SwissProt January 28, 1993  23:49 ..

       PIR3:A34516 =        SW:KPBA_MOUSE
       PIR2:S02185 =        SW:HEMX_ECOLI
       PIR3:A29501 =        SW:FIBA_MACFU
        PIR1:SMHU2 =         SW:MT2_HUMAN
       PIR2:JQ0234 =        SW:YCR3_ORYSA
       PIR1:GGICE6 =        SW:GLB6_CHITH
       PIR1:XNECGM =        SW:GLMS_ECOLI
       PIR3:JQ0835 =        SW:BXB5_BOMMO
       PIR1:Q1BP87 =          SW:Y18_BPT7
       PIR2:JN0016 =          SW:PERI_RAT
       PIR2:S19418 =        SW:YCZ6_YEAST


The input for CheckLenComp is two sorted output files from CheckLen. The first file contains the entries in the database to be used, the second contains the comparison entries for exclusion of duplicates.


All parameters for this program may be put on the command line. Use the option -CHEck to see the summary below and to have a chance to add things to the command line before the program executes. In the summary below, the capitalized letters in the qualifier names are the letters that you must type in order to use the parameter. Square brackets ([ and ]) enclose qualifiers or parameter values that are optional. For more information, see "Using Program Parameters" in Chapter 3, Basic Concepts: Using Programs in the GCG User's Guide.

  Minimum syntax: % checklencomp -Default
  Prompted Parameters:
  [-INfile=]piro.checklen        Sorted CheckLen file of original database
  [-INfile2=]sw.checklen         Sorted CheckLen file of comparison database
  [-OUTfile=]sw-piro.comp        List of matched identical entries
  [-RESTfile=]pironlyrest.dat    List of unique entries for use as DataSet input

Printed: April 22, 1996 15:52 (1162)