FilterOverlap reads the output file from EOverlap and filters out only those overlaps which meet specified values when the alignments are built. Output from GCG's Overlap program may also be used, but only if generated from a self comparison of a single database.
FilterOverlap processes the output from the EGCG program EOverlap (a modified version of GCG's Overlap) and extracts only those candidate overlaps which meet specified values for their actual alignment scores.
This program was written by Peter Rice (E-mail: firstname.lastname@example.org Post: Informatics Division, The Sanger Centre, Hinxton Hall, Cambridge, CB10 1RQ, UK).
All EGCG programs are supported by the EGCG Support Team, who can be contacted by E-mail (email@example.com).
Here is a sample session with FilterOverlap
% filteroverlap FILTEROVERLAP of what file ? overlap.dat What should I call the output file (* overlap.filter *) ? What gap weight (* 0.0 *) ? What gap weight (* 1.0 *) ? What stringency (* 0.80 *) ? Aligning ...........-.. Accepted match "mu:MU10" "mu:MU5", 230 203.3 0.8839 %
The output from FilterOverlap is a revised version of the original input file, with only the accepted overlaps remaining.
OVERLAP of: mu:* to: mu:* Min overlap fraction: 0.80 Min overlap length: 10 Integral width: 3 December 12, 1995 14:02 Filter with Stringency: 0.80 MinOverlap: 10 Integrate; 3 Sequence1 Strand Pos Sequence2 Strand Pos Length Matches Ratio Len1 Len2 .. MU10 + 2 MU5 - 1 230 203 0.88 361 230 ////////////////////////////////////////// MU32 + 6 MU9 - 1 35 35 1.00 40 39
The input file for FilterOverlap is an output file from EOverlap, although an output file from GCG's Overlap for a self-comparison of a GCG database is also suitable.
All parameters for this program may be put on the command line. Use the option -CHEck to see the summary below and to have a chance to add things to the command line before the program executes. In the summary below, the capitalized letters in the qualifier names are the letters that you must type in order to use the parameter. Square brackets ([ and ]) enclose qualifiers or parameter values that are optional. For more information, see "Using Program Parameters" in Chapter 3, Basic Concepts: Using Programs in the GCG User's Guide.
Minimum Syntax: % filteroverlap [-INfile=]test.overlap -Default Prompted Parameters: [-OUTfile=]test.filter Output file for accepted overlaps Local Data Files: -DATa=overdna.cmp Comparison matrix for overlap testing Optional Parameters: -ALIGNfile=test.align File to contain accepted alignments -REJALIGNfile=test.rejalign File to contain rejeted alignments -REJECTfile=test.reject File to contain rejected overlaps -ADDINTegrate=0 Show all hits longer than 6 residues -MONitor Show results of each test -SUMmary Show summary statistics
The files described below supply auxiliary data to this program. The program automatically reads them from a public data directory unless you either 1) have a data file with exactly the same name in your current working directory; or 2) name a file on the command line with an expression like -DATa1=myfile.dat. For more information see Chapter 4, Using Data Files in the User's Guide.
This program uses the local data file overdna.cmp as the comparison matrix. THis matrix scores 1.0 for a match, -5.0 for a mismatch and 0.1 for any match to "N" or "X" (a low value so that these show up with ":" in the display of sequence alignments).
The parameters and switches listed below can be set from the command line. For more information, see "Using Program Parameters" in Chapter 3, Basic Concepts: Using Programs in the GCG User's Guide.
includes a further 2 diagonals in the alignment, in addition to the number summed together by EOverlap
specifies a file to contain the actual alignments used in calculating the scores for accepted overlaps.
specifies a file to contain a list of candidate overlaps rejected by the specified criteria.
specifies a file to contain the actual alignments used in calculating the scores for rejected overlaps.
shows results of each test on the screen
shows run statistics (overlaps tested, overlaps accepted) on the screen.
Printed: April 22, 1996 15:53 (1162)