Database Searching

Version 8.1-UNIX

Printed: April 22, 1996 15:51

Output from (T)Fasta can be screened for significance. TWordSearch searches can compare a protein sequence to the nucleotide databases. The EQuickSearch program can run far faster with far smaller memory requirements, and output can be screened for the best hits using QuickMatch.


NewFetch copies GCG sequences or fragments or data files from the GCG database into your directory or displays them on your terminal screen and allows the user to specify a sequence range.


FastaCheck selects significant alignments from a (T)Fasta output file.

TWordSearch (+)

TWordSearch identifies DNA sequences similar to a protein query sequence using a six frame translation of the database and a Wilbur and Lipman-style search. The output is a list of significant diagonals whose alignments can be displayed with TSegments.


TSegments aligns and displays the segments of similarity found by TWordSearch.


EQuickSearch rapidly identifies places where query sequence(s) occur in a nucleotide sequence database. The output is a file of overlaps that can be displayed with QuickMatch or EQuickShow. You can make up your own sequence database or use GenEMBL, which consists of GenBank and those sequences in EMBL that are not represented in GenBank (or the other way around at some sites).

QuickMatch (+)

QuickMatch displays the overlaps found by EQuickSearch with either optimal alignments or dot-plots. The alignments can be selected by quality to discard poor matches. The dot-plots can be reviewed rapidly with a graphic screen.


EQuickIndex builds hash tables from sequence(s) in data libraries, and stores them as map sections. These tables make up the database that is searched by EQuickSearch.


StsSearch looks for primer pairs in a set of sequences.


RFindPatterns identifies sequences that contain short patterns like GAATTC or YRYRYRYR. You can define the patterns ambiguously and allow mismatches. You can provide the patterns in a file or simply type them in from the terminal. The output is a series of files called r1.rfind, r2.rfind, and so on, each containing a single extracted sequence. These can be fed through Pileup or manipulated in other ways.

PatternPlot (+)

PatternPlot produces a graphical representation of the results of GCG's FindPatterns program.