Introduction Program Manual

Go back to top


The EGCG Program Manual contains an entry for each EGCG program and describes that entry in detail. The entries are arranged into sections according to function. This chapter describes the standard format for a program entry. Each entry consists of topic headings like the ones shown in the sample entry below; for example, the FUNCTION and DESCRIPTION topics. The function of each main topic is described in this example, although there may be additional topics for a particular entry in the EGCG Program Manual.


* means the program is new +) means the program has graphics output


Information under this topic heading consists of one or two sentences that describes what the program does. These are the same sentences that appear when you run the program interactively.


All of the major features of the program are described under this topic heading.


Most EGCG programs are written by one or more experienced GCG users with programming experience. This section acknowledges the author, and gives an electronic mail address for support questions and suggestions for further development.


A screen trace for an interactive session with each program is shown under this topic heading. You run the program by typing its name in response to the UNIX % prompt. The bold parts of the screen trace are the responses typed by you. If only part of the response is bold, the bold indicates the fewest characters you can type in to respond correctly. All other non-bold portions of the screen trace are prompts from the program. Any default answers are displayed between asterisks within parentheses; for instance, (* gamma.txt *) in the example below. Most EGCG and GCG programs ask the same questions in the same order.

Here is a typical screen trace where the first 1,000 bases of each strand of the sequence in the file gamma.seq are compared to each other. For all programs, you only need to type in responses when you do not want to use the default answer; you accept the default answer by pressing .

  % typicalprogram
    TYPICALPROGRAM between what sequence 1 ?  gamma.seq
              Start (* 1 *) ?
            End (* 11375 *) ?  1000
           Reverse (* No *) ?  Yes
    and what sequence 2 (* gamma.seq *) ?
              Start (* 1 *) ?
            End (* 11375 *) ?  1000
           Reverse (* No *) ?
    What should I call the output file (* gamma.txt *) ?

Note The examples shown in the Program Manual may vary slightly from what you see on your terminal screen.


Usually, EGCG and GCG programs read one or more text files for input and write their output into a file. You can send the output to your terminal screen, instead of a file, if you answer the output file prompt with Term. If you want to run the program as shown in the example session, you can copy the input file for each sample session into your own directory with the Fetch program.

The contents of the output file from the sample session are shown immediately below the screen trace; the output shown is the actual output from the session. Because the output file is often too long to be completely included in our documentation, we include partial output files. You know if a portion of the output file is omitted when you see a line of slashes (-//////////////////////////).

Note The output shown in the Program Manual may vary slightly from what you see on your terminal screen.


If there are other programs in the EGCG Package or the GCG Package of similar or related function, they are mentioned under this topic heading. Often other programs are useful for preparing your data beforehand or displaying it after a program is run.


All EGCG programs, like all GCG programs, have a maximum sequence size of 350,000. If additional restrictions are known, they are mentioned under this topic heading.


If the algorithm is not obvious, it is described under this topic heading. If a statistic or metric is being determined or maximized, it is also be mentioned here.


Often a program is not the best tool for the job and better tools exist. The best wisdom we have about a program's strengths and weaknesses is described under this topic heading.


Parameter values that give reasonable results are provided under this topic heading to help you get started.


If there are special device requirements other than a VT100 terminal attached to a port they are described under this topic heading. For example, some programs require a graphics display device.


If the input file is not a sequence file, the file used for the example is shown under this topic heading. The Program Manual entry for the Reformat program describes sequence files; sequence specification is described in the Specifying Sequences section of the User's Guide.


All of the EGCG programs allow complete command line control. Programs allowing complete command line control have a topic heading called COMMAND LINE SUMMARY in their documentation, which is located above the LOCAL DATA FILES topic heading. Command line control is described in detail in the Command Line Control section of the User's Guide.


Most EGCG programs are based on the work of others. Information under this topic heading states where the ideas and the code for each program came from. If there are publications on which the program is based, they usually appear here unless they have not been mentioned under previous topic headings. If the program came from GCG but code did not originate at GCG, we try to cite the original code's author here. If you feel your work has not been acknowledged correctly, please complain.


Some programs require nonsequence data. This data is normally read into the program automatically from GCG files that are kept in a public data directory with the logical name "GenRunData". For instance, the mapping programs require the file enzyme.dat, which associates restriction enzyme names with their corresponding recognition sites.

Normally these files are read from the public data directory and you need not provide them or even know about them. But if, for any reason, you wish to use your own data, you can use the Fetch program to copy the public version and use an editor to modify it to suit your needs. Programs read your version of a data file in preference to the public one.

Finding Local Data Files

To find required data files, programs first check the command line for a file specification after the qualifier -DATa=. In cases where more than one data file is required, the files may be named after the qualifiers -DATa1= and -DATa2=, and so on.

If the data file is not specified on the command line, the program checks your current directory. If the program does not find the data file there, it checks a directory with the logical name "MyData", if such a directory exists. The presence of a file with a particular name (the enzyme.dat, in this example) in either of these two directories implies that you want the program to use your data instead of the public data. If a program still cannot find the data file after all this checking, the program then reads data from the public data file provided with the GCG Package. (Note that all the GCG data files are described in the Data Files volume of the Data Reference Set.)

When programs look for data files in the manner described above, the files are referred to as local data files. Local versions of local data files are always optional; you are never required to have one unless the public version is, for some reason, not suitable for your specific needs.

Fetching Local Data Files

If a program reads a local data file, that file is named and described in the second to the last topic of the program's entry in the Program Manual. When you identify a file that could be provided locally, use the Fetch program to copy it. For this example, the expression % fetch enzyme.dat copies the public enzyme data file into your current directory, which you can modify to suit your needs. The file enzyme.dat contains information about format and how the data should be organized.


Parameters that can only be set from the command line are always described in the last topic of each Program Manual entry. These options can only be set from the command line. In addition to the options described for each program, most programs accept the general command line options described below.

Many of the options described below can be permanently set with global switches, which are described in the "Global Switches" chapter in the Short Descriptions section of the User's Guide. Note that command line settings override global settings.


is used to tell the program to suppress all interaction and to use the program's default values for every parameter that is not explicitly defined on the command line.


names a file the program should use as a command line initializing file.

If a command line initializing file is not specified on the command line, then the file whose name is the same as the program and whose file extension is .init is assumed to be a command line initializing file by the corresponding EGCG program.


sets a program to copy any number of file documentation lines you choose. Usually, EGCG and GCG programs copy the first six, non-blank lines of input file documentation into output files. The global switch % doclines sets your process to act as if this optional parameter were always on the command line.


sets a program to copy all of the input file's documentary heading into output files -- including blank lines.


sets an EGCG or GCG program not to ring the terminal bell, even if an error occurs.


sets a program to accept sequences in Staden format. The global switch % seqformat Staden sets your process to act as if this switch were always on the command line.


prints a summary of the available command line parameters and prompts you for any additions you might wish to make to the command line. The global switch % comcheck sets your process to always show this summary and display this prompt.


suppresses the short banner that introduces each program. The global switch % nodocumentation sets your process to act as if this switch were always on the command line.

EGCG graphics programs work in exactly the same way as those in the GCG Package. If you have not used GCG graphics programs before, you should look at the Graphics section of the GCG User's Guide before continuing with this section.

The options described below apply to all GCG graphics programs. Most plotting programs stop plotting if you use C at the terminal keyboard. Some of the switches described below do not work with every graphics device. A platen unit (pu) in the descriptions below is one percent of the length of the vertical axis. The GCG platen always has at least 150 horizontal (X) platen units and 100 vertical (Y) platen units.


writes the plot as a text file of plotting instructions suitable for input to the Figure program instead of drawing the plot on your plotter. The plotting instructions in the text output file can be customized to suit your needs and be plotted out at any time on any graphics device. (See the Figure program in the Program Manual for a description of the plotting instructions in the output file.)

The name of the file can be set by you on the command line or the program makes up a name for it using the name of the program for the file name and .figure for the file name extension.

-NOTEXt or -FASt

suppresses all of the text on the plot. This option can sometimes make plotting faster on devices where character plotting is slow.


draws all text characters on the plot using font 1 (see Appendix I of the Program Manual).


draws the entire plot with the black pen (the pen in stall 1).


sets the line thickness for all of the lines on the plot to 0.5 platen unit. A platen unit is one percent of the vertical height of the platen. Many devices do not support this option.


makes three copies of each page on some laser printers.


draws a grid showing the platen units behind the graph. The first optional parameter sets the grid interval in platen units. If the first optional parameter is negative, the numbering along the bottom axis is suppressed. The second optional parameter sets the grid color.


draws a box or frame on the plot. The first four optional parameters set the position of the box. The fifth optional parameter sets the color. The sixth optional parameter sets the distance between the inner and outer frames. The seventh optional parameter sets the line thickness of the outer frame (on some devices).


If the data points on a line fall outside of the window in which the data are supposed to be represented, most programs will clip the graph at the edge of the window. This switch disables that clipping.

makes a file of PostScript instructions that can be included within a formatted Red document. The Wisconsin Package(TM) must be set to use the PostScript graphics device driver. The name of the file can be set by you on the command line or the program makes up a name for it using the name of the program for the file name and .ps for the file name extension.


GCG graphics programs direct their output to a port or queue to which the logical name "PlotPort" has been assigned. This option lets you direct graphics output to a different port or to a disk file.


lets you choose a pen speed between 1.0 and 10.0 to achieve higher quality plots, for those x-y plotters that allow pen speed selection, so that you can trade speed for quality. Note that 1.0 is the slowest pen speed available and 10.0 is the fastest; the default is usually 10.0.


advances to the second and all subsequent pages of the plot automatically, on plotters equipped with automatic paper feed. The first page must be loaded in the usual manner. Plotters equipped with automatic page feeding must usually be set up locally to enable this feature. For example, the HP7550 must have the auto-feed button pushed and must have paper in the feed tray.

If your plotter is queued or if you are writing the plotting instructions into a file, then the -AUTOFeed option is automatically in effect -- you do not need to use this option.


A few x-y plotters (like the HP7550) let you draw a second picture over the top of an existing plot even if the plotter would normally unload the paper automatically after plotting each page. This option directs such plotters to keep the existing page on the platen so that you can draw the output from a second session on top of the plot from this session. Queued devices and laser printers do not support this option. In addition, this option doesn't work with the X Windows driver.


sends the plot to a graphics output device attached to the terminal's printer port. Many devices (for instance, printers without keyboards) are always configured this way if you have them attached to the terminal, and this switch is unnecessary.

There are a few plotters, however, that can be attached to your terminal either between the computer and terminal or behind the terminal on the terminal's pass-through printer port. For such plotters attached behind the terminal, this switch insists that your printer port should be turned on before any instructions are sent to the plotter.

With the options described below you can expand or reduce the plot (zoom), move it in either direction (pan), or rotate it 90 degrees (rotate).


allows you to make the plot larger or smaller by setting a scaling factor that is normally 1.0 to some number larger or smaller than 1.0. A scaling factor of 0.5 for each axis causes the outside dimensions of the plot to be half as long, thus making the plot one-fourth as large.

-XSCAle=0.7 -YSCAle=0.8

lets you set the scaling factor independently for each axis if you want to change the aspect ratio of the plot. The parameters above compress the vertical dimension by 30 percent and the horizontal dimension by 20 percent.


rotates the plot 90 degrees on the page. (GCG plots are reduced or enlarged automatically to fit on the page.)


moves the plot to the right or left. The parameter 30.0 would move the plot to the right by 30 platen units.


moves the plot up or down. The parameter 30.0 would move the plot up by 30 platen units.

Printed: April 23, 1996 14:05 (1162)