General purpose UNIX software
Phylogenetic Analysis
Other UNIX software
Free Macintosh Software
Commercial Macintosh & PC Software
(which is available through the Bioinformatics Resource)
WWW versions of Programs
The following databases are available locally
For more information on the GCG versions of these databases, click here.
Here are some Databases on the Net.
There are several different methods you can use to search the databases
|
GCG PROGRAM |
WHAT IT DOES |
WHEN TO USE IT |
|
searches the local databases for keywords, author names, sequence names |
When you want to obtain a specific sequence. |
|
|
retrieves a sequence from the database |
When you have an accession or locus number |
|
|
searches the local databases for similar sequences |
When you have a DNA or protein sequence |
|
|
searches the local or the remote database at NCBI for similar sequences |
When you have a DNA or protein sequence |
|
|
Searches for Protein Motifs |
You want to know the function of a protein |
|
|
Search for sequence patterns or search the TFD |
Look for a specific sequence motif in your sequence |
|
|
Search the databases for a pattern created by profilemake |
Search for sequences that may be similar to your sequence. |
In GCG, the strategy is
You can also search by using the NCBI web site or any of the other database web sites.
|
Search Program |
What it does |
|
Combination of Genbank and the Molecular Biology subset of Medline |
|
|
Search the sequence databases |
The local Decypher search program
|
Advantages |
Disadvantages |
|
Real Fast - Smith Waterman algorithm inserts gaps to optimize sequence alignment- Most Thorough Search |
May not have the most recent databases |
You can use the program DBWatcher on PMGM to schedule automatic searches of the database once a week or so.
If you want to send multiple sequences to a server for searching, visit the BLASTALL web page for instructions on how to do this and interpret the results.
There are several different methods to send a sequence to the National Center for Biotechnology Information. You do not need to send the sequence to EMBL also. Genbank and EMBL exchange information.
For more information about submitting info to Genbank, visit this site.
Each sequence analysis program and database has a different data format than the other. For example, in order to use any sequence file in GCG, it must first be converted into GCG format.
If you use Microsoft Word to edit your sequence file, you should save that file as "TEXT ONLY". Do not save the file in Microsoft Word or Normal format. The sequence analysis programs cannot understand this format.
The easiest way to move a sequence from a Macintosh or a PC into your PMGM account is to use a ftp program. With a Macintosh, you would use the program called Fetch.
For more detailed information about exchanging sequence info. Click here.
The best method to do this is the PileUp or ClustalW program in GCG. For more info on Clustal, and its applicability and limitations, see TIBS Oct98, page 403
Once the alignment has been created by the computer, it is possible to manually edit this alignment using the SeqLab editor or by using LineUp.
This multiple alignment file (msf file) can then be sent to phylogenetic analysis programs, or the alignment can be sent to programs which will let you create a nicer looking display of your multiple alignment.
For example, to nicely display your alignment file
1. run Pretty on PMGM, or
2. run prettybox on PMGM (here is a detailed example), or
3. Convert a GCG MSF file to Excel format
4. The SeqWeb PileUp alignment program gives a nice colored output.
5. MacBox
On PMGM, there is also the X-Clustal program. There is also a Macintosh version of ClustalX available on MacArchives. These programs like to use sequences that have been gathered together into one big "fasta" formatted file. You can't input sequences one at a time. Other Mac and PC alignment programs are also available. These include MacVector and DNAstar (Multalign).
In GCG, there are two main programs for comparing two sequences
GAP - This program does a "global" alignment, and tries to insert gaps to align one sequence completely on top of the other.BestFit - Does a local alignment, and will find the region of the two sequences that has the best alignment.
Programs like fasta can be used for sequence comparison, but they are best used for searching entire databases.
We also have programs like MUMmer that can compare entire genomes. Here's a link to the Web version of MUMmer
GCG Evolutionary Analysis Programs
1. Perform the multiple alignment using Pileup or Clustal
2. Manually adjust the alignment if necessary. It is very important to start with a correct alignment. You can do this by using the GCG program LineUp, or better yet, move the "msf" file over to the SeqLab Editor. Sometimes you have biological knowledge that you need to incorporate into the alignment. For example, you might not want to put a gap in the middle of an alpha helix, but instead, move the gap into a loop or a turn in the protein sequence.
3. Calculate the evolutionary distances using the Distances program.
4. Pass the results of the Distances program over to the GrowTree program.
You can also use parsimony methods using the PAUP programs within GCG
GDE within GCG
You can switch over to the SeqLab editor, and access the Phylip phylogenetic software. You can find all this software under the GDE menu while in the SeqLab Editor.
You can also run the Phylip software separately, but it's a bit more difficult.
Here is a discussion of some potential strategies for doing phylogenetic analysis.
There are several different programs you can use to predict the function of an unknown protein or gene. I suggest using them all.
1. Search the sequence databases
|
Program |
Advantages/Disadvantages |
|
Fast, Most recent database - May not find all sequences of interest |
|
More thorough / Databases may be 2 wks old/ No EST databases |
|
|
Fast - Most thorough search / Database may be 2 months old/ No EST Databases |
2. Search for Motifs within your unknown protein3. Search for Blocks within your unknown protein
4. Identify Search for sequence patterns within your unknown protein
The best strategy is to use the programs together, especially BLAST, Decypher and Identify. Comparing the results from the different programs will give you the best insight into the function of your unknown protein.
Given a genomic sequence, there are several different computer programs that you can use to predict
These programs are not 100% accurate, but they can predict some of the coding regions within a genomic sequence.
There are programs locally, like GCG's Frames and GrailPro (Works in SeqWeb), as well as out on the web. It is suggested that you try a couple of different programs. Grail EXP and GenScan seem to work the best.
On PMGM, from an X-Window terminal, you can run the program X-Grail.
The recommended method is to use MacX, but the alternative methods can be seen on these web pages
Currently we have the following software
Online analysis of expression arrays using Significance Analysis
Don't forget to check out the Stanford Microarray Database
There are several other classes about sequence analysis at Stanford
and also out on The Net.
Also, check out the BioCompanion