Each Sequence analysis program has its own format for storing sequence data.
Popular formats include Genbank
So, if you download a sequence from Genbank, a program like GCG can't automatically read the sequence and analyze it. You first have to convert the sequence into a format that your sequence analysis software can understand. The GCG SeqLab Sequence Editor can import a variety of formats as can the SeqWeb Interface to GCG.
Most sequence analysis software has programs that can interconvert sequence data.
One recent program for Windows is SeqVerter from GeneStudio
Another popular program is readseq. Not only can it convert from one database format to another, but it can do document and feature table parsing, which is becoming essential in sequence manipulation.
readseq can convert data from a variety of formats such as
1. IG/Stanford 10. Olsen (in-only)
2. GenBank/GB 11. Phylip3.2
3. NBRF 12. Phylip
4. EMBL 13. Plain/Raw
5. GCG 14. PIR/CODATA
6. DNAStrider 15. MSF
7. Fitch 16. ASN.1
8. Pearson/Fasta 17. PAUP/NEXUS
9. Zuker (in-only) 18. Pretty (out-only)
To start readseq from PMGM or CMGM, just type
readseq
this will run the old version, to run the newer Java version, type
java -cp readseq.jar run [options]
input-file(s)
For more details: java -cp readseq.jar help
more
For more info on Readseq, visit the online help pages. Basic Readseq Help, Advanced Readseq Help.
Part of readseq is built into the GCG SeqLab editor. While in the "Editor" you can import sequences in fasta, GenBank and GDE format, and they will automatically converted using readseq.
You can also use the Web Interface to Readseq
If you are going to send a sequence to someone, and you don't know what format they might be able to interpret, it is best if it is converted to Genbank format. Most all programs can understand or convert this format.
Other popular GCG commands include
|
What to Type |
What it Does |
|
change from Genbank to GCG format |
|
|
change from Intelligenetics to GCG format |
|
|
convert a file with only sequence info to GCG format |
|
|
change from PIR protein format to GCG |
|
|
change GCG to PIR protein format |
|
|
Converts GCG to FASTA format |
|
|
fromfasta |
Converts FASTA to GCG format |
If you have a TEXT file, and you want to convert to GCG format,
For more help, check the GCG Help Page by typing
genmanual
after logging into GCG, or search for the command you want.
If you don't see the conversion routine you want, check out the EMBOSS software.
There are a lot of specialized programs for extracting information from databases.
Just as sequence analysis software has several different file formats, the molecular modeling software also has a varieity of formats. The common format is PDB (Protein Databank Format) and this is the format that you find when you download a sequence from the Brookhaven Database. However, this format only contains molecular coordinates and some temperature info, none of the information regarding the display of the molecule such as color, style, orientation, zoom, etc... is contained in the PDB format. Thus, the modeling software has to have a special data format that will contain this info. Many of the modeling programs can read other file formats, but we also have the "babel" software on pmgm, which can read and write in a variety of different molecular modeling file formats.
Babel will read the following file types :
Babel will write the following file types :
Always check the results of a sequence conversion to see if it has done what you want it to do. Sometimes, the header information is accidentally included in with the sequence information.
For more help with GCG, visit our local GCG help page .
For help with the Intelligenetics file conversion program, check out this page.