Exchanging Sequence Data


Each Sequence analysis program has its own format for storing sequence data.

Popular formats include Genbank

So, if you download a sequence from Genbank, a program like GCG can't automatically read the sequence and analyze it. You first have to convert the sequence into a format that your sequence analysis software can understand. The GCG SeqLab Sequence Editor can import a variety of formats as can the SeqWeb Interface to GCG.

Most sequence analysis software has programs that can interconvert sequence data.

One recent program for Windows is SeqVerter from GeneStudio

Another popular program is readseq. Not only can it convert from one database format to another, but it can do document and feature table parsing, which is becoming essential in sequence manipulation.

readseq can convert data from a variety of formats such as

         1. IG/Stanford           10. Olsen (in-only)
         2. GenBank/GB            11. Phylip3.2
         3. NBRF                  12. Phylip
         4. EMBL                  13. Plain/Raw
         5. GCG                   14. PIR/CODATA
         6. DNAStrider            15. MSF
         7. Fitch                 16. ASN.1
         8. Pearson/Fasta         17. PAUP/NEXUS
         9. Zuker (in-only)       18. Pretty (out-only)
 

To start readseq from PMGM or CMGM, just type

readseq

this will run the old version, to run the newer Java version, type

java -cp readseq.jar run [options] input-file(s)

For more details: java -cp readseq.jar help more

 

For more info on Readseq, visit the online help pages. Basic Readseq Help, Advanced Readseq Help.

Part of readseq is built into the GCG SeqLab editor. While in the "Editor" you can import sequences in fasta, GenBank and GDE format, and they will automatically converted using readseq.

You can also use the Web Interface to Readseq

If you are going to send a sequence to someone, and you don't know what format they might be able to interpret, it is best if it is converted to Genbank format. Most all programs can understand or convert this format.

Other popular GCG commands include

If you have a TEXT file, and you want to convert to GCG format,

For more help, check the GCG Help Page by typing

genmanual

after logging into GCG, or search for the command you want.

If you don't see the conversion routine you want, check out the EMBOSS software.

There are a lot of specialized programs for extracting information from databases.

 


Molecular Modeling File Formats

Just as sequence analysis software has several different file formats, the molecular modeling software also has a varieity of formats. The common format is PDB (Protein Databank Format) and this is the format that you find when you download a sequence from the Brookhaven Database. However, this format only contains molecular coordinates and some temperature info, none of the information regarding the display of the molecule such as color, style, orientation, zoom, etc... is contained in the PDB format. Thus, the modeling software has to have a special data format that will contain this info. Many of the modeling programs can read other file formats, but we also have the "babel" software on pmgm, which can read and write in a variety of different molecular modeling file formats.

 

Babel will read the following file types :

 

Babel will write the following file types :


Always check the results of a sequence conversion to see if it has done what you want it to do. Sometimes, the header information is accidentally included in with the sequence information.

For more help with GCG, visit our local GCG help page .

For help with the Intelligenetics file conversion program, check out this page.


XXXX

XXXX

XXXX

XXXX

XXXX

Help

Search