fasta format


The fasta format is a very common format for sequence files.

Here is an example of a fasta formatted file.

>VECTOR32    Synthetic vector sequence #32
ATGAGCGGCGGCCCCATGGGCGGCAGGCCCGGCGGCAGGGGCGCCCCCGCCGTGCAGCAG
AACATCCCCAGCACCCTGCTGCAGGACCACGAGAACCAGAGGCTGTTCGAGATGCTGGGC
AGGAAGTGCCTGACCCTGGCCACCGCCGTGGTGCAGCTGTACCTGGCCCTGCCCCCCGGC
GCCGAGCACTGGACCAAGGAGCACTGCGGCGCCGTGTGCTTCGTGAAGGACAACCCCCAG

 

The main parts of the file are.

  1. First line starts with a >
  2. After the > character, the next 10 characters are the file name
  3. There is only one line of descriptive text
  4. Everything after the first line is sequence

You can also have multiple sequences in one file. If you are using the Clustal multiple sequence alignment program, it wants to input sequences in a fasta file format like this.

>VECTOR32    Synthetic vector sequence #32
ATGAGCGGCGGCCCCATGGGCGGCAGGCCCGGCGGCAGGGGCGCCCCCGCCGTGCAGCAG
AACATCCCCAGCACCCTGCTGCAGGACCACGAGAACCAGAGGCTGTTCGAGATGCTGGGC
>VECTOR33    Synthetic vector sequence #33
ACGAGCGGCGGTCCCATGGGCGCCAGGCCCGGCGGCAGGGGCGCTGCCGCCGTGCAGCAC
ATCATCCCCAGCACCCTGCAGCAGGACCACGAGTACCAGAGGCTGTTCGAGATGCTGGGC
>VECTOR34    Synthetic vector sequence #34
GTGAGCGGCGGCTACTTGGGCGGCAGGCCCGGCGGCAGGGGCGCCCACGCCGTGCAGCAG
CACATCCCCAGCACCCTGCCTCAGGACCACGAGAACCATTTGCTGTTCGAGATGCTGGGT

You can use programs such as tofasta to convert a GCG formatted file to fasta format, or fromfasta to convert a fasta formatted file to GCG format (if the fasta file has multiple sequences, they will be saved as separate GCG files). Readseq will also perform this function. There is more information about converting sequences from one format to another.