The Transcription Factor Database

 

What is it?

How do I search it?

What kind of output do I get?

Are there any other Web sites?


The Transcription Factor Database is a database of the DNA recognition sequences for eukaryotic and prokaryotic sequence-specific transcription factors

Here is an example of some entries in the TFD database.

Name - Sequence - Comments..

UAS(G)-pMH100 CGGAGTACTGTCCTCCG ! J Mol Biol 209: 423-32 (1989)

TFIIIC-Xls-50 TGGATGGGAG ! EMBO J 6: 3057-63 (1987)

HSE_CS_inver0 CTNGAANNTTCNAG ! Cell 30: 517-28 (1982)

ZDNA_CS 0 GCGTGTGCA ! Nature 303: 674-9 (1983)

GCN4-his3-180 ATGACTCAT ! Science 234: 451-7 (1986)

 

This format is used by GCG for any pattern matching program. For example, the restriction enzyme and Prosite files are in this format. This means that you can use programs like Map and Motifs and Findpatterns to search this database.

There isn't much annotation in this database. If you need to find out more about these sites, you have to look up the original literature reference. You can also go to the main TFD gopher site and search for the name of the factor. Then you can get a bit more info.


Searching the TFD

You can use GCG programs like Map, MapSort, Motifs and Findpatterns to search this database.

The TFD database is stored in the genrundata subdirectory on PMGM. When the program asks what database to use. Type in.

Program

What to type in

Map

map -data=genrundata:tfd.dat

Motifs

motifs -data=genrundata:tfd.dat

FindPatterns

findpatterns -data=genrundata:tfd.dat

You can copy this file to your own directory using Fetch, and modify the database. You could also use Findpatterns to search for a specific sequence pattern.


Output

Here is an example of the output you will get using the Map program

 (Linear) MAP of: test.seq  check: 2851  from: 1  to: 117
 
 REFORMAT of: test.seq  check: 2851  from: 1  to: 117  April 3, 1997 11:06
 (No documentation)
 
 Using Enzyme data from: genrundata:tfd.dat  FileCheck: 2301
 
This file is a composite from the following datasets:
TFD (release 7.6) SITES dataset file, 2/97
Transfac (release 3.1) SITES dataset selected entries, 3/97
References:     Nucleic Acids Res 21, 3117-8 (1993).
                Nucleic Acids Res 24, 238-41 (1996).
                In Transcription Factors:  Essential Data (Chichester UK:  J Wil
ey and Sons),
 
 
 With 5092 enzymes: *
 
                             December 23, 1997 22:04  ..
 
                                                    LyF/Ikaros_site
         C/EBP_CS1         GATA-1_CS2   REB1-consensus            |
                 |                  |                |            |
         ATTACCCCAGAGATTCACCAGAGATTCCAGATACCAGAGACTACCCATTTACCCGAGGGG
       1 ---------+---------+---------+---------+---------+---------+ 60
         TAATGGGGTCTCTAAGTGGTCTCTAAGGTCTATGGTCTCTGATGGGTAAATGGGCTCCCC
 
                                             STE6.2
                                          UBP1_RS |
                                    GAGA-en     | |
                        GAGA_box/CT_element     | |
                      NIT2-niaD-niiA_(1)  |     | |
                            Knirps_site|  |     | |
   AP-2_CS4              GATA-1_CS2   ||  |     | |
          |                       |   ||  |     | |
         GAAAAAAAAATTAGACCCCAGGATTTAGATACCCAGAGAGAGATTTACACCATATTA
      61 ---------+---------+---------+---------+---------+------- 117
         CTTTTTTTTTAATCTGGGGTCCTAAATCTATGGGTCTCTCTCTAAATGTGGTATAAT
 
 Enzymes that do cut:
 
 C/EBP_CS1  UBP1_RS AP-2_CS4   STE6.2  GAGA-en GATA-1_CS2 NIT2-niaD-niiA_(1)
 REB1-consensus GAGA_box/CT_element Knirps_site LyF/Ikaros_site

 

To find out more about the different Transcription factors, you can search for information about them. For example, to get information about the "C/EBP_CS1" site, you need to type in the commands in bold text

pmgm:~ 58% to genrundata
/gcgv9/gcgcore/data/rundata
pmgm:/gcgv9/gcgcore/data/rundata 59 % grep C/EBP_CS1 tfd.dat
C/EBP_CS1                      0 TKNNGYAAK                                     0
 ! C/EBP                     Genes Dev 1: 133-46 (1987)

This will move you over to the directory that has the TFD database, and then you use the "grep" command to search through the database for the name "C/EBP_CS1". The result you get back will be the literature reference that talks about that site. You now need to visit the library.

You can also go to the main TFD gopher site and search for the name of the factor. Then you can get a bit more info.


Other Sites with TFD information

Searching for Regulatory Elements with GCG

GCG's TFD Site

TRANSFAC Database

Pattern Searching Techniques