BLAST/ PSI-BLAST/HMM Protein Family Homework Doug Brutlag
For this homework assignment take between 10 and 30 protein sequences from a single, highly-homogeneous protein family and:
1) make an HMM with them using the Decypher computer. The easiest way to do this is to use a single sequence as a query in a Smith-Waterman or BLAST database search of the UniProt/ Swiss-Prot and then to chose 10-30 sequences from the list of similar sequences that are at least 30% identical or better. The proteins should also be nearly the same length (+ or - 10%). If the proteins are not of nearly the same length, then they might differ in the number of domains or be highly variable at the N or C terminal ends. In either case, it is better to find a family that is nearly the same length as well as > 30% identical.
In order to conect to the Decypher Computer you must first set up a VPN connection to the Stanford VPN server. The Decypher computer is behind a firewall and only VPN connections are permitted. Please go to http://vpn.stanford.edu/ to find out how to set up a VPN link for your computer (Windows, Mac or Linux). Be sure to always establish the VPN connection before you try to connect to Decypher.
You must also use the Chrome or FireFox browser (Safari or Internet Explorerer will not work) and choose Web Graphic for Iterated Search output option on the Decypher computer to permit an HMM to be made. Click the ALGN selection buttons (not the HIT buttons) to select your protein family. This will limit the Multiple Alignment and HMM to just the homologous regions in your family. Try to stay in the pink/magenta colored sequences and not include any blue colored sequences.
2) use the HMM to search the Uniprot/Swiss-Prot database for additional members of the family (i.e. not Swiss-Prot/TrEMBL)
3) take one member of your protein family and perform a two or three iteration PSI-BLAST search of Swiss-Prot. You may do this search either on the NCBI BLAST Web Page or on the Decypher Tera-Blast, again with Web Graphic for Iterated Search.
4) take the same member of your protein family and perform a standard BLAST search of Swiss-Prot.
5) Answer the following questions:
(A) Mention how many proteins are in the statistically significant
range in each search.
(B) Are there more statistically significant sequences in the BLAST
search, in the PSI-BLAST search, or in the HMM search.
(C) How homogeneous are the results from each search.
(D) Which search includes multiple protein families.
(E) Which search is best for distinguishing a single highly homogeneous protein family?
SInce the Decypher Computer has been retired, this homework assignment can no longer be performed.
Please include sufficient output from your analyses (copy and paste or screen dumps) to support each of your answers/conclusions.