Visa version
Version skapad av Lars Arvestad 2015-11-30 19:02
Visa
< föregående
Jämför
< föregående
Remote Blast
In this assignment, you will use BioPython to run a Blast job at NCBI.
Data
Download the human CST3 protein sequence.
Preliminaries
- Figure out what Blast does and how it works.
- Go to the NCBI Blast web page (find it yourself...) and start a Blast comparison with CST3 against the so-called non-redundant protein database: nr. This means that you comparison should use the "blastp" subprogram (protein query against protein DB).
- Find the highest scoring hit in mouse
- What is the E-value, and what does that mean?
- Look at the actual alignment. How alike are the sequences?
Programming
Write a Python program that conducts a Blast search of a given protein sequence against the nr database at NCBI. There is good support for this in BioPython.
Requirements
- Your program is an executable Python file taking one input: a file containing a protein sequence.
- Output is the blast report given in XML format, presented to stdout.
To present:
- You should be able to give a brief explanation of Blast.
- What is the E-value, and what does that mean?
- You should understand the online output from a Blast run.
- Your code for the remoteblast program.
- Demonstrate a successful run of remoteblast.
Example session
Usage of your program should be something like this:
orange-01> ./remoteblast cst3.fa <?xml version="1.0"?> <!DOCTYPE BlastOutput PUBLIC "-//NCBI//NCBI BlastOutput/EN" "NCBI_BlastOutput.dtd"> <BlastOutput> <BlastOutput_program>blastx</BlastOutput_program> <BlastOutput_version>blastx 2.2.6 [Apr-09-2003]</BlastOutput_version> <BlastOutput_reference>~Reference: Altschul, Stephen F., et al (1997), "Gapped BLAST and PSI-BLAST: a new generation of protein database search~programs", Nucleic Acids Res. 25:3389-3402.</BlastOutput_reference> <BlastOutput_db>sprot</BlastOutput_db> <BlastOutput_query-ID>lcl|QUERY</BlastOutput_query-ID> <BlastOutput_query-def>CST3</BlastOutput_query-def> ... |
Output trimmed for brevity! Note that "orange-01>" is the commandline prompt.