Till KTH:s startsida Till KTH:s startsida

Ändringar mellan två versioner

Här visas ändringar i "Remote Blast" mellan 2015-11-02 11:18 av Lars Arvestad och 2015-11-30 19:02 av Lars Arvestad.

Remote Blast

In this assignment, you will use BioPython to run a Blast job at NCBI.

Data Download the human CST3 protein sequence.

Preliminaries
* Figure out what Blast does and how it works.
* Go to the NCBI Blast web page (find it yourself...) and start a Blast comparison with CST3 against the so-called non-redundant protein database: nr. This means that you comparison should use the "blastp" subprogram (protein query against protein DB).
* Find the highest scoring hit in mouse
* What is the E-value, and what does that mean?
* Look at the actual alignment. How alike are the sequences?

Programming Write a Python program that conducts a Blast search of a given protein sequence against the nr database at NCBI. There is good support for this in BioPython.

Requirements
* Your program is an executable Python file taking one input: a file containing a protein sequence.
* Output is the blast report given in XML format, presented to stdout.
To present:


* You should be able to give a brief explanation of Blast.
* What is the E-value, and what does that mean?
* You should understand the online output from a Blast run.
* Your code for the remoteblast program.
* Demonstrate a successful run of remoteblast.
Example session Usage of your program should be something like this:

orange-01> ./remoteblast cst3.fa <?xml version="1.0"?> <!DOCTYPE BlastOutput PUBLIC "-//NCBI//NCBI BlastOutput/EN" "NCBI_BlastOutput.dtd"> <BlastOutput> <BlastOutput_program>blastx</BlastOutput_program> <BlastOutput_version>blastx 2.2.6 [Apr-09-2003]</BlastOutput_version> <BlastOutput_reference>~Reference: Altschul, Stephen F., et al (1997), "Gapped BLAST and PSI-BLAST: a new generation of protein database search~programs", Nucleic Acids Res. 25:3389-3402.</BlastOutput_reference> <BlastOutput_db>sprot</BlastOutput_db> <BlastOutput_query-ID>lcl|QUERY</BlastOutput_query-ID> <BlastOutput_query-def>CST3</BlastOutput_query-def> ... Output trimmed for brevity! Note that "orange-01>" is the commandline prompt.