Till KTH:s startsida Till KTH:s startsida

Visa version

Version skapad av Lars Arvestad 2015-11-30 19:02

Visa < föregående
Jämför < föregående

Remote Blast

In this assignment, you will use BioPython to run a Blast job at NCBI.

Data

Download the human CST3 protein sequence.

Preliminaries

  • Figure out what Blast does and how it works.
  • Go to the NCBI Blast web page (find it yourself...) and start a Blast comparison with CST3 against the so-called non-redundant protein database: nr. This means that you comparison should use the "blastp" subprogram (protein query against protein DB).
  • Find the highest scoring hit in mouse
    1. What is the E-value, and what does that mean?
    2. Look at the actual alignment. How alike are the sequences?

Programming

Write a Python program that conducts a Blast search of a given protein sequence against the nr database at NCBI. There is good support for this in BioPython.

Requirements

  • Your program is an executable Python file taking one input: a file containing a protein sequence.
  • Output is the blast report given in XML format, presented to stdout.

To present:

  1. You should be able to give a brief explanation of Blast.
  2. What is the E-value, and what does that mean?
  3. You should understand the online output from a Blast run.
  4. Your code for the remoteblast program.
  5. Demonstrate a successful run of remoteblast.

Example session

Usage of your program should be something like this:


orange-01> ./remoteblast cst3.fa
<?xml version="1.0"?>
<!DOCTYPE BlastOutput PUBLIC "-//NCBI//NCBI BlastOutput/EN" "NCBI_BlastOutput.dtd">
<BlastOutput>
  <BlastOutput_program>blastx</BlastOutput_program>
  <BlastOutput_version>blastx 2.2.6 [Apr-09-2003]</BlastOutput_version>
  <BlastOutput_reference>~Reference: Altschul, Stephen F., et al (1997), "Gapped 
BLAST and PSI-BLAST: a new generation of protein database search~programs",  
Nucleic Acids Res. 25:3389-3402.</BlastOutput_reference>
  <BlastOutput_db>sprot</BlastOutput_db>
  <BlastOutput_query-ID>lcl|QUERY</BlastOutput_query-ID>
  <BlastOutput_query-def>CST3</BlastOutput_query-def>
...

Output trimmed for brevity! Note that "orange-01>" is the commandline prompt.