Till KTH:s startsida Till KTH:s startsida

Remote Blast

In this assignment, you will use BioPython to run a Blast job at NCBI.

Data

Download the human CST3 protein sequence.

Preliminaries

  • Figure out what Blast does and how it works.
  • Go to the NCBI Blast web page (find it yourself...) and start a Blast comparison with CST3 against the so-called non-redundant protein database: nr. This means that you comparison should use the "blastp" subprogram (protein query against protein DB).
  • Find the highest scoring hit in mouse
    1. What is the E-value, and what does that mean?
    2. Look at the actual alignment. How alike are the sequences?

Programming

Write a Python program that conducts a Blast search of a given protein sequence against the nr database at NCBI. There is good support for this in BioPython.

Requirements

  • Your program is an executable Python file taking one input: a file containing a protein sequence.
  • Output is the blast report given in XML format, presented to stdout.

To present:

  1. You should be able to give a brief explanation of Blast.
  2. What is the E-value, and what does that mean?
  3. You should understand the online output from a Blast run.
  4. Your code for the remoteblast program.
  5. Demonstrate a successful run of remoteblast.

Example session

Usage of your program should be something like this:


orange-01> ./remoteblast cst3.fa
<?xml version="1.0"?>
<!DOCTYPE BlastOutput PUBLIC "-//NCBI//NCBI BlastOutput/EN" "NCBI_BlastOutput.dtd">
<BlastOutput>
  <BlastOutput_program>blastx</BlastOutput_program>
  <BlastOutput_version>blastx 2.2.6 [Apr-09-2003]</BlastOutput_version>
  <BlastOutput_reference>~Reference: Altschul, Stephen F., et al (1997), "Gapped 
BLAST and PSI-BLAST: a new generation of protein database search~programs",  
Nucleic Acids Res. 25:3389-3402.</BlastOutput_reference>
  <BlastOutput_db>sprot</BlastOutput_db>
  <BlastOutput_query-ID>lcl|QUERY</BlastOutput_query-ID>
  <BlastOutput_query-def>CST3</BlastOutput_query-def>
...

Output trimmed for brevity! Note that "orange-01>" is the commandline prompt.

Lars Arvestad skapade sidan 27 oktober 2016

kommenterade 23 november 2016

We get a 403 error when trying to access the database by importing qblast on the school computers but not on our personal. However, everything works when we copy the NCBIWWW source code into our script.

Has anyone else had the 403 error and managed to solve it?

Lärare kommenterade 25 november 2016

It might be that the installed version of BioPython is too old. NCBI may have changed the underlying protocol. Has anyone successfully managed to run this on a school computer?

En användare har tagit bort sin kommentar