Till KTH:s startsida Till KTH:s startsida

Blast parsing

Use BioPython to filter the results from a Blast run. You want to be able to specify a substring that will match the accession, and you can assume that you always want to remove any hits with an E-value larger than 1e-20. The input is a Blast report in XML format.

Data

Here is some sample test data. You should however also manage to handle the output of the previous assignment.

Please note: Browsers get confused if you try to look at these XML files in the browser. Right click and download the files directly!

Requirements

Write a Python program that fulfills the following.

  • Your program is an executable Python file taking two inputs:
    1. A command line argument with a string that should match "Hit_id", "Hit_def", or "Hit_accession" field in the Blast hit list, e.g., "mouse".
    2. A file to read input from. The file contains Blast XML data.
    An extra smile from the tutor if you can read from stdin if the filename is a dash ("-")!  :-)
  • Output goes to stdout and is a table with four columns,
    1. Query accession: "BlastOutput_query-def"
    2. Target accession: Hit_accession
    3. Score
    4. E-value
    in which only sequences with an id, definition ("description"), or accession matching the input substring are listed. There might be several HSPs from the same target accession, but only the one with lowest E-value should be shown.

Example session

In this example, the first column has been simplified. The actual accession can be more complicated!


spel-01> ./blastparse MOUSE test1.xml
Warning: No hits
spel-01> ./blastparse MOUSE test2.xml
CST3    CYTC_MOUSE   185.3    3.5e-47

To present:

  1. Your code.
  2. A demonstration of your code running with the output from the last assignment (a search with CST3). What is the best MOUSE hit you find?

Lars Arvestad skapade sidan 27 oktober 2016

kommenterade 2 december 2016

Even though I can access alignment.hit_def and alignment.hit_id, I'm having trouble finding the Hit_accession. Is it the same as alignment.accession? It does not seem so, as it is a number, while the example output from test2.xml is CYTC_MOUSE. Is the example output correct?   

Lärare kommenterade 2 december 2016

That number is also an accession, but use the definition line instead.