Till KTH:s startsida Till KTH:s startsida

Ändringar mellan två versioner

Här visas ändringar i "Blast parsing" mellan 2015-11-02 11:18 av Lars Arvestad och 2015-11-30 19:02 av Lars Arvestad.

Blast parsing

Use BioPython to filter the results from a Blast run. You want to be able to specify a substring that will match the accession, and you can assume that you always want to remove any hits with an E-value larger than 1e-20. The input is a Blast report in XML format.

Data Here is some sample test data. You should however also manage to handle the output of the previous assignment.

Please note: Browsers get confused if you try to look at these XML files in the browser. Right click and download the files directly!


* test1.xml contains 3 hits and none is from mouse.
* test2.xml contains 11 hits, of which one is from mouse.
* Blast result file is in XML format.
Requirements Write a Python program that fulfills the following.


* Your program is an executable Python file taking two inputs:
* A command line argument with a string that should match "Hit_id", "Hit_def", or "Hit_accession" field in the Blast hit list, e.g., "mouse".
* A file to read input from. The file contains Blast XML data.
An extra smile from the tutor if you can read from stdin if the filename is a dash ("-")! :-)
Output goes to stdout and is a table with four columns,
* Query accession: "BlastOutput_query-def"
* Target accession: Hit_accession
* Score
* E-value
in which only sequences with an id, definition ("description"), or accession matching the input substring are listed. There might be several HSPs from the same target accession, but only the one with lowest E-value should be shown. Example session In this example, the first column has been simplified. The actual accession can be more complicated!

spel-01> ./blastparse MOUSE test1.xml Warning: No hits spel-01> ./blastparse MOUSE test2.xml CST3 CYTC_MOUSE 185.3 3.5e-47 To present:


* Your code.
* A demonstration of your code running with the output from the last assignment (a search with CST3). What is the best MOUSE hit you find?