Visa version
Version skapad av Lars Arvestad 2015-11-02 11:18
Blast parsing
Use BioPython to filter the results from a Blast run. You want to be able to specify a substring that will match the accession, and you can assume that you always want to remove any hits with an E-value larger than 1e-20. The input is a Blast report in XML format.
Data
Here is some sample test data. You should however also manage to handle the output of the previous assignment.
Please note: Browsers get confused if you try to look at these XML files in the browser. Right click and download the files directly!
- test1.xml contains 3 hits and none is from mouse.
- test2.xml contains 11 hits, of which one is from mouse.
- Blast result file is in XML format.
Requirements
Write a Python program that fulfills the following.
- Your program is an executable Python file taking two inputs:
- A command line argument with a string that should match "Hit_id", "Hit_def", or "Hit_accession" field in the Blast hit list, e.g., "mouse".
- A file to read input from. The file contains Blast XML data.
Output goes to stdout and is a table with four columns, - Query accession: "BlastOutput_query-def"
- Target accession: Hit_accession
- Score
- E-value
Example session
In this example, the first column has been simplified. The actual accession can be more complicated!
spel-01> ./blastparse MOUSE test1.xml Warning: No hits spel-01> ./blastparse MOUSE test2.xml CST3 CYTC_MOUSE 185.3 3.5e-47 |
To present:
- Your code.
- A demonstration of your code running with the output from the last assignment (a search with CST3). What is the best MOUSE hit you find?