Till KTH:s startsida Till KTH:s startsida

Visa version

Version skapad av Lars Arvestad 2015-11-02 11:18

Visa nästa >
Jämför nästa >

Advanced assignment

Advanced assignment: Controlling Phylip programs

Completing this assignment will raise your grade one step.

The purpose of this assignment is to practice using a couple of advanced standard Python modules, tempfile and subprocess or pexpect. Using such modules, you can use use Python as a glue language.

Write a Python program that reads a protein alignment on Fasta format and runs a Phylip bootstrap analysis. The filename and number of boostrap replicates are supposed to be commandline arguments and the output should be a Newick tree written to stdout.

You can read about the necessary Phylip programs on the web. Note, however, that on Ubuntu computers, you start neighbor with the command


> phylip neighbor

instead of plain "neighbor".

Requirements

  • Phylip does not allow accessions longer than 10 characters. This is a hard limit and Phylip programs misbehave if it is violated. Your program must rename sequences internally so that Phylip works well. The output must have the original names however.
  • Your program should not leave any temporary files laying around! In particular, there should be no file named "infile", "outfile", or similar in the directory where your program was called. Solve this using the tempfile module.
  • A session should look something like:
    
    
> bootstrap small.fa 100
((((horse:100.0,(dog1:100.0,dog2:100.0):100.0):60.0,rat:100.0):100.0,
orang:100.0):100.0,human:100.0);
> bootstrap small.fa
Error: You have to specify the number of bootstraps.

Usage: 
   bootstrap <filename> <number of boostraps>

Hints

  • Use BioPython for reading and writing alignments.
  • All Phylip programs read stdin for instructions. You will therefore have to create input to them dynamically.
  • Use a Python module for controlling subprocesses:
    • We have used the subprocess module to run Phylip programs in the past, but apparently there are some issues with this module (at least using the waitfuntion, which is a good idea).
    • An alternative to subprocess is the pexpect module. Try it if you don't like subprocess!

Test data

To present:

  1. Your Python program, code and test runs
  2. How have you dealt with temporary files?
  3. How have you worked with the subprocess module?