Till KTH:s startsida Till KTH:s startsida

Nyhetsflöde

Logga in till din kurswebb

Du är inte inloggad på KTH så innehållet är inte anpassat efter dina val.

I Nyhetsflödet hittar du uppdateringar på sidor, schema och inlägg från lärare (när de även behöver nå tidigare registrerade studenter).

Januari 2016

Lars Arvestad skapade sidan 2 november 2015

Lärare Lars Arvestad ändrade rättigheterna 30 november 2015

Kan därmed läsas av alla och ändras av lärare.
kommenterade 6 januari 2016

We are only using symmetric distance (treecompare.symmetric_difference()) to compare the trees, is that enough?

Lärare kommenterade 7 januari 2016

Yes. I don't know what is "only" about that. :-) Distance 0 means the trees are identical.

kommenterade 15 januari 2016

Would a multialignment column of AACD mean two letters are unique (C and D) or that there are 3 unique letters (A, C and D)? regarding the noisy column requirement: "at least 50% of amino acids are unique".

Lärare kommenterade 16 januari 2016

The former, C and D are called unique here.

 
under HT 2015 appbio15

Lars Arvestad skapade sidan 2 november 2015

Lärare Lars Arvestad ändrade rättigheterna 30 november 2015

Kan därmed läsas av alla och ändras av lärare.
kommenterade 7 januari 2016

Hi all! If some other desperate person is starting the project just now and needs somebody to work with, PM me.

 
December 2015
under HT 2015 appbio15

Lars Arvestad skapade sidan 2 november 2015

Lärare Lars Arvestad ändrade rättigheterna 30 november 2015

Kan därmed läsas av alla och ändras av lärare.
kommenterade 18 december 2015

We were wondering some things regarding this project. 

When we have analysed our data we get a really low bit-score (~0.5) for the logo sequences we want to find (eg "GT"). After investigating we found that when we look at the positive strand we get all of our target sequences with a bit-score of 2, whilst the negative strand seem to be random - indicating faulty retrieval of the sequences on the negative strands.

By now we have been stuck on trying to isolate the site sequences for the negative strands a really long time but it is not working. Our main idea so far have been to isolate coordinates by taking the 3' UTR end site position minus the first exon chromosome start position. It would seem we don't really have a full understanding of how the sequence/sequence positions are provided when using ensembl, could you get us some indication or somewhere where we can look it up?

Furthermore, is it necessary to use the negative strand as well? We can see no reason why only investigating the positive strand should infer a bias in the results. On the other hand we guess it is bad practice to exclude data, if it is available.

kommenterade 19 december 2015

Nevermind! We finally solved it!

Lärare kommenterade 21 december 2015

Bra!

 
under HT 2015 appbio15

Lars Arvestad skapade sidan 2 november 2015

Lärare Lars Arvestad ändrade rättigheterna 30 november 2015

Kan därmed läsas av alla och ändras av lärare.
kommenterade 21 december 2015

In this project, are you looking for a "yes/no" classification, or are you looking for results that predict exactly where the n/h/c regions as well as the cleavage site (C) are?

Lärare kommenterade 21 december 2015

It is primarily the yes/no classification that I am looking for. The "real tools" are also interested in the cleavage site, while other details are of little interest. 

 
under HT 2015 appbio15

Lars Arvestad skapade sidan 2 november 2015

Lärare Lars Arvestad ändrade rättigheterna 30 november 2015

Kan därmed läsas av alla och ändras av lärare.
kommenterade 17 december 2015

I keep getting the error: sqlite3.DatabaseError: file is encrypted or is not a database, when I try to query the protdb.sqlite3 file with the sqlite3 module. This happens on my computer as well as the CSC computers. Has anyone encountered this and knows what's up?

Lärare kommenterade 17 december 2015

When you test no the school computers, have you tried to access the data file directly, from /info/appbio10/data/protdb.sqlite3 ?

Lärare kommenterade 17 december 2015

I am getting the same error when I open the database into sqlite3, but it works fine if I 'read' it. (with command ".read" in sqlite3).

kommenterade 17 december 2015

I managed to do 'Your own database' with sqlite3 just fine without any error, but using this code in python for the same database file:

#!/usr/bin/python

import sqlite3 as lite

con = lite.connect('protdb.sqlite3') #on my own computer
cur = con.cursor()

for row in cur.execute('SELECT * FROM species;'):
     print row #indented

returns this error, and I always connect to the file directly on the school computers. 

Lärare kommenterade 17 december 2015

I have set up a database which should work: /info/DD2404/appbio15/data/protdb2.sqlite3

kommenterade 17 december 2015

Great, do you have an online version?

Lärare kommenterade 17 december 2015

This link will live for 5 days: https://transfer.sh/11dlbr/p.db

kommenterade 17 december 2015

This seems to work, thanks!

 
under HT 2015 appbio15

Lars Arvestad skapade sidan 2 november 2015

Lärare Lars Arvestad ändrade rättigheterna 30 november 2015

Kan därmed läsas av alla och ändras av lärare.
kommenterade 11 december 2015

There is no table called gene_stable_id, but a column called gene.stable_id.

 
under HT 2015 appbio15

Lars Arvestad skapade sidan 2 november 2015

Lärare Lars Arvestad ändrade rättigheterna 30 november 2015

Kan därmed läsas av alla och ändras av lärare.
kommenterade 7 december 2015

test1.xml and test2.xml are not valid xml files as they have multiple xml declarations. See http://stackoverflow.com/a/20251895

Lärare kommenterade 7 december 2015

Not my fault. Welcome to the world of Bioinformatics. 

kommenterade 7 december 2015

Right. But the Python ElementTree won't deal with it, so it's not really "standard Python". Should we just hack together a solution?

Lärare kommenterade 8 december 2015

No, you should use the BioPython module for parsing Blast output.

kommenterade 8 december 2015

Yes, of course. How could I forget that BioPython has a module for that... :-)

kommenterade 8 december 2015

BioPython (ver 1.63 on Python 2.7.6) chokes on all three example files as well, when using the SearchIO module of BioPython. Errors are along the lines of "cElementTree.ParseError: junk after document element: "

SearchIO works perfectly with the (presumably valid) XML which I myself gathered from the remote NCBI BLAST service.

Using the older Bio.Blast.NCBIXML module seems to work with the example files however, so I'll have to use that. I'm still curious as to the origins of the broken XML files, because I haven't encountered that issue with either online or local BLAST.
kommenterade 8 december 2015

It seems the example output doesn't match the requirements. Col 3 is specified to be the hit accession, which in this example would be 24130, i.e.

<Hit_accession>24130</Hit_accession>

But the output shows it as "CYTC_MOUSE", which is a part of the Hit_def, which is not in the required output spec.

 
under HT 2015 appbio15

Lars Arvestad skapade sidan 2 november 2015

Lärare Lars Arvestad ändrade rättigheterna 30 november 2015

Kan därmed läsas av alla och ändras av lärare.
kommenterade 8 december 2015

Some problems with Biomart downtime right now. Try one of the mirrors:

http://www.ensembl.org/info/about/mirrors.html

 
under HT 2015 appbio15

Lars Arvestad skapade sidan 2 november 2015

kommenterade 23 november 2015

How large should the project groups be? 2-4 people?

Lärare kommenterade 23 november 2015

At most 3, preferably 2.

Lärare Lars Arvestad ändrade rättigheterna 30 november 2015

Kan därmed läsas av alla och ändras av lärare.
kommenterade 5 december 2015

I'm looking for a partner for the bioinformatics project. My background is in mathematics and computer science. Send me a message if interested.

 
November 2015
under HT 2015 appbio15

Lars Arvestad skapade sidan 2 november 2015

kommenterade 30 november 2015

What do you mean with 'create a histogram of all scores for the first Blast result in the file.'? The first blast result - one result - has only one score?

Lärare kommenterade 30 november 2015

What I mean is "for the first query". If the Blast contains results from several queries, you only need to produce a histogram for the first one (which then includes all the suboptimal hits). Does this make sense?

kommenterade 30 november 2015

Yeah, I guess so. But there is only one query in the cst3 file? But you want us to handle multiple queries anyways?

Lärare kommenterade 30 november 2015

No, I am saying your code does not need to handle multiple queries. I am making this assignment easier than it perhaps should be. :-)

kommenterade 30 november 2015

Ok, but now it does handle multiple queries. I think it says that it should handle multiple, but maybe I'm mistaken.

Lärare Lars Arvestad ändrade rättigheterna 30 november 2015

Kan därmed läsas av alla och ändras av lärare.
Lärare kommenterade 30 november 2015

I added a parenthesis which hopefully clarifies the assignment for future generations. But I won't fail you for writing a more general program which solves the assignment.