Nyhetsflöde

Logga in till din kurswebb

Du är inte inloggad på KTH så innehållet är inte anpassat efter dina val.

Har du frågor om kursen?

Om du är registrerad på en aktuell kursomgång, se kursrummet i Canvas. Du hittar rätt kursrum under "Kurser" i personliga menyn.

Är du inte registrerad, se Kurs-PM för DD2404 eller kontakta din studentexpedition, studievägledare, eller utbilningskansli.

I Nyhetsflödet hittar du uppdateringar på sidor, schema och inlägg från lärare (när de även behöver nå tidigare registrerade studenter).

Januari 2016

Visa tidigare händelser (5)

Lára Kristín Stefánsdóttir kommenterade 6 januari 2016

We are only using symmetric distance (treecompare.symmetric_difference()) to compare the trees, is that enough?

Lärare

Lars Arvestad kommenterade 7 januari 2016

Yes. I don't know what is "only" about that. :-) Distance 0 means the trees are identical.

Radoslaw Sawicki kommenterade 15 januari 2016

Would a multialignment column of AACD mean two letters are unique (C and D) or that there are 3 unique letters (A, C and D)? regarding the noisy column requirement: "at least 50% of amino acids are unique".

Lärare

Lars Arvestad kommenterade 16 januari 2016

The former, C and D are called unique here.

Visa tidigare händelser (2)

Radoslaw Sawicki kommenterade 7 januari 2016

Hi all! If some other desperate person is starting the project just now and needs somebody to work with, PM me.

December 2015

Visa tidigare händelser (4)

Tobias Frick kommenterade 18 december 2015

We were wondering some things regarding this project.

When we have analysed our data we get a really low bit-score (~0.5) for the logo sequences we want to find (eg "GT"). After investigating we found that when we look at the positive strand we get all of our target sequences with a bit-score of 2, whilst the negative strand seem to be random - indicating faulty retrieval of the sequences on the negative strands.

By now we have been stuck on trying to isolate the site sequences for the negative strands a really long time but it is not working. Our main idea so far have been to isolate coordinates by taking the 3' UTR end site position minus the first exon chromosome start position. It would seem we don't really have a full understanding of how the sequence/sequence positions are provided when using ensembl, could you get us some indication or somewhere where we can look it up?

Furthermore, is it necessary to use the negative strand as well? We can see no reason why only investigating the positive strand should infer a bias in the results. On the other hand we guess it is bad practice to exclude data, if it is available.

Tobias Frick kommenterade 19 december 2015

Nevermind! We finally solved it!

Lärare

Lars Arvestad kommenterade 21 december 2015

Bra!

Visa tidigare händelser (3)

Lärare

Lars Arvestad kommenterade 21 december 2015

It is primarily the yes/no classification that I am looking for. The "real tools" are also interested in the cleavage site, while other details are of little interest.

Visa tidigare händelser (9)

Lára Kristín Stefánsdóttir kommenterade 17 december 2015

I keep getting the error: sqlite3.DatabaseError: file is encrypted or is not a database, when I try to query the protdb.sqlite3 file with the sqlite3 module. This happens on my computer as well as the CSC computers. Has anyone encountered this and knows what's up?

Lärare

Lars Arvestad kommenterade 17 december 2015

When you test no the school computers, have you tried to access the data file directly, from /info/appbio10/data/protdb.sqlite3 ?

Lärare

Lars Arvestad kommenterade 17 december 2015

I am getting the same error when I open the database into sqlite3, but it works fine if I 'read' it. (with command ".read" in sqlite3).

Lára Kristín Stefánsdóttir kommenterade 17 december 2015

I managed to do 'Your own database' with sqlite3 just fine without any error, but using this code in python for the same database file:

#!/usr/bin/python

import sqlite3 as lite

con = lite.connect('protdb.sqlite3') #on my own computer
cur = con.cursor()

for row in cur.execute('SELECT * FROM species;'):
print row #indented

returns this error, and I always connect to the file directly on the school computers.

Lärare

Lars Arvestad kommenterade 17 december 2015

I have set up a database which should work: /info/DD2404/appbio15/data/protdb2.sqlite3

Lára Kristín Stefánsdóttir kommenterade 17 december 2015

Great, do you have an online version?

Lärare

Lars Arvestad kommenterade 17 december 2015

This link will live for 5 days: https://transfer.sh/11dlbr/p.db

Lára Kristín Stefánsdóttir kommenterade 17 december 2015

This seems to work, thanks!

Visa tidigare händelser (2)

Yrin Eldfjell kommenterade 11 december 2015

There is no table called gene_stable_id, but a column called gene.stable_id.

Visa tidigare händelser (8)

Yrin Eldfjell kommenterade 7 december 2015

test1.xml and test2.xml are not valid xml files as they have multiple xml declarations. See http://stackoverflow.com/a/20251895

Lärare

Lars Arvestad kommenterade 7 december 2015

Not my fault. Welcome to the world of Bioinformatics.

Yrin Eldfjell kommenterade 7 december 2015

Right. But the Python ElementTree won't deal with it, so it's not really "standard Python". Should we just hack together a solution?

Lärare

Lars Arvestad kommenterade 8 december 2015

No, you should use the BioPython module for parsing Blast output.

Yrin Eldfjell kommenterade 8 december 2015

Yes, of course. How could I forget that BioPython has a module for that... :-)

Yrin Eldfjell kommenterade 8 december 2015

�BioPython (ver 1.63 on Python 2.7.6) chokes on all three example files as well, when using the SearchIO module of BioPython. Errors are along the lines of "cElementTree.ParseError: junk after document element: "

SearchIO works perfectly with the (presumably valid) XML which I myself gathered from the remote NCBI BLAST service.

Using the older Bio.Blast.NCBIXML module seems to work with the example files however, so I'll have to use that. I'm still curious as to the origins of the broken XML files, because I haven't encountered that issue with either online or local BLAST.

Yrin Eldfjell kommenterade 8 december 2015

It seems the example output doesn't match the requirements. Col 3 is specified to be the hit accession, which in this example would be 24130, i.e.

<Hit_accession>24130</Hit_accession>

But the output shows it as "CYTC_MOUSE", which is a part of the Hit_def, which is not in the required output spec.

Visa tidigare händelser (2)

Hugi Aegisberg kommenterade 8 december 2015

Some problems with Biomart downtime right now. Try one of the mirrors:

http://www.ensembl.org/info/about/mirrors.html

Visa tidigare händelser (4)

Kristófer Hannesson kommenterade 23 november 2015

How large should the project groups be? 2-4 people?

Lärare

Lars Arvestad kommenterade 23 november 2015

At most 3, preferably 2.

Hlynur Davíð Hlynsson kommenterade 5 december 2015

I'm looking for a partner for the bioinformatics project. My background is in mathematics and computer science. Send me a message if interested.

November 2015

Visa tidigare händelser (7)

Patrick Bryant kommenterade 30 november 2015

What do you mean with 'create a histogram of all scores for the first Blast result in the file.'? The first blast result - one result - has only one score?

Lärare

Lars Arvestad kommenterade 30 november 2015

What I mean is "for the first query". If the Blast contains results from several queries, you only need to produce a histogram for the first one (which then includes all the suboptimal hits). Does this make sense?

Patrick Bryant kommenterade 30 november 2015

Yeah, I guess so. But there is only one query in the cst3 file? But you want us to handle multiple queries anyways?

Lärare

Lars Arvestad kommenterade 30 november 2015

No, I am saying your code does not need to handle multiple queries. I am making this assignment easier than it perhaps should be. :-)

Patrick Bryant kommenterade 30 november 2015

Ok, but now it does handle multiple queries. I think it says that it should handle multiple, but maybe I'm mistaken.

Lärare

Lars Arvestad kommenterade 30 november 2015

I added a parenthesis which hopefully clarifies the assignment for future generations. But I won't fail you for writing a more general program which solves the assignment.

Visa äldre