Till KTH:s startsida Till KTH:s startsida

Unravel motifs in UTRs and introns

The image displays a sequence logo, a visualization of a motif, for the region around the translation start site of genes in E. coli. As hinted by this visualization, the positions in front of the start codon has a interesting bias towards G and A. In this project, you will study what this pattern looks like in the human genome.

Some students have found this project harder than it sounds because there is a lot of data to work with.

Questions

What are the sequence logos for the regions before and after

  • the translation start site?
  • the beginning and end of the first intron (for those genes that have at least one intron)?

Note that this project involves gathering the data and preparing it, which in this case involves (at least) to align sequences before trying to get a sequence logo.

Data

You will have to extract the data you need from, e.g., Ensembl's BioMart.

Tools

I suggest you create the sequence logos using the online WebLogo system, but there might be other easy ways of making them.