Till KTH:s startsida Till KTH:s startsida

Your own database

My First DB

In this exercise you will create and manipulate your own database running under a SQLite3 system running at CSC. SQLite3 is a lightweight database system running without a server. This can be very convenient for single user usage or in read-only setups. A strength with SQLite3 is that it is very easy to set up. It is also quite easy to access an SQLite3 database from within Python and other programming languages, but we will not try that in this assignment.

Read up on SQL on the web, and in particular use the SQL description for SQLite.

Starting up

Start by running the sqlite3 command:


orange-01> sqlite3 db1
SQLite version 3.7.9 2011-11-01 00:52:41
Enter ".help" for instructions
Enter SQL statements terminated with a ";"
sqlite>

You can now access a newly created database under the name "db1", which is also the name of the file in which your database, including data, is located.

Note 1: You can choose whatever filename you choose! "db1" is just there to get you started on something.

Note 2: To get more information on how to start/run SQLite3, try using the command man sqlite3.

Note 3: If you log in to one of our sparc computers (e.g. "my"), then you will not find a working SQLite installation. However, you can easily install SQLite on your own computer and work there.

Loading data

There is a prepared dataset available to you in the file /info/appbio10/data/protdb.sqlite3 (online here if you want to work on your own computer.) Take a peak at the file! What do you see?

To load this data into sqlite3, run the command:


sqlite> .read /afs/nada.kth.se/info/appbio10/data/protdb.sqlite3

Note: If you do this command repeatedly, you will get warnings!

To get help, you use the special command ".help". How does the .schema command work?

Database schema

  1. Figure out what the schema of the loaded database is!

Querying the database

Complete the following assignment by querying the loaded database using SQL.

  1. What species are in the database?
  2. Add another species to the database: Sus scrofa!
  3. What proteins are longer than 1000 aa?
  4. What species are present in family NHR3? Give a full list with full species names using one SQL statement.
  5. How many proteins from each species are present in the database?
  6. How do you change the schema to add information about a protein's structure? You want to store the name for the structure, resolution, and method. Remember that a protein may have several structures published. Make your suggested changes and add some example data that you make up!