Hoppa till huvudinnehållet
Till KTH:s startsida

Research datasets

NOTE: links below need updating, if you wish to have the data: get in touch!

The datasets used in my publications are generally provided to interested researchers on request. The following datasets have been compiled by me and my colleagues.

Sousta dataset

For the 2016 ISMIR paper with Emmanouil Benetos, we used a set of existing manual transcriptions of performances of Cretan dance tunes, and conducted a note-to-note alignment of 20 recordings. These aligned notations are shared along with the performance recordings on request. For more information on this paper click here.

Microtonal Transcription

This dataset contains a set of microtonal transcriptions of Turkish Makam music, and was used in our recent JASA journal article. So far you can obtain the annotations for instrumental performances, and for vocal performances. Further examples of our proposed transcription algorithm will be made available along with the publication.

Singer Recognition

The dataset that has been used in our Paper about singer recognition contains a collection of 21 different singers, all recordings are available on commercial CD's. The music contained in this collection can be considered very similar, as it all belongs to the Genre of greek Rebetiko music. There are many old recordings contained, that have been digitized from old grammophone records. You can listen here to a sample of a female singer and of a male singer. The annotations that we performed specify the regions of singing voice activity in 82 songs (4 from each singer).

Rhythm Similarity

The dataset that has been used in our Paper about measuring rhythmic similarity contains a collection of 6 different dance styles. These dances are traditional forms of music commonly encountered in the island of Crete. For each dance 30 song excerpts have been collected. The instruments are usually the string instruments Cretan Leira and Cretan Laouto. Listen to examples of the dance Maleviziotis and Pentozalis.

Beat Tracking

  • The first dataset was used in our Paper for beat tracking using phase slope based onset detection. The music in the dataset is of the same style as the music in the rhythm similarity dataset. The dataset was extended for my PhD and contains 41 excerpts of thirty seconds length, along with text files that give the beat times in seconds. These beat times have been determined by myself. Listen to an example of a Cretan Syrtos dance with beat annotation: Syrtos.

  • Together with my former colleagues at INESC TEC in Porto, a corpus of music samples along with beat annotation was compiled, with the goal to contain samples that are challenging for current beat tracking technology. We presented this dataset in this paper, along with an approach to automatically determine the difficulty of an unknown sample. Please refer to the website of the SMC group at INESC to get the data.

Onset Detection

A dataset for onset detection was compiled for our paper on features for note onset detection. It contains 78 short samples from a variety of instruments, both from Turkish and Eurogenetic context. The labels were recently revised by Sebastian Böck (thx!).

A symbolic dataset of Turkish makam music phrases

This dataset was presented during the FMA 2014. For more details and to get the data follow this link.


Profilbild av André Holzapfel

Portfolio