Skip to main content
To KTH's start page

Our music tastes are based on some shared perceptions

Published Nov 18, 2014

Metal heads, jazz purists and folkies may have more in common musically than you thought. A new study at KTH Royal Institute of Technology sheds light on the shared ways in which humans perceive music.

What do we hear when we listen to music? A team of researchers has narrowed the answer down to nine basic features.

What do we really hear when we listen to music? Researchers from Sweden’s KTH Royal Institute of Technology have attempted to close in on the answer by boiling our perception of music down to nine basic elements – or what they call “perceptual features”.

Their findings could help improve computational models that the music industry uses for predicting the individual tastes of listeners.

The nine percepual features of music

In order to predict what we will like, music information retrieval programs need to first understand what we agree on. These nine so-called "perceptual features" of music were proposed by KTH researchers as the basic elements of what we hear when we listen to music.

Speed – The overall speed of the music.

Rhythmic clarity – The pulse of the music. Is it flowing, as in an aria? Or is it firm, as in a dance mix?

Rhythmic complexity – Similar to rhythmic clarity but relates more to the differences in rhythmic patterns.

Articulation - The overall articulation related to the duration of tones in terms of staccato or legato.

Dynamics – Related to the estimated effort of the player, ranging from soft to loud. 

Modality – Whether the music in a major or minor key.

Overall pitch - Overall pitch height of the music, ranging from high to low.

Harmonic complexity – Simple or complex harmonic progressions.

Brightness – dark or light.

So-called music information retrieval (MIR) models combine audio signal processing measurements with analysis of musical elements, which are usually drawn from concepts of music theory and music perception, such as beat strength, rhythmic regularity, meter and mode. The models also include analysis of musical genre (for example, punk, dance, experimental), emotion (sad, happy, tender) and other contextual qualities.

But a big limitation arises from how consistently music is perceived by listeners with different backgrounds and varying familiarity with music, not to mention their individual biases and cultural references.

The reliance on music theory could be one of the weaknesses in the ability of MIR programs to capture any commonalities in perception. Not everyone understands music, yet they know what they like. So what is the basis on which these preferences are formed? 

The researchers at KTH – Anders Friberg, Anton Hedblad, Marco Fabiani and Anders Elowsson – argued that when people listen to music, their brains may rely on an “intermediate analysis layer” where more basic features of the music are naturally perceived.

Getting a group of people to agree on anything having to do with music is – as any dj can attest – not easily done. But by focusing on nine key features of music, the team was able to find some commonalities in perception that could prove useful.

They conducted an experiment in which 20 people listened to 100 ringtones and 110 snippets of film music and then rated what they heard in terms of each of the nine so-called "perceptual features": speed, rhythmic clarity, rhythmic complexity, articulation, dynamics, modality, overall pitch, harmonic complexity and brightness.

Friberg says the nine perceptual features reflect how non-musicians try to understand what they are hearing. For example, instead of rating tempo, which refers to the amount of notes in a given time measure, they chose to use the less complicated concept of speed – which non-musicians associate with movement.

The idea was to see whether the ratings they got from the subjects matched up in any significant way.

The subjects used a Likert scale to evaluate each feature, scoring the music for example somewhere between slow and fast, or soft and loud. 

“If there was a high agreement among any of them, this would then indicate that a given feature corresponds to something relating more closely to the real music perception going on in our brains,” he says.

Friberg says that for the most part, the nine features did generate common agreement among the participants.

“The nine perceptual features work, and the test subjects’ own references and cultural differences do not matter.

“We could take 20 new volunteers, and the result would be the same,” he says.

While Friberg says the study is by no means the final word on music information retrieval, it does offer a path toward a better understanding of human perception, which could result in better and simpler computational models.

“A common description in terms of different features would then be able to describe some aspects of the music that most listeners have in common,” he says. “Thus, we avoid the huge individual differences in preferred music, what is considered good or bad.”

David Callahan/Peter Larsson

For more information, contact Anders Friberg at +46 8 - 790 75 76, +46 70-774 62 87, or afriberg@kth.se.

Read the paper