0:36

So we'll first present the idea of the music information plane.

And then we will distinguish between sounds and music with the aim of

developing methodologies that are either relevant to the more generic concept of

sound, or are more specific for the characteristics of music.

So we'll first talk about sounds and sound recordings and

then collections of sound recordings.

And then we'll continue with the idea of music recordings and

collections of music recordings.

1:24

So we can dither different abstraction levels and

the left column are these different abstraction levels.

And we can go from the physical level,

which is basically the lowest level that we're dealing with.

And we go up to the cognitive level, okay.

So that would be the highest level that we can see there and some steps in-between.

So at the physical level, when we talk about sounds and music,

we can talk about concepts like the frequency or

the duration of the sound, or the spectrum and

some clear characteristics of the spectrum like the centroid.

And also we can talk about intensity of the sound.

If we go a level higher, a sensorial level,

then instead of frequency we can talk about the pitch of the sound.

And then in search of duration, we can talk about times,

the sensorial time of the duration.

And then instead of taking about spectrum we can talk about timbre

which is a sensorial concept.

And finally in terms of

intensity now we talk about loudness which again is a sensorial concept.

And we can go a level higher and talk about

perceptual topics, or perceptual concepts that are more musical,

that are related to musical concepts.

So in here, we talk about successive and

simultaneous intervals of pitches, what will be called nodes.

And then, we talk about time.

We talk about structuring of time, and we talk about things like the beat.

And then the timbre we talk about aspects of the timbre that we can identify and

characterize with some aspect of a musical sound.

And for example, the spectral envelope would be that.

And then finally instead of loudness,

when we talk about musical loudness we normally refer to dynamics.

And we have vocabulary that talks about the dynamics of musical sounds.

3:47

And we can still go a level higher and go towards the more

formalized way of talking about musical concepts.

And therefore, when we talk about pitch-related concepts,

we talk about things like melody or key or tonality.

When we talk about timing related concepts, so

we talk about rhythmic patterns, we talk about tempo, we talk about meter.

And when we talk about spectral timbre characteristics, we identify musical

instruments or the voices, entities that have a characteristic timbre.

And then finally when we talk about dynamics or loudness we are interested

in direct articulations of the sounds and how sounds change from one to another.

And finally we can reach the highest level, the cognitive level,

the level that relates to as humans in a very subjective way.

And how we listen to music and

4:53

what issues are relevant for us in the interaction with music.

So at this level the columns are not anymore valid.

There is interaction between all these different concepts.

And then we can talk about the emotion or the musical style or

semantic concepts that clearly integrate all these other levels of

descriptions to obtain these concepts.

These ways of describing music that are clearly more generic and

definitely are subjective or cultural.

And that would be a level that definitely would be hard to reach and

in this class we are definitely not going to talk much about that.

5:55

If you want to describe sounds in a generic way,

sounds like the ones we find in free sound, we can group audio features.

The audio features that we talked about in the last lecture and

we can group them in different categories.

So we can talk about the timbre related features and

we mentioned quite a few of them, like the spectral centroid,

or the MFCCs, or the high-frequency content, etc.

Then we can talk about another group of features that relate with dynamics.

And that's basically the loudness and the level of a particular recording,

and then we can talk about the pitch related features.

6:43

And here is where we can talk about the pitch or the pitch salience,

and finally we have to describe also the time varying aspects of it.

Aspects of a sound that relate to the evolution of the sound to the texture

of the sound, and this we can group them under the term morphological features.

And here we can talk about things like the envelope of a sound or the onset rate or

many other type of descriptions that we could include under this.

7:18

We already have seen quite a few of these descriptors, so

what is interesting now is from these descriptors from these features

that we can analyze, we can now talk about collections of sounds.

So let's talk about how to describe collections of sounds, and clearly

there are many ways that we can analyze a collection of sounds and describe it.

And we'll focus on three basic concepts.

The first one and the most important concept that we need to develop is

the idea of similarity If you want to talk about collection of sounds,

we have to talk about the similarity between these sounds so

we can form the year of collections and groups them.

Once we can talk about similarity then we can cluster sounds.

We can groups sounds according to some criteria.

And finally, if we know some classes, some existing labels

that we use to describe a particular group of sounds, then we can classify sounds.

We can assign classes to particular sounds.

8:40

In order to properly describe a sound, we have to use many features.

But for simplicity, we will be taking only, in this case, two features.

So if you consider a sound as represented by two features,

we can display a sound, a set point in a two dimensional space.

And that's what we're seeing here.

Every feature is one dimension.

So here, we're showing two audio features.

The horizontal line is the mean of the spectral centroid.

So we have analyze notes of three instruments, a violin, flute, and trumpet.

We have computed the spectral centroid and we have taken the mean of it.

So this is a multi-frame feature.

And also we have done the mean of one of the MFCC coefficients,

the second coefficient.

9:37

So that's the mean of the second coefficient in the vertical line.

And we can see that the violin has a quite high value for

this coefficient, for the MFCC value, and

it has a centroid that it quite covers quite a bit of space.

The trumpet has this MFCC coefficient quite lower.

So these blue dots are more in the lower side.

And the flute sound is kind of in between and

also the MFCCs are in between.

So that we can kind of see that these types of sounds

are distinct according to these two features.

10:31

Now, in order to play around with the space, the most fundamental

thing is to measure the distance between sounds, between points.

So we have to find a way in a multi dimensional space,

not just in this simple 2D space, how we compare 2 sounds.

How do we find the similarity between the two?

So Euclidian distance is one of the simplest ways to measure

the distance between two points in a multi dimension of space.

So in this case, p would correspond to one sound,

the collection of features of one sound.

And q would correspond to another sound,

the collection of feature values of the other sound.

And then, for every dimension i, we just take the distance between

those two values on that particular feature.

Then we square it and we sum over all

the features of the dimensions, and then we take the square root.

And that's the Euclidian distance.

In the case of 2D space of just 2 features, that becomes much simpler.

So in this case, the red and the blue are two sounds with two features.

And we can just measure this Euclidian distance, and it's basically the line

that separates these two points, the length of this line.

12:12

Now that we know how to measure distance,

we can cluster sound.

K-means is a clustering algorithm.

If we give to the algorithm the desired number of clusters, it will create

the clusters and it will return the mean value of each of these clusters.

K-means clustering aims to partition and observations, so

the observations would correspond to the number of sounds in

two K-clusters, so into K-categories or groups of sounds.

13:24

So this equation expresses these mean immunization process that we

have to go through in the K-means algorithm.

So the goal is to find the mu for every cluster, so

we have this K-cluster that minimizes this overall sum,

so we have to do it sort of holistically

of attaining this overall minimization result.

When here in the plot we see the three steps

in this process of obtaining these clusters.

On the left one, we start from a collection of points.

In fact, this is not sound features.

This is just random points in space.

And the goal is to cluster them according to two clusters.

So we're going to find two clusters.

14:18

So we initialize the algorithm by putting

two points that will be used as the initial means of two clusters.

So the middle diagram, the red and the blue are the two initial means.

And these two initial means, this collection of samples of

sounds get clustered in the way that we see here with the red cluster and

the cyan cluster.

And now, with K-means, we iterate over this

minimization that, this equation that we have here.

And after a certain iteration, it converges,

and it converges to the clustering that we have on the right.

So it has clustered the red dots in the lower left corner and

the cyan dots in the upper right corner.

And clearly, this is a much better clustering than the initial

random clustering that the algorithm started with.

So now, with that, we can have collections of sounds and

automatically find classes that group sounds

that might have a similar audio features.

The last thing that we talked about for

describing sound collections is the classification of sound.

And that means that we know some classes.

We have identified certain categories of sounds.

And what we want to do is given a new sound,

we want to classify to one of these known classes.

So the K nearest neighbors classifier, KNN,

is an algorithm used for this type of classification.

And the rule of that we implement with KNN,

it classifies a sound by assigning to it the class

that is most frequent in the neighbors.

So we find K neighbors, and whatever is the majority vote of those neighbors

then becomes the class of this query or of this new sound.

So this block diagram exemplifies this process,

these set of rules that are implemented in the KNN algorithms.

So we start from a query, okay so that would correspond to a new sound, and

we are starting with target examples.

So we are starting with collection of samples, of sounds that have a label.

For example, in the diagram below, we have two such collections,

label collections, the blue and the red ones.

And the cyan dots are our query.

So we have to label or assign these query samples to one of these two collections.

So what we do is we measure the distance with the Euclidean distance.

We measure from every query sample to all the neighbors, okay?

And we take the K top results.

So we only look at the K nearest neighbors.

17:51

And from those what we do is we take a majority vote

based on the classes they belong to.

So the last box is basically we know the classes that the neighbors belong to.

And we take the majority of the vote and we assign the class that is the majority.

So, on the right diagram, we see the result.

So the cyan dots have been assigned a color.

So some have been assigned the blue class,

and the rest have been assigned in the red class.

So this is a very simple but quite efficient way

to classify sounds or, of course, any other type of data in to classes.

If we now go to musical sounds, recordings of pieces of music.

The features to be analyze should be more specific and

more related to musical in meaningful concepts.

So let's start by defining some categories or features or

descriptors that are relevant musically.

So we can talk about timbre related descriptors.

And things that we mentioned like instrument characterization or

instrumentation characterization, or even the remixing of

musical recordings that is also an important feature of music.

19:21

Then another category would be related to melody and harmony.

And that includes things like the phrase, the motive, or

the tonic of the piece of music and even if we talk about non-western music

traditions like Indian music tradition, we talk about raga, or

like in the Turkish music tradition, we talked about makam.

So, these are melodic concepts that can be described and

that are important to characterize particular piece of music.

Then we can talk about rhythm.

And then again we talk about patterns.

Or we can talk about tempo.

Or we can talk about beat.

20:24

These descriptions cannot be obtained by just performing audio analysis.

We normally start from audio features, but then we have to develop models

from a combination of features that can capture the essence of each concept,

and clearly this is beyond the aim of this class.

And this is very much an open research area, very active and

that hopefully we'll be evolving through the years and

we'll be able to eventually do things like this.

21:37

And then, the concept that we talk about sounds, also apply but

they have to be adapted here so similarities of fundamental concepts.

But then we have to divide or we can find different facets of the similarity and

we can talk about rhythmic similarity,

we can talk about similarity of the instrumentation,

of the melodic aspects or the harmonic aspects structural similarity.

And then, of course, we could combine them in order to find similar songs.

And these types of similarity are clearly not Euclidean distances.

We have to develop similarities that are much more sophisticated.

And then we can classify and

cluster these pieces of music according to different criteria.

The classification for example, can be classified according to genre,

or style, or artist, or the school that the music tradition comes from.

Again, this is much beyond what we can cover in this class.

That is a fascinating topic that is a natural continuation of

the kinds of things we talked about.

23:17

And then for the more specific things that we have talked about,

you can look at the specific entries for Euclidean distance or for

the K-means clustering has a good entry in Wikipedia, or

the concept of classification base on K-nearest neighbors.

Of course, these are just two examples of different clustering and

classification strategies.

There's a lot of different strategies coming from the field of machine learning

that has brought many new possibilities to do these type of tasks.

And that's all, so in this lecture, we have opened the door into

a huge research field that aims at automatically describing and

organizing large collection of sounds and music recordings.

We just introduced some of the basic concepts and

specific methodologies that can be used to start working on this topic.

In the programming lectures,

we will show a little bit examples of how to actually do some of this.

That clearly we cannot make justice to this field of research.

However, I hope you got a taste of it.

And I will see you next class.

We will present some more demonstrations and practical examples of all this.

See you next time, bye bye.