Welcome again to the course on how using processing for music applications. In the first part of this lecture, we started talking about the Short-Time Fourier Transform. Now, lets continue. We will cover quite a few topics, but each one is quite short. We will first review the concept of STFT and analysis window, then talk about the window size, then the FFT size, then the hop size, then what we call the time frequency compromise. And then, once we have covered all that, we can do the inverse of this analysis process, so we'll do the inverse of the Short-Time Fourier Transform. And finally, we can put it altogether into an analysis synthesis system that can be used to analyze and synthesize any sum. So we saw this equation in the first part of the lecture. We now know that the Short-Time Fourier Transform is the time varying version of the DFT, and that windowing is a key concept to understand the STFT. So for example, in these plots, if we start from a violin sound that we can listen to, [SOUND] then the magnitude spectrum of one fragment of one of the frames is quite different depending on the window we use. If we use a rectangular window, we can see this first magnitude spectrum. If we use the Blackman window we can use this second one, okay? So now, let's try to understand a few more things about the Short-Time Fourier Transform or the windowing process. So we talked about different types of windows in the last class. But these windows can be of any size and what is the effect of this size? So here, we see example of the same sound and choosing the same window but with different sizes. One is with size 128 and the other is with a size 1024. Clearly very different and the result is amazingly different. The minor spectrum of the small window has very little information, has very few samples. It has, basically, 128 samples and in the larger window, we see quite a lot, we see a lot more detail of the sound, of what's going on. So that's going to be a very important compromise to be done when we use the window size. And the phase spectrum, of course, it's also different but a lot of this issues will be much more clearer in the Monday spectrum, so will focus a lot on discussing defect of magnitude spectrum. Okay, another important issue that has a significant effect, even though it's a tiny difference, is the concept of even and odd sized window. So we can take the same window and have a size 32, which is an even size. Or we have a size 31, which is one sample less and it's an odd size. If we look at the magnitude spectra, they basically look the same, the magnitude spectra are the same. The difference is not significant, but the phase spectrum is quite different. In one, in the even size, there is a slope. So this are by the way, of course, centered around zero. So clearly, if its a uneven size, it cannot be perfectly centered because we have to have a sample of zero, and then if its even, it will have different number of samples on the right hand or on the left hand. In the other hand, if we have a knot size window, we can have a sample right in the middle and then have the same number of samples on both sides. Therefore, it can be perfectly symmetric around zero and therefore, get the zero phase value that we have talked about before. So, whenever we can, we're going to be using odd size windows. The window size has been chosen according to some resolution criteria, the number of periods or the resolution that we need to have, and we'll again be talking more about that. But now, we have to choose the FFT size, which is independent of the window size because we have the opportunity to do zero padding. So in this example we show, we start from a given window size 512. And if we choose the same window size and the FFT size, we get this first magnitude plot, and we see a pretty decent minor spectrum with all these peaks that correspond to the harmonics of this over sound. But if we do and FFT size larger, so we do an FFT size in this case of 2,048 samples, so we zero but quite a bit is four times larger the FFT size than the window size, we see a spectrum which is much smoother, that match nicer, and this is something that is going to be very much desirable. Something that we will be looking for is a spectrum that are quite small, so we can identify things much better, okay? Then the final step of the Short-Term Fourier Transform analysis loop is the advancement of the windows, the Blackman\g of the frames, what we call the hopping, the hop size. So the window advances eight samples after each DFT. And then, with this equation, we see the effect of this hopping. So by just taking the window function and hopping by a given H, we get a function A that depends on this hopping factor and of course of the window size. And here we have two examples, two hopping factors, different. Okay, the first one, with the window of 201 samples, we are hopping 100 samples like half of it. And the red function is this a function that the sum of this windows. Clearly, it's not a very smooth result. It has, like this oscillation, it's basically we call a modulation, that in fact would be heard in the re-synthesis that we will do with the Short-Time Fourier Transform. On the second plot, we see the hopping of fifty, so basically one fourth of the window size. And apart from the boundaries in which we see clearly that there is a the effect kind of a distortion, all the middle is very much like a straight line. And this is the kind of things that we would like to have. We would like the hopping does not effect so it can maintain an identity and so this A function should be a straight line. Now, we can put all these concepts together to analyze a real and complex signal, like a piano for instance. So first, let's listen to the input sound. [MUSIC] Okay, so choosing all the parameters in the STFT, it has a number of effects in the output. The most important one is what we call the time frequency compromise. On the top plot we see the magnitude spectrum of the whole STFT analysis, so basically, it's a sequence of DFT. The vertical axis is the frequency, the horizontal axis is the time, and the color intensity, basically, is the magnitude, in the magnitude value in the magnitude spectrum. Okay, so here we see very clearly the attacks of the piano, and kind of, their horizontal aspect response to the harmonics of the piano. But on the second plot, which has a bigger window and FFT size, four times, so it has one thousand twenty four samples. For the window, we see that horizontal lines are much better defined, even though in exchange of the vertical lines of the attack that are not as crisp as clear. So this is what we call time-frequency compromise. If the window size is small, we might get a good time resolution in exchange of a poor frequency resolution. If we take a bigger window size, we will get a better frequency resolution. So that's what we see better horizontal lines, in exchange of not so good time resolution, so not so good crisp-like attacks, as we see in this second plot. And then, of course, typically, we only show the magnetic spectrum, but the phase has also valuable information. So in here, we see both the magnitude spectrogram and the phase spectrogram. In fact, what we are seeing is the derivative of the phase and if we take the derivative, what we're seeing is a kind of the, when there is a flat region or when there is a phase that changes a lot. So in fact here, we can also kind of see these horizontal structure that relate to the stable peaks that correspond to the harmonics. So, whenever the color is more clear, it means that it's more stable. And when we see this darker color, it means that there is a bigger slopes so there is a bigger phase change. So it's a little bit harder to read this and visualize this phase spectrum, but they have some very interesting information that we're going to be using. And like the DFT, the Short-Time Fourier Transform has the inverse transform. If we do things right, we can recover the original sound from the amplitude and phase spectrogram. How to do that, what we're basically doing is what we call overlap and adding this different output spectra. So, we'll be computing every individual spectra of every frame. So X of L, and we take the inverse VFT of every X of L, as we show here. And then there is this shifting and summing, which correspond to this idea of shifting and adding over the previous frame in such a way that we burst and easily compensate all this windowing and we recover the output signal. So basically, the output frame of one single frame is going to be this yw sub l, which is going to be the result of one fragment of the input sound multiplied by the window. And then, the overall output sound will be the sum of this windowed output fragments. And what we see here is in this equation is we identify the effect of the window. Basically, we have this summing over all the windows and this is the possible distortion or artifact that might be applied using this process. So, if the sum of the windows is a constant, and is one, the input will be equal to the output. But if the sum of windows is not equal to a constant, then we definitely going to have some distortion modulation that will be heard in the output sound. So graphically, basically, we are just doing exactly the same in the analysis. So it's the inverse process, so we are overlapping and adding every single fragment to recover the whole signal. And now that we know how to analyze and synthesize a sound with STFT, we can build a system, and we will show that and we'll implement that in the programming lectures. And this is the diagram of a complete analysis synthesis system using the Short-Time Fourier Transform. So we have the input sound x7, then we have the window that we apply and we use to select portions of the sound and from this window fragment, we compute the FFT, and results in the magnitude and phase spectrum. Then, at every frame we do the inverse of that, and recovers the output sound, which is going to be a windowed signal because it has the window in it. And then we get overlap-add factor, and if we do it correctly, we generate an output signal that should be identical in the window and the hub size is correct, to the input signal. So this would be an example of a complete sound analysis and synthesis of a sound in varying time. We start with the piano sound that we already heard. We see the magnitude spectrum with the given window size, FFT size and hop size in a way that the overlap is correct. We see that the derivative of the phase spectrum, and then we see the y, the output sound, which is the inverse of this spectrum, and we can listen to that. [MUSIC] And clearly, in this case, it's identical to the original because we did the hopping and the windowing correctly. So again, quite a bit of references are available on this topic, both on Wikipedia and Julius' website. And that's all for this class. With the second part of the lecture, we have completed the topic on the Short-Time Fourier Transform. We now know how to analyze and synthesize any sound. And, of course, in the demonstration classes and in the programming classes of this week, we're going to put that in practice. But, even that, this is just the beginning of the interesting stuff. We will now be able to really do powerful things building on top of the Short-Time Fourier Transform. So stay with me and you will see. See you next time.