Welcome to Lesson 3 at Introduction to Data Signal and Image Analysis with MATLAB. This is the lesson on signal analysis in MATLAB. In this lesson, we'll be looking at how we can use MATLAB for signal analysis tasks. In our first section, we will introduce signals as a time dependent form of data. Before we get into any analysis algorithms, first, we need to understand what is a signal. Signals are functions that vary over time. Real-valued signals, as opposed to those composed of complex numbers, are functions that evaluate to real numbers as a function of time. Traditionally, the field of signal analysis includes both continuous time and discrete time signals. A continuous time signal is also known as an analog signal. An example would be a function describing the strength of an electric field in an electrical circuit as it changes over time. Another example would be a function describing the sound pressure level created by your audio speakers right now, which is creating a continuously changing vibrating pressure wave that travels through the air to reach and vibrate your eardrum to induce the sensation of sound in your ear. Sounds can be represented with well-known trigonometric functions like the cosine function. When a cosine with frequency of 440 hertz, that is a cosine wave that oscillates at a rate of 440 times per second, is played on a speaker to create a sound pressure wave at that frequency, it creates what we call a "Pure tone". A pure tone is a sound that contains only a single frequency. Hertz is a unit equal to one divided by seconds used to indicate frequency of a periodic process. In this case, 440 hertz is a wave that oscillates 440 times per second, and this creates the sound of the musical note A. It turns out that every note on the musical scale corresponds to a sound pressure wave oscillation frequency. We will see numerous examples of this in the following lessons, as sound signals often illustrate critical signal analysis properties. We also have discrete time signals, in which our notion of time is not continuous, but rather is something that occurs at discrete time points spaced at regular intervals. Thus, our discrete time signals are signals that have a value only at a set of discrete time points. Discrete time signals will be our main focus in this lesson because we're representing signals in a discrete digital representation on a computer. Further, the vast majority of modern signal processing applications that involve analyzing continuous time signals will involve a process called sampling or analog to digital conversion, in which we sample a continuous time signal to create a discrete representation that we're able to further analyze, process, or transmit on digital computers. An example is the microphone I'm speaking into now. This microphone is sampling the continuous sound pressure level of the air in the room in which I stand at discrete time intervals, and creating a digital representation of this continuous signal. That digital representation gets stored on a hard drive. Ultimately, that data is being transferred as another signal through the Internet to your computer, where your audio card converts it back from a digital to an analog signal that can be amplified in output on a speaker to reproduce the original sound pressure wave for you to hear. Going forward, when we talk about signals, we will be generally referring to discrete time signals unless otherwise noted. An example of a discrete signal would be the value of the Dow Jones Industrial Average at opening bell every day. In this case, our time interval is one day and our signal is the numeric stock value sampled at that time point. Over the timescale of years, our signal contains thousands of samples. You may have noticed by now that signals at a fundamental level are simply a set of numbers. In this sense, signals are a sub-type of data, like we saw on the previous lesson. Like data, signals can be thought of as a set of numbers. Like data, signals can be one, two, or in general n-dimensional. For example, instead of just the Dow Jones, we could be looking at in different stock indices sample that opening bell. This would result in an n-dimensional signal. Because signals are a sub-type of data, the data analysis methods we explored in the last section are applicable. However, with signals, there is a notion of ordering or time to the data. This is different from the data sets we inspected previously, where we were analyzing a set of samples of a process, and those samples were considered independent measurements of the process. Since the samples were independent, the order of the samples was meaningless. Discrete time signals, on the other hand, are samples of a process that changes over time. In this case, the data are only valid if they are properly ordered. Thus, the most fundamental signal analysis methods analyze how the signal changes over time, making methods in the signal analysis field distinct from those that are used in the period data analysis fields. Are you not sure you yet understand what a signal is? That's okay. We've been talking about them at a very abstract level. Let's consider a couple of concrete examples in MATLAB that will make signals more understandable. The first thing we'll look at is an audio signal. It's actually quite straightforward to create any audio pure tone we like in MATLAB. We can use MATLAB's built-in trigonometric functions to define waves. Recall from trigonometry that the cosine function is a wave function that is periodic with period two Pi. That is, the cosine function repeats every time the argument increases by two Pi radians. We can show this in a plot. We define a radians variable to range from 0-4 Pi in steps of 4 Pi over 100. In a subplot, in the figure, we can plot the cosine function with rads as the argument. We will plot the cosine with o's. On the x-axis, we have radians in multiples of Pi. On the y-axis, we have our cosine function. As we can see in the plot, cosine of 0 is equal to 1, and then decreases to minus 1 as the argument approaches Pi. It then increases back to one when approaching two Pi, after which the function repeats. This repeating behavior is called periodicity. We say the cosine signal is a periodic signal with period two Pi, because when the argument of cosine is increased by two Pi, it produces the same value. How do we use the cosine function to create pure tones? We want to create a time-varying signal, so to create a pure tone, we are going to define a cosine that is a function of time. Because signals in MATLAB are discrete and finite in length, we have to first choose the duration we want our signal to be. Let's select big T, the duration of our pure tone to be 100 seconds. Then we can define a time variable to range from 0-100 seconds. Why do we do this? Our signal will be a function that has values at discrete time points in the duration from 0-100 seconds. In this case, we use T to define our time interval between time points to be one second. In our vector T, we have values of 0, 1, 2, all the way to a 100, to represent the values of our time variable at these discrete intervals. This makes it easy to define our signal as a cosine wave as a function of time. To do this, we can define a new vector y that is a cosine dependent on t, where t is scaled by some constant a. What do we want a to be? We define a to control the frequency of the wave. Since the cosine function is periodic with period two Pi radians, we simply need to scale time t, which has units of seconds, with the appropriate a in order to convert our time in seconds to radians while scaling it with the frequency, at which we want our cosine to oscillate relative to time. As a simple example first, lets say we would like to have one full oscillation of our function over our interval, T of 100 seconds. That is a frequency of 1 divided by T is equal to 1 divided by 100 hertz. We defined f which stands for frequency to be 1 over 100 hertz. Then we can define a to be f which is in hertz multiplied by 2 Pi radians. Thus, a is equal to 2 Pi f radians hertz or radians per second. Why? Well, if we want our cosine wave to oscillate one period from time 0-100. Then at time 100, that is where t is equal to 100 seconds. We want the argument for the cosine to equal 2 Pi radians. We need to scale time t in seconds by 2 Pi divided by 100 or 2 Pi f radians per second. Thus, when t is equal to 100 seconds the result is 2 times Pi radians times 1 over 100 hertz times 100 seconds which equals 2 Pi radians. Now we can evaluate y to be cosine of a times t, and plot the results. On the y-axis, we have our y of f. On the x-axis, we have t in seconds. As intended, we can see this results in a single period of the cosine distributed over 100 seconds of time. What if we went two oscillations over our 100 second interval? This would be a frequency f of 2 over 100 hertz. Now when a is equal to 2 Pi f. When T is 100, we have a times t is equal to 2 Pi times 2 over 100 times 100 which equals 4 Pi. We would expect our cosine to have oscillated two periods. When T is equal to 50 seconds, a times t is equal to 2 Pi radians times 2 over 100 hertz times 50 seconds which equals 2 Pi radians. Our cosine will have oscillated once by time t equal 50. Let's take a look. We again evaluate y is equal to cosine a t and then a new subplot. We can plot our new signal. As expected, now we have two oscillations. Note that we only needed to change our frequency variable f and the rest of its code is identical. This was no accident. We simply choose the value of our frequency f to control the number of oscillations per second produced by the cosine function is a function of time. Also note that the result looks identical to our original plot of the cosine function that we plotted with respect to radians. The plot is exactly identical. The only difference is that for the first plot, we started with a radians as the independent variable, that is the variable that the cosine depends on. Whereas in the last plot, we started with time as the independent variable and converted time into radians to achieve the same two oscillations of the cosine is a function of the passage of 100 seconds of time rather than radians. What happens when instead of a cosine a times t, we do cosine of a times t plus some number? If we add a constant to our cosine argument, this phase shifts the cosine. Essentially, it moves the cosine function to the left or to the right. By default, at time zero, the cosine has a value of one and decreases from there before returning to one at the end of the first period. If we have a phase shift that is not a multiple of 2 Pi, then our cosine will not start at one at time zero, but some other value between one and minus one depending on the phase. Phase shifts that are multiples of two Pi radians have no effect at all. Is they equate to shifting our function by an integer number of oscillations? Let's show what this looks like in an animated plot that we can create with a for loop. We'll use 50 iterations, and define our phase fraction to be i divided by 50. Then y is cosine of a times t plus 2 times Pi times the phase fraction. Once we hit iteration 50, our cosine will be phase shifted by two Pi, which should be one full oscillation of the cosine. We plot it as os. Our x-axis is still time in seconds. On our y-axis, we will label it to note the phase fraction. We are showing cosine of a times t plus 2 Pi times the phase fraction, which we can print to a string using the sprintf function. Here, I'm just going to display two decimal points. We use the drawnow function to tell MATLAB to stop and draw the figure before moving to the next iteration. Running it, we can see as we increase our phase fraction, we can see our cosine function appear to be moving to the left. Ultimately, this continues until our phase reaches two Pi, where we will end up exactly back where we started. If we kept continuing to increase the phase, we would just do the same thing all over again due to the periodic nature of the cosine function. So that's the phase effects. Going back to frequency, we can see what happens when we keep increasing our frequency. If we want five oscillations in 100 seconds, we can simply update our frequency variable and repeat the same code. Again, a is equal to 2 Pi f, y is equal to cosine of a times t. We'll do one last subplot, plot our signal. We'll have time on the x-axis and our signal y of t on the y-axis. As expected, this results in five total oscillations over our 100 second interval. However, this plot has exposed a few new questions. We can see that as our frequency increases, it becomes more difficult to actually see the shape of the cosine function in the plot. This is because as the cosine oscillates more rapidly, the space between the plot circles increases relative to the width between the oscillations. We are beginning to expose one of the fundamental problems in digital signal analysis. Spoiler alert. We'll come back to this later. But it turns out that there is a maximum oscillation frequency that you can represent in a digital signal. This maximum frequency effects not only visualization, but also our fundamental analysis of the signal. We'll have more on this later. What's important to note now is that our frequency limit and our ability to visualize this signal can always be increased by decreasing the time interval between samples. In our example so far, we have defined our time independent variable, such that the interval between time points is one second. If we halved this interval, we double the number of samples we acquire with our cosine function. We can plot this new signal with the original sample superimposed. First, we plot the new signal with red circles, and then we superimpose the original signal in blue circles. We can see how having the time interval between samples doubles the number of samples we have over the time duration of our signal, big T, and having twice as many samples, makes the shape of the signal much more visually clear. As mentioned a few moments ago, we can always improve our ability to visualize the signal by decreasing the time interval between samples. This time interval is referred to as "the sampling period", which has units in time. We will always use seconds as our time unit. The inverse of the sampling period is called the sampling frequency, often denoted, fs. The sampling frequency of a signal indicates the number of samples per second that are required. As this is a frequency quantity, it has units of hertz. Sampling frequency is equivalent to the inverse of the sampling period, but traditionally, the sampling frequency is the metric that is used to quantify the time interval between samples. In this case, instead of saying we have a sampling period of 0.5 seconds, we would say we have a sampling frequency of two hertz. We can repeat the same plot using fs instead of a sampling period. We defined fs to be equal to two hertz. Then we defined our independent time variable, again it will range from 0-t. But the step size in this range is going to be a function of the sampling frequency. Specifically, it would be the inverse of or one divided by sampling frequency. After this, we define our cosine signal similarly. Then we can see we arrive at exactly the same signal. Only this time, we defined it using the sampling frequency rather than the sampling period. This leads us to another question. Are circles or any other individual graphical symbol available in MATLAB the best choice for visualizing signals. The visualization mode may seem trivial at first, but the choice of how to visualize the signal is often quite important. The use of a circle or other discrete symbol is the more genuine way to plot the data, but typically not the most visually pleasant. Here are the fundamental issues at play. Our y vector is a set of discrete data points. These are individual samples of a process that are being sampled at discrete time points. Thus discrete symbols for these discrete time points is logical. However, sometimes in MATLAB, we like to approximately represent continuous signals. We can approximate a continuous signal over a limited interval with discrete samples. The higher our sampling rate, the more dense our samples will be, and the better our approximation will be. If the thing we are really trying to represent is a continuous process, we can do what is called linear interpolation between samples. With linear interpolation, we are saying that if we do not know the exact value of our function for values of time that occurred between our known samples, we can approximate those values by connecting our known values with straight lines. If our sampling frequency is high relative to the maximum frequency of our signal, our signal does not change in value much between samples, and thus, our linear interpolation lines between samples will be very close to the truth. We can illustrate this by varying the sampling rate of our cosine function and displaying it with a linear interpolation. First let's take a look at the effect of changing the sampling frequency for a specific signal. Our signal will be a cosine with a frequency of 2 over 100 hertz, over a time interval of 100 seconds. We will try four different sampling frequencies. We'll define our vector fs to be the set of sampling frequencies we will try, which will include 1, 0.25, 0.1, 0.04 hertz. Then we can plot each result using a four loop. Looping over the four sampling frequencies. We will create two rows of four subplots and we'll just fill in the first row here. First, we define our independent time variable, T, just as we did before, but this time as a function of the ith entry in Fs, then we plot the results identically to before. We'll have time in seconds on the x-axis and the value of our signal will be displayed on the y-axis. We can give it a title that tells us the sampling frequency. It'll also be helpful to set the axes to be a bit wider than the data to best see the data points and save some room for some other stuff later. Since our time variable goes from 0-100 on the x-axis and range from minus 5 to 105, we'll add a little bit of padding to the visualization on the x-axis. Similarly, our cosine ranges from minus 1 to 1. So ranging from minus 1.5 to 2, we'll add a nice padding in the y-direction so there's some wide space around the data. After we run this, we see that at a sampling frequency of one hertz, we can easily see our familiar cosine function. As we decrease our sampling frequency but hold the frequency of the cosine constant, we see that the number of oscillations does not change yet the signal looks quite different. The signal is sampled at a much sparser set of data points. The values of the function sampled at those sparser sets of data points are identical, yet the underlying cosine function being sampled becomes unrecognizable as such from its samples. When Fs is equal to 0.04 hertz, our samples are only at the positive and negative peaks of the cosine function. In this case, the vectors Y and T, our F length is only five, so they are easy to display on the screen and inspect. Y is equal to plus 1 minus 1 plus 1 minus 1 plus 1 and only has a value once every 25 seconds. Let's add a densely sampled cosine to these plots plotted with linear interpolation to help see what's happening. We choose our dense sampling rate to be 100 times the cosine frequency. Then we can define our dense time and dense cosine signal Yds as before. Finally, we loop through our subplots to add the new signal to each one. You can also add a legend to each plot, which will state the sampling frequency of the two signals being plotted. When we plot the cosine function with very dense sampling and display it using lines connecting the data points, which is the default plot behavior when no symbol is specified, we can see what is happening. In every case, the samples that we choose fall exactly on the underlying cosine function we are sampling. However, when assembling becomes very sparse, it becomes more and more difficult to understand how the function behaves between the samples. On the other hand, when we have lots of samples, the shape of the underlying function is obvious. Now, in the bottom row of our figure, let's try a linear interpolation plotting method for the same sampling rates. Here we repeat the plots of exactly the same data as above, but instead of plotting the sparsely sampled data using circles, we'll use the linear interpolation method by removing the O symbol. Everything else is identical. By simply connecting the sequence of data points together with lines, the sparsely sampled data appear to be much closer to the underlying function than the sparse points alone. Again, the data has not changed at all, only how we visualize it. Even the sparse set of samples with Fs equals 0.04 Hertz, where we only have five samples of the signal, looks pretty close. At Fs equals 0.1, it's pretty clear from the red curve that we have a cosine function. Once we reach a sampling rate of one hertz, the difference between the one hertz and two-hertz sampling rates becomes so small that when the two plots are displayed on the screen, they exactly overlap and appear identical, which is why you can see no red on the plot on the left. In fact, both of these results look so perfectly cosine-like, we can easily be mistaken into thinking we are looking at a continuous function rather than a discrete one. Even at Fs is equal to 0.251, one has to look closely to see the sharp transitions between line segments that indicate that this is a piecewise linear function rather than a smooth continuous cosine. This is both a blessing and the curse of linear interpolation plotting. It makes the shape of the function easier to visually understand, however, it can also make us forget that we are looking at discrete data. If we are not careful, we may forget that we only know the value of the signal at the discrete time points where the samples exist and that the linearly interpolated regions between samples are mere guesses. If however, we know that our signal is a cosine, or more generally, a smoothly varying function of time and we can guarantee that our sampling frequency is much faster than the fastest frequency that exists in the signal, the linear interpolation is a very good approximation of the underlying continuous function. A good rule of thumb for visualization is choosing the sampling frequency to be 100 times the frequency of the signal. Here, that is Fs is equal to 2 hertz. Going forward, when we display signals, we will use the linear interpolation visualization method. But do your best to not forget that these are discrete samples rather than a continuous function. We will see in later lessons that the sampling rate plays a critical role in digital signal analysis. As a last exercise, let's put what we have learned about creating a signal-specified frequency sampling rate and visualization into practice with an example problem. Your task is to create a two-second long 440-hertz pure tone sound signal to be used as a tuning standard to tune musical instruments. Remember how in music an A note can be created using 440 hertz wave. That is a wave that oscillates 440 times per second. Where do we start? Well, we know our frequency F will be 440 hertz. We are also given our time duration, big T, to be two seconds. We will create the signal using the cosine function and sampling rate is an open quantity that we can choose. We just saw that at least for visualization purposes, 100 times F is a good place to start, assuming this does not make our vectors too large. Let's try Fs equals 100 times F for our sampling frequency. Then we are ready to define our independent time variable T to range from zero to big T in steps of 1 over Fs. As always, our a is 2Pif. Finally, our y is cosine of a times t. We can see that t and y are 80,000 long vectors, which is a perfectly reasonable length for MATLAB to handle, so our choice of Fs is fine. If the vectors were to get very long, where we are using too much memory on our computer, we may try to reduce Fs or reduce big T to work with shorter vectors. Let's look at the result. Well, when we plot it, what's happening? Well, with a frequency of 440 hertz, our wave is oscillating 440 times per second or 880 times in this two-second plot. That's a lot of oscillations. So many that the plot of the whole signal looks like a blur. However, if we zoom in, we can start to see a wave. Using the axis function, we can look at the first four oscillations of the wave. That is from time 0 to time 4 divided by 440 on the x-axis since the frequency is 440 and we want four oscillations. By zooming in on this very short time frame, we can see the familiar cosine function. By this point, you are probably asking, seeing the wave on the screen is great and all, but I wanted to create a pure tone sound. So how can I hear it? MATLAB lets you play signals over your computer loudspeaker using the soundsc function. You should always specify at least two arguments. The first is the signal, in this case, y, the second is the sampling frequency. Without this soundsc has to assume what is the sampling frequency of your signal and that could change how it sounds. By itself, y is just a vector, so there's no notion of the amount of time that passes between samples without knowing the sampling frequency. The SC in soundsc stands for scaling. It will take the input signal and scale it so that the maximum magnitude and the signal corresponds to the loudest part of the sound to be played on the loudspeaker. If the input is 0.1 times y or y or 10 times y, the sound played is the same loudness because the signal is always re-scaled. Two words of warning before executing the function, check that the volume on your speakers is reasonable to make sure the sound will not be too loud and understand that your entire signal will play with little way to stop it short of killing the MATLAB window. The duration of the signal and seconds is the length of y divided by Fs. So here we can see it's just two seconds. But if Fs was smaller or a length y longer, you could be in for several minutes or longer of listening pleasure. Let's hear the pure tone. If you've followed along, congratulations. You've created a sound signal from scratch in MATLAB now both visualized and listened to it.