AI today is being successfully applied to image and video data, to language data, to speech data, to many other areas. In this video, you see a survey of AI applied to these different application areas and I hope that this may spark off some ideas of how you might be able to use these techniques someday for your own projects as well. Let's take a look. One of the major successes of deep learning has been Computer Vision. Let's take a look at some examples of computer vision applications. Image classification and object recognition refer to taking as input a picture like that and telling us what is in this picture. In this case, it'd be a cat. Rather than just recognizing cats, I've seen AI algorithms able to recognize specific types of flowers, AI able to recognize specific types of food and the ability to take as input a picture and classify it into what type of object, and this is being used in all applications. One specific type of image classification that has had a lot of traction is face recognition. This is how face recognition systems today work. A user might register one or more pictures of their face to show the AI what they look like. Given a new image, the AI system can then say is this the same person? Is this you? Or is this a different person so that it can decide a decision, unlock the door or unlock the cell phone, unlocked the laptop or something else based on the identity of the person. Of course I hope face recognition will only be used in ways that respect individuals privacy, we'll talk more about AI in society next week as well. A different type of computer vision algorithm is called object detection. So, rather than just tried to classify or recognize an object, you're trying to detect if the object appears. For example, in building a self-driving car, we've seen how an AI system can take as input a picture like this and not just tell us yes or no, is there a car. Yes or no, is there pedestrian but actually tells us the position of the cars as well as the positions of the pedestrians in this image, and object detection algorithm can also take as input a picture like that and just say, no I'm not finding any cars or any pedestrians in that image. So rather than taking a picture and labeling the whole image which is image classification, instead an object detection algorithm will take us input an image and tell us where in the picture different objects are as was what are the types of those objects. Image segmentation takes this one step further. Given an image like this, an image segmentation algorithm we output, where it tells us not just where the cars and pedestrians but tells us for every single pixel, is this pixel part of this car or is this pixel part of a pedestrian. So it doesn't just draw rectangles around the objects and detects, instead it draws very precise boundaries around the objects that it finds. So, in reading x-rays for example, it would be an image segmentation algorithm that could look at an x-ray scan or some other image of a human body and carefully segment out, where's the liver or where's the heart or where is the bone in this image. Computer vision can also deal with video and one application of that is tracking. In this example, rather than just detecting the runners in this video, it is also tracking in a video whether runners are moving over time. So, those little tails below the red boxes show how the algorithm is tracking different people running across several seconds in the video. So, the ability to track people and cars and maybe other moving objects in a video helps a computer figure out where things are going. If you're using a video camera to track wildlife for example, say birds flying around, a tracking algorithm will also be the helper track individual birds flying across the frames of your video. These are some of the major areas of computer vision and perhaps some of them will be useful for your projects. AI and deep learning specifically is also making a lot of progress in Natural Language Processing. Natural Language Processing or NLP refers to AI understanding natural language, meaning the language that you and I might use to communicate with each other. One example is text classification where the job of the AI is to input a piece of texts such as an email and tell us what is the cause or what is the category of this email, such as a spam or non-spam email. There are also websites that would inputs a product description. For example, you might write, I have a secondhand cellphone for sale and automatically figure out what is the product category in which the list is product. So, that would go under cellphones or electronics or if you write, I have a new t-shirt to sell then it would list it automatically under clothing. One type of text classification that has had a lot of attention is sentiment recognition. For example, a sentiment recognition algorithm can take as input a review like this of a restaurant, the food was good and automatically tries to tell us how many stars this review might get. The food was good as a pretty good review maybe that's four over five-star review. Whereas if someone writes service was horrible, then the sentiment recognition algorithm should be able to tell us that this corresponds maybe to a one-star review. A second type of NLP or Natural Language Processing is information retrieval. Web search is perhaps the best known example of information retrieval where you type in the text query and you want the AI to help you find relevant documents. Many corporations will also have internal information retrieval systems where you might have an interface to help you search just within your company's set of documents for something relevant to a query that you might enter. Name entity recognition is another natural language processing technology. Let's illustrate it with an example. Say you have this sentence and you want to find all the people names in the sentence. So, Queen Elizabeth the second is a person, Sir Paul McCartney as a person. So, the sentence Queen Elizabeth, the second night to Paul McCartney for a service of music at the Buckingham Palace, it would be a name entity recognition system to confine all the people's names in the sentence like this. If you want to find all the location names, all the place names in a sentence like that, a named entity recognition system can also do so. Name entity recognition systems can also automatically extract names of companies, phone numbers, names of countries, and so, if you have a large document collection and you want to find automatically all the company names, or all the company names the occur together or all the people's names, then a name entity recognition system would be the tool you could use to do that. Another major AI application area is machine translation. So, for example, if you see this sentence in Japanese, AI [inaudible]. Then hopefully a machine translation system can input that and output the translation AI is in the electricity. The four items on this slide: text classification, information retrieval, name entity recognition, and machine translation, are four major categories of useful NLP applications. If you work with an NLP team you may also hear them talk about parsing and part of speech tagging technologies. Let me tell you what these are. Let's take the example sentence, "The cat on the mat". A part-of-speech tagging algorithm will go through all the words and tell you which of these words are nouns, which of these words are verbs, and so on. For example, in the English language cat and mat in the sentence are nouns. So, the part of speech tagger we'll label these two words as nouns. According to the theory of English language, the word the is a determiner. Don't worry if you've never heard of a determiner before, this is a word from the theory of English language, and the word on is a preposition. So, part of speech tagger will label these words like that. Well, why do you care? If you're building a sentence classifier for restaurant reviews, then a part-of-speech tagging algorithm would be able to tell you which are the nouns, which are the verbs, which are the adjectives, which are the adverbs, and so on, and therefore, help your AI system figure out which of the words to pay more attention to. For example, you should probably pay more attention to the nouns since those seem like important words. Maybe the verbs. Certainly the adjectives, words like good, bad, delicious are adjectives, and your AI system may learn to ignore the determiners. Words like the which may be matter less in terms of how a user is actually feeling about the restaurant. A part of speech tagging system is usually not a final application. You hardly ever wake up in the morning and think, "Boy, I wish I could get all the words in my sentence tag." There's often an important pre-processing step. There's often an important intermediate step in a longer AI pipeline, where the first step is part-of-speech tagging, or parsing, which we'll talk about in a second, and then the later steps are an application like sentence classification, or machine translation, or web search. Now, what is a parser? Given these five words, a parser helps group the words together into phrases. For example, the cat is a phrase, and the mat is a phrase. So, a parser will draw these lines on top of the words to say, those words go together. On the mat is another phrase. Finally, the two phrases, the cat, as well as on the mat, these two phrases are then combined to form the overall sentence. So, this thing that I drew on top with the sentence tells you what words go with what words, and how the different words relate to each other. While a parsing algorithm is also another final end-user product, it's often a commonly used step to help other AI algorithms. That's how classify tags are translated, and so on. Modern AI, specifically deep learning has also completely transformed how software processes audio data such as speech. How is speech represented in a computer? This is an audio waveform of one of my friends saying the phrase machine learning. The x-axis here is time, and the vertical axis is what the microphone is recording. What the microphone is recording is little variations, very rapid variations in air pressure which your year and your brain then interpret as sound. This plot shows as a function of time, the horizontal axis, how the air pressure is changing very rapidly in response to someone say the word machine learning. The problem of speech recognition, also known as speech-to- text, is the problem of taking as inputs a plot like this, and figuring out what were the words that someone said. A lot of speech recognition's recent progress has been due to deep learning. One particular type of speech recognition is trigger word detection or wakeword detection. You saw this in the earlier video with having an AI system detect a trigger word or the wakeword such as Alexa, or Hey Google, or Hey devise. Speaker ID is a specialized speech problem where the task is to listen to someone speak and figure out the identity of the speaker. Just as face recognition helps verify your identity by taking a picture, speaker ID can also help verify your identity by listening to you speak. Finally, speech synthesis, also called text-to-speech or TTS is also getting a lot of traction. Text-to-speech is a problem of inputting a sentence written in text and turning that into an audio file. Interestingly, whereas, text-to-speech is often abbreviated TTS, I don't often see speech-to-text abbreviated STT. One quick example. Let's take the sentence, "The quick brown fox jumps over the lazy dog." This is a fun sentence that you often see NLP people use because this sentence contains every single letter from A to Z. So, that's ABC all the way up to X, Y, and Z. You can check all 26 letters appear in this sentence. Some letters appear more than once. If you parse this sentence into a TTS system, then you might get an audio upwards like this, The quick brown fox jumps over the lazy dog. Modern TTS systems are increasingly sounding more and more natural and human-like. AI is also applied to many applications in robotics and you've already seen one example in the self-driving car. In robotics, the term perception means figuring out what's in the world around you based on the senses you have, be it cameras, or radar, or lidar. Shown on the right is the 3D laser scan or the lidar scan of a self-driving car as well as the vehicles that this self-driving car the middle has detected in the vicinity of your car. Motion planning refers to finding a path for your robot to follow. So, if your car wants to make a left turn, the motion planner might plan a path as well as a speed for the car to make a left turn that way. Finally, control refers to sending commands to the motors such as your steering wheel motor, as well as your gas pedal, and brake motors in order to make the car smoothly follow the path that you want. On this slide, I'll focus on the software and the AI aspects of robotics. Of course, there's also a lot of important work being done to build hardware for robotics as well. But a lot of the work AI on perception, motion planning, and control has focused on the software rather than the hardware of robotics. In addition to these major application areas, machine learning is also very broadly used. The examples you've seen in this video relate mainly to unstructured data such as images, audio, and text. Machine learning is applied at least as much to structured data, and that means these tables of data some of which you saw in the earlier videos. But because unstructured data such as images is so easy for humans to understand, there's something very universal, very easy for any person to understand and empathize with when we talk about an AI system that recognizes a cat. So, the popular press tends to cover AI progress on unstructured data much more than it does AI on structured data. Structured data also tends to be more specific to a single company. So, it's harder for people to write about or understand, but AI on structured data, or machine learning on structured data is creating tremendous economic value today as well as AI on unstructured data. I hope this survey of AI application areas gives you a sense that the wide range of data that AI is successfully applied to today, and maybe this even inspire you to think of how some of these application areas may be useful for your own projects. Now, so far the one AI technique we've spent the most time talking about is supervised learning. That means learning inputs, output, or A to B mappings from labeled data where you give the AI system both A and B. But that's not the only AI technique out there. In fact, the term supervised learning almost invites the question of what is unsupervised learning, or you might also have heard from media articles, from the news about reinforcement learning. So, what are all these other techniques? In the next video, the final optional video for this week, we'll do a survey of AI techniques, and I hope that through that maybe you'll see if some of these other AI techniques and supervised learning could be useful for your projects as well. Let's go on to the final optional video for the week.