In this module, we'll do two things. First, we'll introduce you to the world of machine learning. Second, we'll examine a classic classification algorithm logistic regression. More and more managers find themselves drowning in data. It's not just the everyday transactions and sales data that stalks them. It's the shipping, the logistic data, customer loyalty data, scanners and IoT data, that's machines talking to other machines, supplier data, and employee, and HR data. The volume, velocity, variety of data is only increasing if the use of the term big data is any indication. It's not just hyperbole to say that we have more data available to analyze now than we ever have before. Commentators state that over 2.5 quintillion bytes of data are created every single day. That more than a megabyte of data is created every second for every person on Earth. That means that in the time you've been sitting here listening to this video so far, more data has been created just from you than most personal computers could handle in the year 2000. Now think of your own employer. How much data is available to analyze? How quickly does that data get created? Just keeping up with this data is a process in and of itself. But you are tasked with not only understanding these mountains of data, but also extracting actionable business insights from this data. Luckily, that's what this course is about. Helping you develop the ability to use tools to extract the business insights from your data. Clearly, for most of us, there's too much data to analyze by hand. Rather we need help. That's where machines come in. This course deals with data modeling and using machines to gather insights from data. After your data is acquired, cleansed, and ready for use, and explored, and understood it at a general level, it's time to use statistics to draw actionable insight. This module will discuss that general process, the menu of options available for analyzing data, and general similarities that models have. Machine learning has many definitions. But generally refers to solving a problem by gathering data and then using a machine to follow an algorithm that builds a statistical model based upon the data to gain actionable information from the data. I put it in another way. A machine learning algorithm is like a program with instructions. When you apply the algorithm to data, it creates a model. The model has new data along with instructions for how to make predictions with the data. The learning part of machine learning refers to the fact that the algorithm can take the data and use the statistics and parameters that you give it and arrive at information without being explicitly programmed to arrive at that information every step of the way. Thus, the computer takes the general directions you give it and finds information that you did not explicitly tell it to find. Before we move forward, it's useful to discuss some additional terminology. Let's first look at data science versus data analytics, versus business analytics. Unfortunately, for those of you who like clear definitions and distinct differentiation between terms, you'll be disappointed to learn that these terms are often used interchangeably. Most would consider data science as the broadest term and both data analytics and business analytics to be subsets of data science. Data science comprises methods and procedures to extract knowledge and insights from data with the aid of computers. Data analytics is similar, but a subset of data science that is more focused on describing and visualizing data for specific objectives. Business analytics is a subset of data analytics that focuses data and questions related to business and involves relatively few statistics. Next, let's define and distinguish machine learning, artificial intelligence or AI, data mining, and deep learning, etc. What do each of these terms mean and how are they different? Well, all of these are disciplines in Data Science. Artificial intelligence is the broadest term. It refers to the ability of machines to perform intelligent and cognitive tasks. Thus, AI simulates thinking. Machine learning defined earlier, and data mining are both subsets of AI. Data mining is a cousin, if you will, to machine learning. It analyzes inputs to detect outputs, but relies on direct human intuition rather than self-learning. Finally, deep learning is a direct subset of machine learning that is characterized by using progressive layers of learning. That is, while many basic or shallow machine learning models derive output directly from their input variables, deep learning produces output based upon prior levels and prior layers of the model. Next, it's useful to understand where this relatively new field of learning, data analytics and machine-learning came from. Data analytics is a confluence of many different disciplines, including computer science, mathematics, statistics, information technology, operations management, and business analytics. Thus, a data scientist and a data oriented business analyst is a relatively new creation. But perhaps at this point, it might be time to step back a little bit and ask, why we care about machine learning at all? How does machine learning help us in business? One way to think machine learning is just to relate it to the ways machines have helped businesses throughout history. Certainly, no one would doubt the usefulness to business of the steam engine or the telegraph. Like these early machines, computers and business learning process data at phenomenal rates of speed and draw connections, correlations, and insights that the analysts could get to know other way. Of course, don't be misled by the many movies predicting machines are taking over. Machine learning helps the analyst derive useful and actionable insights from the data. But it is still up to the analyst to act on these insights at least initially. Thus, machine learning is like a great blood hound. It helps you gain insight, but must be directed cautiously, or else it will run off chasing a squirrel. Finally, let's conclude by listing some questions that machine learning can answer. Here are some categories of problems that machine learning can help with and specific examples of problems from each category. First, numerical prediction as seen in regression. What do we expect, for example, costs to be next quarter in a segment? Next, cause and effect relationships between variables. What what factors matter? Will this work? For example, which factors have the biggest effect on shipping delay? Or does a particular change in our website improve click-through traffic? Or does this layout of our store improve sales? Next, would be classification examples. What category does this observation belong in? For example, is this transaction fraudulent? Or what factors increase the likelihood of a purchase? Another category would be grouping things together. For example, which group does this belong to? Which customers should receive which type of targeted promotions? There's many others.