Now it's time for the exciting topic. How to do machine learning in BigQuery. Before we go into the syntax of model-building with SQL, let's discuss very quickly how BigQuery ML came about. As you saw from the earlier ML timeline, doing machine learning has been around for a while, but the typical barriers have been, Number 1 doing ML in small datasets in Excel or Sheets and iterating back and forth between new BigQuery exports, or if you're fortunate enough to have a data science team at your organization. Number 2 is building time-intensive TensorFlow or psychic learner models using expert's time. Still, you're just using a sample of data so the data scientists can train and evaluate the model locally on the machine if they're not using the Cloud. Google saw these two critical barriers, getting data scientists and moving data in and out of BigQuery as an opportunity to bring machine learning right into the hands of analysts like you who are already really familiar with manipulating and preprocessing data. You soon will be by the end of this specialization. Here we go. Let's talk about how you can now do machine learning inside of BigQuery using just SQL. With BigQuery ML, you can use SQL for machine learning. That's to repeat that point, SQL. No Java or Python code needed. Just basic SQL to invoke powerful ML models right where data already lives inside of BigQuery. Lastly, the BigQuery team has hidden a lot of the model knobs like hyperparameter tuning or common ML practitioner tasks like manual one-hot encoding of categorical features from you. Now those options are there if you want to look under the hood, but for simplicity, the models are run just fine with minimal SQL code. Here's an example that you'll become very familiar with in your next lab. You notice anything strange about the number of GCP products used to do ML here? You got it. It's all done right within BigQuery. Data ingestion, preprocessing with SQL, model training, model evaluation, the predictions from your model, and the output into reporting tables for our visualization. You mentioned before, BigQuery ML was designed with simplicity in mind. But if you already know a bit about ML, you can tune in and adjust your model's hyperparameters like regularization, the dataset splitting method, and even the learning rates for the model's options. We'll take a look at how to do that in just a minute. What do I get out of the box? First, BigQuery ML runs on standard SQL and you can use normal SQL syntax like UDFs, subqueries, and joins to create your training datasets. For model types, currently, you can choose either from a linear regression model for forecasting or binary logistic regression for classification. As part of your model evaluation, you'll get access to fields like the ROC curve as well as accuracy, precision, and recall that can simply select from after your SQL model is trained. If you'd like, you can actually inspect the weights of the model and perform feature distribution analysis. Much like normal visualizations using BigQuery tables and views, you also can connect your favorite BI platform like Data Studio or Looker and visualize your model's performance and its predictions. Now the entire process is going to look like this. First and foremost, we need to bring our data into BigQuery if it isn't there already. That's the ETL. Here again, you can enrich your existing data warehouse with other data sources that you ingest and join together using simple SQL joins. Next is the feature selection and preprocessing step, which is very similar to what you've been exploring so far as part of this specialization. Here's where you get to put all of your good SQL skills to the test in creating a great training dataset for your model to learn from. After that, here it is. This is the actual SQL syntax for creating a model inside of BigQuery. It's short enough that I could fit on all within this one box of code. You simply say CREATE MODEL and give it a name, specify mandatory options for the model like the model_type, pass in your SQL query with the training dataset, hit "Run Query", and watch your model run. After your model is trained, you'll see as a new dataset object. It'll be there inside of your BigQuery dataset. It'll look like a table, but it'll perform a little bit differently because you can do cool things like executed ML.EVALUATE query and that reserves syntactical to allow you to evaluate the performance of your train model against your evaluation dataset because remember that you want to train on a different dataset that you want to evaluate on. Here you can analyze loss metrics that'll be given to you like the Root Mean Squared Error for forecasting models, and area under the curve, accuracy, precision, and recall for classification models like the one that you see here. Once you're happy with your model's performance, and, again, you can iterate and train multiple models and see which one performs the best, you can then predict with it in this even shorter query that you see here. Just invoke ML.PREDICT and that command on your newly trained model will give you back predictions as well as the model's confidence in those predictions super useful for classification. You'll notice a new field and the results when you run this query, where you'll see your label field with the word predicted added to the field name, which is simply just your model's prediction for that label. It's that easy. But before we dive into your first lab, now that you've seen with just these lines of code that you see here and how easy it is to create a model, that doesn't mean that it's going to create a great model. A model is only as good as the data that you feed into it for it to learn the relationship between your features and the label. That's why you're going to spend most of your time exploring, selecting, and engineering good features so that we can give our model the best possible dataset for it to work and learn from.