So now one of the key pitfalls that you could run into, and this is why we talk a lot about eliminating those duplicative records, is doing what we call a many-to-many JOIN. Now a many-to-many JOIN, for example, if we had not gone through the exercise for our stations and weather temperature readings exercise, what we could've done is joined on, not necessarily, a distinct or unique key. So you have five records on the right-hand side and you have five records on the left-hand side, and they all match each other. So how many output rows would actually be returned? Would it be zero, would it be five? Or could it be potentially way more than five? And the answer's way more than five, so you'd actually get what's called the cartesian product or 5 times 5. So you'd get 25 output rows, even though your input tables only had 5 rows. And if you can imagine this at scale, if you don't get that joining key correct, that's when you'll really just blow out all of the resources that you're using. Because having a million record table, joined even on like a 10,000 record table, you could just be exploiting by many, many, many factors of 10. And get a dataset that's outputting way, way, way more rows than what you're expecting. So the end results is a situation you want to avoid. And the way you can avoid it is the third bullet point here, is knowing the relationship between the tables in your data, before you do that JOIN. So let's walk through an example. Here, this is an IRS example, where we, instead of joining on the unique identifier, which is the EIN, we joined on the tax period. And as many common tax period values that could be shared across many of those EINs. So let's work through what that actually looks like inceptually. So, here we have our two different tables. On the left-hand side, we have the 2015 filings and the right-hand side is the organization or charity details table, which just has a tax period and when they last filed. So instead of joining in on EIN, you would normally expect your EINs to matchup one for one. One EIN organizational details per one filing. But instead, if you incorrectly put the tax period as your joining key, what you've done here is basically said, all right. Well, this tax period and this organizational details table on the right matches five records there on the left, so that's great. What we're going to be returning here is just this ein number, the 345352, as an additional column for all of those five. So it's not that bad, and we just have one record here. This query doesn't make too much sense joining on tax period. But what happens if we join on another EIN, that also has the same tax period of December 2014? Now, you see where it's starting to multiply, in a very bad way, the amount of output rows that we have. So this one also matches those five records. And what this looks like in your resulting dataset is again the cross-product or what's called an unintentional cross-join of your data. And again, this is a very small example where you just have two rows cross-joining against five. And you get the result there. But you can imagine if you have this on a million-row dataset. You could be potentially just have even a billion or a trillion rows accidentally outputted. And this is generally where BigQuery will run for more than 60 seconds or a couple of minutes. And then you realize, wait a minute. There's something that's going terribly wrong with our queries. And again, the way to cure this is really understand what are the unique identifying fields in your dataset and what's the relationship between all of your different data tables. So you could have 1:1. For example, you could have one charity that corresponds to one tax year of filing for 2015. Or you could have many to one, that's what the N represents. Many to one could potentially represent if you had multiple years of tax filings, you could have many tax filings per one organization or you could have the opposite one to many. Or you could have that many to many scenario, where you might need to create it through concatenation or another method. A unique identifier key that you have. And this is why the very first thing that we taught in the first course here is practicing counts and count distinct to see which fields in your dataset are the actual unique identifiers for those rows. Understanding when and how to use Joins and UNIONS in SQL is a concept that's easy to pick up, but honestly takes a while to truly master. The best advice I can give you when starting out is to really understand how your data tables are supposed to be related to each other, like customers to orders, supplier to inventory. But being able to back that up by verifying those relationships through SQL. Remember, all data's dirty data. [LAUGH] And it's your job to investigate it and interrogate it before it potentially pollutes your larger dataset with Joins and UNIONS. And once you understand the relationships between your tables, use UNIONS to append records into a consolidated table and Join to enrich your data with other data sources. Let's practice these concepts and pitfalls in our next lab.