These ordered pairs are more like a cloud than a line. They would result in a useless forecast of individual occupancy rates for each nightly rent. It can be shown that this linear model is not significantly better, it's less than 2% more accurate, than forecasting, so called forecasting, an occupancy rate of 45.6% for every single one of the 244 short term rental properties. I chose 45.6%, because it is just the mean or average occupancy rate. The base rate occupancy for the 244 properties. Whenever your individual forecasts are no better than forecasting the base rate, you have zero information gain. Your model is not reducing uncertainty at all. So using this model is useless in trying to figure out what the best nightly rents are to charge to maximize revenues. One potential solution is called normalization. Normalization in general means changing the x or y axis coordinates or both in a scatter plot in a manner that reveals the underlying pattern in otherwise chaotic data. Normalization is often necessary in modeling. We will show you a way to change the x axis values of these data that preserves and enhances their information content. The dollars to percentile spreadsheet and accompanying hand out explains the arithmetic behind the normalization process we will use here. But I would first like to explain the logic behind this particular normalization. In hindsight it is really not surprising that using the raw nightly rents in dollars didn't work well as the basis for generating a linear model of occupancy rates. As we saw in course one, if a property is expensive relative to it's competition, it will tend to have a lower occupancy rate. If it is less expensive, it will tend to have a higher occupancy rate. But in the raw data with 244 properties, the non-normalized x axis does not correspond yet to a measure of relative price when compared to comparable or competitive properties. For example at $200 per night, a one bedroom apartment in Greensboro, North Carolina might be relatively expensive for its property type and location. While a two bedroom apartment in Manhattan in New York city at $200 per night might be relatively inexpensive. But both would appear at the same x axis point. So what is the correct y axis occupancy rate to associate with $200? A low occupancy rate, due to the high rent Greensboro property, or a high occupancy rate, due to the low rent New York City property? So, in this case, it is worth trying to normalize the nightly dollar amounts by calculating them. Instead of in raw dollar terms, in terms of how inexpensive, or expensive each is, in relation to other rents for comprable properties. The same type of property in the same zip code. We will normalize in terms of percentile rent. Where does each property fall on a scale from a very low rent, tenth percentile to a very high rent, 90th percentile. This is why we now need the tenth percentile and 90th percentile rent fields for each of the 244 property type and zip code combinations from the database. Of course. Normalized data are not guaranteed to produce any better linear association than non-normalized data. But we would like to explore the normalization method and see what happens. The spreadsheet and accompanying handout show the arithmetic steps involved. You should practice with the practice spreadsheet. Make sure you understand the arithmetic in the hand out and then program your own Excel spreadsheet to convert efficiently all 244 nightly rental dollar values to relative percentile values.