The stem-and-leaf approach is a little different but
also illustrates more information about the dataset.
The procedure is to build one as follows.
For this data set given on the left,
let's split up the numbers into stems and leaves.
For example, for the first number 70 that we split up
into a seven for the stem and a zero for the leaf.
So to build the table to the right,
take the first number and list them in the stems column with no duplicates.
Then go back to the dataset and take the second half of the number
and place them under the leaves column duplicates are welcome here.
The next step is to rearrange the leaves from min to max order from left to right.
This will reveal even more information.
If you recall from an earlier session,
we needed to indicate the mode of the dataset.
The mode is the greatest occurrence of the same number within the dataset.
We see several candidates such as 51,
55, 64, and 99 occurring twice.
However, 70 occurs three times making it the mode of the dataset.
Here you have the completed stem and leaf plot.
The box-and-whisker, or commonly called the boxplot,
is simply a box on a number line with
vertical lines indicating certain types of information about the dataset.
In the next several slides,
we will calculate those vertical lines.
We have some work to do.
We need to calculate a few numbers before we can produce the boxplot.
The dataset on the left contains 20 numbers ranging from 5 to 66.
We will use these numbers to get the minimum,
the first, second, and third quartiles,
the inter-quartile range or IQR,
the lower limit LL,
the upper limit UL, and the maximum.
Let's begin by arranging the numbers from lowest to highest.
For this example, this is already done.
We will start with Q2 because Q2 is
actually the median which we learned about in an earlier course.
The median is the exact center of the ordered data.
Here, since we have an even number of data 20
the exact center is between the 10th and the 11th number.
So we must split the difference.
The two numbers are 30 and one and 30.
So we take the difference between those,
divide it by two,
and then add it back to the lower number.
So we get 31 minus 30 is one divide it by two is one half.
Then we add that to the 30 making it 30.5.
For the first quartile or Q1,
this is the exact middle of the first half of the data that lies between 21 and 25.
So we take half of the difference and add it back to the lower number.
So we get 23 as our Q1.
Q3 is the exact middle of the second half of the data that lies between 35 and 38.
So we take the difference divide it by two then add it back to the lower number.
So we get 36.5.
Now that we have Q1, Q2,
Q3 let's move on.
We calculate IQR, the lower limit,
and the upper limit to identify any outliers in our data.
Outliers can often distort our data and could be a special case of
data collection that normally does not occur or perhaps the data was taken in error.
Collecting the limits helps us
identify any outlier data so we can decide how to handle it.
The IQR or the inter-quartile range is
calculated as simply the difference between Q1 and Q3,
in this case 13.5.
The formulas for the lower limit and upper limit are
similar except for the anchor point and the sign.
For the lower limit, we must take Q1 value and subtract 1.5 times the IQR to get 2.75.
Similarly, we do the same for the upper limit but use Q3 and then
add 1.5 times the IQR or 56.75.
We then compare these two limits to our dataset to see if any data falls
outside of the range between 2.75 and 56.75.
We find that 66 does fall outside of that range and it is labeled as a potential outlier.
The only two numbers left are the min and the max.
These numbers could change depending on our lower and upper level calculations.
The 2.75 lower limit is below the minimum of
five in the dataset so five is our minimum number.
The 56.75 calculated in the last slide is under our max of 66.
So we take the next highest number or 43 and that becomes our max.
The 66 now becomes our suspicious outlier number and is labeled in the graph accordingly.