When collecting data, we must first identify the variables we will use in

properly quantifying the parameters we wish to monitor.

Within the realm of variables, we can have quantitative or qualitative types.

The main difference being the numerical, versus non-numerical.

Within the family of quantitive variables, we could have discrete and

continuous times.

Discrete variables can only hold a fixed and finite set of values.

Continuous variables can assume any variable to any precision within

a specified range.

A population is a collection of all items in a space.

A sample is a subset of the population.

From a cost, practicality, resource and

time frame perspective collecting a sample is far more advantageous,

as well as, suitable, adequate, and effective,

as long as the sample is random and representative of the population.

Descriptive statistics consist of methods for organizing and

summarizing information.

Often, we use descriptive statistics as a means of characterizing the data.

Any graphs, calculations, or measures reports are for

informational purposes only.

Inferential statistics calls for us to use the data and the description

statistics combined with other techniques to draw a conclusion about the data.

In research circles there are many types of studies available to us.

Within Six Sigma circles, the majority of our forays into the understanding and

analysis of the process will either be observational, or a designed experiment.

Clearly, a designed experiment bears far more information,

value, and useful data over a purely observational study.

Some points on measurement.

Just because we can measure something,

does not mean that the measurement is taken from a stable process.

Measurements should also be relevant to customer needs.

We measure for many reasons, early warnings, quantifying performance,

prioritization of improvement and setting goals.

The Data Collection Plans ensure measurement of critical dissatisfaction

metrics, identification of the right mechanism to perform the data collection,

outline how to analyze the data collected.

Defines how and who is responsible for collecting the data.

We should operationally define our measurements.

This will include a criteria, a test, and a decision.

An example of a non operationally defined measurement would be,

go out and count the number of red cars you see in the next hour.

Operational goals should be legitimate,have an official status,

customer focused, either external or internal.

Measurable, understandable, aligned and

integrated with higher levels, and equitable.

To encourage continual improvement,

some goals should be stretch goals, but also be achievable.

Effective quality measures can be tied back to the needs of the customer.

They also facilitate participation at all levels of the organization.

Certain measurements can provide leading and lagging indicators.

This can help us anticipate problems and

confirm that corrective actions are effective.

We seek to simplify data collection as much as possible.

We also should seek to reevaluate the accuracy, validity,

and usefulness of our data continuously.

Never be afraid to ask how and why, even when collecting data.

In order to quantify a measurement we must create a system of measurement

consisting of a unit of measure and an instrument.

Units of measure for product features are more difficult to create,

we may have to invent them.

Deming pointed out, it is a costly myth to assume that because you

cannot measure something, that you cannot manage or improve it.

Analysis of the data is equally as important as the means we will use to

collect it.

All methods must be properly grounded in the type and

behaviour of the data we collect.

A series of assumptions are typically required for

many of the common tests we use.

When sampling, consider the following.

How is the situation defined?

Who should be surveyed?

How many people will be contacted?

How will the sample be taken?

How will questions be worded?

Is there any ambiguity?

What is the frequency of the sampling?

Does the sampling represent all possible conditions?

Will it be representative?

And finally, what will be the sample size?

Representative samples mimic the behavior of the population.

How we sample can introduce bias into our process and

impact the significance of the result.

Random sampling is the desired type of sampling, but typically,

the hardest to obtain in real world applications.

The two most common types of random sampling are with and without replacement.

There are five common types of sampling.

This list could hardly be considered exhaustive, but it does highlight

many of the primary types of sampling seen within industry, marketing, and

organizations tasked with the collection and dissemination and analysis of data.

In the absence of random sampling, stratified

sampling is considered our least biased alternative sampling approach.

Systematic is heavily used in manufacturing settings.

Cluster sampling is typically found in public opinion surveys, or

other political action inquiries.

Determination of the sample size is based on the level of confidence we

seek to maintain.

Alpha is our acceptable risk of being wrong.

The confidence level is one minus the Alpha value.

All confidence intervals are two-tailed scenarios, so

we must divide the Alpha equally in each tail of the distribution.

Note that a dispersion variance and

a margin of error are required to make this determination.

The formula above is applicable when the estimator is a mean.

We have an analogous formula when the estimator is a proportion.

Sampling plans also require a clear set of instructions.

An SOP, or work instruction, is most beneficial.

Within this document we should identify who is collecting the data, where,

when, how and anything else that might be, if otherwise missing,

contribute towards bias and variation in the assessment method.