Over the last several videos, we've looked at how do we write great survey questions. What if I told you, you didn't actually have to write your own survey questions. It's really common practice in survey research to use validated reliable questions that other people have created and used in other circumstances. So, in this video, what we're going to talk about are, what are some of these common survey questions that you see in user experience research, and what is the importance of having validity, reliability and scales in creating banks of questions in your surveys. This is probably an image you've seen back when you were studying science. Validity and reliability are two really important concepts in research more broadly and in user experience research. If we imagine the center of this target to be the thing we're interested in the concept we've been trying to measure in our survey, then how we actually measure that concept can be valid and reliable both or neither. So, validity refers to how accurately are we recording or are we hitting that target of what that concept actually is. Reliability is a measurement of how consistently do we record that concept across people and across time. We've talked a lot about in this module about how we want to have consistent survey questions that anybody can answer, this is an important goal for having survey research questions. Validity and reliability, are ways that we talk about having questions of this type. The problem becomes is that for any given individual, one question when you ask it is not really very valid and reliable. In academic survey research, we would never ask just one question as an outcome variable. We would ask a series of questions and that's partially because, when we've seen in the past people differ in how they answer a question from time one to time two, it could depend on their mood, they might answer differently after lunch or before lunch, all of this creates error that we want to basically reduce when we're doing our survey measurements. Scales are a set of questions that are all trying to answer the same concept but asked in slightly different ways. What we tend to do is, we ask these questions, a series of questions, and then we collapse them when we're doing analysis into one much more standardly reliable measurement of the concept that we're interested in. So, here's an example from work that we did with you. As a quicker side, you'll see that the format for this looks very different than some of the other questions that we've seen. This format is actually structured for phone interviews. This is the set of instructions that a survey researcher who's a phone interview would use to ask these questions consistently at people who were calling up to participate in our survey. This particular set of questions is about how expert people are and in the Internet. That seems like a pretty easy question at the face of it. How good are you at interneting? That's a really hard question, however, to ask in a way that people will understand in a very systematic fashion. We've talked about how in the past people have a hard time all perceiving the question in the same way. This is a great example of that. If I ask you, how good were you at the Internet? Your conception of what that would mean would be very different than somebody else that I might ask. So, what Dr. Eszter Hargitai came up with was a really nicely validated scale that gets at that. This is a scale that's been validated and found reliable. What do we mean by that? We mean that Dr. Hargitai went through a ton of work looking at different terms and comparing the responses she got with other measurements of Internet expertise. She found that these terms Advanced Search, PDF et cetera, were the most reliable and valid terms to assess people's familiarity with, and that assessing this concept up familiarity, really did correlate really nicely with other Internet measurements that we were taking. So, this scale has been used many, many times by academic researchers and is a great, valid, reliable way of asking people how good they are at the Internet. Here's another example of a scale that we've used, again from the same surveys so it still has these interviewer instructions attached to us. This is from research I've done with Dr. Nicole Ellison and we're looking at how do people actually interact with each other in social media sites. You can see here we have asked five questions, and what we would do is we would take a Likert scale measurement, ordinal closed-ended question and each of these five questions, and then after we have all of our responses collected, we would collapse this into one measurement. Now what this does is it takes any error that's introduced by individuals or by people understanding the question in different ways, and collapses that into one measurement. So, the error is actually distributed in a much smoother way, and we would get more stable better responses to our survey questions. Now, why doesn't everybody use scales, why wouldn't we use scales for everything? Was as can be seen as fairly obvious here, you have to ask five questions instead of just one. When cost is related to the total number of questions you can ask, which it almost always is, then asking more questions to conduct a scale can outweigh the benefits of having a cheaper survey. In this case, for instance, because we were under such a budget constraint and every question had to be read aloud, this is one of the very few scales we could include in the survey and we did have to depend more than we normally would and single-item measurements. But in general, scales are a great way not only to get really reliable measurements, but they're wonderful because somebody else already came up with them. Because scales have to be found valid and reliable, there's often a lot of work that goes into them, validating them making sure that they actually measure what we want to measure, making sure that they measure the same thing across time and with different populations, so there's a lot of great scales out there. The good news for you is that instead of having to come up with your own items for everything you want to measure, you can actually borrow scales that others have validated and found reliable. That's a wonderful way to write survey questions that help you to actually get at some of the things that you're interested in. Now, in user experience research, there have actually been a ton of scales that are commonly used, or sometimes they're single item measures. But these are ways that you can borrow what others have done and find questions that have been time-tested to measure concepts that are important to UX researchers. A good example, and this is a single item measure not an entire scale, is the Net Promoter Score. Now, we saw this earlier and we saw different types of questions that you could ask in a lot of the survey software packages that we use. Net Promoter Score is very commonly used, and it basically it uses a Likert scale, but in this case an 11-point Likert scale from zero to 10. On a scale from zero to 10, how likely are you to recommend this website to a friend or colleague? The way you analyze the Net Promoter Score is that you then look at those 11 points and you break people into three categories; detractors, passives and promoters. This is appealing due to its simplicity. Everybody at this point understands the Net Promoter Score, it's been asked to respondents so commonly that respondents understand it pretty well. People who are using Net Promoter Score for their own research understand it. It's a really great way to do a single item measure, and it's partially because it's also comparable across other platforms. You could compare a Net Promoter Score from one product to another, which would usually be hard to do with different questions and you get relatively the same set of responses and that allows coordinates comparability. There's lots of issues with Net Promoter Score that you should be aware of. You don't know a lot of why answers when you're asking that Net Promoter Score. You have to unpack this quite a bit to really get into why people are promoters? Why are they detractors? This is a nice thing to connect to other UX measurements when you're doing surveys. In older measurement that's still very commonly used is the System Usability Scale or SUS. Now, I think I've made clear in other videos how I feel about matrix questions, and this not only combines the matrix question but also a set of strongly agreed or strongly disagree ordinal response categories. So, there are some issues with this scale. But it's so time-tested. It has been used for such a long time. Pre-web for instance. That it's great because it's been considered to be very valid, very reliable over those many years. So I think I would like to use this system frequently. I found the system unnecessarily complex. You can use these measurements to find a set of impressions of a system in a way that's been tested over multiple types of systems, which is great. Now, it's not a diagnostic as you get a better view of these questions and look over them. You can see that it is a set of impressions or opinions about the consistency, effectiveness, efficiency of a system, and that's not going again tell you why anybody feels that way. It is however valid and reliable, and a great way to get connected to a large scale if you're able to ask those many questions. If you're not able to ask a lot of questions, another really common UX question used is the Single Ease Question or SEQ. This is overall how difficult or easy did you find this task with a range of very difficult to very easy. Now, this performs as well as many more complicated measures of difficulty. While the SUS offers a lot of stability, it's a lot of questions. SEQ isn't actually that much worse than SUS in terms of measuring true behaviors. Research shows that it has about a 50 percent correlation with behavioral measures of ease of use. Now, that's pros and cons, right? Fifty percent is actually a pretty good correlation rating. But that does mean that you're missing a whole bunch of stuff. So you really want to use SEQ in combination with other types of measurements. SEQ is also most productive if it immediately follows a task. You don't want to let a lot of time go between a task being accomplished by a respondent and then the SEQ question popping up to ask them about how that went. Another common set of questions that becomes the scale is from the Technology Acceptance Model or TAM. As you can see, this is an older model developed in the 1980s. It comes a lot out of the business school tradition. It uses seven-point Likert scale responses. Agree to disagree again. I told you that was a very common response set, and it asks basically about how much would you like a technology if it were presented to you? This is a measurement that was developed by people who are interested in how would people respond to a technology if it were in place. So you can imagine a new scheduling software using Cliff's new scheduling software in my job would enable me to accomplish tasks more quickly, and that's really valuable information to know, and that's great. TAM has been found to be relatively valid and incredibly reliable. The problem becomes is that people are super bad at guessing their future actions and preferences. A lot of things happen between, oh yeah, I would like Cliff's scheduling software, and then when it finally shows up, we find that in fact people don't adopt it for many varied reasons. So this disconnect between what people think they want to do at time one versus what they actually do at time two can be a killer. The way to think about using TAM most effectively for UX research is to think about it as a benchmark. You could take a TAM measurement at time one, take it a time two, take it at time three, and you're able to see as you redesign and iterate over design, how people's TAM scores change, and that dynamism is a pretty good set of information to have. One of my favorite newer scales that comes from the user experience world is the Standardized User Experience Percentile Rank Questionnaire, which is thankfully often abbreviated as SUPR-Q. This was developed by a guy named Jeff Sauro. Jeff is wonderful. He's a statistician who works in user experience research. Has done a lot of investigation of different scales and survey questions. A really good person to follow if you're interested in learning more about surveys and user experience. He did a great deal of work validating the SUPR-Q scale, and his intention was to create a really easy to use, really parsimonious one-stop-shop of constructs that we often care about in user experience, in particular, usability, trust, loyalty, and appearance. So I put a link to the journal article where he actually reports out how he validated this study. You can get access to this, and it's a great way to see how this work happens, but also to use SUPR-Q in your own work. If you go to Jeff's site, you can actually pay him to get access to his database of SUPR-Q scores, so you can for instance take a SUPR-Q score of your product, and compare that with multiple other products or websites, and that also gives you a really nice cross-sectional benchmark for the types of things that you're looking at. So these are the questions that exist in SUPR-Q. You can see the dimensions that he has, measure those different things that we care about in user experience. So the website is easy to use. Would refer to usability. I find the website to be attractive. Refers to appearance. These are all five-point Likert agree-disagree category scales, like we've seen so many other times, except for question five here, how likely are you to recommend the website to a friend or colleague? That looks familiar, right? That's really an adaptation of the Net Promoter scale. So he uses it as an 11-point scale. That's slightly different than some of the other questions. So that's just a few of the many scales that exist out in the world. There's scales for almost everything that you can imagine if you start doing some of the research. Those are the most common ones that are used in user experience research. Scales are great because they help us to reduce the burden and risk of writing original survey questions. If you're worried about writing good survey questions, don't depend on questions somebody else already wrote. There's lots of them already out there obviously. There are a bunch of validated UX related survey scales and single item measurements that have been tested over time, shown to hit that bullseye just like we want it to, and well, we looked at some common ones. There are many more out there being developed. So keep in touch with the literature. Try to take a look at what people are doing as they develop new scales, and how they're measuring some of these trickier concepts that we care about. In the next series of video, what we're going to do is talk more about the design of questions. How do you actually use visual design to create good questions in written survey questionnaires?