[MUSIC] Welcome to week five of this course on assessment. This week we're focusing on human judgment as the basis of scoring student work. Before we look at the details of how we go about that, I would first like us to look at the issues around making errors when we judge things. Humans are notoriously bad judges, and that's what this week will try to raise your awareness of, so that you can be a better judge. In terms of our curriculum map, we're looking this week at informal, student-centered approaches to assessment that require human judgments. We use these types of assessments, instead of just objectively scored assessments because sometimes we want students to construct, or write, or create, or perform their own answers. This is an important curriculum objective. For every response that a student carries out, there needs to be rules for how should it be scored, interpreted, or evaluated. And we call those rules, in general, "rubrics", and next week's session is going to focus especially on rubrics. And we need to carefully consider in the process of making decisions about the quality of work, whether another competent judge would make a similar evaluation to ourselves. If another teacher walks by, sees the same work, and comes to a similar judgement, then we have reliable scoring. And that's really important when there are consequences attached to student work. And we have to be sure that the rules that we use to judge student work are appropriate to the task being evaluated. So, in terms of judging, we have to be worried about reliability and validity. Judgement is the process in which experts, that's you the teacher, decide "How good is this work? Where does it lie on a progression from total beginner to advanced expert?" And progress is always from none or very little up to very high levels. In New Zealand, we use for our high school qualifications, words like "not achieved", "achieved", "merit", and "excellence". We could use norms for children of a certain age group - "below average", "at the average", "above average". We could describe it like the New Zealand curriculum in terms of below, at, or above level 4 or 5, and so on. Even university grades are a kind of progression from a F, fail to an A+ or A* represents progression in quality. And progress in good assessment is indexed to curriculum objectives and their order of difficulty. A curriculum statement tells us these are the things that we teach early because they're necessary and perhaps easier. And these are the things we teach later. So, in terms of establishing our judgments, it's most powerful if those judgments are indexed to curriculum difficulty and curriculum order. So, we need tools that help us make those kinds of judgements. And the reason we need tools is, fundamentally, humans are not very accurate judges. Even experts in a field disagree with each other. Certainly the research evidence shows us that scoring essays or performances by university lecturers is notoriously inconsistent. In fact, studies have shown that if you give a lecturer the same set of essays, very few of them will get the same grade or even close to each other. And markers, that's you and me, have a huge impact on the accuracy of scores, and therefore has serious implications for any decisions we might make. Take for example Olympic scoring. There are sports like diving and figure skating in the Olympics, where panels of judges make very rapid evaluations of the quality of a performance. And on the screen, you can see a table of scores for two athletes and six judges. And what you can notice is that the scores are pretty close to each other. They range between eight point eight and nine point three. Out of a ten point scale, they're really close. But actually, very few judges give the same score to the same athlete. And in fact, you can see the two highlighted in green are the same score but they are different athletes. The two in red are low scores, but they're for different athletes. So, none of the judges agreed, but they were close to each other. And now this judgement happened by elite markers, on elite performances, that happened very quickly. This is hard work, so when we come to judging student learner work, it's going to be hard to be similar with each other, but it is an important ambition to aim towards. Consider the kind of questions that we might ask. A classic question we can ask is like the one on screen: "The President of the U.S. during the missile crisis was:", and the student is expected to fill in the blank. These run-on sentence completions are easy to create, but they leave some ambiguity as to what is the correct answer. You and I as teachers know, that the answer is supposed to be President Kennedy during the Cuban Missile Crisis in 1962, But, if the student writes "The President of the U.S. was worried." Is that answer wrong? If they say "busy playing golf," how is this answer wrong? We need rules to say what is a valid and appropriate answer that should be awarded and marked, and what answer should be ignored or treated as wrong? And those rules are a rubric. And so, in a simple, open-ended question you need a set of rules. And on screen, are some rules that I've made up for how this question could be scored. Unfortunately, I think it would be better if this were a question, not a run-on sentence. So, there's some rules, and we have a basis to avoid being arbitrary or capricious, which of course, is a serious concern from a student's point of view. "My teacher doesn't like me, therefore they marked me wrong." So, how can we reduce error in our judgement about student work? To create high quality judgements, we need rules. And that's what the next session will be about, how to create rules. We need, generally, two or more markers who work independently, but who compare and discuss any differences in their scoring, and we'll talk about moderation more in this week. We have to be carrying out our marking and our judgement, in an environment that's suitable. If we're angry, if we're hungry, if we're tired or we're distracted, we shouldn't be marking student work because it will come out in the scores we give. We should deal with all performances on the same topic in one setting. We should only compare apples with apples. We should try to finish the scoring of all of those apples in one sitting, and if we can't, because there's too many, or it's too late in the day, then we should mark some without any marks on the paper, put them aside, and come back to them tomorrow. If we can't mark them the same way we did yesterday, then we know there's a problem in our scoring. We have to check our own consistency. Human judgment is necessary in being a teacher. We have to do it. It's expected. Not everything that we value can be assessed objectively with multiple choice or sequence and sorting and so on. We have to use our expertise to judge the quality of student work. But we have to be humble about this. We have to know that we're going to make mistakes. We're going to disagree with other judges. And that we have to accept that our goal is not to be error free, but to admit our error and talk about it with other judges, and overcome and reduce that error before we give any marks back to our students, or before we use those marks in reporting. Later this week, we're going to get more advice on how to guide judgments. Our next session is going to be on rubrics. [MUSIC]