On Your Mark: Challenging the Conventions of Grading and Reporting by Thomas R. Guskey

A 100 point grade scale gives the false impression of accuracy. In most cases, it offers 65 levels of failure, which is almost twice the number of passing levels (35). To an extent, this scale is popular due to the fact that the technical people who design grading software like it. On a 20 item test, for example, the standard error of measurement is about two items in each direction. This is a range of 20 points so a student’s real grade could be anywhere between 75 and 95 or two letter grades. With more levels, more students are likely to be misclassified. There are also problems with the impact that a very low score can have due to the common practice of grade averaging. Richard suggests a four or five level scale such as below basic, basic, proficient, and advanced.

Here Guskey takes on grading systems that involve using plus and minus along with letter grades. Doing so takes you from a system with five levels to one with twelve assuming that you don’t use A+. He does a fine job of explaining why as the number of grade categories increase, the chance of two equally competent judges assigning the same grade to the same sample of student work diminishes significantly. With a pass-fail system, the odds of the correct assessment is as high as it gets, since for the four possible outcomes only two are incorrect. (giving a P to an F student or visa versa) For a five category example, there are five ways to get it right and 20 ways to get it wrong. The leap to 12 gives you only 12 chances out of 144 to get it right.
If you ask someone to define what an A, B, C, D, and F thinker looks like in their course, they can probably do it. If you ask them to describe twelve levels of performance, the probably can’t. This is why written rubrics seldom contain more than four levels of performance. With more cutoff points, the more likely a given effort falls near the cutoff and runs the risk of being incorrect. The irony is that people view systems with more possible grades as being more accurate when in fact they are less accurate.

Many schools limit the number of top grades teachers can give and in essence force grading on a curve. The teacher’s job then is one of sorting students rather than getting all students to meet the course standards. The bell shaped curve or normal distribution is often used to assign grades. This is wrong as the normal distribution is what you get when things happen randomly in nature. It’s not what you expect to get with any type of targeted intervention, such as adding fertilizer to your fields. Teaching is an intervention that should result in anything but a normal distribution, and if it does, then it wasn’t very effective. Efforts to grade on the curve or limit top grades are often defended as a way to avoid grade inflation. They also discourage collaboration and promote competition. At worst, they lead to students actively sabotaging the efforts of their classmates. Rather than serve as a sorting mechanism (normative), grades should reflect the degree to which students have reached the corse standards (criterion). Teachers then should clarify their standards and base their grading criteria on those standards.