The Myths of Standardized Tests: Why They Don't Tell You What You Think They Do - Page 2 of 4

The Myths of Standardized Tests: Why They Don’t Tell You What You Think They Do

The information the public needs to have about its schools is far broader and more complex than any set of test scores. Policies are about evaluation not about accountability. This book is devoted to demonstrating the shortcomings of the measuring stick of standardized testing. In order for a test to provide accurate evidence to judge the caliber of instruction, it must be able to distinguish between students who have been well taught and students who have not. Suitable tests must be instructionally sensitive. Tests being used are not. They rely on items closely linked to socioeconomic status and inherited aptitudes to spread out the scores. As such they tend to measure what students bring to school, rather than what they are taught once they get there.

Standardized tests depend on the idea that a small portion of a student’s behavior fairly represents the whole range of possible behavior. To draw valid inferences from a test, it must cover more than a small part of the content domain. If you have a large number of students, overall outcomes on different versions of a test should even out. But any one student can go from proficient on one version to needing remediation on another. Items are chosen to spread out the scores of the test takers. Time limits spread out scores even more. Life isn’t like a quiz show. As students can only sit and focus for so long, the tests can only deal with a small fraction of the domain. Some topics have to be skipped. Timed tests also produce test anxiety. Beware of timed tests.

Measurement of student achievement is too complex for current social science methods. We still use them to make decisions that impact students and schools. They don’t measure goals schools pursue like creativity, critical thinking, motivation, persistence, empathy, leadership, courage, compassion, honesty, and curiosity. Students with high scores may be shallow thinkers. If the only tool you have is a hammer, everything looks like a nail. They were created to measure the performance of populations not schools, teachers, or individuals. Important items that all students should know are left out at the expense of items that half of the students will miss. There are too many standards to use criterion-referenced tests.

By objective we mean based on observable phenomena and not influenced by emotion or personal prejudice. A look at how the tests are made, however, reveals a good deal of subjective human judgement from the people who write, edit, and assemble the test items. Setting the standards also requires judgement that occurs in a political and social context. Achievement levels are, first and foremost, policy statements. The only truly objective part is the scoring. Test scores are not clean, crisp numbers but fuzzy ranges that extend above and below the score, and 5% of the time the true score isn’t even in the fuzzy range. Since numbers confer a sense of fairness and impartiality, they are used to justify decisions that impact students’ lives.