Education policies that affect millions of students have long been tied to test scores, but a new paper suggests those scores are regularly misinterpreted.
According to the new research out of Mathematica, a statistical research group, the comparisons sometimes used to judge school performance are more indicative of demographic change than actual learning.
For example: Last week's release of National Assessment of Educational Progress scores led to much finger-pointing about what's working and what isn't in education reform. But according to Mathematica, policy assessments based on raw test data is extremely misleading -- especially because year-to-year comparisons measure different groups of students.
"Every time the NAEP results come out, you see a whole slew of headlines that make you slap your forehead," said Steven Glazerman, an author of the paper and a senior fellow at Mathematica. "You draw all the wrong conclusions over whether some school or district was effective or ineffective based on comparisons that can't be indicators of those changes."
"We had a lot of big changes in DC in 2007," Glazerman continued. "People are trying to render judgments of Michelle Rhee based on the NAEP. That's comparing people who are in the eighth grade in 2010 vs. kids who were in the eighth grade a few years ago. The argument is that this tells you nothing about whether the DC Public Schools were more or less effective. It tells you about the demographic."
Those faulty comparisons, Glazerman said, were obvious to him back in 2001, when he originally wrote the paper. But Glazerman shelved it then because he thought the upcoming implementation of the federal No Child Left Behind act would make it obsolete.
That expectation turned out to be wrong. NCLB, the country's sweeping education law which has been up for authorization since 2007, mandated regular standardized testing in reading and math and punished schools based on those scores. As Glazerman and his coauthor Liz Potamites wrote, severe and correctable errors in the measurement of student performance are often used to make critical education policy decisions associated with the law.
"It made me realize somebody still needs to make these arguments against successive cohort indicators," Glazerman said, referring to the measurement of growth derived from changes in score averages or proficiency rates in the same grade over time. "That's what brought this about." So he picked up the paper again.
NCLB requires states to report on school status through a method known as "Adequate Yearly Progress." It is widely acknowledged that AYP is so ill-defined that it has depicted an overly broad swath of schools as "failing," making it difficult for states to distinguish truly underperforming schools. Glazerman's paper argues NCLB's methods for targeting failing schools are prone to error.
"Don't compare this year's fifth graders with last year's," Glazerman said. "Don't use the NAEP to measure short-term impacts of policies or schools."
The errors primarily stem from looking at the percentage of students proficient in a given subject from one year to the next -- but it measures different groups of students from year to year, leading to false impressions of growth or loss.
Hat tip to the Commish.