Monday, April 25, 2016

“With a carefully selected data set, you can do amazing things with statistics “


William Mathis was a school superintendent in Vermont. Since retiring, he has become Managing director of the National Education Policy Center and a member of the Vermont Board of Education.

In this post, which he wrote for this blog, he deconstructs a recent study by prominent economists about school reform. The idea of projecting how many trillions might be saved if the schools adopted certain test-based reforms rang a bell.

I checked my copy of Reign of Error and found that Eric Hanushek had predicted in 2011 that if the U.S. replaced the lowest-performing teachers with average teachers, we would match the test scores of Canada and Finland and generate an additional $112 trillion in economic output over our lifetimes. (Eric A. Hanushek, “Valuing Teachers: How Much is a Good Teacher Worth?” Education Next (Summer 2011).

The following article under review says the gains produced by raising NAEP scores would generate “only” $76 trillion in new economic output. Not sure why the future gains dropped from $112 trillion to $76 trillion. The article reviewed here can be found online at educationnext.org and will appear in the summer 2016 issue of Education Next (http://educationnext.org/pays-improve-school-quality-student-achievement-economic-gain/).

The Cargo Cult Educational and Economic Reform Theory
By William J. Mathis

As U.S. Forces island-hopped across the Pacific during World War II, Melanesians noticed that the Yankees would land, immediately bull-doze huge landing strips, put up rows of lights and build a control tower. Great metal birds would then be attracted, land, and off-load tons of valuable cargo. 

Being quick learners and believing that if they built it, manna would come; the islanders dug landing strips out of the jungle, placed torches along the sides and built a bamboo tower to attract these birds. Thus was born a new version of the economic theory of the “cargo cult.”

With rigorous application of just such impeccable reasoning, Erik Hanushek, Jens Ruhose and Ludger Woessman have published their latest re-write, It Pays to Improve School Quality, as the feature story in the summer 2016 issue of Education Next

Retreaded several times since 2007, the basic rationale rests on the correlation between economic wealth and test scores. They conclude that if we invest in increasing eighth grade NAEP math scores, then $76 trillion in new money will descend from heaven, thereby quadrupling the GDP, and bestowing great blessings on society.

With a carefully selected data set, you can do amazing things with statistics.

Since the common school movement of the mid nineteenth century, we have known that investments in education provide great returns to society and the economy. Contemporary funding reformers have thus called for equality in investments in education. The inconsistency in this case is that a veteran opponent of adequate school funding is the lead proponent of the cargo cult education myth.

As attractive as the myth is, there are four major faults; (1) they over-simplify and misread the economic development literature, (2) they wrongly argue causation from correlation, (3) they incorporate fatal statistical errors in their analysis, and (4) they frustrate the reader with unexplained mystery methods.

Economic Development – It is puzzling to see economists interpret the economic development literature so narrowly. For instance, the World Economic Forum’s twelve pillars  of growth mentions education as part of only three of these pillars; early education, training, and post-graduate research. 

Unfortunately, NAEP math scores measure none of these relevant education pillars particularly well. Transportation, infrastructure, macroeconomic support, and other vital necessities for economic development are not even part of the equation. Presumably, the invisible hand of middle school NAEP math scores will provide the missing meta-flux which will parachute trillions of dollars onto a needy society.

It would be good to see the economic development theory that supports this overly constricted model, but the discerning reader will be disappointed. There is no review of the literature — even though there is an entire discipline devoted to these questions. But the reader does not have to rely simply on the World Economic Forum. 

The United States, with modest performance on international math scores (PISA), has won the distinction of being number one in the 2014 World Global Entrepreneurship and Development Index. And on another OECD designed assessment, the national Innovation Index, the U.S. was essentially tied for fourth place.

A more realistic and comprehensive model would surely include other relevant factors. Variables like the decline in carbon based extractive industries, the graying of the population, the effect of health care costs, and the reported oversupply of STEM-qualified job seekers might have greater economic relevance than how kids performed on an eighth-grade math test some years earlier. But broader issues, such as these, are not addressed.

Correlation and Causation – The authors strongly contend that the relationship between math scores and a stronger economy is causal (pp. 21-22). That is, high NAEP math test scores cause economic growth. This may seem a bold over-reach, but the authors flatly state “extensive analysis of the cross-country evidence has shown that a causal interpretation of the relationships is credible.” 

They garnish this statement with phrases such as the “strong relationship between test scores and economic growth” and advise “Any state political leader of vision would do well to make school quality a high priority.”  

States should make a “sustained commitment” and “large economic benefits should accrue.” The overt necessity is an (unspecified but major) investment in education and particularly in math education.

Strangely, they then justify making this commitment by citing “the very weak correlation between increased spending on schools and higher levels of student achievement.” The reader is then faced with making sense of the authors’ urging to invest more in test scores while simultaneously saying their overall investment mechanism doesn’t work. This self-contradiction seems lost on the authors.

Perhaps, they would re-purpose some unspecified amount of money from some unknown source. But this is never addressed and is left to conjecture.

The Fatal Flaw – With correlation, it matters a great deal what variables are entered in the analysis and how they are measured. For instance, if the number of predictors is limited to a small set of highly interrelated measures, then the importance of these variables will be inflated. 

This leads to uncertainty due to a wide list of omitted “third factor” explanations — which is particularly problematic in this paper. Likewise, when using the states as units of the analysis, the variance is collapsed for both test scores and for spending. 

Elementary statistics shows that this reduction in score intervals jacks up the correlation and exaggerates the resulting findings. This is how you generate trillions and trillions of theoretical dollars.

This leads to the question of whether you can claim causality when third or lurking factors are not examined and when the ecological correlation fallacy is in play. The obvious answer, of course, is that this causal claim cannot be supported.

Mystery Methods – The methods used to support their argument remain a mystery.  While the central rationale hinges on the “strong correlation” between test scores and economic growth, the reader will search in vain to find the simple yet key statistic, the correlation between wealth and math scores. In reviewing earlier work, David Berliner noted this same omission in the 2014 version but it has not been fixed.

The authors do say test scores account for 20-35% of the growth but, again, this is not explained nor are other factors (except migration) considered. The reader is simply told to behold the “strong relationship” as illustrated by a scatterplot (Table 4) which is about as symmetrical as a shotgun blast.

It must be noted that earlier work (2012, for example) by these authors was often characterized by far greater methodological detail and lengthier discussion of omitted variables, poor measures, units of analyses, etc. The latest version provides none of this detail and the differences in findings between the reports is simply unexplained.

The results are extrapolated to a lifetime (defined as 80 years of age) of earnings. Considering the volatility of the economy over long periods of time, and the notoriously weak track record of economic predictions, the arithmetic that gets us to $76 trillion puts a lot of weight on those middle school math scores. One gets the feeling that one is listening to stock broker speculations of pork belly futures instead of policy analysts.

******

Ultimately, the reader is left to puzzle over the purpose of the paper. If it is to encourage investments in education, the authors would find themselves faced by their own contradictions. 

If it is to encourage the use of test-based reforms, they would have to overlook three decades of test-based reform which have produced no convincing narrowing of the achievement gap. 

If they wish to demonstrate that money matters, they inadvertently succeed. Yet, they don’t explain how this mechanism would work. When this current paper is considered as part of the ten year set, the reader is left with the impression of statistical exotica run amok displayed more for the appearance of intellectual elegance or as a numerological fantasy rather than as a disciplined or useful exploration of either education or economics.

Finally, assuming we built the runway and lit the torches, would the $76 trillion of manna descend from the heavens? Not likely. The underpinning economic theory is not developed, complete or costed. The curricular improvements are not defined, costed or planned. The fiscal gains are speculative.


For a nation that has yet to restore education expenditures to pre-2008 levels, the promise is a chimera. There is little in the current political landscape that points to sufficient investments. A more realistic scenario is that greater disparities between low and high spending states can be expected with ESSA. The probability of this proposal’s strengthened investment in middle school math being enacted is further diminished by the primary author’s tour of the nation’s courthouses