Saturday, January 28, 2012

Don't Believe the "Defenders" of Teachers: Teachers Do Matter

You often see education commentators trying to suggest that bad school performance is almost entirely the fault of poverty and other external factors, not the fault of poor teaching. In making this claim, commentators often point to the level of variation in student test scores that is allegedly "explained" by teachers. For example, Anthony Cody says, "Even Eric Hanushek, the economist who has done more to advance these evaluation systems than anyone, admits that teachers only account for around ten percent of the variability in student test scores."

Family and income are surely important, but the "10% of variance" argument is wrong for at least two reasons:

First, in statistical terms, saying that teachers account for 10% of the variance in student test scores does NOT mean that teachers are unimportant. Wrong, wrong, wrong. (At the end of the blog post, I say more about what explaining variance means.)

The eminent Harvard professors Rosenthal and Rubin explained this in a 1982 article, "A Simple, General Purpose Display of Magnitude of Experimental Effect," Journal of Educational Psychology 74 no. 2: 166-69 (that article isn't available online, but is described here).

As luck would have it, Rosenthal and Rubin address the precise example of a case wherein 10% of the variance was explained:
We found experienced behavioral researchers and experienced statisticians quite surprised when we showed them that the Pearson r of .32 associated with a coefficient of determination (r2) of only .10 was the correlational equivalent of increasing a success rate from 34% to 66% by means of an experimental treatment procedure; for example, these values could mean that a death rate under the control condition is 66% but is only 34% under the experimental condition. We believe . . . that there may be a widespread tendency to underestimate the importance of the effects of behavioral (and biomedical) interventions . . . simply because they are often associated with what are thought to be low values of r2.

By analogy, saying that teacher quality explains 10% of the variance would be equivalent to saying that teachers can raise the passing rate from 34% to 66%. That's nothing to sneeze at, and it certainly isn't a reason for teachers to throw up their hands in dismay at the hopelessness of their task.

Second, the fact that teachers account for 10% of variance NOW, given a particular set of data points, tells us little or nothing about the true causal importance of teachers. 10% isn't a Platonic ceiling on what teachers can accomplish, and the proportion of variance explained tells us very little about how much impact teachers really do have.

A simple hypothetical example makes this clear: Imagine that all teachers in a school were of equal quality. Given equal teachers, any variation in student test scores would automatically have to arise from something other than differing quality of teaching. So a regression equation in that context might tell us that demographics explain a huge amount of the variation in test scores, while teaching quality explains nothing. But it would be completely wrong to conclude that demographics are inherently more important than teaching quality, or even that teaching quality doesn't matter. The exact opposite might be the case, for all that such a regression could tell us.

Moreover, if all teachers became twice as effective as they are now, there would still be variance among teachers and variance among student test scores, and teachers collectively might still "account" for a "small" amount of variance, but student performance might be much higher.  The fact that teachers account for 10% of variance today (as large as that actually is) simply does not give us any sort of limit on how much student achievement could rise if the mean teacher effectiveness shifted sharply to the right.

So the would-be defenders of teachers can breathe a sigh of relief: value-added modeling might still be a shaky idea for several other reasons, but there's no need to denigrate the potential of teachers.


* * *

A more detailed statistical explanation:

The proportion of variance explained means is that if you take the Pearson product-moment correlation, and square it, you end up (after some algebra) with the following:

 

 What does this mean? The denominator is calculated by taking all the individual Y's (in the education context, all of the student test scores that you're trying to explain), subtracting the average Y value, squaring all of the differences, and adding up all of the squared values. In the context of the following graph, the denominator gives us a measure of the total squared distance (in the vertical direction) that all of the red dots deviate from the average Y value.

 

The numerator tells us how far the regression line deviates from the average Y value.  The regression line predicts that the Y values will be along the line itself, which obviously isn't exactly true. So the predicted Y values (that's what the little ^ sign over the Y means) have the average Y value subtracted, the difference is squared, and then all the squared differences are added up.

All in all, the "proportion of variance explained" figure is just a way to represent how close a regression line based on X will come to the actual red dots in the graph, compared to how close a line based on just the average red dot will come.

For the same reason that correlation is not causation, accounting for variance does not provide an upper limit for the true causal importance of a variable. As noted above, the level of variance "explained" is a bad way to determine how important X actually is. See D'Andrade and Hart, for example.

Wednesday, January 25, 2012

Jeff Buckley's "Corpus Christi Carol"

Just lovely:

Also this:

Tuesday, January 17, 2012

The New Groupthink

From a New York Times article titled "The New Groupthink":

Our schools have also been transformed by the New Groupthink. Today, elementary school classrooms are commonly arranged in pods of desks, the better to foster group learning. Even subjects like math and creative writing are often taught as committee projects. In one fourth-grade classroom I visited in New York City, students engaged in group work were forbidden to ask a question unless every member of the group had the very same question.
I'm not sure why group seating in classrooms seems to have caught on so strongly. As a parent, I know that children are better behaved (if only by necessity) when they're not sitting close enough to bother someone else, mark on the other child's paper, etc.  

Saturday, January 07, 2012

A Case Study in Bias

Two studies came out comparing the performance of schools or teachers. In the first case, Raj Chetty, John Friedman, and Jonah Rockoff came up with just about the most extensive and sophisticated study of teachers' value-added that I've ever seen. As highlighted in the New York Times, the study includes estimates for how much high-quality teachers improve their students' income years later, and also (see pp. 29 ff.) includes a new way to check for bias by looking at how cohorts of students change performance when a high or low value-added teacher arrives from somewhere else. Very cool.

But such a study, implying that some teachers are better than others, and that teacher quality can be revealed by how well their students do on tests (conditioning on prior achievement and student demographics), is disturbing to some people. Diane Ravitch tweeted at least 67 times the day the study came out, trying to undermine the study by questioning its lack of peer review (so far), the way in which it was conducted, and the very project of looking at test scores in the first place.

 In the second case, there's a group called Educate Now in Louisiana that released a PDF chart (available here) that merely lists the schools in New Orleans identified by whether they are Recovery School District schools or voucher-accepting private schools, and then listing what percentage of students score above basic on English and Math in grades 3-5. That's all. No attempt to control for the individual students' prior achievement, no attempt to control for any student demographic variables such as poverty, no attempt to control for the fact that students are eligible for vouchers only if they had been attending a failing public school, no statistical analysis whatsoever.

This is as primitive as it gets, and is a horrible way to judge the merit of voucher schools (as I explained here).

 Did Diane Ravitch tweet 67 times criticizing this purported attempt to compare voucher schools to public schools? No: right in the midst of her incessant criticism of an immeasurably superior study, she sent out one tweet that said, "How did voucher schools in New Orleans do?" followed by a link.

Ravitch here displays the worst sort of intellectual bias: when what looks like one of the best studies out there doesn't fit her ideology, she acts as if it is far more questionable than the baloney that she otherwise is happy to plug. To be sure, it's OK to ask questions about the new value-added study, what it means, how it was done, and whether it was oversold in the media. But it's not OK to pass along a worthless analysis of the merits of vouchers.

Anti-reformers need to think a bit more carefully about whether they want someone as their standard-bearer who doesn't know the difference between good and bad research (or, worse, who doesn't care).

Sunday, December 18, 2011

"Starch Consumption Raises Risk Of Breast Cancer Coming Back"

So read the headline on a news story recently. A new study found as follows:
Breast cancer survivors whose starch intake is above average have a greater risk of cancer recurrence compared to other breast cancer survivors, researchers from the University of California, San Diego explained at the 2011 CTRC-AACR San Antonio Breast Cancer Symposium, Dec. 6-10, 2011. The researchers added that it is in particular starch that raises the risk, and not just overall carbohydrates.
Plausible. But it turned out that all the researchers did was interview the women once a year about their diet, and women whose breast cancer came back were eating 2.3 GRAMS of carbohydrates more than the average per day, only half of which was starch. As far as I can tell, that's about as much as is in 1 tablespoon of cooked rice. So the headline was based on women claiming to have eaten an additional 1 tablespoon of cooked rice (or some equivalent) per day.

This does not strike me as a useful finding. There is no way that a once-a-year interview can pinpoint women’s carbohydrate consumption down to the tablespoon, and such a miniscule amount of starch surely can't be making a clinical difference anyway.

Thursday, November 24, 2011

Charter Schools and Segregation

Charter schools are often accused of "segregation" merely for serving too many black kids. One recent example of this criticism comes from Zoe Burkholder of Montclair State University in New Jersey, who has an article in Teachers College Record lamenting the fact that DC Prep Charter School is 98% black.

She does concede that black parents have a good reason to choose DC Prep: "parents in D.C. can choose between a traditional public school racked with violence and high dropout rates, or a charter school that is safe and promises to teach at least two of the '3 Rs.'" She even admits that "maybe anyone would prefer a charter school like DC Prep under these conditions."

But she immediately backs away from agreeing that black parents ought to have the option of choosing such a school:
But that doesn’t make it okay, and here is why. When you step back from DC Prep, and successful charter schools like it, what you see is a public school that is racially and socio-economically segregated and inherently very different from the form and function of the majority of public schools in America. . . . Since Horace Mann first rode horseback through New England to sell the idea of tax-supported “common schools” for all children, Americans have dared to dream that public education will instill in our citizenry the many capacities necessary for self-government: critical thinking, civic engagement, tolerance for diversity, an appreciation for the arts and sciences, a knowledge of global affairs, a critical understanding of American history, and the capacity for civil debate.
I've said this about Diane Ravitch before: If you're going to oppose the so-called "segregation" of charter schools, even though it arises from the completely voluntary choices of black parents, you should think twice before waxing so eloquent about Horace Mann's day, when it was often illegal for black people to attend school anywhere. Nor is it historically correct that "Americans" wanted "tolerance of diversity" in public schools during the 100+ years of officially-mandated segregation.

In any event, Burkholder makes the same mistake that the highly publicized Civil Rights Project (headed by Gary Orfield) made: she compares DC Prep Charter School to "the majority of public schools in America."

That comparison is completely meaningless. We know that charter schools are much more likely to be located in inner-city neighborhoods where the demographics are much different from the national average. Indeed, if an inner-city DC or Atlanta charter school had demographics that resembled the broader United States, that school would instantly be accused of promoting segregation by gathering too many white students in one place.

What Burkholder should have done is compare DC Prep to nearby traditional public schools. On that ground, it turns out that a 98% black charter school in a heavily black area of northeastern D.C. isn't that unusual. The closest traditional public school to DC Prep is Noyes Elementary, which is 96% black, 3% Hispanic, and all of zero percent white.

Yes, racial imbalance still exists. But attacking charter schools does nothing to get rid of it.

Sunday, November 20, 2011

Momentum in Sports

The Freakonomics blog makes a point that I think is wrong:
The best place to start is with a famous (for academia) paper from several years ago, called “The Hot Hand in Basketball: On the Misperception of Random Sequences.” As you can glean from that snazzy subtitle, the authors come down against momentum, arguing that a “hot streak” is really just a random sequence that we misperceive to be more meaningful than it is.

Ever try flipping a coin 100 times? You’ll be surprised at how many long, unbroken sequences of heads or tails you get. It’s easy to mistake that for a pattern, suggesting some kind of meaning or momentum, but it’s really just a pure illustration of randomness itself. The fact is that if you get 10 heads in a row, the next flip is no more likely to be heads (or tails, for that matter).

And so it is, for the most part, with hot hands and hot streaks and hot quarterbacks. In our Momentum video, you’ll hear Toby Moscowitz, the academic co-author of Scorecasting, discuss how pretty much everyone in football believes in momentum. But, having looked at a lot of NFL data, Moscowitz reaches a sobering conclusion: “There is a much stronger belief in momentum than is warranted by what we see in the data.”
* * *

Consider one example in our video, the Buffalo Bills’ redonkulous 32-point comeback against the Houston Oilers in 1993. As Chris “Mad Dog” Russo puts it: “You’re gonna tell me momentum had nothing to do with that game?!”

Okay, Chris, I’ll take a shot at telling you exactly that. You know why we’re still talking about that game? Because it was a massive anomaly – the kind of comeback that almost never happens. It was so rare that our brains have an easy time recalling it. (We do this with all anomalies – dramatic plane crashes, mass murders, and so on.) And when we recall something so easily, we tend to believe it’s far more common that it actually is.

The truth is that you’re bound to get a wild 32-point-comeback once in a while, just as you’re bound to get a streak of 10 or 12 heads too.

Here's the thing: a 32-point comeback might indeed be so rare that it fits within a statistically normal distribution as several standard deviations above the mean. This does NOT mean, however, that a 32-point comeback was itself a matter of random chance -- the 32-point comeback happened because of how a bunch of human beings performed on a given day, and their performance was not random at all. Their performance was affected non-randomly by their preparation and skills, their coaching, their choices of plays, and their confidence level (the latter of which would be dramatically affected if either team started to think that the "momentum" was heading in a particular direction).

Try an analogy: A 7-foot-tall man is a rarity, and if human height falls into a normal distribution, someone might make the following claim, akin to the dismissal of sports momentum: "This 7-foot-tall man's height might seem to have sprung from some genetic factor, but in fact, you find 7-foot-tall men in nature only as often as would be expected by chance. Therefore his height is just a matter of random chance, not genetics."

Well, the fact that this particular guy got the genes to be 7 feet tall might be random chance from the point of view of a statistician looking at all of humanity, but that in no way proves that his height was unrelated to genes. Similarly, the fact that one particular sports team had a huge amount of momentum on a particular day might be described as random chance, but that doesn't disprove the claim that it did have momentum then.

Wednesday, November 09, 2011

Songs I Like

UPDATE: Here's a Spotify list of most of the songs below.

Deas Vail, “Desire.”


The National, “Exile, Vilify.”


Eisley, "One Day I Slowly Floated Away."


Eisley, "Memories."


Deas Vail, "Shoreline."


Jeff Buckley, "Corpus Christi Carol."


Copeland, “Should You Return.”


With Lions, “Our Great Rise.”


The Future of Forestry, “If You Find Her.”


Jonsi, “Tornado.”


Gotye with Kimbra, “Somebody That I Used to Know.”


The Honey Trees, “To Be With You.”


Eric Whitacre, “Lux Arumque.”


Espen Lind, “Scared of Heights.”


Kimbra, “Settle Down.”


The Reign of Kindo, “The Moments In Between.”


Peter Groenwald, “Wreckage.”


Yael Naim, “New Soul.”


Bat for Lashes, “Sleep Alone.”


Civil Twilight, “Human.”


Digital Daggers, “Surrender.”


Shiny Toy Guns, "You Are the One."


Dubstar, "Stars."


Muse, Undisclosed Desires.

Erin McCarley, "Pitter-Pat."


Deadmau5 and Kaskade, “I Remember.”


Marc Martel, "Somebody to Love."


Dredg, "Information."


Eden's Edge, "Blue Moon of Kentucky."


Lo-Pro, "Reach."

George Michael, "I Can't Make You Love Me."


And just for fun: George Michael, “1, 2, 3.”

Friday, November 04, 2011

What Causes Student Achievement?

Dana Goldstein addresses that question:
As you can see, by estimating teacher effects at 20 percent, I've interpreted the research consensus quite generously. Matthew DiCarlo, a sociologist with the Shanker Institute, has looked at this same body of research and concluded that another 20 percent of the causes of student achievement gaps are "unobservable" (ex; differences in innate intelligence, statistical error, other mystery causes); and that the rest, about 60 percent, can likely be explained by all the myriad factors associated with socioeconomic status.
The parsing out of what causes student achievement seems very dubious. What if part of the way that socioeconomic status leads to higher achievement is that parents use it to buy houses in school zones with . . . better teachers? Seems very likely, but there's no way to tell with the usual models.

One way we could figure out how to divvy up responsibility would be to get 500 rich kids and randomly assign half to attend a school with teachers identified as horrible (but otherwise keeping everything else about the school the same, such as peers or school spending), and then compare them to the other rich kids who got to attend their regular school. Then you'd really be able to see how much rich kids were benefiting from being able to buy access to good teachers.

But you'd never be able to do such a study -- no one would sign up.

Labels:

Wednesday, October 19, 2011

Memory and the Law

All of the studies on the fallibility of memory should make us question the weight given to eyewitness testimony.