Friday, March 07, 2014

Ravitch's math

In a recent post, Diane Ravitch decries the fact that Chicago charters expel a higher percentage of kids than do the other public schools:
The data reveal that during the last school year, 307 students were kicked out of charter schools, which have a total enrollment of about 50,000. In district-run schools, there were 182 kids expelled out of a student body of more than 353,000. That means charters expelled 61 of every 10,000 students while the district-run schools expelled just 5 of every 10,000 students.
She credits this pattern of expulsions with helping the charter schools have higher test scores:
It makes perfect sense. If a school can kick out the kids with low scores, the school will have higher scores and the public school that gets the low-scoring kids will have lower scores. How simple!
If you give all the Chicago kids a test on which public school students score an average of 70 and charter school kids score an average of 75, but then take 307 charter kids who score 50 (quite a bit lower than the overall average) and move them to the traditional public schools instead, what will happen to the overall test scores? Public schools will now have an average of 69.98 instead of 70, and charter schools will how have an average of 75.154 instead of 75.

This is a somewhat artificial example, of course, but the point remains that even if every single kid expelled from charter schools had substantially lower test scores than everyone else, charter expulsions probably don't explain very much about the overall test score patterns.

Meat = Smoking?

Another day, another over-hyped scaremongering study in the news. The latest is this:
Eating a diet heavy on meat and cheese may be as harmful to you as smoking a cigarette, researchers claim. A new study, published in Cell Metabolism on March 4, shows that middle-aged people who eat a diet high in animal proteins from milk, meat and cheese are more likely to die of cancer than someone who eats a low-protein diet. The research also showed the people who ate lots of meat and dairy were more likely to die at an earlier age.
The actual study is here. Among the most obvious problems:

1. No good information on what people actually ate. 

The 6,381 people in the study were given a survey ONCE asking them about what they ate in the previous 24 hours. The authors then matched these people up with a database of death records 18 years later.

So you have to assume that:

 (a) these people told the truth about what they ate in that 24 hour period even though they only claimed to have eaten an average of 1,823 calories per day (which is completely implausible), and,

 (b) whatever they ate in that 24 hour period is the same thing that they ate for the next 18 years.

The authors themselves say in the “limitations” section: “First, the use of a single 24 hr dietary recall followed by up to 18 years of mortality assessment has the potential of misclassifying dietary practice.” No kidding.

2. The authors seem to have been cherry-picking amongst ways of subdividing the people in the study.

The overall conclusion, for all ages, is that “high and moderate protein consumption” were “not associated with all-cause, CVD, or cancer mortality when subjects at all the ages above 50 were considered.” The authors then subdivide people into ages 50-65 at baseline versus ages 66 and up at baseline.

When you split the data that way (and why did they pick 65 as the dividing line? who knows?), it turned out that high protein consumption appeared to be correlated with more cancer deaths for younger folks but REDUCED cancer deaths for the 66+ group. (Obviously this had to be the case – if there is no overall increased risk of cancer, then if you cleverly chop the sample such that one group has an increased risk, the other group has to have a decreased risk.) It is not clear why meat protein would “cause” a 4-fold increase in cancer at one age but a reduction in cancer at another age.

3. It’s just plain wrong to divide continuous variables up into simplistic categories such as high vs. low protein or over vs. under 65 years old. 

Age is a fairly continuous variable. So is protein consumption. Even if you have good information on protein consumption (which we don’t, see point 1), the correct way to model the risk of death at various ages versus protein consumption would be to use all the information at hand – people’s exact ages (not just whether they are over or under 65), and people’s exact protein consumption (not just whether they ate less than 10%, between 10% and 19%, or 20%+ of calories from protein).

Dividing these variables into large buckets like the authors did can cause completely spurious relationships to arise. There are innumerable articles warning NOT to do this. See here or here or here or here or here or here.

4. Correlation vs. causation. Even if we had good information on what people ate over the 18 year period, a correlation between meat and cancer deaths is just correlation unless we know for sure that the people who ate more meat were identical in every other way to people who ate less meat (or at least that the dataset measures everything that is different about them, such that we can control for it). But we don’t know this at all. Indeed, the authors did not control for some factors that might affect mortality, such as exercise levels, geography, or income.

To take one of many possibilities, perhaps poorer people eat more meat (fewer salads and the like) and also have less access to good cancer screening/treatment. This could cause a spurious correlation between meat and cancer to appear in the data (although even then, it’s not clear why the correlation would go in opposite directions depending on whether the person was over 65).