Scott Alexander on Wednesday:
I’ve spent fifteen years not responding to [Hanson’s medicine] argument, because I worry it would be harsh and annoying to use my platform to beat up on one contrarian who nobody else listens to. But I recently learned Bryan Caplan also takes this seriously. Beating up on two contrarians who nobody else listens to is a great use of a platform!
What claim of mine does he dispute?
Robin Hanson of Overcoming Bias more or less believes medicine doesn’t work. This is a strong claim. It would be easy to round Hanson’s position off to something weaker, like “extra health care isn’t valuable on the margin”. This is how most people interpret the studies he cites. Still, I think his current, actual position is that medicine doesn’t work.
Scott then quotes 500 words from a 2022 post of mine, none of which have me saying all medicine is useless on all margins. Even so, he repeats this claim several times in his post. Now I agree that the analogies I chose in the two places he quotes me, to casinos and to old medicine, were not to cases where there’s clearly much difference between marginal and average effects. But analogies don’t have to be the same in all features to be useful.
Those analogy choices may have misled Scott. If so, I’m sorry. But I think I’ve been clear elsewhere. Such as in our 2016 book The Elephant in the Brain:
Our ancestors had reasons to value medicine apart from its therapeutic benefits. But medicine today is different in one crucial regard: it’s often very effective. Vaccines prevent dozens of deadly diseases. Emergency medicine routinely saves people from situations that would have killed them in the past. Obstetricians and advanced neonatal care save countless infants and mothers from the otherwise dangerous activity of childbirth. The list goes on. …
We will now look to see if people today consume too much medicine. … we’re going to step back and examine the aggregate relationship between medicine and health. … We’re also going to restrict our investigation to marginal medical spending. It’s not a question of whether some medicine is better than no medicine—it almost certainly is—but whether, say, $7,000 per year of medicine is better for our health than $5,000 per year, given the treatment options available to us in developed countries.…
[Re] the medicine consumed in high-spending regions but not consumed in low-spending regions, … the research is fairly consistent in showing that the extra medicine doesn’t help. … Still, these are just correlational studies, leaving open the possibility that some hidden factors are influencing the outcomes. … To really make a strong case, then, we need to turn to the scientific gold standard: the randomized controlled study.
There’s also my 2007 article Cut Medicine in Half where I say:
In the aggregate, variations in medical spending usually show no statistically significant medical effect on health. … the tiny effect of medicine found in large studies is in striking contrast to the large apparent effects we find even in small studies of other influences.
Obviously, if I thought medicine was useless at all margins, I’d have said to cut it all, not just cut it in half.
Scott’s refutation of me focuses on (A) specific evidence suggesting specific treatments help, and (B) aggregate randomized experiments.
On (A) specific treatments, he considers cancer and heart attacks.
We can more clearly distinguish the effects of medicine by looking at … for example, what percent of cancer patients die in five years? … People with cancer are more likely to survive than fifty years ago. … Some of these changes (especially prostate) are a result of earlier diagnosis. … here’s a graph showing similar survival improvements among childhood cancers in particular, where we wouldn’t expect this to be a problem
Odds of death within 30 days of a heart attack have fallen from 20% in 1995 to 12.4% in 2015 … can we dismiss this because maybe heart attack victims are younger? The study this particular graph comes from says their patients were on average 2.7 years older at the end than the beginning.
First, my claim of a near zero marginal health gain from more medicine on average is consistent with some particular kinds of medicine having a positive marginal gains. We name some plausible candidates in our book. Cancer and heart attacks could also be among them. Or maybe just childhood cancer.
Second, to be relevant to my claim these treatments need to be of the sort that many people get but many others do not. I’m willing to presume that cancer and heart attack treatment fall into this category, but Scott doesn’t show this.
Third, Scott is well aware that many others attribute much of these changes to the population getting generally healthier over time, and thus better able at each age to deal with all disease, and also to earlier screening, which catches more cases that would never get very bad. He judges:
Although some of this is confounded by improved screening, this is unlikely to explain more than about 20-50% of the effect. The remainder is probably a real improvement in treatment.
But he seems well aware that many other specialists judge differently here.
On (B) randomized experiments, Scott considers the four that I had mentioned:
In a late 1970s RAND Health Insurance Experiment on 7700 people over 3-5 years, those who randomly consumed 30-40% more medicine were no healthier.
In a 2008 Oregon Health Insurance Experiment, where for two years 8700 poor folk out of 35,200 eligible were randomly given Medicaid. This raised self-reported health, but mostly before they got any treatment. It cut medical debt.
In a 2021 Karnataka Hospital Insurance Experiment, half of 52,300 people were randomly given free (or cheap access) hospital insurance for 3.5 years, which increased their hospital insurance take-up from 60% to 79%. Again, no significant health difference.
In a 2019 US Taxpayer Experiment, 0.6 of 4.5 million were not sent a letter saying they faced a tax penalty for lacking insurance. Over the next two years those sent a letter got 0.23 more months of insurance. The headline result given is that the 45-64 year olds among them also had 0.06% lower mortality, at 1% significance.
Scott sees the first three as too underpowered to find interesting results. He found the results of RAND “moderately surprising”, but thinks “it’s a stretch to attribute [p = 0.03 blood pressure result] to random noise”, even if its the only result out of 30 at p<0.05.
Scott calls Karnataka a “study where the intervention didn’t affect the amount of medical care people got very much” as “they were unable to find a direct effect of giving people free insurance on those people using insurance, at all, in the 3.5 year study period!” But I see the study as reporting big utilization effects:
The average annual insurance utilization rate at 18 months (3.5 years) is 13.46% (2.56%) in the free-insurance arms versus 7.72% (0.64%) in the control arm. On average this effect amounts to a 74.35% (400%) increase in insurance utilization at 18 months (3.5 years).
And this seems to me a non-trivial constraint on medical effectiveness:
We cannot rule out clinically-significant health effects, on average equal to 11% (8.8%) of the standard deviation for each health outcome in ITT (CATE) analyses.
(They can rule out larger effects.)
Regarding US Taxpayer, until now I’ve written only one short paragraph on it, which ended calling 1% significance “marginal. So there’s a decent chance this study is just noise.” Scott was irate at that:
Come on! Thousands of clinical RCTs show that medicine has an effect. Robin wants to ignore these in favor of insurance experiments that are underpowered to find effects even when they’re there. Then when someone finally does an insurance experiment big and powerful enough to find effects, and it finds the same thing as all the thousands of clinical RCTs, p = 0.01, Robin says maybe we should dismiss it, because p = 0.01 findings are sometimes just “noise”. Aaargh! Here are some other quasi-experimental studies …
As Scott knows, we have a huge problem of selective publication and specification search (“p-hacking”), especially in medicine, which is why I’m suspicious of the few “quasi-experimental studies” that find big health gains from medicine, I know that typical regressions of health on medicine find no effect, and also that medical errors errors and prescription drugs cause huge numbers of deaths. Thus I focus on our few best studies: randomized experiments.
Over the last day, I’ve gone over that US Taxpayer experiment more carefully. The authors claim that their headline result of 0.06% lower mortality among 45-64 year olds, at 1% significance, is robust to the choice of that age range:
Online Appendix Table A.XV shows that the presence (but not the magnitude) of the mortality effect is reasonably robust to adopting alternative age cutoffs for defining the sample.
But that table seems to me to show no such thing. Compared to the 0.010 significance for their preferred age range, the other five age range estimates tested there have significance 0.044, 0.052, 0.069, 0.252, 0.319. Clearly I was right to suspect their 1% result to be marginal.
Projecting this 0.06% lower mortality given 0.23 more months of insurance to all 24 months, we’d get a 30% mortality cut for the fully insured compared to the fully uninsured for this age range, with other ages getting no cut. Which seems a high estimate to me. But while that is an ordinary least squares (OLS) estimate, the paper prefers a 7x larger instrumental variables (IV) estimate. These are often problematic in economics due to specification searches and false specification assumptions. Their IV estimate is:
each month of [insurance] coverage induced by the intervention during the [2yr] outcome period reduced mortality by approximately 10.1%.
Which is crazy huge! As (1-10.1%)^24 = 1/12.9, the same size effect for each of the 24 months in this period implies 13x lower mortality for the fully insured, over fully uninsured. Which is surely an effect we’d have noticed in health on medicine regressions. They agree this is implausibly large, and are embarrassed that their 95% confidence OLS and IV intervals don’t even overlap. But they note that the low end of their 95% confidence range is only 2.2%, which only implies the fully insured have 41% lower mortality than the fully uninsured, for this age range.
Now while their OLS estimate of the effect of treatment on mortality is only significant at the 1% level (and that exaggerated by selection bias), their OLS estimate of the effect of more insurance on mortality looks much stronger. At least if we could believe their Table IV which gives an estimate there of -0.026 and a standard error 0.001, for a crazy huge ratio of 26! But as they never even discuss this crazy huge significance in the text, I have to suspect that this is just a table typo.
Btw, the Karnataka paper mentions other randomized trials that I haven’t looked at, but hope to soon:
This specific result is consistent with results from all other RCTS of health insurance conducted in lower-income countries (Haushofer et al., 2020; King et al., 2009; Levine, Polimeni and Ramage, 2016; Thornton et al., 2010).
To summarize, contra Scott, I only claim that the aggregate health effect of marginal medicine is typically small, and of low economic value relative to costs, but not zero. While many studies claim to show otherwise for specific treatments, those tend to be quite biased, pushing me to focus on our least bias-able studies: randomized trials of aggregate medicine. I say that they still consistently fail to find clear effects.
Yes, this is in part because these studies aren’t larger. But the first experiment, RAND, was funded in part because many expected it to show that more medicine caused much more health, which would help them argue for giving everyone free medicine. That is why the main book on it was called Free for All?. Not seeing results was in fact surprising to its sponsors, and these null results continue to be surprising to most everyone who learns of them.
Added 3p: OK, I looked at those 4 further randomized experiments.
In Kenya, 789 workers were split into 3 groups: free, paid cash of premium, no help. Free folk had lower cortisol and stress levels.
In Nicaragua, 2608 people were randomly assigned different health insurance prices, 20% took insurance, and a year later they saw no change in medical utilization.
In Cambodia, 5000 households were randomly assigned different health insurance prices. Those given lower prices were less likely to take on new debt due to a bad health shock. No change was seen in how much medical care people sought.
In Mexico, for 534,500 people living in 148 geographic areas, in random matched pairs, one of each pair was given more resources to upgrade facilities, and people there were encouraged to enroll in insurance. After ten months, barely 5% significant effects on catastrophic spending overall and for poor, and on health spending of poor. No effects on overall spending health outcomes, or medical utilization.
So not seeing any effects on medical utilization, these studies can’t speak to the effect of medicine on health.
Added 7a: Scott says he’ll reply:
6: Robin Hanson wrote a response to my piece arguing against his healthcare views. I’ll probably have a response up sometime in the next week or two if I don’t get distracted.
Added 1June: On May 10, Scott replied:
I basically agree with this, and apologize to Robin for being suspicious of his position. I think this is a pretty reasonable position, not too far away from mine (although I still disagree on the insurance studies).
Scott’s earlier work was awesome. He changed the way I saw the world. For five years, probably more, the quality had deteriorated. This is a new low. Such a shame. Anyway. This is a good piece and a good response.
Surprising and a bit disappointing from Scott who I usually expect to go very deep into a topic. This seems like he only skimmed for the prep. Or he probably could have just asked.
Not really sure what his motivation is unless he’s just trying to release as many words as possible and this one seemed easy.