Epidemiology Doubts

Sunday’s New York Times Magazine:

In January 2001, the British epidemiologists … Davey Smith and … Ebrahim … noted that those few times that a randomized trial had been financed to test a hypothesis supported by results from these large observational studies, the hypothesis either failed the test or, at the very least, the test failed to confirm the hypothesis: antioxidants like vitamins E and C and beta carotene did not prevent heart disease, nor did eating copious fiber protect against colon cancer.   

The Nurses’ Health Study is the most influential of these cohort studies, and in the six years since the Davey Smith and Ebrahim editorial, a series of new trials have chipped away at its credibility. … The implication of this track record seems hard to avoid. "Even the Nurses’ Health Study, one of the biggest and best of these studies, cannot be used to reliably test small-to-moderate risks or benefits," says Charles Hennekens, a principal investigator with the Nurses’ study from 1976 to 2001. "None of them can."  …

But clinical trials also have limitations beyond their exorbitant costs and the years or decades it takes them to provide meaningful results. They can rarely be used, for instance, to study suspected harmful effects. Randomly subjecting thousands of individuals to secondhand tobacco smoke, pollutants or potentially noxious trans fats presents obvious ethical dilemmas … randomized trials "are very good for showing that a drug does what the pharmaceutical company says it does … but not very good for telling you how big the benefit really is and what are the harms in typical people. Because they don’t enroll typical people." …

The effect of healthy-user bias has the potential for "big mischief" throughout these large epidemiologic studies. … At its simplest, the problem is that people who faithfully engage in activities that are good for them – taking a drug as prescribed, for instance, or eating what they believe is a healthy diet – are fundamentally different from those who don’t. … wealth associates with less heart disease and better health, at least in developed countries. …

[There is also] the compliance or adherer effect. Quite simply, people who comply with their doctors’ orders when given a prescription are different and healthier than people who don’t. … the prescriber effect. The reasons a physician will prescribe one medication to one patient and another or none at all to a different patient are complex and subtle. … "A physician is not going to take somebody either dying of metastatic cancer or in a persistent vegetative state or with end-stage neurologic disease and say, `Let’s get that cholesterol down, Mrs. Jones.’ …

We can fall back on several guiding principles, these skeptical epidemiologists say. One is to assume that the first report of an association is incorrect or meaningless, no matter how big that association might be.  … If the association appears consistently in study after study, population after population, but is small – in the range of tens of percent – then doubt it. … If the association involves some aspect of human behavior … then question its validity. … it’s never a bad idea to remain skeptical until somebody spends the time and the money to do a randomized trial and, contrary to much of the history of the endeavor to date, fails to refute it.

For the record, I’m all in favor of randomly subjecting people to harm, as long as they have been paid enough for participation.  We should just be doing lots more randomized trials of important influences.


GD Star Rating
Tagged as:
Trackback URL:
  • Is the problem as simple as bad maths? I’m currently struggling with Judea Pearl’s book Causality: Models, Reasoning And Inference. The impression I’ve got so far, partly from reading Clark Glymour’s website, is that you can get causation from correlation, but the mathematical techniques in current use are not up to the job; they just don’t do it.

    Why do researchers use invalid techniques and churn out wrong answers? One reason that I’m reading Pearl is that I have a hunch that I want to follow up. It is that the valid techniques have a huge appetite for data. On ordinary sized data sets they mostly show that you cannot actually tell. So the choice between using valid techniques and invalid techniques is dominated by the the consideration that the invalid techniques lead to statistically significant results. The answers are gibberish, but at least they are publishable gibberish.

  • “For the record, I’m all in favor of randomly subjecting people to harm, as long as they have been paid enough for participation. We should just be doing lots more randomized trials of important influences.”

    I agree, and interesting post.

  • J Thomas

    Small effects just aren’t worth much in epidemiology. They are in process design. You can run a chemical factory and make small changes each run and see which changes result in small improvements. Then make more small changes. The results are consistent and predictable enough that it works, and the small variations in results don’t matter. But not in epidemiology.

    Big changes matter. Smoking tobacco to excess affects heart disease, and there aren’t a lot of people who can smoke without smoking to excess. It’s possible to get people to give up smoking, sometimes.

    Obesity matters to heart disease. Maybe sometimes people can reduce their obesity.

    Vitamin C might palliate the effect of smoking some, but is it better to get smokers to take enough ascorbic acid or is it better to get them to stop smoking? Does it really matter whether the small effect is really there?

    The big effects are the important ones. If you get a small effect and it’s statisticly significant, that’s at least a sort of consolation prize. You can say that there’s something real there, even if it isn’t big enough to act on.

  • Alan,

    You write, “you can get causation from correlation, but the mathematical techniques in current use are not up to the job.”

    With certain types of data collection (for example, randomized experiments), you can infer causation from correlation. But no, you can’t do it from observational data in general.

  • With certain types of data collection (for example, randomized experiments), you can infer causation from correlation. But no, you can’t do it from observational data in general.

    A common misconception, Andrew. There are ways. See Judea Pearl’s Causality.

  • Forrest Bennett

    A very good article overall, but I do have two criticisms.

    1) He implies or states that there is very little to no proper randomized, double blind, placebo controlled studies in this area that valid conclusions can be drawn from. This is false.

    Just off the topic of my head, the most compelling recent population-based, double-blind, randomized placebo-controlled trial showing a statistically significant effect was for using vitamin D to prevent cancer. The effect in this 4 year study was so large and statistically significant that researchers say that _all_ previous studies of cancer will now have to be revised to control for the effect of latitude (because latitude effects vitamin D levels).

    Also, just looking at one hit from a single quick google search brought up randomized, double blind, placebo controlled studies for omega-3 fatty acids and the following health effects: joint function, cognitive/emotional health, respiratory function, and gastrointestinal health. These results are further supported by our detailed mechanistic understanding of the function of specific types of fats in membrane functioning, and as precursors to important hormones in specific synthesis pathways, etc. It can be shown in atomic detail exactly why a membrane doesn’t function as well when constructed with the wrong fats. We also know that certain of these required fats can not be synthesized by the human body, and therefore must be consumed in the diet.

    2) I agree with what he says about HRT, but he fails to mention that HRT was a pretty stupid idea a-priori for two reasons. A) They use horse hormones for HRT, which is not even the same molecule as is found in human females. This may or may not be the problem, but we now know that one of these horse estrogens causes DNA single-strand breaks and oxidation of DNA bases in vitro. B) The ratio of the hormones used in HRT doesn’t match the ratios found in human females – so it doesn’t even look like a promising therapy on paper. So none of the HRT studies he cites have any bearing on the therapeutic value of using real human molecules in biologically realistic ratios. I’m not talking about natural vs synthetic, I’m talking about using the right atoms connected in the right topology.

    The healthy user effects, compliance effects, and prescriber/eager patient effects described in the article are factored out in the randomized, double blind, placebo controlled studies I refer to above.

    I agree with the author that a strong dose of skepticism is in order when evaluating public health recommendations, but to say that we don’t know anything beyond smoking causes lung cancer (and maybe three other things) is overstating his case.

  • Jor

    Robin, doesn’t this mean that we shouldn’t take too seriously all the observational studies you use to show medicine has no net effect?

    Andrew & Alan, there are certain types of causal structures that can be inferrred from observational data alone, but only a few types. For example, in a D.A.G, if you have a Y structure, i believe you can prove causality.

  • Is the original article http://www.bmj.com/cgi/content/full/325/7378/1437

    I’m puzzled by

    > Furthermore, it is seldom recognised how poorly the standard statistical techniques “control” for confounding,…

    My understanding was that the standard statistical techniques of controlling for confounding variables do not support causal inferences at all, not even poorly. Once you have found which variables are independent and which are independent when conditioned on others there is a second step required, of filtering all the possible causal graphs against the measured dependence relations. Maybe you get lucky and some connections are present and go the same way in all the remaining graphs permitting an inference about causality.

    Smith and Ebrahim don’t mention this, so perhaps they were unaware of the difficulty in 2002

  • Gray Area

    It’s not in general possible to infer causation from correlation by itself. Methods in Pearl’s book assume consensus on the broad causal structure of the domain (in the form of a causal diagram). This consensus is rare in practice, and when it isn’t available you have to resort to experimentation to establish what the graph is, so you are back where you started.

    There is some ongoing work striving to reduce how many causal assumptions (graphical or otherwise) are needed to draw useful conclusions.

  • g

    Should this blog be renamed to “Overcoming Bias and Contrarian Medicine”? Robin’s articles on medicine are interesting and possibly important, but they don’t generally seem to me to have much to do with bias unless “bias” is being used to mean simply “error”. Which is odd in view of his (perfectly reasonable) willingness to discourage discussion that he considers “political” by commenters.

    Yes, sure, this post uses the word “bias” (“healthy-user bias”), but that’s not a *cognitive* bias. (Perhaps we should expect posts about how to make more uniform billiard tables.)

  • g

    Actually, “Contrarian Medicine” would be a dumb term because it would be taken to mean things like acupuncture. “Contrarian Health Policy” or something, perhaps.

  • Eliezer and Gray Area,

    I wouldn’t say I have a misconception, but I was being sloppy and I appreciate the correction. I actually infer causality from correlation all the time. It’s just that, when doing this from observational data, assumptions need to be made (as we discuss in Chapters 9 and 10 of our new book, and I’m sure Pearl discusses in different terminology in his book). Regarding Alan’s original comment, I think the limiting factor is not so much “mathematical techniques” but rather the substantive assumptions required to make the causal inference convincing in political science, public health, economics, or whatever field is being studied.

  • Konrad

    Forrest Bennett wrote: ” I agree with what he says about HRT, but he fails to mention that HRT was a pretty stupid idea a-priori for two reasons. ”

    They had no other choice at the time. This stuff’s been around since WW II, and it didn’t get the huge randomized trials until much later.

  • Gray Area


    I agree that assumptions are the key, the ‘techniques’ are mostly developed.

    I wonder how often the process which starts with some randomized or observational study (subject to a myriad of simplying assumptions), and ends with a press soundbite along the lines of “vitamin B helps prevent cancer” or “talking to your kids about smoking makes them six times less likely to smoke” actually leads to correct causal claims. Personally, I view most causal claims involving populations with extreme scepticism.

  • Laura

    The NYT article is all about bias, including cognitive bias- namely the bias to believe established, albeit incorrect, observational data without considering the actual causes of the correlations. Highly intelligent people do it too, because it takes a long time to read the studies and see the actual basis of the headlines and health advisories, and even to make sure our background preassumptions are correct to begin with. The lesson is not to take current medical opinion at face value, even if it is well established. Perhaps you do not have this bias, but nearly all the rest of society does.

  • Richard Hollerith

    The quality of the comments here is outstanding.

  • joe

    “Robin, doesn’t this mean that we shouldn’t take too seriously all the observational studies you use to show medicine has no net effect?”

    Jor, I have tried and tried to get through to Robin all of the problems with the conclusions he attempts to make from these studies.

    Robin seems to have his views about medicine and doesn’t entertain any thoughts which could undermine his views. I don’t think he even thinks for two seconds about some of the criticisms I have made on this blog concerning his conclusions.

    I’ll say the most basic point, AGAIN, about Robin’s FAVORITE piece of evidence. In the Rand study, insurance level was randomized, healthcare received was not. Thus you have to be very cautious about any CAUSAL conclusions drawn from the amount of healthcare actually received.

    Just think about it.

  • BillK

    I agree with Jor. A study that produces a net result of ‘no effect’ doesn’t mean that everyone in the study had ‘no effect’. It means that some had 100% effect and some had 0% effect, with all the ranges in between.

    It indicates that more study is required to improve the individual cases where below 50% effect was observed.

  • Jor and Joe, I rely most heavily on the RAND aggregate experiment; what would you have me rely on?

  • g

    Laura, believing “established observational data” surely isn’t a “cognitive bias” in any useful sense. It’s usually the right thing to do. Gotchas are not biases. As has been pointed out elsewhere on this blog, even very noisy data are better than no data.

    And if it turns out that in fact epidemiological studies are so noisy and biased that there’s no useful information to be extracted from them? Why, then believing epidemiological studies is a mistake, just as believing horoscopes is a mistake. But there’s no “astrology bias”, although there may be biases (e.g., confirmation bias) that make it easier for astrology to get believers; and while telling us that epidemiological studies are useless is valuable (provided it’s true) it’s not clear that it offers much in the way of useful general cognitive lessons.

    Of course, Robin is in overall charge of this place, and even if he wants to use it to post pictures of kittens or descriptions of his favourite movies then I’ve got no grounds for complaint. It just seems to me that there’s some divergence between the stated mission of “Overcoming Bias” and some of what it’s used for.

  • g, if the other editors, Nick and Eliezer, told me that thought I was drifting off topic, I would listen carefully. We get complaints about being too specific as well as about being too general. I take “bias” to be “avoidable error” and it seems to me beliefs about medicine are especially prone to avoidable error.

  • J Thomas

    In the Rand study, insurance level was randomized, healthcare received was not. Thus you have to be very cautious about any CAUSAL conclusions drawn from the amount of healthcare actually received.

    But the people who had no copayments received approximately 50% more healthcare. They asked for 50% more and got it. And their health was not particularly improved according to these various measures, most of which look worthless to me. (11 scales, but 3 of them were for mental health where nobody particularly expects a few years of psychotherapy to do much, and some of them were about the people’s view of how healthy they were or their view of how good their healthcare was etc. And how well the extra 50% of doctor’s visits or hospital stays helped them quit smoking or lose weight.)

    The obvious implication to me is that patients who have a moderate co-pay will see doctors when they really need them, and patients who have free co-pay will see doctors more often than they need them.

    This is probably a valid result whether the study actually shows it or not.

  • joe

    J Thomas, you were so close to realizing one of the points I have been trying to make. I agree that patients with no co-pay will see the doctor more often than those who have to pay, but you have to be careful about conclusions drawn about the actual amount of medicine received. If patients with free co-pay see doctors more often than really they need them, does this mean more medicine has no net effect… or does it imply that medicine is mainly reactive and if you go to the doctor when you don’t really need to, medicine shouldn’t have an extra benefit…. if there’s nothing really wrong with you, then why should medicine be able to improve your health? Also, we have to remember that you don’t actually get treated with medicine every time we go to the doctor, especially if there’s nothing really wrong with you. Thus, doctor’s visits don’t even directly equate to an increase in medicine received.

  • joe


    If the Rand study design is inadequate to address your question of interest, then be careful about the conclusions you draw from it and don’t rely on it anyway.

    I am sure there are some people on here who are intelligent enough to design an ethical study for your question of interest.

  • J Thomas

    Joe, I thought I was making precisely the point you elaborated.

    The study appeared to show that 50% extra doctor’s visits and 50% extra hospitalizations, at the patients’ initiative, did not improve their health.

    Robin wants to interpret this as saying that the first 100% of doctor’s visits and hospitalizations also failed on average to improve patients’ health.

    What I saw the RAND study showing was that the cost of the co-pays in their samples was not large enough to keep patients from getting medical assistance when they needed it. Patients might put off getting eye exams and new glasses when the expense was high, while they didn’t put it off when it was free. So their vision was slightly worse. But for most things, when their health was in serious danger they were willing to pay their co-pay sums and get their treatment, whether it actually helped them or not.

    It implies that the extra third of medical care that people got when it was free, was probably unneeded. It says nothing about how useful the first two thirds were.

    To tell whether the first 2/3 of the medical treatments were useful it would work better to withhold all medical care from one randomly chosen group and let the other group have medical care. Then you’d see whether medical care has on average a beneficial effect.

    Or perhaps limit the members of one group to a number of doctor’s visits and hospitalizations that’s half the average for the area the study is performed over, and let the second group have as many of both as they’re willing to co-pay for. Then see whether the second group is healthier on average.

    To do the study correctly it would be necessary to keep the patients from paying for private medical care themselves, and keep them from getting medical care from foreign nations. They must not be given illicit medical care; if they get it sneakily they compromise the experiment. I doubt this project is politically feasible. But it could be done with volunteers, who might be subtly different from the rest of the population.

  • J Thomas

    does it imply that medicine is mainly reactive and if you go to the doctor when you don’t really need to, medicine shouldn’t have an extra benefit…. if there’s nothing really wrong with you, then why should medicine be able to improve your health?

    Yes, but the summary claimed that not only doctor’s visits but also hospitalization increased by 50%. When there’s nothing wrong with you, doctors ought to tell you there’s nothing wrong with you and not send you to the hospital.

    But while I looked at the details for various other sections of the report I didn’t look at that. If you have a complaint and the doctor needs tests done at a hospital, maybe that gets counted as hospitalization. It doesn’t have to implay anything is wrong with the medical system, although at first sight it would tend to imply that.

  • joe

    J Thomas,
    Thanks for the clarification. I wasn’t quite sure what conclusions you were going to draw about the effect of medicine since you said
    “But the people who had no copayments received approximately 50% more healthcare,” which I took as arguing with my previous statment about the ability to make causal conclusions regarding the effect of increased medical care. I’m glad we cleared that up.

    In regards to hospitalizations,
    “Averaged across all levels of coinsurance, participants
    (including both adults and children) with cost sharing
    made one to two fewer physician visits annually and had
    20 percent fewer hospitalizations than those with free

    We would have to make some assumptions regarding the mechanism resulting in a hospitalization. Doctor’s offices are not generally open on the weekend which causes some people to go to the ER, inability to schedule an appointment with your primary care physician could also lead someone to choose to go to the hospital. In regards to admissions, I would be curious to find out how many were kept for observation but not really treated for anything serious. Without the barrier of cost, many probably go to the hospital just to be safe, and when a hospital knows that your insurance is going to cover the whole visit, why not admit the patient… you would be stupid not to.

  • Jor

    Robin, as I’ve mentioned repeatedly, and I think some commenters at CATO also stated — the RAND study is so old, as to be useless. Medicine, especially the kind being assessed in the RAND study has almost completely changed since then. There are just too many new therapuetics — drugs and interventions — that have each individually been shown to improve mortality and reduce morbidity (many in multiple RCTs).

    At the turn of the century, Osler (considered by many to be the father of American medicine) thought that there were only 5 or 6 interventions in all of medicine that physicians did that were useful. In terms of the medicine measured in the RAND study, that was probably still the case in the 70’s.

    If you look at the top 100 mortality and morbidity reducing interventions today,(in the non-acute setting) and see what was available in the 70’s — I’d be surprised if more than 10 of those 100 were available or known in the 70’s. Hell, I’d be curious as to how many of the wide-spread interventions in the 70’s went on to have rigorous support behind them later on — probably not many.

  • J Thomas

    Jor, you have pointed out a serious problem. If medicine has almost completely changed in the last 25 years, so that the 1980’s studies are obsolete, what would happen if we did a new study that took 7 years? Would the results from the beginning of the study be approaching obsolescence before the study ended?

    It could be argued that if medicine is progressing too fast to do statistics on the results, that it’s progressing too fast.

    I’m old enough to remember the 1980’s. Back then we were saying that the medicine of the 1950’s was not very good, it probably did almost as much harm as it did good, but since then we’d improved tremendously. If we’re saying the same thing now about then, it leaves me with a certain nameless doubt….

    And with no way to dispell that doubt. If new inadequately-tested methods replace old ones faster than we can test the old ones, how can we ever tell how well we’re doing?

  • Konrad

    Giving anasthesia before surgery may not improve mortality rates, but it certainly improves quality of life. However, mortality is far easier to measure, so that’s what gets studied.

    If you start with the assumption that medicine is only (or mostly) about saving lives, you may come to agree with Robin Hanson. But would you try to measure the quality of policing by tracking Bad Guys Shot vs. Dollars Expended on Cops? TV dramas focus on police shoot-outs and ER docs because they’re dramatic, not because they’re representative.

  • Konrad, studies that look at non-mortality outcomes gives similar results.

    Jor and joe, neither of you answered my question. You can find flaws with any study, but finding a flaw with every study you see does not justify your believing anything you like. You must choose some basis for your beliefs.

  • J Thomas

    Robin, I figure that if none of the existing studies answer my question, then I should accept that I still don’t know the answer.

    If we accept that we don’t know, then we can decide what to do about not-knowing.

  • joe


    A poor basis for your beliefs can be worse than admitting that you do not have the proper evidence to make an informed decision.

    I guess the real question is what question do you truly want to answer? If you want to know whether giving free healthcare to people (aside from the very poor) results in overall health increases, the answer according to the Rand study is that it didn’t make much of a difference. I do not know if the findings would be able to be replicated today given the advances in medicine, but it would be interesting to find out.

    But instead, if you want to know what it means that their health didn’t improve and the implications for the benefits of marginal increases in medicine, then GOOD LUCK dealing with all of the important confounders since I study participants were allowed to choose their marginal health increases. This is NOT about study flaws, this is about what questions you can and cannot answer. The Rand study addressed a specific question and to that end, I do NOT believe the study was flawed.

    Honestly, the Rand results don’t particularly surprise me, but perhaps for very different reasons than you may have.

    Before you read the main idea of this paragraph, let me preface it by saying that the study population was fine to address the proposed main objective of the study, which was the effect of varying levels of insurance. From a public policy perspective, you want a study population that is typical of average Americans. But if you want to measure the net benefits of medicine, let’s not forget that study participants were average, healthy people and the mean age for study participants was early 30’s. Even if there is a net-benefit of medicine, medicine is not designed to improve the overall health of a healthy person. You see the results and say ah ha, this proves that medicine doesn’t work and must hurt as many average people as it helps. I see the results and say, yeah, what did you expect what was going to happen.