Medical Quality Bias

Here is an old but puzzling phenomena:  people seem remarkably intolerant of allowing people to act on noisy measures of medical quality.  If a measure of medical quality does not perfectly correlate with quality, that seems to many a sufficient reason to prevent people from seeing or acting on the measure. 

Now any quality measure, even one that is very noisy, should allow you to increase the accuracy of your overall estimate, if you combine it with your other measures.  The only requirement is that you have some idea of the relative noisiness of your various measures.  When other quality measures are banned, the public must rely solely on noisy government quality measures, such as professional licensure and FDA approval.   

Apparently, many feel that many others are biased to drastically underestimate the noisiness of medical quality signals.  For example, we prevent hospitals from publishing mortality statistics, because such stats may sometimes be "misinterpreted."   A recent Washington Post article gives another example:

In the quest to control spiraling costs, insurance companies and employers are looking more closely than ever at how physicians perform, using computers, mountains of health claims and billing data and sophisticated software. Such data-driven surveillance … raises questions about the line between responsible oversight and outright meddling in the relationship between caregivers and their patients. And it shows how people such as Berkenwald are at risk of losing control of their reputations as corporations and other organizations mine electronic data to draw conclusions about them and post them online. …

Physicians who have been profiled, including those with top ratings, say that the data often contain errors and that doctors often lack the ability to correct them. The effort is more about cutting costs than raising quality, some say, adding that doctors could begin to "cherry pick" healthier patients whose problems are less costly to treat. Such systems fail to capture the intangibles of quality, such as a doctor who visits a dying patient at home, critics say. …

Last fall, Schiesser joined five other doctors and the Washington State Medical Association in suing Regence BlueShield, claiming defamation and deceptive business practices after the health plan told participating members that they no longer had access to about 500 doctors because the doctors did not meet the insurer’s quality and efficiency standards.  Regence spokesman Charlie Fleet said that because of the lawsuit, the company could not comment on the data issue. He did say, however, that the data were "provided from the physicians themselves."  In December, Regence abandoned its plan.

Doctors critical of ratings systems say they are held accountable for whether patients exercise, take their medications or follow their prescribed regimens.

I’m afraid this doesn’t bod well for Google and Microsoft health ambitions

GD Star Rating
Tagged as:
Trackback URL:
  • The complaints in the quoted example seemed to be pointing out that there wasn’t simply noise in the data. Instead, it had systematic biases as it ignored various aspects of medical care quality. These biases may not be political biases or even shortsightedness. There may be an honest attempt to measure quality, but even then, if they measure certain aspects and not others, they form a systematically biased estimate of quality. Thus its use is feared to create damaging incentives in the medical profession (cherry picking patients, not doing work on the unmeasured aspects etc). These seem like entirely rational complaints to me. Depending on the measures proposed, I can easily see regular publication of such data doing more harm than good or vice versa. Restricting the data to the government or supervisory bodies that understand its weaknesses may be the best solution.

    Even if it were simply noisy rather than biased in what it measures, it may still be worthwhile preventing the publication of the data. If it is very noisy, then its publication will likely cause more harm than good. For instance, people will drive longer distances to go to different hospitals based on nothing more than noise. As noise increases, these little costs will balance the benefit of the information. At some level, it will simply be wasteful to publish it (at least to allow its publication to hit the media etc). Perhaps this would not be a problem if people were all completely rational and understood issues of noise, but the very existence of this weblog shows that we do not believe this is true and we shouldn’t ignore the (sad) facts here. One approach would be to wait until there were enough data sets to bring the aggregate noise down to a low enough level and then publish them all simultaneously.

  • Stuart Armstrong

    There are a lot of bad reasons to try and control measure of medical quality. There are also a few potentially good ones, mainly focused around the fact that patient care is only partially measurable. Hence any noisy measure creates perverse incentives: to focus only on those aspects that can be measured, and ignore those that can’t.

    Interestingly, what can’t be measured are often those things the patients themselves prefer: considerate attitude, availability, willingness to listen and to take on patients with medical or financial difficulties.

    So, the market would tend to fill in the gaps left by the measurments. But this depends on the structure of the market. If people approach health care with the firm intention of finding the best doctor they can, noisy measurements are fine and helpfull. If instead they seek to find the first available tolerable doctor, then noisy measures can be a severe problem; the reward to an adequate doctor of increasing his position in the noisy rankings far outweigh the drawback of reducing unmeasurable care. This probably would cause a net loss (depending on the noisiness of the measures and the real importance of unmeasurable care).

    If the physician market is indeed structured in this way (I feel it is, but haven’t seen any statistics) then I would advocate putting the noisy rankings in a “Would have banned” shop and requiring those who want to use them to prove they are dedicated to the quest of finding the best doctor for them, rather than just stoping at adequate.

    If the physician market is not structured this way, then noisy rankings are fine and people should stop arguing against them.

    Ideas possibly also applicable to school and university rankings.

  • Stuart Armstrong

    Here is an old but puzzling phenomena: people seem remarkably intolerant of allowing people to act on noisy measures of medical quality.

    More details on who these “people” are (both sets)? Government, industry, doctors, etc…? (you use the example of doctors, but you also seem to have governments in mind).

  • Maybe the quality-info-restriction people think that people suffers from so strong biases that they will misinterpret the data seriously? But as Toby pointed out, the concern appears to be more like worries about biased quality info.

    When the british system was introduced,
    people worried that the best surgeons would look bad because they take on the hardest cases and hence have a lower survival rate, making surgeons risk-averse. A later study showed this did not happen,
    Treatment of high-risk and elderly patients actually rose slightly rather than fell, and (adjusting for the more risky procedures) mortality actually fell from 2.4% to 1.8%. It seems unlikely that patients were turned away because they were hard cases; rather it may be that the clinicians optimized treatment for a particular patient to improve the results, a desirable outcome. The study authors were very positive: “Given that the downside of disclosure is small and the upside is big, the results of the study should encourage other clinical groups to take this forward, rather than being driven by politicians or the media.”

    Note that this study did not check whether the patients got much use from the information or whether it was biased. But even the perception of openness may be important in a care setting to establish trust, which in turn is important for outcomes.

  • Toby, every quality signal will contain some systematic bias. And every systematic bias creates opportunities for incentives rewarding behavior to exploit that bias. So it only makes sense to restrict a quality signal if that signal produces much more such incentives than the other signals. Also, you seem to endorse the claim that people are biased to overweigh noisy signals; I don’t yet see much evidence for this claim.

    Stuart, I don’t understand why you think making available additional noisy measures hurt when people “seek to find the first available tolerable doctor.”

    Anders, an interesting data point.

  • Toby Ord

    Robin, I agree that signals will all contain some bias. My point is that given this it could easily be predictably bad to give out certain figures. It will not always be so, but any attack on a group for not allowing access to ‘noisy’ data will come down to questions about (a) such perverse incentives and (b) potential harms from poor public interpretation. Whether we should allow these things is thus an open question. I’m not saying that we shouldn’t, just that you seemed to be attributing bias to those who restrict such figure (‘remarkably intolerant’). I would think that the burden of proof would lie on those who suggest bias if there is a perfectly good explanation of how they could have come up with their conclusions in a perfectly rational manner.

    Anders, that result is somewhat surprising to me (I’ve heard lots of anecdotal things pointing in the other direction), and is potentially great news. I’d like to see similar studies on other released figures, as well as longer term studies on this one. It could take quite a while for the perverse incentives to be felt, and it may also be that things work fine unless the media jump on the results, so I’d like to see studies where that has happened. I see such ratings as potentially beneficial and I would be happy to see my skepticism proved wrong.

  • Toby, in many years of following such arguments I have virtually never seen someone offer a concrete argument why additional private quality signals would produce more perverse incentives than the existing government quality signals. People always seem content to point out the mere possibility of such perverse incentives. When A suggests that B is biased, and then C suggests A is biased to think so, I don’t see how a relative burden of proof can go onto either A or C based on the fact of claiming bias – both sides are claiming bias.

  • michael vassar

    This is probably just another endowment effect or status-quo bias issue justified by the fear of base-rate neglect and failure to integrate multiple pieces of information,

  • Scott Clark

    I think it is interesting to point out that the first example of a “quality” measure left out of the ratings was a pure “showing that you care” example. The complaint that these measures “fail to capture the intangibles of quality, such as a doctor who visits a dying patient at home”. Seeing a dying patient at home is exactly the type of thing that a pure cost-benefit (from the standpoint of the insurer and the insurer’s customer, generally the employer) minded party would be trying to cut out of practice.

    Opinion should move closer to the direction of the validity of the showing that you care theory.

  • Often it boils down to this:

    If you are dying, do you prefer to have Dr. House or a charming quack?

    I’ll take Dr House, anytime.

  • Jor

    I have to look at the paper Ander’s sites also, but I would also say that the systematic bias in quality measures — and the ease with which those measures will be “gamed”, is a bigger problem than their random “nosieness”.

    Perverse incentives are a big problem in medicine, and I don’t know if we need another source for metrics that might be of questionable utility and easily distorted. If you need an example of perverse incentives, just look at some of the financial incentives in medicine, and you can easily see the over-proceduralization and excess diagnostic studies performed due in part to the absurd re-imbursement system we have in place.

  • Luis, the question is what clues can you use to determine whether the doctor you are considering is a quack.

    All, every product where we get quality clues can suffer the same problems. Yet we usually allow people access to many quality clues. So,

    Michael, why would base-rate neglect be a worse problem in medicine?

    Jor, why would perverse incentives from quality measures be a bigger problem in medicine? (And as I asked before, why do added clues have worse problems than the basic clues?)

  • anon

    “For example, we prevent hospitals from publishing mortality statistics, because such stats may sometimes be “misinterpreted.” ”

    I am assuming that since that put misinterpreted in quotes, you don’t see this as a valid argument against publishing such stats.

    The problem with a single summary statistic such as death rate is that it is not very meaningful without more information concerning potential confounders such as percentage of high-risk patients. Of course, I am sure that you wouldn’t make such an error, but what about the general public who isn’t very well-versed in statistics. I know that the general public thinks that they want info and stats about everything, but there are some statistics which can be EXTREMELY misleading to someone who doesn’t know any better… and they wouldn’t have a clue.

  • Stuart Armstrong

    Stuart, I don’t understand why you think making available additional noisy measures hurt when people “seek to find the first available tolerable doctor.”

    well, the main noisy piece of info that we already have is government approval or registration. This involves a lot of studying, and work experience at low pay. Some of this will be signaling, some will improve doctor medical competence.

    The cost of this sort of noisy signal is to reduce the pool of available doctors, and the benefit is to filter out some medically unfit doctors at the start.

    Now consider the noisy signal of a biased medical league table. Add the following premises:
    1) A given doctor can improve in the league table not by becoming better, but by transferring efforts from one domain to the other.
    2) This transfer is medically detrimental.
    3) People seek the first available tolerable doctor.

    To simplify the model, assume a level of “tolerability” going from 0 to 100, with 50 being tolerable. Assume the transfer of effort costs the doctor 5 points on the tolerable scale.

    Then any doctor with a tolerability above 55 should opt for the medically detrimental transfer of effort, since it will not cost him, and will benefit him if anyone is paying attention to the noisy measure of quality. Only doctors with tolerability in the 50-55 range will be motivated to actually improve. (This is wildly over-simplistic model, but it does capture the essence of what could happen, if the premises are reasonably correct. A more sophisticated and realistic analysis would rest on the marginal gain of transferring effort, compared with the marginal loss of reduced patient experience).

    Compare that with the one-off cost of government approval. Its main detrimental effect is to increase costs. Its main benefit is to reduce gross medical error, and reduce (somewhat) the efforts a patient needs to furnish to find a tolerable doctor. Depending on the cost/benefit there, this could result in a net good (as the cost of a gross medical error is probably much higher than some extra monetary cost).

    So to sumarise: there exists market structures that can make some noisy signals beneficial (mainly one-off signals), and others detrimental (updated biased league tables). That market structure does not seem totally unreasonable. Therefore we need to look at the data to see if the market actually has that structure. Anders’ example is an argument against this.

    My personal prediction is that biased league tables will result in an increase in quality among bad doctors, and a decrease among good doctors. Anyone know if this prediction is born out or refuted by the evidence?

    PS: this argument probably falls apart (even in my simple model) if one considers biased league tables in the absence of a governmental approval scheme. There, the amount of bad doctors in the system will be much higher, so the benefits of the league tables will become substantially higher than their drawbacks.

  • Stuart Armstrong

    A more sophisticated and realistic analysis would rest on the marginal gain of transferring effort, compared with the marginal loss of reduced patient experience

    And, I forgot to add, the marginal medical loss for transferring effort.

  • Stuart Armstrong

    Here is an old but puzzling phenomena: people seem remarkably intolerant of allowing people to act on noisy measures of medical quality.

    I strongly doubt, however, that those advocating suppression of noisy measures have a proper economic model. They seem to have much more intuitive models, and focus only on the cost, not the benefits.

  • Stuart, I agree that it is possible for government clues induce less bad incentives than added other signals. My question is what reasons anyone has to think that it actually does.

  • Stuart Armstrong

    My question is what reasons anyone has to think that it actually does.

    I actually feel that it does, because I feel the medical market is quite close to the description I gave of it. The reasons for my belief are all subjective (personal experiences, friend’s experiences, and doctors-who-are-friends experiences, and – even worse – newspaper reports), so my belief isn’t very strong, but it is there (and very specific to the perverse-incentive aspect of noisy measures). Anders’ example has undermined my belief to some extent though.

  • Stuart Armstrong

    PS: been talking this issue over with some American friends, and they see the medical market very differently. There may be a Continental Europe versus US issue here.