I assign my undergrad students papers on unusual policy proposals, and grade those papers on the number of important relevant arguments offered, pro or con. While I ask students to take an overall position on the policy proposal, this position doesn’t influence grades.

Overall my students oppose change, moderately favoring whatever is the status quo. So I was quite surprised to see them favoring change in the last paper I assigned in Health Econ.  85% of my students said yes to: Should all medical practice data be published, aside from data identifying patients?

The idea is to publish all births, deaths, disabilities claimed, and all medical records, including doc visits, med tests, drug prescriptions made/used, amounts billed, etc. Docs would be fully identified, but patients would be identified by age, gender, weight, height, etc. — not enough to tell which ordinary person it was. After a semester of seeing how little we really know about modern med, students appreciated that revealing all this info would greatly aid comparison shopping, of docs, hospitals, treatments, etc., both in quality and in price terms.

Could this be a politically feasible change from the status quo?!

Added 8a: Well of course even identifying people by age is enough to identify folks if done with perfect accuracy (e.g., born 23/12/87, 8:36:27.14625 am).  So of course the idea is to not identify age, weight, height with perfect accuracy.  Of course even then there would some rate of error when info is mistakenly revealed; but surely some rate is tolerable.

Added 10p: Wow – while some are concerned about violating rights and government power, no commenter here seems to think this proposal would reduce overall social welfare or economic efficiency! Seems most agree that this is a clear net win that just won’t happen anytime soon.

Added 10June: Karl Smith says reveal it all.

  • patients would be identified by age, gender, weight, height, etc. — not enough to tell which ordinary person it was.

    Regardless of everything else, it will be trivially easy to figure out who is this data about these days. See this for comparison point, and that was far less information that med data would reveal.

    So essentially what you’re proposing is full disclosure of everyone’s medical information. This might still be a good idea, but it would be very drastic.

  • I agree with Thomasz.
    It is surprising how little information you now need to identify people. Take the example of peoples netflix film queues http://userweb.cs.utexas.edu/~shmat/netflix-faq.html

    Say a doctors catchement area had 100,000 people in it. Gender roughly halfs the number of people, height even more so as does weight, and age really limits the possibilities. There are some privacy preseving transforms on this kind of data but they are very tricky.

  • Newerspeak

    Could this be a politically feasible change from the status quo?!

    Asked and answered.

    Students are disproportionately young and healthy. Ask if any of them have tried Viagra. You might get some answers like “Hell yes, and it was better than Ecstasy!”

  • David J

    Robin, accidental exposure of identifying information is hard to prevent. In a real-world situation it’s hard to see how the implementation issues could be separated from the other issues.

    Another example: the Netflix case from December ’09:

  • I just added to the post.

  • db

    I think this would be a very good thing, provided that someone found a way to actually anonymise the data. As others have said, this is much harder than just removing the names and might actually be impossible with a dataset as rich as this would be.

  • Indy

    How about teachers? How about welfare recipients? How about prisoners?

    In other words – is the instinct here really specifically medical in nature – or is it just a pro-epidemiology, better-management-through-more-and better-data attitude?

    I would think young facebook/google-generation students would be more optimistic and less paranoid about the potential of deep database collection and analysis in general.

    Remember also that there’s a faith/hope function to that optimism. There are lots of problems which have confounded American society for decades and seem immune to our best efforts and intentions. Health care, education, poverty, etc…

    If the new generation is to be other than resigned to their limitations, they have to believe the “secret” is out there – and that it remains possible that we (or they) can solve these confounding puzzles through rigorous empirical scientific means.

    The attitude of “Hope and Change and Progress and Improvement” and so on depends on believing that one’s difficult and seemingly intractable problems remain solvable.

    What alternative is there in general besides placing great faith in the power of statistics?

  • Steve Dodson

    Exact age wouldn’t be necessary. If you knew the person’s birthday, city of residence and a day on which they had an appointment, you could probably identify 90% of people. Add race and you could probably identify 99% of people in a minority race. If you knew the dates of two hospital visits, you’d probably be able to identify almost anyone.

    I would bet that you just need to know four common pieces of information from someone’s file or three uncommon pieces to identify a person 80% of the time. A really simple one is people who’ve moved: you just need the state or city they were born in plus where they live now. Very few people will have a combination of their age plus both state/cities, especially if the state/city is small.

    Your boss and coworkers would have this information almost certainly (they do, after all, notice you’re not around). There would probably be little you could do to protect identities from interested, capable parties without making the data useless. Employers, of course, are the obvious group which would qualify as both interested and capable. Anonymity collapses very quickly in the face of interested and capable investigators using large amounts of public, individualized data.

    The upside, or downside, is that it would put pressure on people to not use medical care because there would be a significant chance that it would be exposed. Obviously this might help control cost, but it would also make it more difficult to treat diseases considered embarrassing. STDs come immediately to mind.

  • Corbond

    Could this be a politically feasible change from the status quo ? ! {Should all medical practice data be published, aside from data identifying patients?}


    The proposal requires strong police sanctions/force to coerce widespread release/catalog of such private data.

    New ‘crimes’ must be invented to threaten/punish citizens & doctors who would choose not to release their private data.

    Exactly what new punishments are proposed to induce cooperation ??

    How would you specifically guarantee patient privacy in such a vast database ?

    Proposer sees no significant downsides whatsoever to this proposal ?

    Interesting how draconian social-planning proposals are always couched in such casual, innocuous terms.

  • michael vassar

    You could explicitly add random factors to the data as you collect it and scientists could simply anticipate and correct for those known random factors when analyzing the data

  • James Andrix

    I also think this would be very difficult to anonymize and keep useful. How would you obscure the dates of doctor visits themselves?

  • aram

    The way to anonymize properly is called differential privacy, and is an active area of CS research. This blog has a good discussion of online anonymity, and how easy it is to compromise it.

  • I’m 26, and feel like I’m sort of in between the status quo (“no, I don’t use twitter”) and an emerging generation (no, I don’t remember life without the internet”).

    Obviously, the men and women making decisions in our world are dominated by an older generation, who, by no fault of their own, are forced to adapt to a connected world… one where the rules are being reconsidered on a largely unfamiliar level.

    I’m confident that when your students are “in charge”, there will be much more well-informed power pushing for the sort of open data initiatives that we hear so much about these days.

    I work with an 18yo programmer, and it’s been my belief that these citizens that don’t know a world without the internet will more substantially change the way we interact… socially, economically, and in business.

    The medical field reminds me of the inefficiencies of an insurance firm I use to work for. I was, basically, the excel guy; and, just about all of their practices were up to 1993 standards. Both of these industries need improvement, and the sort of open health information you’re questions ask about is right along those lines.

    Will we see it soon? I wouldn’t say so. It’s a clash of two polar mindsets. I’m for it.

  • Do you read much Bruch Schneier? http://www.schneier.com

    He compares our treatment of data to pollutants in the 50s and 60s; that we’re currently creating a computing environment polluted with tons of data points, a structure which will undermine privacy, and that it’s a poor legacy to leave to our children. It’s because of this sort of structure, that a little bit of medical data would be enough to identify an individual.

    That being said – this type of data being available would greatly benefit us, unlike the trail created by facebook, google-analytics, etc. It seems to me that it ought to be possible to keep the raw data private, and only the analyses of it made public – but even if this weren’t possible, the advancements it would lead to in treatment are probably worth it.

    Whether or not doing so would be possible is a whole other story.

  • merodoacher2

    Imagine the ability to mine this data for new side effects of existing drugs.

    Do patients prescribed mianserin have longer lives on average?
    Is there any existing drug that incidentally delays the onset of dementia?

    A comprehensive public record of such a kind after anonymization may spur some interesting research directions.

  • Looks like my work (on reversing anonymization) has been linked to 3 separate times on this thread 🙂 It’s really unfortunate — I think most of us agree that there is a huge benefit to releasing this data, and most of your students concur as well, but it is highly unlikely that the privacy issues are going to be worked around any time soon.

    One compromise would be to release aggregate data on a variety of marginals deemed to be interesting/useful.

    • Arvind, what does it take to “work around” the privacy issues? Why can’t we just implement whatever is the current state of the art?

      • gwern

        De-anonymizing is like cryptography – you have to defeat not just current or past attacks, but the future ones as well. (A breach in someone’s privacy in 10 years is nearly as bad as one right now.)

      • The current state of the art is, for the most part, dictated by HIPAA, which mandates removing obviously-identifying information. (Like date of birth; whereas year of birth is OK.) This has worked reasonably well in the context of health information that is shared with specific parties like researchers who don’t have a malicious intent.

        Making medical records public is a whole different can of worms. The naive anonymization that is standard practice today doesn’t really prevent re-identification if you’re up against someone who can write code and can cross-reference the data with readily available auxiliary sources. (This is what I and other researchers have been going around demonstrating.)

        Think of anonymization as a keep out sign rather than a secure lock. There’s a lot of evidence that this might be a fundamental limitation. I would bet against the possibility of anonymizing data while preserving the level of detail that you’re hoping for.

        In the long run, I think society will be forced to move in the direction of lower privacy expectations (of course, that is already happening.) But for now it is going to be a constant tug-of-war.

        There’s a nice game theoretic formulation — if no one had any privacy over their medical records, there would be a stable equilibrium, and society as a whole would be way better off because everyone would benefit from the availability of data. Right now we’re in another equilibrium where no single party is willing to take the lead make data public, to the detriment of all.

        Veering off topic here, but I’m interested in finding out if economists have studied privacy from the game-theory perspective (and potentially interested in collaboration if it turns out there’s something interesting to be said). Any pointers would be very helpful. Thanks.

  • Buck Farmer

    Doesn’t the U.S. Census introduce noise on the micro-level that’s designed to cancel out with aggregate analysis?

    I’ve always been a fan of the (possible apocryphal) Dutch model…leave your windows open, do whatever you want, and studiously avoid looking into other people’s windows.

    I am filled with contempt for the employers (or for their clients) who care that you went drinking and partying in college. I hope that with the new generation, the equilibrium is less privacy but more tolerance.

  • This won’t fly for awhile — too many potential skeletons in the closet relating to race, sex, etc. What if it turned out that a good fraction of those claiming to be Native American (and enjoying benefits from that) were barely genetically distinguishable from Europeans? What if blacks are found to have a higher prevalence of some socially undesirable gene? Or if women are found to have a higher prevalence of neurotic-related genes?

    There’s already such a big stink about these matters, and we are very far from complete open access. Look at the BiDil brou-ha-ha for instance: lots of people don’t want to collect info on race even if it will prolong and improve the lives of blacks. To them, that’s just a Faustian bargain.

    • Firaga

      What if it turned out that a good fraction of those claiming to be Native American (and enjoying benefits from that) were barely genetically distinguishable from Europeans?

      In order to get native american benefits you have to be recognized by a tribe. A few tribes have “blood quantum” requirement that must be attested to by a blood test. Most require that you document direct ancestry to a person recorded on the Dawes rolls.

      There has been Indian interbreeding since Columbus so of course there would be a great deal of genetic similarity. That doesn’t negate the treaty history.

      My great grandmother was Cherokee but I don’t have her birth certificate to prove ancestry and none of the Cherokee tribes accept blood quantum tests.

  • Seems the universe of those who have the motive and the resources to reverse anonymizing process would be rather small. Make it a crime to do so, just as it’s a crime to release individual census data. I’d think the social benefits far exceed the potential.

  • Aron

    Privacy is a very expensive privilege. People should pay into the commons for it.

  • Nobody is going to want to go to a doctor who says he might reveal your medical records to insurers and future employers. Obfuscation seems likely to be insufficiently reassuring – in the face of professional data mining. So: the proposal seems impractical.

  • Hot intern

    So, I’m taking a class from Professor Hanson and he assigns a paper about medical policy. Should I write a paper that takes the conventional view or one that doesn’t?

    He promises that it is just the arguments that count in the grading, that he isn’t biased at all to be more inclined to grade more favorably arguments he agrees with (maybe he’s even given some anecdote about how he is impressed with arguments he disagrees with). Do I really believe that crap?

    So 15 percent of the class is not making the conventional move. Most of them probably not the best students, some too earnest (in Hanson’s class this likely interferes with getting the point too), some stupidly believing they can counter signal, and maybe one person correctly believing he or she can counter signal

  • Proper Dave

    “Should all medical practice data be published, aside from data identifying patients?”

    I say yes.

    OK now we implement this policy according to the above criteria and run into the “aside from data identifying patients”
    And you find that to do that, you only end up with very vague aggregates…

  • Tony

    The big problem here is that the interpretation of the data will not be entirely rational. Doctors that have high “success” rates under superficial perspectives of the data will be rewarded, and they will cater to those superficial perspectives in ways that undermine effective care. For example, they may refuse to treat patients with a low prospect for recovery, or encourage patients to not be treated for likely future complications.

    Evaluating physician effectiveness with this data will be VERY hard to do in a way that reveals their actual skill. Amateur statisticians will have a field day with unfair criticisms of good doctors, and bad doctors will game that system relentlessly.

  • Bo

    My wife and I talked about this for a while, and it made for an interesting discussion. We both assumed that it would be possible for someone to identify people based on medical history if a serious effort was made.

    I said that privacy is something to be valued, but does negative utility risk a privacy breach would pose outweigh the positive utility access to this data might represent? That is, I wouldn’t want my medical records released to the general public, but could I reasonably object if some minimal privacy-protection measures were in place?

    She brought up that people might shy away from needed-but-embarrassing procedures if they thought there was a chance that someone might be able to identify them and publicize the treatment after the fact. This would be especially harmful for people with depression, who have been abused, or, say, need an abortion. This effect, she argued, might persist even if steps were taken to limit access to the data (e.g. only to serious researchers), and even if the data weren’t released until long after the fact (e.g. even after the patient’s death).

    Overall, I was more in favor of letting medical records hang out and my wife was more against it. We were able to agree that we might support an opt-out version of this type of program, if there were some minimal sanitzation of the data (enough to discourage casual snoopers, not enough to seriously skew research opportunities), and especially if there was a significant time delay (e.g. decades) between a medical procedure and its publication.

    Great topic! What else do you ask your students, Robin?

  • Doug S.

    I’d be more inclined to favor something like this under a health care system that didn’t require you to hire a lawyer every time you see a doctor.

