Variance-Induced Test Bias

Discussion of the Science article on gender differences in math test variance got me thinking.  Since a test score is a noisy measure of some underlying ability, an unusually high score can come either from an unusual high ability, or from an unusually positive measurement error (or both).  If higher male score variance is due more to a higher male ability variance than to a higher male measurement error variance, then a high female score is more likely to be due to measurement error than is the same high male score.  If so, treating the same score value as the same ability, independent of gender, as is common in school admissions, creates a bias (vs. men) in favor of high scoring, and against low scoring, women.

More precisely, assume that each test score s is a sum s = a + e of an ability a and a measurement error e, and that ability and measurement error are normally and independently distributed with variances A and E.  This implies test score variance is S = A + E, and that mean (and median) ability estimates given scores s are E[a|s] = m+(s-m)*(1-E/S), where m is the mean score.  The discounting factor, 1-E/S, is between 0 and 1.

Now assume men and women have the same mean score m and measurement error variance E, let R be the ratio of male to female score variance, and let N be the ratio of measurement error variance E to female score variance.  In this case, the ratio of female to male discounting factors is (1-N)/(1-N/R), which is < 1 for R > 1.   For example, if R = 1.16, the mid-estimate from the Science article, then for error fraction N values of 0.1, 0.2, or 0.4, the discounting factor ratios are 0.985, 0.967, or 0.916 — female scores must be discounted by these factors (relative to mean scores) to be fairly comparable to male scores.  For example, applied to the math SAT (female mean 504) you’d want to subtract off (again for my sample N values) 3.7, 8.2, or 20.7 points from a 750 point female SAT score to make it comparable to a male score.  (For a 600 point score, you’d subtract 1.5, 3.2, or 8.1 points). 

No doubt there are many other factors to consider in comparing male and female candidates, but do any schools make such corrections?  Are they even aware of this bias?  Are they aware but uninterested in correcting for it?

Added 4Aug: College Admission Futures would solve this problem and many more.

GD Star Rating
loading...
Tagged as: ,
Trackback URL:
  • Matt Huang

    Not only am I convinced that schools are not aware of this bias, but I would be willing bet that even presented with the evidence of such a bias, they would refuse to correct for it.

  • http://knol.google.com/k/james-miller/james-miller/1j9f9ffxxeue5/1# James Miller

    Too bad for you that your result didn’t come out the other way. If you had come up with a statistically valid justification for subtracting off from high scoring male SAT scores then you would have been declared one of the most important academics of our time by many politically correct professors. And I bet you could have made millions consulting for elite college admissions offices.

  • conchis

    Not quite the same issue, but Oxford was in the news a couple of years back when it emerged that some admissions fellows were discounting female students’ grades on the basis that they were more likely to reflect conscientiousness than talent.

    What would we expect about the degree to which score variance is driven by ability vs. errors? I know there have been suggestions that females are more risk averse test-takers, which would presumably lower their error variance relative to males. More general concerns about stereotype threat etc. could also suggest that the expected error in female scores is negative — though I’m unsure how much weight to put on such factors.

    More generally, I would think that there must work out there on the predictive ability of tests scores in determining some set of relevant outcomes (pass rates, job placements, etc.) for males and females, which might be more directly relevant to the question?

  • Andrew Breese

    I love this post. Narcissistically, cause I’ve thought of it too 🙂

    Another way of thinking about the core issue: Information/evidence (here, about the college candidates: sex or race or SES or library rental history) ALMOST ALWAYS continues to MATTER; better, newer information just reduces its importance. New information (an SAT score, a single day’s results from random questions) hardly ever completely subsumes (obviates) all the earlier evidence.

    Call all the Ashkenazim who score 700V “Judd.” Call all the gentiles who score 700V “Bob.” I would make so much money if anyone wanted to bet on the Bobs’ versus the Judds’ retests!!

    Matt Huang is completely right, and dryly understates it. Schools actually “correct” in the OPPOSITE directions by being overly impressed with unusual results, both as a matter of public policy (“diversity”) and by going gaga over dramatic miracle-type stories (doing poorly in the past is “hardship”; the more it contrasts with a current test result, the more is “overcome”).

    @James Miller: Indeed.

  • conchis

    P.S @James Miller: it’s easy enough to get the result to come out the other way. Just fix the ability variance A equal for males and females, instead of the error variance E. Then the discount ratio for females to males is just R.

  • http://profile.typekey.com/halfinney/ Hal Finney

    From what I could find, approximate mean and standard deviation for SAT math scores are 500 and 100, so variance is about 10,000. Robin’s exemplar ratios N of measurement error variance to score variance, 0.1, 0.2 and 0.4, would therefore correspond to measurement error standard deviations of 32, 45 and 63 respectively.

    Juniors who re-take the SAT as seniors gain an average of 13 points while 1 in 25 gain 100 points or more. If we assume that the 13 points is due to greater experience and intelligent, this suggests that 4% of students score 87 points above expectation due to measurement error. That would imply a measurement error SD of about 50, roughly in the middle of Robin’s sample range, close to the N=0.2 value.

  • http://entitledtoanopinion.wordpress.com/ TGGP

    GNXP on the issue here. With a picture for those who don’t feel like reading.

  • http://neuraltransmissions.wordpress.com MZ

    No doubt there are many other factors to consider in comparing male and female candidates

    No doubt. Correcting the scores by 1, 5, or even 20 points doesn’t make much of a difference in the overall quality of an admissions application. So you have to consider the cost-benefit ratio of implementing the corrections. It’s probably not worth doing.

    I know a guy who got into Princeton with a 1200 on the SATs. Not bad, but not stellar either. He also wrote articles for the local (not school) newspaper while in high school and was an accomplished violin player. Just goes to show that you can make up for hundreds of SAT points with other skills, talents, and accomplishments.

  • http://hanson.gmu.edu Robin Hanson

    Hal, retest variance gives a lower bound on error variance; we also expect a systematic error, which is the same for each retest. If a systematic variance was about the same size as an unsystematic variance of N=0.2, then we’d really have near N=0.4.

    MZ, so you’d predict that if some school did make this change people wouldn’t bother to lobby to change it back?

  • http://www.scienceblogs.com/gnxp razib

    the berkeley math dept. is sexist!

    also, your blog makes me want to throw up 🙂

  • http://www.iSteve.blogspot.com Steve Sailer

    Right, as was first pointed out to me in 1999, the black guy who got a perfect score on the SAT is less likely to retest at that level than the Ashkenazi guy with the same high score. So, a Bayesian approach to college admissions would knock some points off black high scorers on the grounds that their impressive scores are more likely to be flukes.

    But no college does that, and it’s easy to see why. Colleges have a hard enough time defending the use of the politically incorrect SAT without doing something so prima facie unfair that it can only be justified by an insight so sophisticated that it eluded even the Overcoming Bias boys for years.

    Overall, it’s better just to rely upon the relatively high consistency of the SAT and ACT in a colorblind manner than to play Bayesian games, as interesting as they are to contemplate.

  • http://hanson.gmu.edu Robin Hanson

    Steve, shouldn’t we recommend all bias corrections we can think of, regardless of how politically popular they are? If we only suggest politically popular corrections, how sure can we be that will add up to a net improvement?

  • http://www.thechiao.com/wordpress Chiao

    Is there any reason to suppose that the error would be additive, as opposed to multiplicative? What if S = A * E?

  • Andrew Breese

    Robin,
    shouldn’t we recommend all bias corrections we can think of, regardless of how politically popular they are? If we only suggest politically popular corrections, how sure can we be that will add up to a net improvement?

    If any of us suggested ALL the corrections we understand (as obsessive expert generalists)…we’d be ignored (as out-of-touch) before having a chance to be appreciated.

    Relatedly, Eli is right to build some complicated ideas up, piece by piece.

    Your personal list of “absurd” beliefs is awesome (to me) and laughably outlandish (to most, including essentially everyone whose powers we hope to convince). For a relatively uncurious (overconfident) audience, would you want to LEAD with that list?

    Sadly, we can’t be much SURE that ANYthing we advocate will lead to good (due to likely nonreactions, bad reactions, misapplications, unintended consequences…). In various contexts, we should just “pick our spots” and work/hope for the best.

    There is TONS that is simply “off the table” nowadays, or any existing polis. Steve Sailer, ironically (because you’re asking HIM if “we” should be more extremely whole-truth-and-nothing-but), addresses more politically infeasible topics more in-the-trenches persuasively than anyone I know…and his underpopularity speaks volumes.

  • Douglas Knight

    How can we tell whether males have larger error variance or ability variance? If it’s per-test error, it’s easy to measure, but is the data available?

  • Carl Shulman

    This has been discussed at Gene Expression in years past. I agree that the prima facie unfairness is too great to transparently implement an algorithm that made use of this information. Even if it were understood to improve accuracy, advocates would raise a danger of a positive feedback effect, where members of groups to which a statistical penalty is attributed feel the system is rigged against them and disidentify with it, further reducing their performance and increasing the penalty.

    A related phenomenon is that when your measure of something that varies between individuals (wealth, intelligence, etc) is noisy, you will tend to underestimate the degree to which it is influenced by any particular factor (genes, in utero environment, schooling, etc). Psychometricians do know about the effects of imperfectly reliable tests and at least sometimes adjust for them.

  • Lara Foster

    Blech- you all need to get over test-scores and priors an just look at what people *do*. The truly talented will make their way whether or not they get into princeton… Tests seem to grossly undervalue real-world ability… navigating bureaucracies, dominance hierarchies, basically getting things done… What’s truly annoying is when someone who is *clearly* talented by all real-world measures is ignored for not having the proper intellectual pedigree… No Ivy-league degree? Spent 5 years as a real-estate agent in Bejing? 1 and 1/3 kids? Blah! Away with you heathen! Loser fuck-up!

  • http://hanson.gmu.edu Robin Hanson

    What is the point of social science, or indeed any analysis, if we only ever use it to offer support for beliefs people already have? My argument above is very simple and solid, and took only four paragraphs and two simple algebraic equations. Why ever bother with a twenty page journal article, or a three hundred page book, full of math if even this short post is too complex to persuade anyone who doesn’t already believe the conclusion? Shall we just wait until society decides, for example, that we need affirmative action and then go searching for data and models to support that conclusion, and then wait for society’s next political fashion so we can go find support for that?

  • http://jamesdmiller.blogspot.com/ James Miller

    Robin wrote “What is the point of social science, or indeed any analysis, if we only ever use it to offer support for beliefs people already have? My argument above is very simple and solid… Why ever bother with a twenty page journal article, or a three hundred page book, full of math if even this short post is too complex to persuade anyone who doesn’t already believe the conclusion?”

    Colleges do a horrible job at rationally considering issues of race and gender. Thus, the fact that the argument you present here won’t persuade anyone to change their position doesn’t imply that on most issues college professors are unwilling to be persuaded by logical arguments.

  • Andrew Breese

    if even this short post is too complex to persuade anyone who doesn’t already believe the conclusion?

    Nah. It’s merely too complex to persuade 99%+ of those who weren’t already there 😉

    (To be persuaded, someone must both understand your analysis on its face + be confident that there’s nothing for another side that’s unspoken and more important.)

    All good arguments matter to SOME people (however few)…and that has to be enough to justify publishing. I’m reminded of the “Elite For The Elite” Demotivator a few weeks back.

    Robin, your broadcasting niche is well-established to include WEIRD. So, don’t hold back! (…much, if any…)

    …though I’d be curious to see the different reaction (even here at OB) if you posted the obvious even stronger and more striking version of your 4 simple paragraphs for racial differences (…with their differing means as well as variances) instead of gender. Bet on the amount of firestorm (now or later)? On impact to your career? (especially any govt-funded advisory role you might accept…)

    I wish each truly great WEIRD idea (like Idea Futures) could attract several popular advocates with otherwise perfectly mainstream reputations…

  • Andrew Breese

    Shall we just wait until society decides, for example, that we need affirmative action and then go searching for data and models to support that conclusion, and then wait for society’s next political fashion so we can go find support for that?

    Whether we’re the ones to do it or not…surely you’ve noticed that the marketplace most rewards EXACTLY THAT! So, it will get done (however tortuously — e.g., Rawls) and it will break big.

    Perhaps the most effective subversion (of a rising dominant trend you actually oppose) would be to become its best spokesperson (and therefore gain chances to limit its scope!!)…if only one were psychologically capable of such.

  • Carl Shulman

    Robin wrote:

    “What is the point of social science, or indeed any analysis, if we only ever use it to offer support for beliefs people already have? My argument above is very simple and solid, and took only four paragraphs and two simple algebraic equations. Why ever bother with a twenty page journal article, or a three hundred page book, full of math if even this short post is too complex to persuade anyone who doesn’t already believe the conclusion?”

    The post is not too complex to convince many quantitatively sophisticated academics that the bias discussed exists, but it is too complex for many other academics, not to mention unsophisticated, rationally ignorant, rationally irrational voters, particularly if it is presented inaccurately by the media or political advocates. The people to contact on this would be the Educational Testing Service’s research division or the College Board, organizations of psychometricians that produce score interpretation recommendations for their tests. I expect that they will understand the point of the post but will also explain that the political pressures they operate under prevent them from drawing attention to it.

  • Lara Foster

    Robin- Nothing in your post proposes what to do about said bias, or even suggests that anything should be done… It is an exercise in simple mathematical equations and that is *all.* What did you think we would say, “Oh dear! Women with high math scores *still* are likely to suck at math… better not hire all those female grad-students just because they got high GREs!” I mean, God, what *was* the point of your post? “Nah nah nah nah nah nah- you actually are different in spite of equally test scores! I’m smarter than you are ’cause I gotta penis! And *that* prior is the most reliable predictor of success *anyone* has to go on! I mean, how better to cut down half of the work of selection, then automatically taking a man over a woman! I mean, look at history: Napoleon, Mussoulini, Hitler, Churchill, General George Washington, every US president for that matter! Women… what’s the point… Oh yeah, sex! Damn bitches seem to want insane criminal types for some reason…Gee… Stupid hoes… You know, prisoners do really appreciate it more than I ever could… Maybe I should spend 5 years in the slammer ‘cuz then I’ll actually get laid! Maybe they could teach me how to keep my hoes in the rows…”

    Really.

    And you wonder why women can’t stand the things you post?

    There are few enough people willing and capable of doing what is necessary without excluding them prima-fascia for not fitting your priors. Stop trying to divide people by color, gender, sexuality, whathaveyou and come up with some real-world, actually useable solutions already.

  • Aspiring Vulcan

    Lara, if it was such an “exercise in simple mathematical equations”, why are you getting so worked up over it? I wonder how the world will progress when a man is not allowed to post simple mathematical equations without accusations of being sexist.

    And why can’t a person post a problem without proposing a solution? I don’t see any precedent for this attitude in the world of science. That’s like telling Young, “Oh, you performed that double slit experiment! You’ve just shown Newtonian Physics is wrong. Whats the point? I don’t see your solution to the problem. Go work on a solution instead of coming up with problems. Idiot.”

  • http://www.cmp.uea.ac.uk/~jrk Richard Kennaway

    conchis: Not quite the same issue, but Oxford was in the news a couple of years back when it emerged that some admissions fellows were discounting female students’ grades on the basis that they were more likely to reflect conscientiousness than talent.

    Interestingly, in the cited discussion, some people argued that men excel disproportionately in various fields because of their greater “obsessiveness”; and this intended not to diminish their achievements but merely to explain why they achieve. But it seems that in the eyes of some Oxford admissions fellows, when women work hard, it’s mere “conscientiousness”.

    Perhaps that is specific to the culture of Oxford University. I have heard someone from Oxford denigrate someone as having merely a “hard work first [class honours degree]”. Is that a concept that would even occur to anyone at an American university?

  • Carl Shulman

    Knowing what a Bayesian approach tells you about the expected performance of various candidates is useful information even when you want to trade off the feature being measured for the promotion of social integration, or to reduce the salience of such unchosen features: it helps to measure the extent of discrimination and to make the tradeoffs clear. The political conflict over such analysis comes because people don’t want others with different values or biases to be bolstered in views that they oppose.

  • Vladimir Slepnev

    Lara, Robin has identified an existing bias in favor of women, not proposed a policy in favor of men.

  • Tim Tyler

    The GNXP references given so far do not appear to mention the topic under discussion here. Rather they simply discuss the well-known male/female variance difference.

  • http://yudkowsky.net/ Eliezer Yudkowsky

    I thought over this issue a while back, and concluded that the argument against such “corrections” is not so much an argument against the principle of regression to the mean, but rather, an argument that people’s social reputation should be construed in such fashion as to depend only on proximal factors under their volitional control.

    The point of good reputation / bad reputation is not just to provide information about people, but to encourage people to behave well in order to acquire good reputations – this requires reputations to depend on behavior. Consider the effect on the Tit for Tat equilibrium, of some agents starting off with a string of Defections that they never actually made, listed against them.

    If you want to distinguish gang members and, say, not offer them taxi rides, then discriminate against a certain style of clothing, not against people colored black. Clothing is volitional, color is not.

  • steven

    Eliezer, that sounds to me like an argument to make corrections for both non-volitional and volitional factors, and then make some *extra* corrections for volitional factors where appropriate. It also seems important not to let game-theoretical factors like this influence your actual beliefs (e.g. your estimate of someone’s intelligence) as opposed to your actions.

  • steven

    I mean, what you’re saying is that, when two people score the same on the SAT but prior knowledge says one is probably a bit smarter than the other, we should discount this prior knowledge because doing so rewards the stupider one for doing better on the SAT than reflects his intelligence, even though doing better on the SAT than reflects your intelligence doesn’t benefit others in any obvious way.

  • http://hanson.gmu.edu Robin Hanson

    All, I find it curious that my argument is seen as being against women, when it is only hurts high scoring women – it helps low scoring women. Why is it that when we discuss wages the usual presumption is that it is better to help low wage folks, even if that hurts high wage folks, but the opposite presumption holds for female test scores?

    Eliezer, with steven I don’t follow you – we are usually comfortable with social reputations that depend on a great many things not under volitional control. People have social reputations for being pretty, witty, having popular friends, having successful or failed businesses, and so on. And the adjust score I propose would still be under your volitional control.

    Lara, I thought I was clear about my proposed solution: adjust the scores when ranking candidates.

  • Lara Foster

    Vulcan- That was posted in response to Robin’s comment: “Why ever bother with a twenty page journal article, or a three hundred page book, full of math if even this short post is too complex to persuade anyone who doesn’t already believe the conclusion? Shall we just wait until society decides, for example, that we need affirmative action and then go searching for data and models to support that conclusion, and then wait for society’s next political fashion so we can go find support for that?”

    Which I interpreted as suggesting the need that we actually *do* something about women’s test scores not meaning the same thing as men’s, and even women’s achievements having more to do with luck than skill just based on variance and priors… Indeed, he just said: “Lara, I thought I was clear about my proposed solution: adjust the scores when ranking candidates.”

    Which is absurd and dangerous! It doesn’t matter if he’s right about the math- when you start introducing systematic discriminatory practices based on priors beyond people’s control, you get totalitarianism… It’s just… DISGUSTING. And from a *libertarian*??? INCOMPREHENSIBLE. I mean, lets make it so black people *can’t* vote again, since statistically they are less likely to be up-to-date on the news… I’m disturbed that so many people are flocking to Robin’s aid on this… I guess I can start to see how Hitler was able to convince the Germans that Jews were sub-human in spite of having any seeming humanity…

    Robin, defending this disgusting prejudgement: “People have social reputations for being pretty, witty, having popular friends, having successful or failed businesses, and so on. And the adjust score I propose would still be under your volitional control.”

    BAD EXAMPLES!!! Pretty: Lose the fucking weight, dress better, get nicer beauty care products… trust me, unless you’re quasimodo, this makes a *temendous* difference in how you will be perceived by others. You don’t need to be blessed with ‘natural’ beauty, whatever that’s supposed to mean anyway.

    Witty: Well, that’s just obvious. Go read an Oscar Wilde Play and go to a few bars to practice… Demosthenes? Anyone?

    Having Popular Friends: Now this is a moral choice… Do I *use* that person? I don’t like to, and so opt out, but yeah, this would not be a problem if one was determined just to look good for the judges… Maybe I should change my name to Larry and start dressing up as an orthodox Jew to please the great and powerful Hanson…

    Having failed or successful businesses: Well, there is an aspect of luck here, but there is also a tremendous aspect of skill and foresight involved. A friend of mine made it semi-big in the dot-com bubble, didn’t sell his company, and lost it all… Learning experience maybe?

    But don’t listen to me… I’m just a woman… My priors tell you that in all odds anything I have done is more likely luck… They asked me if I cheated on my CTBS memory test with the word associations when I was in 3rd grade, cuz I got them all right… Oh, lucky me…

  • http://hanson.gmu.edu Robin Hanson

    Lara, there is difference between having some control over a reputation factor and having complete control. You would still have some control over adjusted scores, just as you have some but far from complete control over being pretty.

  • Lara Foster

    Robin- you just don’t get it. How big of an effect do you really think this variance has to be justifying deliberately discounting it? You, of all people, should know that human beings are extraordinarily bad at appropriate mental calculations… let’s just say it’s a 2% discount.. well, you tell the judges that and in their minds there’s this difference that they know they have to account for, but it’s in what in total is a subjective decision… so Jane is slightly more qualified than Jim, but we know Jane’s a woman, and we *should* discount for that, so take Jim… Now Moira is more than slightly more qualified than Marco, but this bias is already in the judges head, so they *overcompensate* and take Marco, reflexively… This is to a large extent *already* how it is without giving the judges more ammunition for automatically taking the man over the woman or the white over the black… Do you really think women are overvalued in science? I personally don’t like affirmative action, because it casts a shadow on the abilities of the people who are there legitimately… ‘Black woman? Oh, affirmative action case, obviously.’ Then again, would the judges ever let her in without it?

  • Tim Tyler

    Why is it that when we discuss wages the usual presumption is that it is better to help low wage folks, even if that hurts high wage folks, but the opposite presumption holds for female test scores?

    Excessive taxing of the rich and giving to the poor would be bad – if it caused all your talented individuals to emigrate to where their efforts would be better appreciated.

    However, the fact that the rich tend to set the laws, control the police, run the country, and control the voters like sheep may mean that the extent of wealth inequalities is larger than is necessary for efficient operation – at least in some countries.

    I don’t see an analogous effect for female test scores.

    Also, the distribution of wealth is a bit different from the distribution of intelligence – the tail of the distribution of wealth extends mosly in one direction – so there are many poor folk and only a few rich ones.

  • Tim Tyler

    The taxi driver replies:

    Look, I’m trying to stop my taxi getting trashed, and make sure my fares pay. I prefer to leave the job of self-sacrifice aimed at reforming society up to other people. Also, isn’t much of the point of reputations that they are difficult to forge? If you could buy a reputation as easily as you can buy a suit, all my fares would be wearing them.

  • Chris

    Robin: *My argument above is very simple and solid, and took only four paragraphs and two simple algebraic equations. Why ever bother with a twenty page journal article,…*

    I’m persuaded by your math, but I’m not the average person, or even the average academic. There is value in a little more exposition, especially for a fact of such general interest.

    My suggestion: write a somewhat longer explanation for people less familiar with statistics. A short explanation of the statistics you use (I know it’s very basic), an analogy to something non-controversial (e.g., pixel values in an image) showing that this is just a general result of noisy measurements, followed by your conclusion.

    You might also look for a place in academia where female test scores overpredict performance, and see if the ratios there agree with your data.

    I’d also suggest leaving out the normative conclusion that we *should* correct for this bias. Stick to the facts, it makes irrational criticism more difficult.

  • http://yudkowsky.net/ Eliezer Yudkowsky

    What I’m suggesting is that “reputation” is a construct that glues society together. It is not the same as your expected performance. “Reputation” is something you’re supposed to earn only by actual performance, because part of the point of keeping track of reputations, and treating people solely based on their reputations, is to encourage performance. If you allow anything into the reputation-construct that isn’t performance, even the smallest shred of it, human beings react to this as tremendously unjust and they will cease to respect the system. (At least if they have the prior sense that the metric was supposed to be performance-based. This is one of those systems with a selective off-switch – i.e., all the fantasy novels about the heir to the throne with royal blood, and so on.)

    If you find yourself saying something like, “People shouldn’t be subject to systemic penalties they can’t avoid, for something they never did, that they can’t possibly dig themselves out from under no matter what they do,” then this is the moral intuition I’m defending and trying to formalize. Deploying any possible test you like, is one matter – there’s at least the theoretical possibility of trying harder and scoring better. If you penalize someone for sex or color, they can’t change their sex or color by trying harder.

    Now, yes, there are kids out there who won’t score 1600 on the SAT no matter how hard they try. But that’s a different category of injustice, that will take a higher order of technology to remedy.

    If you can’t see the difference between these types of injustice, then I suspect you of trying too hard to ignore it.

    And yes, the second kind of injustice is actually much worse – but the first kind of injustice is a human injustice, not an injustice of Nature; so it is easier to remedy; and therefore, easier to acknowledge as a problem.

  • http://hanson.gmu.edu Robin Hanson

    I agree we should be careful to separate data from inference about data, and I agree we are often irate that others draw what we consider to be inaccurate inferences about us from data about other people. But most uses of the word “reputation” surely do depend not only on what people did, but also on factors outside their control, many of which can be greatly influenced by the rest of us. For example, whether you get accepted to a school depends now on your nation and state of residence, whether an admission officer just likes your style, whether other students voted for you for class president, whether a teacher liked you enough to write a nice letter of recommendation, whether you were pretty enough to be selected for the lead in a play, and whether an school paper editor thought your articles popular enough with readers. Even your actual SAT score depends on how many times you can afford, in time and money, to retake the test (most schools just use your max score).

  • steven

    It’s hard enough for me to calculate my actual beliefs, I’m not sure I also want to have to calculate the beliefs I would have had if I knew some things but not others.

  • steven

    …and worse, people might confuse those two things.

  • Lara Foster

    Robin- And you want to add more unjust appraisals to this list, why?

  • j

    The results of ONE test are highly unreliable. It does not filter out a certain percentage of lucky fellows, those wunderkinder who later, unavoidably fail. It could be easily avoided by retesting the outliers.

  • Aspiring Vulcan

    Lara, I agree with some of what you say, especially this:

    so Jane is slightly more qualified than Jim, but we know Jane’s a woman, and we *should* discount for that, so take Jim… Now Moira is more than slightly more qualified than Marco, but this bias is already in the judges head, so they *overcompensate* and take Marco, reflexively

    However, I’m not sure if I agree with “when you start introducing systematic discriminatory practices based on priors beyond people’s control, you get totalitarianism”. We systematically discriminate against kids, by not allowing them to drive until they’re 16/18 because statistics show that kids really suck at driving cars.

    Also the Hitler comparison was unnecessary. (Godwin’s Law)

  • http://profile.typekey.com/halfinney/ Hal Finney

    It’s interesting to imagine actually applying this policy, and the resulting implications.

    It would be a reversal of affirmative action – women and under-represented minorities would be penalized on their test scores, while those minorities who do better than average, perhaps Asians and Jews, would get a bonus (here we are focusing on high end scores). There might be an effort for members of certain minorities to identify sub-groups which have scored exceptionally well. Perhaps females do somewhat worse than males, but females from Massachusetts (I’m just making this up) do better, so they deserve a bonus rather than a penalty. Meanwhile members of below-average minorities would try to hide their minority status in order to avoid the penalty.

    So from one direction there would be constant pressure to discriminate more finely in order to benefit groups whose priors are high, and in the other direction there would be attempts to avoid being caught by fine-tuned discrimination that would identify low-prior groups. At the same time people might try to lie about characteristics in order to gain the benefits of being associated with a high-scoring group.

  • Cyan

    All, I find it curious that my argument is seen as being against women, when it is only hurts high scoring women – it helps low scoring women.

    Since people are probably imagining an admissions officer with high standards doing an evaluation based partly on SAT score, the general view is not inaccurate. If there’s a fixed Pass/Fail level, then your argument either hurts high-scoring women or helps low scoring women, but not both. (Which possibility occurs depends on whether the bar is set above or below the mean.)

  • Billy C

    When discussing policy adjustments, I’m surprised no one has mentioned trying to decrease error, which could be done fairly simply by increasing the number/length of tests.

  • http://knol.google.com/k/james-miller/james-miller/1j9f9ffxxeue5/1# James Miller

    Robin wrote: “All, I find it curious that my argument is seen as being against women, when it is only hurts high scoring women – it helps low scoring women.”

    Colleges with high average student SAT scores reject many students for admissions, but colleges with low average student SAT scores often reject few students or have open admissions. Consequently, a woman with a SAT score well below the mean would get very little if this score was slightly increased whereas a woman with a score much above the mean would have a high expected loss from having her SAT score slightly decreased. Thus Robin’s plan would harm high scoring women but not much help low scoring ones. Still, this is not a valid reason for opposing it.

    Razib who wrote “your blog makes me want to throw up” cites an article saying that the ratio of men to women who have high IQs is much lower than the ratio of men to women at Berkeley’s math department. Razib thinks this means that Berkeley’s math department is sexist. But before you label Berkeley’s math department sexist you need to prove that men with high IQs don’t like math more than women with high IQs do.

  • http://www.allancrossman.com Allan Crossman

    I’m hesitant to enter this discussion, and I especially don’t want to say something absurd like “Even if your maths is right, you’re still wrong.” However:

    All, I find it curious that my argument is seen as being against women, when it is only hurts high scoring women – it helps low scoring women.

    I suspect that, as a matter of psychological fact, the annoyance of being told “you will never have members in the top 1%” is greater than the comfort of “you will never have members in the bottom 1%”.

  • http://occludedsun.wordpress.com Caledonian

    I especially don’t want to say something absurd like “Even if your maths is right, you’re still wrong.”

    That isn’t absurd. Using calculations in an argument doesn’t magically proof it against errors of all sorts, no more than spelling all the words in an argument correctly means it has to be right.

    The greater problem I find with what Robin suggests is that we want the tests to tell us the performance of individuals as individuals, not as members of group category. Statistics are fantastic at making statements about groups, and for giving us grounds for educated guesses about individuals when we lack the necessary data, but they aren’t a substitute for that data, or an excuse not to bother collecting it.

    One of the reasons affirmative action has worked out so poorly is that it treats people as members of classes rather than individuals – classes that are presumed to be inferior, too. That’s not the intent, but that’s the result.

  • http://www.allancrossman.com Allan Crossman

    Caledonian: Using calculations in an argument doesn’t magically proof it against errors of all sorts

    You’re quite right, of course.

    Still, to the extent that the argument was solely about maths, there can be no rebuttal except criticizing the maths. But extending the argument to actual recommendations brings in all sorts of unintended consequences that would need to be thought through.

  • Anna Salamon

    If we have multiple indicators, such as multiple test scores, grades, or research performance, you lose predictive power by working with gender-corrected test scores rather than retaining the gender *and* the test scores. For example, suppose that an SAT-M score of 800 in a woman has the same predictive implications as an SAT-M score of 750 in a man. Three SAT-M scores of 800 in the same woman do not then have the same predictive implications as three SAT-M scores of 750 in the same man.

    If we want to maximize predictive power, gender-correction should be applied once at the end (to the weighted sum of indicators that most correlates with the variable we’re trying to predict) rather than being applied separately to each piece. The “correct once at the end” procedure also has the virtue that it doesn’t automatically exclude women from the top percentile in many cases where test-by-test gender-correction *would* automatically exclude women from the top percentile. (I’m not advocating either procedure as social policy.)

    There’s a literature on how people rate identical resumes with male and female candidates or black and white candidates. Does anybody know if the correction factor that people unconsciously apply is bigger or smaller than the correction factor that would be recommended by Bayesian analysis? Does anybody know if the correction factor decreases in the correct way when there are multiple ability-indicators on the resume?

    On a related note: much of what bothers me about sexism amounts to the effects of indicator-by-indicator gender-correction that add in a distorted way when people don’t adjust for gender as a common cause of their perceptions. For example, suppose my knowledge that I’m female affects my estimate of my math ability, and suppose Bob’s, Jim’s and Sarah’s knowledge that I’m female also affects their estimates of my math ability. If I use Bob’s, Jim’s and Sarah’s estimates to update my own estimate of my math ability, and I don’t realize that a large component of their estimates is based on my gender and that my gender should be factored in only once, I end up pulling my estimate too strongly toward the female prior.

    Is there any literature on how groups of people socially update their ratings of the identical resumes?

    Finally, regarding the high-ability tails of low-performing groups: I would expect the high-ability tail of women’s performance to be higher than one would expect from fitting a normal distribution to women’s performance as a whole. Gender (and race) isn’t all that binary as a causal biological factor; men have nipples, which do in some circumstances produce milk, and genetic/developmental variants in women can similarly dip into the pool of biological possibilities built around the male normal. Is there any evidence for or against this conjecture in the case of women’s cognitive performance?

  • http://yudkowsky.net/ Eliezer Yudkowsky

    For example, suppose my knowledge that I’m female affects my estimate of my math ability, and suppose Bob’s, Jim’s and Sarah’s knowledge that I’m female also affects their estimates of my math ability. If I use Bob’s, Jim’s and Sarah’s estimates to update my own estimate of my math ability, and I don’t realize that a large component of their estimates is based on my gender and that my gender should be factored in only once, I end up pulling my estimate too strongly toward the female prior.

    Good point, Anna. Though I would point out that many other kinds of reputational reasoning are also going to double-count lots of evidence. Actually, this opens up quite a large problem to think about. I’d previously thought in terms of “herd effects”, but when you phrase it as “double-counting evidence” the problem looks much much worse.

  • Billy C

    Anna is right to question whether performance is normally distributed. This applies even more when discussing corrections for error taking into account race, as has been discussed in the comments, since when accounting for race, economic/class factors aren’t controlled for as they are for sex.

    On a side note, I learned this past fall applying to Ivy League and other elite colleges that some of these schools (say that they) only look at a student’s highest SAT score (or even sectional subscore), introducing much further error.

  • Anna Salamon

    Though I would point out that many other kinds of reputational reasoning are also going to double-count lots of evidence.

    Yes, I’ve thought about that too, as a general problem in how to implement Aumann agreement among groups of rationalists with partially overlapping evidence. I haven’t come up with much in the way of useful practices; has anyone else?

    In the case of reputational reasoning, race and gender are somewhat unusual in that they are visible and salient to nearly everyone who forms an opinion of the individual. But I agree that one can get similar dynamics around where someone went to college, what their SAT scores were, etc.

  • http://hanson.gmu.edu Robin Hanson

    Anna, if gender were directly a relevant factor in choosing who to admit, then one would indeed want to count for it directly, and should worry about double-counting if several other factors already included that factor. But what is going on here is just that we are looking at gender as a signal about the degree of noise in another signal about another factor – we are not accounting for any direct relevance of gender to the decision.

  • http://hanson.gmu.edu Robin Hanson

    Billy, yes not only does looking only at the higher score increase noise, it increases biases from differing gender variances and biases from some people being better able to afford retests.

  • Anna Salamon

    Robin, no. Gender, GPA, previous test scores, and other ability-signals are *all* signals of the degree of noise in an individual’s SAT score. One therefore obtains better estimates of the signal vs. noise component of a woman’s SAT score by considering the raw SAT score together with her gender, GPA, etc., and not by gender-correcting the SAT score in isolation.

  • Carl Shulman

    In some absurdly mission-critical application, it seems that the way to use an adjustment like this without complications from overcorrection and stereotyping would be to evaluate profiles of candidates from which the group membership information has been removed, and in which the original scores replaced with their adjusted counterparts.

  • Lara Foster

    Carl- But why adjust for gender and not other factors that might be much more relevant? Income, access to tutors, ethnicity, religion, parents level of education, etc.? Just because it is obvious and people haven’t yet started covering it up? And if you decide to correct for all of these factors, then who gets to decide the master equation and with what evidence? Who even gets to decide what evidence should be permissable? This approach just seems far too dangerous to me for introducing much worse systematic biases. It’s one thing to say, ‘well, the questions on the IQ test are written by white upper-middle-class men, so lets give poor blacks the benefit of the doubt due to the cultural gap…,’ to a large extent because they *are* a minority. Though even this has the problem I mentioned earlier of casting a shadow on the minority students who *did* score high.

    Your solution also maintains the problem of preventing women and other groups from ever being considered in the top percentage. I got a top score on my SAT… Maybe I needed a harder test, so the admissions committee would know just how many points to knock me down… If the test is not accurate, then fix the test. Introducing systematic biases to correct for imperfect tests seems like a much worse solution, a justification for racism and other forms of discrimination, and a generally divisive practice. That anyone on this blog promotes the implementation of such a fool-hardy measure as a ‘practical’ solution continues to disturb me greatly and reminds me that even rationalists can be the geniuses of destruction.

  • Anna Salamon

    Yikes, sorry, my last comment was half wrong or badly stated. AFAIK, Robin’s comment was also mistaken.

    For clarity, assume the Prediction Committee is interested in a candidate’s ability A. Assume A has one distribution for men and another for women (if you like, you can let the two distributions be normal distributions with equal means and a greater male variance, as Robin does). Assume also that we have a variety of indicators I_1, I_2, … I_n, each of which is the sum of the candidate’s A-value and of an indicator-specific normally distributed error term that has the same variance for men and for women.

    In this case: (1) SAT scores alone are worse predictors of candidates’ A-values than are gender-corrected SAT scores; (2) one does optimally by taking the appropriate weighted sum of all the indicators and then correcting *that sum* based on the prior A-distribution for people of the candidate’s gender; (3) one does sub-optimally by gender-correcting the candidate’s individual indicators and then taking the weighted sum of the corrected values; (4) one also does suboptimally by gender-correcting I_1 alone and taking the weighted sum of {gender-corrected I_1} and I_2 through I_n. I am not sure which of these Robin is advocating in his reply to my first comment.

  • Carl Shulman

    Lara,

    I specifically mentioned a hypothetical “absurdly mission-critical application,” i.e. one in which one needed to maximize the accuracy of your predictions at the expense of other consideration (perhaps selecting a pilot for a spacecraft to divert an asteroid from hitting the Earth). In practice, as I suggested earlier, I do not support an attempt to use prior information about differences in score distributions between the sexes to improve the predictive validity of SAT and GRE scores in school admissions testing.

    However, for some purposes the value of accurate prediction is higher relative to other concerns than college admissions. This Ian Ayres piece discusses the use of an algorithm to predict the likelihood of sex offenders re-offending after release and so to determine whether to in fact release them. It turns out that age predicts likelihood of reoffending, even though age is outside the control of the offender, and this is used in parole determinations. Since males have vastly higher rates of violent crimes, and especially of repeated violent crimes, a similar algorithm for the parole of murderers would almost surely favor earlier release of women than of men (effectively concentrating prison resources on those most likely to kill again). If this were to save a sufficient number of lives I could endorse it, as I could endorse the use of height, digit ratio, or left-handedness.

    • dmytryl

      I specifically mentioned a hypothetical “absurdly mission-critical application,” i.e. one in which one needed to maximize the accuracy of your predictions at the expense of other consideration

      This is fairly amusing. I take it as you totally aren’t neglecting all of the much more significant factors, such as academic credentials and the like, in your absurdly mission-critical application?

    • CarlShulman

      Yes, academic credentials, particularly more difficult and relevant ones, are much stronger evidence than these minor factors, and have fewer harmful effects.

      That’s why one doesn’t need to postulate exotic hypotheticals to imagine using them to eke out tiny bits of incremental validity at the expense of serious equity problems.

  • Douglas Knight

    Eliezer Yudkowsky:
    If you allow anything into the reputation-construct that isn’t performance, even the smallest shred of it, human beings react to this as tremendously unjust and they will cease to respect the system.

    If I understand you correctly, you’re saying that we shouldn’t use, say, affirmative action in college admissions, because it damages the credibility of the college admissions process and through some positive feedback loop, the whole system breaks down?

    As is probably clear from my choice of “some positive feedback loop,” that’s the part of the argument that I think is wrong.

    As many people have said, there are a lot of factors that go into reputations that are not fully under the control of the reputee. I think college admissions is fairly mysterious with a purpose of obscuring and legitimizing these factors, some byproducts of noisy measures and some egregious, like legacies.

  • Andrew Charles

    So, if the variance of male height is greater than the variance of female height, we should automatically subtract a few centimetres from the height of a tall woman?

    Robin, your maths isn’t wrong as far as I can tell, but it’s unconvincing, like most unqualified results resting on dubious simplifying assumptions. Constructively, to start to convince me you would have to justify that the scores and errors are normally distributed, that the measurement error is the same for both groups, and that measurement error and score are in fact independent.

    A clearer example on a less loaded topic would also help.

    It also seems odd to me to correct a score that by definition comes from the distribution you are correcting it to – this changes the distribution by modifying it’s extreme members, resulting in an even smaller variance in your corrected test results.

  • Mysterious

    “So, if the variance of male height is greater than the variance of female height, we should automatically subtract a few centimetres from the height of a tall woman?”
    No, because the reliability of height measurements is close to 1.

  • Cyan

    Andrew Charles,

    If a man and a woman were each measured to be 6 feet tall by conventional methods, and then you were forced to make a costly 50:50 wager on which was actually taller based on a ridiculously accurate measurement, which way would you bet?

  • http://achillesshrugged.blogspot.com Benquo

    The SAT does not directly measure how desirably an applicant is (I’ll call this admissibility). I’d expect admissions offices to weight the significance of outlier scores on the basis of their reliability as a predictor of whatever qualities they’re really looking for. So there’s no immediately obvious way of distinguishing between (1) the error introduced by using SAT-measurable intelligence as a proxy for admissibility and (2) the error in SAT scores themselves as measures of that kind of ability.

    SAT score (predicts)-> SAT-measured intelligence (predicts)-> admissibility

    Both arrows contain some amount of error or equivalent noise. And it’s pretty clear that whatever information that does exist about SAT scores as predictors of admissibility is not calibrated by gender, so we’d have to throw all that out the window and start over from scratch. Even if we had a well-quantified measure of admissibility (which would call into question the purpose of using the SAT as an indicator), it would require further work to figure out what part of the error is proper to the SAT itself. And we don’t know to what extent the variance in SAT scores corresponds to a variance in admissibility.

    If it were possible but expensive to measure admissibility directly (or it might only be measurable after the fact, which would require admissions offices to gather more data on current students or alumni — also expensive), then Robin’s suggestion would be feasible. But my understanding is that most colleges don’t have a clear (much less a quantifiable) definition of what makes a desirable student. That, and not SAT-error, is the most relevant dearth of information. So I don’t see any really good way to ensure colleges don’t overcorrect or undercorrect if they just apply Robin’s analysis as is, no matter how good their algebra and intentions are. It looks to me an awful lot like Robin’s searching for his keys under the street lamp.

    To do it right, without losing the information already contained in the current evaluation of test scores, you’d have to figure out:

    A) How to measure admissibility directly, and data on admissibility and its variance by gender.
    B) How to measure SAT-intelligence more reliably, and data on SAT-intelligence and its variance by gender.

    A, combined with existing SAT data, would give you the necessary information to calibrate your error-estimates by gender. A combined with B would allow you to add back in the imperfection of SAT scores as measures of admissibility.

  • http://achillesshrugged.blogspot.com Benquo

    Also, the sample size for women in the SAT data Robin linked is significantly larger than that for men — I’m not the most statistically literate person in the world, but shouldn’t we expect less variance even if the underlying variance in populations is the same?

  • Tim Tyler

    The sample size for women in the SAT data Robin linked is significantly larger than that for men – I’m not the most statistically literate person in the world, but shouldn’t we expect less variance even if the underlying variance in populations is the same?

    Robin only cited that document to establish the female SAT average.

  • http://www.hopeanon.typepad.com Hopefully Anonymous

    This reminds me of something I still don’t understand about why race-based affirmative action plays out the way it does in the USA.

    Let’s say you begin discounting female SAT scores. Suddenly more than 90% of the applicants to competitive universities now claim on their applications to be male. What do you do then?

  • http://achillesshrugged.blogspot.com Benquo

    @Tim:

    The laws of probability — if they are in fact laws about the world — apply to more than the mere internal consistency of a single document.

  • Tim Tyler

    I wasn’t arguing with the laws of probability – merely pointing out that the cited study containing the data used to compare male and female SAT variances is not the same as the cited document with more female than male test results.

  • http://www.hopeanon.typepad.com Hopefully Anonymous

    I notice no one responded to my post. I think it points to an enormous hole in current expressed understandings of human behavior, particularly as applies to the expression of “immutable” identities like gender and race. I think this is part of a general economics of self-restraint, and recent research into behavioral economics is starting to point to the clues of why people express diverse identities even though rationally they should contextually express the identity that maximizes their rewards (for example, why 100% of applicants to a position don’t claim the race or other characteristic that would maximize their odds of getting the position).

  • Carl Shulman

    Hopefully,

    The applicants would need to coordinate with the providers of their letters of reference, which would normally use ‘he’ or ‘she.’ Legal name changes, avoiding personal contact with admissions officers, and similar efforts would be required. Also, it would be easier to expel someone for falsifying sex information than racial information.

  • http://achillesshrugged.blogspot.com Benquo

    @Tim:

    The study seems to be behind a paywall, so I can’t see it, but maybe you can tell me — did they correct for the differing sample size, and the non-random selection of SAT-takers? If so, how? If they didn’t, then my point still holds — there’s a larger current sample available of women than of men, so we should expect more precision in the former aggregate than the latter.

    And sorry about the snark. With the benefit of time, I now see that my previous comment was less than perfectly constructive.

  • http://achillesshrugged.blogspot.com Benquo

    Unless you’re saying that there’s difference in variance, by gender, across multiple tests, which would weaken my argument a bit. I still think the basic problems would remain.

  • http://www.hopeanon.typepad.com Hopefully Anonymous

    Carl,
    Actually all that’s required in principle is to check the “female” box. I don’t see why that’s falsification if at that moment (and going forward) the person believes themselves to be female and identifies as such. That nobody (or very, very close to nobody) does this in a nation of 300 million people is to me an interesting example of the economics of almost perfect mass restraint. Even more so, with respect to race, where the arguments are stronger on the “personal identity” side.

  • Tim Tyler

    Variance differences caused by sample size don’t explain results like this:

    The results obtained by both procedures establish that by age 13 a large sex difference in mathematical reasoning ability exists and that it is especially pronounced at the high end of the distribution: among students who scored greater than or equal to 700, boys outnumbered girls 13 to 1.

  • Douglas Knight

    HA,
    I think that admissions officers will see the checkbox, see the male pronouns in the letters of reference and decide that they don’t want to deal with it. Expulsion afterwards is unlikely, but sex does have long-running consequences, eg, roommates. Race seems much safer to me. As I said before, I think people could get away with it.
    But I don’t think “self-restraint” is the best way of thinking about it. It’s part of it, but creativity and bothering to think about how rules are actually enforced is probably a bigger part. It doesn’t seem terribly irrational to me for high school students to think that it’s not worth the risk that there is some mechanism that they don’t know.

    But that leads to: why don’t people experiment? They could check the box on only some applications, or just have different letter writers for different schools.

  • http://achillesshrugged.blogspot.com Benquo

    @Tim:

    Fair enough. Now if anyone has evidence that disaggregating the two kinds of error I pointed out can be done cheaply with some degree of accuracy, then my other objection can be dismissed as well.

  • http://www.hopeanon.typepad.com Hopefully Anonymous

    Douglas, you start with “But I don’t think “self-restraint” is the best way of thinking about it.”

    But then I think you move more in my direction with:

    “But that leads to: why don’t people experiment? They could check the box on only some applications, or just have different letter writers for different schools.”

    Exactly, the USA is a nation of 300 million, and these type decision moments probably occur for nations of additional hundreds of millions outside of the USA. That’s a massive number of individuals engaging in self-restraint, and it seems to me elements of core identity (gender, race, perhaps religion and some other things) can cause these near perfect instances of individual self-restraint in populations of millions.

    I think there is a behavorial economics of self-restraint which probably ties into repugnancy bias too. I hope to write more about this soon, but my time’s up right now.

  • Lara Foster

    Doug, HA, et al-

    People *do* try to get away with classifying themselves as ‘minorities’ all of the time. I knew an Indian girl who said she was ‘black’ on her applications. Since she had brown skin, she was never actively challenged, and who knows what kind of advantages she got as a ‘black’ over a typical Indian applicant.

    When I said ‘not yet covering it up,’ I meant that people would deliberately carry a lie throughout the application, not just check one box. It’s true that this is harder to do with gender… In a more absurd scenario, however, it wouldn’t be that hard for narrower women to pass as effeminate men simply by getting crew cuts and dressing in shirts and ties. There was a woman in my previous lab who did this named ‘Chris,’ and I had no idea what her sex actually was for a good month. Shemale? HeShe? When I needed to refer to ‘it’, I always used ‘Chris.’ Finally, I caught on that the pronoun others used was ‘she.’ It’s true for much more gynacoid types, like myself, it would be nearly impossible to pass as a man without major reconstructive surgery. Though I’m sure a talented enough make-up artist could make a believable male fat-suit that would cover up the incriminating body curves… Though this might worsen chances of admission. As geneticist James Watson said, “When you interview fat people, you feel bad, because you know you’re not going to hire them.”

  • http://www.hopeanon.typepad.com Hopefully Anonymous

    ” I knew an Indian girl who said she was ‘black’ on her applications. Since she had brown skin, she was never actively challenged, and who knows what kind of advantages she got as a ‘black’ over a typical Indian applicant.”

    Is this really true? Because I intuit that it has the hallmarks of a fabricated anecdote.

    “People *do* try to get away with classifying themselves as ‘minorities’ all of the time.”

    The data indicates otherwise to me.

  • Lara Foster

    Actually, this is all missing the point. One wouldn’t need to lie about the cold facts on an application to a prestigious university- the essay would be enough. And you wouldn’t even need to lie *per se,* just exaggerate certain opinions and use smaltzy language, like all applicants already do. Take it from someone who at this point could be considered a professional applicant with a good success record: it’s not about truth and accuracy as Robin would have us believe. It’s about telling them what they want to hear. If women and blacks lose a couple of points on the SAT, you better believe their essays will deal directly with the hardship of being considered a lesser human being, and how in spite of it all they’ve triumphed and *will* continue to do so no matter what the scores say, because they have looked the ugly demon of discrimination in the face and said, “No! Not today! I WILL cure cancer, for my grandmother, for my mother, for MYSELF, and NO ONE can tell me that my black skin or my uterus will stop me…” This drivel practically writes itself. Yet… That’s what they WANT. REALLY. Hard to believe it, but I’ve talked to a lot people about college/MD/PhD essays, and the smaltz has repeatedly floated to the top, leaving any shred of true intention stuck to the bottom of the barrel. Here’s the question: Why do admissions officers want so badly to be lied to?

  • Lara Foster

    HA- Check with Mike Vassar. He knows her too.

  • michael vassar

    The degree to which Laura knows what she’s talking about on the smaltz point is simply astounding, and definitely deserves attention. It’s more general than admissions though. So many people set up systems that so richly reward lies for what seem like incredibly bad reasons. In general the world is filled with situations of mutual pseudo-deception where neither side is fooled and where common knowledge of nominal deception exists, but where outrage results if this knowledge is made explicit. To some extent even religion fits this bill, but so does much bureaucracy, such as much of airport security. The psychology is very hard for me to empathize with but probably relates to establishing membership/conformity/belonging.

    She may exaggerate how frequently the race trick is played though, esp. by non-Indians.

    I think that HA is right that the infrequency with which people lie about their race on college applications is worthy of serious study. It would be incredibly valuable to be able to create similarly strong taboos against other kinds of fraud. It’s especially odd given that so many people consider the mere fact of gathering such data to be a form of oppression and could easily see themselves as simply fighting back and claiming what is theirs.

  • http://www.hopeanon.typepad.com Hopefully Anonymous

    Michael,
    Do you and Laura both know a specific individual indian that you know for a fact applied to a competitive academic program claiming to be black? I ask because I was privy to racial data at a universivity I attended (through a work-study job) and I was surprised when I looked at the racial data that I couldn’t identify anyone that lied about being a minority (black and hispanic in particular). And like most decent universities, the school had a large indian population.

    Also, if this happened with any noticeable frequency, I’m surprised I haven’t read about any exposures.

    So I’m skeptical. But if you both know the SAME indian, that would mitigate my skepticism a bit about this anecdote.

  • Douglas Knight

    I’m sure self-restraint plays a role, I just think that creativity and semi-legitimate fear play much bigger roles.

    You ask: why don’t people do this, but those same people may ask: do colleges do something to verify the checkboxes, at least in the most blatant cases? It would be very easy for them to do so; how is the high school student to know that they don’t? That is what I mean by semi-legitimate fear.

  • michael vassar

    Yes it’s the same person. However, it’s good to know that you have better general data. Anecdotes just tell us that it sometimes happens, while you can apparently tell us how infrequently, e.g. very. I do know another person who checked as Hispanic based on very slight Mexican ancestry and a white guy who got put in an “African American Heritage Dorm” at Stanford, though this may have been a SNAFU, not the result of a lie.

  • Daniel

    Let’s say you begin discounting female SAT scores. Suddenly more than 90% of the applicants to competitive universities now claim on their applications to be male. What do you do then?

    Easy. You can re-estimate the probability of maleness using the SAT score. A close to average SAT score may indicate a female claiming to be male. It’s trivial to generalize Robin’s adjustment formula to this case. 🙂

  • http://www.trsohbet.name sohbet

    Thank you, mirc

  • current grad student

    If the ratio of female to male discount factors is less than 1, then doesn’t this mean that female scores should be shrunk to the mean to a degree slightly less than the discount applied to male scores?

    For example
    discount male 1600 to 1550
    discount female 1600 to 1575.

    Thus, discount factor ratios would seem to favor high-scoring women and therefore isn’t not discounting actually a bias AGAINST high scoring women.

  • http://www.gwern.net/ gwern
  • Pingback: Mathematical ability tests biased in favour of/against women | nickelbook