Why Broken Evals?

This review article published 36 years ago shows that it was well known back then that teacher evaluations by college students are predictably influenced by time of day, class size, course level, course electively, and more. Thus one could get more reliable teacher evaluations by building a statistical model to predict student evaluations using these features plus who taught what, and then using each teacher coefficient as that teacher’s evaluation. Yet colleges almost never do this. Why?

Actually, most orgs also use known-to-be broken worker evaluation systems:

There is a lot of systematic evidence on the connections between job performance and career outcomes. … The data shows that performance doesn’t matter that much for what happens to most people in most organizations. That includes the effect of your accomplishments on those ubiquitous performance evaluations and even on your job tenure and promotion prospects. …

[For example,] supervisors who were actively involved in hiring people whom they favored rated those subordinates more highly on performance appraisals than they did those employees they inherited or the ones they did not initially support. In fact, whether or not the supervisor had been actively engaged in the selection process had an effect on people’s performance evaluations even when objective measures of job performance were statistically controlled. (more)

So why don’t firms correct employee evaluations for this who-hired-you bias? And it isn’t just this one bias; there are lots:

Extensive research on promotions in organizations, with advancement measured either by changes in position, increases in salary, or both, also reveals the modest contribution of job performance in accounting for the variation in what happens to people. In 1980, economists … observed that salaries in companies were more strongly related to age and organizational tenure than they were to job performance. Ensuing research has confirmed and extended their findings, both in the United States and elsewhere. … One meta-analysis of chief executive compensation found that firm size accounted for more than 40 percent of the variation in pay while performance accounted for less than 5 percent. (more)

An obvious explanation here is that coalition politics dominates worker evaluations. Coalitions like being able to ignore job performance to favor their allies and punish their rivals. Winning coalitions tend to be benefiting from the current broken rules. But, you might ask, why don’t people at the top put a stop to this? Doesn’t allowing politics such free reign hurt overall org performance? This story hints at an answer:

A few years ago, Bob, the CEO of a private, venture-backed human capital software company, invited me to serve on the board of directors as the company began a transition to a new product platform and sought to increase its growth rate and profitability. Not long after I joined the board, in the midst of an upgrading in management talent, the CEO hired a new chief financial officer, Chris. Chris was an ambitious, hardworking, articulate individual who had big plans for the company— and himself. Chris asked Bob to make him chief operating officer. Bob agreed. Chris asked to join the board of directors. Bob agreed. I could see what was coming next, so I called Bob and said, “Chris is after your job.” Bob’s reply was that he was only interested in what was best for the company, would not stoop to playing politics, and thought that the board had seen his level of competence and integrity and would do the right thing. You can guess how this story ended— Bob’s gone, Chris is the CEO. What was interesting was the conference call in which the board discussed the moves. Although there was much agreement that Chris’s behavior had been inappropriate and harmful to the company, there was little support for Bob. If he was not going to put up a fight, no one was going to pick up the cudgel on his behalf. (more)

People at the top play coalition politics as hard as anyone. Rules to limit politics at lower levels can hurt lower level allies of top people, and can set expectations that limit politics at higher levels. When mob bosses who are best at violence rise to the top of a competition for boss-hood, why should they and their allies favor non-violent criteria for how to pick bosses?

Some more data:

A study using data from Dutch aircraft manufacturer Fokker reported that white-collar workers who received performance ratings of “very good” were only 12 percent more likely to be promoted than colleagues rated “good.” Meanwhile, many studies have documented the influence of numerous factors, ranging from educational credentials to race and gender, on careers, with performance often having a statistically significant but substantively small effect on advancement. For instance, a study of more than 200 employees from a variety of companies found that managers considered job tenure, educational credentials, overtime work, and absence as well as job performance in determining internal mobility for employees. A study of federal civil service employees, an excellent setting because of the extensive measures captured in the database, noted that performance ratings were weakly tied to actual productivity and that people with more educational credentials were more likely to be promoted even if they weren’t the best employees. (more)

GD Star Rating
Tagged as: , , ,
Trackback URL:
  • brendan_r

    Buffett and Munger (the investment duo) emphasize management honesty as one of the crucial factors in their stock picking process; they say it’s not just a matter of principal, but that it is profitable.

    If the effects of coalition politics is to leave lots of low hanging fruit for performance improvement- fruit that honest/conscientious (non-brilliant) managers could easily pick- then that helps explain why selecting for companies by manager honesty works.

  • Jess Riedel

    I can’t tell if you are making an argument just for coalition politics being stable, or in addition for anti-politics being unstable. Both could be stable, although presumably you’d then argue that young companies must reliably start in the politics regime.

    Suppose you magically created a company that used good performance metrics and had rules that did a reasonably good job at preventing coalition politics at all levels. Presumably, such a company would be more competitive in the marketplace. Would you predict the internal company dynamics to quickly decay to the status quo, despite its initial success? Would you predict it would take as least as long as a career (~35 years) or several careers worth of time for this decay, since you’d have to wait for the Chris’s to rise through the ranks?

  • TheBrett

    I figured that broken-eval systems were used mostly to assuage upper management anxiety and reduce liability issues when you fire someone. Nobody really cares whether they work or not, as long as the company seems to be hitting its numbers and working reasonably well.

  • Silent Cal

    I think we need to dig more into why people at the top don’t (or can’t) discourage politics below. Even if they’re politicking themselves, they could benefit by preventing it at lower levels–it would just take a little hypocrisy. It doesn’t seem that hard to say that senior management performance can’t be evaluated the same way as at lower levels. And I’m no mob expert, but I bet the bosses don’t take it well if a subordinate whacks a rival to get a promotion, no matter how many times the boss has done exactly that.

    Why do firms have evals at all, if they’re known to be broken? It could be that, while they don’t measure performance accurately, they’re doing *something* perfectly well, like reassuring stockholders or making employees think their performance matters. Maybe it’s not feasible to achieve meritocratic promotions, and accurate evals would only put it on paper how un-meritocratic your processes are. Or maybe ‘better’ evals would just be gamed at higher organizational cost.

    • And I’m no mob expert, but I bet the bosses don’t take it well if a subordinate whacks a rival to get a promotion, no matter how many times the boss has done exactly that.

      If the whacked rival was a member of an opposing coalition, they might love it.

      • Silent Cal

        Non-rhetorical question: how much are subordinates’ coalitions the same as bosses’, especially as you go up multiple levels? If the boss is neutral in most of the politics below, the incentive is to prevent costly politicking, but if he has reason to prefer one side over the other, he’ll encourage it. I can’t imagine low-level employees’ politics being relevant to upper management; in what meaningful way could a base-level worker take sides against the CEO? Which means upper management could benefit by enforcing fairer evals at low levels.

    • IMASBA

      “Why do firms have evals at all, if they’re known to be broken?”

      HR isn’t the stock market, we can’t just assume (almost) all information is already known and widely disseminated. Even if everyone knew evals are broken it would take many cycles for alternatives to be tried, tested and the results tobe widely understood and disseminated. A single cycle can last for many years because we’re talking about people’s careers here. Quite possibly the system just hasn’t been running long enough.

    • AspiringRationalist

      Perhaps they create a lower bound for employee performance.
      This is similar to the way that even though elections don’t give us great people as leaders, we get much better leaders than we would get if they didn’t have at least _some_ form of accountability.

      • IMASBA

        “Perhaps they create a lower bound for employee performance.”

        This may be true for professors: the grades put on the evals are mostly worthless, but the optional written comments will highlight negative excesses. Really bad grades on the eval will also signal something (class hours and cupakes may mean the difference between a 6out of 10 and an 8 out of 10, but if multiple classes give a professor a 2 out of 10 there may be something to it). Evals may be able to catch negative excesses.

  • Plausible explanation, tortured evidence.

    Correcting teacher evaluations for time of day, etc., wouldn’t diminish coalition politics. So, you have an example showing that measurements aren’t improved even when there’s no cost to coalition power. This actually weakens the argument.

    • John Laing

      If teaching during a particular time slot consistently yields lower evaluations, and evaluations matter at all for tenure, then whoever sets the schedule can threaten teachers who want tenure with unfavorable time slots, which then factors into coalition politics.

  • What makes this even more horrifying are some of the models that suggest that actual contribution of the better people may be highly nonlinear in their job performance.

  • Also I think the assumption that ANYONE really cares about actual teacher performance, much less their evaluations, is misguided.

    Most academics tend to care more about their research and for the most part, being academics themselves, so do most people in the administration. So why have evals at all or consider them in granting tenure? Well I think they are largely a mechanism to placate students and parents while also giving the school at least some control over the truly awful educators and making sure professors didn’t simply completely ignore teaching to the point where it created actual bad PR and resentment from students.

    I think those people in academia who honestly really care about teaching (and there are a substantial number) already realize that even controlling for all these factors there is no reason to think teaching evals truly reflect the pedagogical performance of the teacher and not their willingness to engage in eval boosting but student harming tricks. For instance, when a substantial fraction of your curriculum is out of your hands student evals do better when you feign excitement and lie about the usefulness of useless material (like integration tricks that should be relegated to computers) even though the students are better off realizing they should focus their time elsewhere. Furthermore, if time of day and the like effect evals so strongly that controlling for them makes a substantial difference than after controlling things like handing out cupcakes (yes I know people who do it) probably would make an even bigger difference and render the evals negatively correlated with the professor’s honesty.

    So I think that once people think hard about evals they either don’t want them to be taken seriously or realize that they can be so easily gamed as to render their results worthless. Thus no one wants to further legitimize them by appearing to take what they show seriously by building a complicated model that people might blindly trust.

  • “performance often having a statistically significant but substantively small effect on advancement”

    The glass could be half full rather than half empty. How much variation among orgs is there? If there is much, do ones in which performance has a larger effect on advancement outperform others? Apart from obviously not enough to dominate and thus change the quote to “…and…large”.

  • Ronfar

    If people really cared about this, they’d make everyone’s salaries public.

  • Ronfar

    The purpose of performance evaluations is to justify decisions, not to make them.

  • Ely Spears

    Definitely check out the book Moral Mazes if you haven’t already.

  • Philip Goetz

    But this is worse than no explanation at all. “When mob bosses who are best at violence rise to the top of a competition for boss-hood, why should they and their allies favor non-violent criteria for how to pick bosses?” Because they don’t want to die. Bosses who got their jobs thru politics should be especially wary of underlings who use politics. Like Stalin, they should get rid of anybody near the top who poses a threat.

  • Pingback: Overcoming Bias : More Broken Evals()

  • Pingback: Overcoming Bias : Firm Inefficiency()

  • Pingback: Overcoming Bias : Dissing Track Records()