Judgment Can Add Accuracy

Humans responsible for forecasting often feel the need to adjust the output of sophisticated forecasting systems.  In many contexts this makes forecasts worse, but it is worth noting this is not always the case.  In the Fall 2007 Foresight, Robert Fildes and Paul Goodwin looked at four British-based supply chain companies:

A nationwide retailer. A leading international food company.  A subsidiary of a U.S. pharmaceutical company.  A manufacturer of own-label domestic cleaning products.  We collected data on over 60,000 forecasts, interviewed the companies’ forecasters, and observed forecast review meetings where managers discussed and approved any adjustments that they thought were necessary. …

Those working for the food manufacturer adjusted 91% of the forecasts that had been generated by their expensive and sophisticated forecasting software. The four forecasters employed by the retailer adjusted only about 8% of their forecasts, but then they had over 26,000 forecasts to make each week, so there probably wasn’t enough time to put their mark on each forecast. The pharmaceutical company held 17 forecast review meetings every month, tying up about 80 person hours of valuable management time. On average 75% of the statistical forecasts in our companies were adjusted. …

Many of the adjustments were small, and in some cases very small. It was as if forecasters sometimes simply wanted to put their calling card on forecasts by tweaking them slightly to show that they were still doing their job. Indeed, we received anecdotal evidence from a consultant that people at review meetings tend to adjust more of the forecasts that are presented earlier in the meetings, rather than later on. As the meeting progresses they tire and feel that they have already done enough to justify the meeting, so later forecasts are simply waved through. …

Despite these concerns, judgmental adjustments to statistical forecasts can still play a useful role in improving accuracy. Our study found that on average they lowered the average percentage error (MAPE) by 3.6 percentage points for all companies except the retailer.  …  Larger adjustments tend to improve accuracy [more] and 2) negative adjustments tend to be much more beneficial than positive …

When we analyzed the accuracy of the retailer’s adjustments they looked awful. The positive adjustments its forecasters made more than doubled the MAPE from 32% to 65%. Moreover 83% of these adjustments were either too large or in the wrong direction. …  [However,] Most people would probably consider a forecast to be an estimate of the most likely level of future demand. It turned out that the retail forecasters were estimating a different quantity. Often they were trying to determine the levels of demand that only had a small chance of being exceeded  that is, the level that would limit stock-outs.

GD Star Rating
Tagged as:
Trackback URL:
  • Doug S.

    That’s interesting. I wonder what methods and information they used to adjust the predictions? Could any of them be used to improve the computer algorithms?

  • Doug S.

    I just read something great on this topic. It’ll be stuck behind a pay gate for a while, though, so I’m going to post some excerpts here. (I’m cutting out the Magic: the Gathering related content, because I suspect that most people here don’t play Magic and therefore won’t understand it.)

    [begin article]

    It turns out that even very intelligent human beings are very bad at making optimal strategic decisions in a world of dynamic complexity.

    For well over twenty years, the management gurus at MIT’s Sloan School of Management have been showing just how badly sharp undergraduates, brilliant graduate students, and experienced executives can be at making decisions with even a simplified model of the real world. When asked to participate in the “Beer Distribution Game,” even the brightest among us find themselves frustrated, confused, and most importantly, wildly wrong.

    The game is set up as follows. Participants are asked to divide themselves into four groups: retailer, wholesaler, distributor, and the factory, and told to minimize costs. The retailer is presented with customer demand for beer. Each team is eager to sell off its inventory because buildup costs money. After four weeks of depleting inventory and making orders to replace that inventory, the consumer demand spikes upward. At that point, chaos reigns.

    In most cases, the participants are not allowed to see the basic “board” state, but here’s a good picture of what’s going on.

    The retailer has a two-week shipping delay in receiving his orders from the wholesaler who has the same delay in receiving from the distributor and so on. There is also a two-week delay from orders place to orders made. Most importantly, there is an even longer production delay as raw materials are shipped into the factory and processed. Although demand is held constant at 4 cases of beer in the first four weeks and then 8 cases for the remainder of the game, the teams of wholesalers, distributors, and factories sketched a pattern of perceived consumer demand with huge amplitude fluctuations. The end result was that average costs were more than ten times greater than optimal.

    When it was revealed that customer demand was in fact constant, many voice disbelief. According to Professor Sterman, “many participants are quite shocked when the actual pattern of customer orders is revealed.”

    These sorts of studies are well-known and easily replicated in business schools.

    [TypePad’s spam filter is acting up, so I’ll have to split this into multiple parts.]

  • Dagon

    That last paragraph is critical, as it’s a common error when discussing any prediction methodology. Judging effectiveness of a prediction can ONLY be done based on the goals of that prediction. Minimizing out-of-stock events is a far different goal than predicting actual sales.

    When measuring the accuracy of the different forecasting methods, they should have used a measure that tracks the purpose of the forecasts. Average percentage error makes sense for a true forecast. Lost profit from out-of-stock less carrying costs for overstock is likely a better measure of the retailer’s accuracy.

  • Michael

    Seems to me the fact that software is “expensive and sophisticated” doesn’t mean it’s right, or even any good.

    People wanting to make changes to ‘make their mark’ is nothing new – you know, “Why should I care what color the bikeshed is?”

  • Boy, that study sure didn’t turn out the way I would have expected.

  • P Decker

    In 1965, Herman Kahn and his buddies at Rand published a book called The Year 2000, in which they made a host of predictions for the millenial year. One of you who has the time might be interested is reading it and letting us know the %#@&* accuracy of their predictions.

  • Unknown

    Eliezer, does that mean there’s a serious problem with your model?