Tag Archives: Math

Optimum Prevention

Assume you use prevention efforts P to reduce a harm H, a harm which depends on those efforts via some function H(P). If you measure these in the same units, then at a prevention optimum you should minimize P+H(P) with respect to P, giving (for an interior optimum) dH/dP = -1.  And since in general dlnX = dX/X, this implies:

-dlnH/dlnP = P/H.

That is, the elasticity of harm with respect to prevention equals the ratio of losses from prevention to losses from harm. (I previously showed that this applies when H(P) is a power law, but here I’ve shown it more generally.)

Yesterday I estimated that for Covid in the U.S., the ratio P/H seems to be around 5.3. So to be near an optimum of total prevention efforts, we’d need the elasticity -dlnH/dlnP to also be around 5.3. Yet when I’ve done polls asking for estimates of that elasticity, they have been far lower and falling. I got 0.23 on May 26, 0.18 on Aug. 1, and 0.10 on Oct. 22. That most recent estimate is a factor of 50 too small!

So you need to argue that these poll estimates are far too low, or admit that in the aggregate we have spent far too much on prevention. Yes, we might have spent too much in some categories even as we spent too little in others. But overall, we are spending way too much.

Note that if you define P to be a particular small sub-category of prevention efforts, instead of all prevention efforts, then you can put all the other prevention efforts into the H, and then you get a much smaller ratio P/H. And yes, this smaller ratio takes a smaller elasticity to justify. But beware of assuming a high enough elasticity out of mere wishful thinking.

GD Star Rating
a WordPress rating system
Tagged as: ,

Lognormal Priorities

In many polls on continuous variables over the last year, I’ve seen lognormal distributions typically fit poll responses well. And of course lognormals are also one of the most common distributions in nature. So let’s consider the possibility that, regarding problem areas like global warming, falling fertility, or nuclear war, distributions of priority estimate are lognormal.

Here are parameter values (M = median, A = (mean) average, S = sigma) for lognormal fits to polls on how many full-time equivalent workers should be working on each of the following six problems:

Note that priorities as set by medians are quite different from those set by averages.

Imagine that someone is asked to estimate their (median) priority of a topic area. If their estimate results from taking the product of many estimates regarding relevant factors, then not-fully-dependent noise across different factors will tend to produce a lognormal distribution regarding overall (median) estimates. If they were to then act on those estimates, such as for a poll or choosing to devote time or money, we should see a lognormal distribution of opinions and efforts. When variance (and sigma) is high, and effort is on average roughly proportional to perceived priority, then most effort should come from a quite small fraction of the population. And poll answers should look lognormal. We see both these things.

Now let’s make our theory a bit more complex. Assume that people see not only their own estimate, but sometimes also estimates of others. They then naturally put info weight on others’ estimates. This results in a distribution of (median) opinions with the same median, but a lower variance (and sigma). If they were fully rational and fully aware of each others’ opinions, this variance would fall to zero. But it doesn’t; people in general don’t listen to each other as much as they should if they cared only about accuracy. So the poll response variance we see is probably smaller than the variance in initial individual estimates, though we don’t know how much smaller.

What if the topic area in question has many subareas, and each person gives an estimate that applies to a random subarea of the total area? For example, when estimating the priority of depression, each person may draw conclusions by looking at the depressed people around them. In this case, the distribution of estimates reflects not only the variance of noisy clues, but also the real variance of priority within the overall area. Here fully rational people would come to agree on both a median and a variance, a variance reflecting the distribution of priority within this area. This true variance would be less than the variance in poll responses in a population that does not listen to each other as much as they should.

(The same applies to the variance within each person’s estimate distribution. Even if all info is aggregated, if this distribution has a remaining variance, that is “real” variance that should count, just as variance within an area should count. It is the variance resulting from failing to aggregate info that should not count.)

Now let’s consider what this all implies for action biases. If the variance in opinion expressed and acted on were due entirely to people randomly sampling from the actual variance within each area, then efforts toward each area would end up being in proportion to an info-aggregated best estimates of each area’s priority – a social optimum! But the more that variance in opinion and thus effort is also due to variance in individual noisy estimates, then the more that such variance will distort efforts. Efforts will go more as the average of each distribution, rather than its median. The priority areas with higher variance in individual noise will get too much effort, relative to areas with lower variance.

Of course there are other relevant factors that determine efforts, besides these priorities. Some priority areas have organizations that help to coordinate related efforts, thus reducing free riding problems. Some areas become fashionable, giving people extra social reasons to put in visible efforts. And other areas look weird or evil, discouraging visible efforts. Even so, we should worry that too much effort will go to areas with high variance in priority estimate noise. All else equal, you should avoid such areas. Unless estimate variance reflects mostly true variance within an area, prefer high medians over high averages.

Added 3p: I tried 7 more mundane issues, to see how they varied in variance. The following includes all 13, sorted by median.

GD Star Rating
a WordPress rating system
Tagged as: , ,

Risk-Aversion Sets Life Value

Many pandemic cost-benefit analyses estimate larger containment benefits than did I, mainly due to larger costs for each life lost. Surprised to see this, I’ve been reviewing the value of life literature. The key question: how much money (or resources) should you, or we, be willing to pay to gain more life? Here are five increasingly sophisticated views:

  1. Infinite – Pay any price for any chance to save any human life.
  2. Value Per Life – $ value per human life saved.
  3. Quality Adjusted Life Year (QALY) – $ value per life year saved, adjusted for quality.
  4. Life Year To Income Ratio – Value ratio between a year of life and a year of income.
  5. Risk Aversion – Life to income ratio comes from elasticity of utility w.r.t. income.

The first view, of infinite value, is the simplest. If you imagine someone putting a gun to your head, you might imagine paying any dollar price to not be shot. There are popular sayings to this effect, and many even call this a fundamental moral norm, punishing those who visibly violate it. For example, a hospital administrator who could save a boy’s life, but at great expense, is seen as evil and deserving of punishment, if he doesn’t save the boy. But he is seen as almost as evil if he does save the boy, but thinks about his choice for a while.

Which shows just how hypocritical and selective our norm enforcement can be, as we all make frequent choices that express a finite values on life. Every time we don’t pay all possible costs to use the absolutely safest products and processes because they cost more in terms of time, money, or quality of output, we reveal that we do not put infinite value on life.

The second view, where we put a specific dollar value on each life, has long been shunned by officials, who deny they do any such thing, even though they in effect do. Juries have awarded big claims against firms that explicitly used value of life calculations to not to adopt safety features, even when they used high values of life. Yet it is easy to show that we can have both more money and save more lives if we are more consistent about the price we pay for lives in the many different death-risk-versus-cost choices that we make.

Studies that estimate the monetary price we are willing to pay to save a life have long shown puzzlingly great variation across individuals and contexts. Perhaps in part because the topic is politically charged. Those who seek to justify higher safety spending, stronger regulations, or larger court damages re medicine, food, environmental, or job accidents tend to want higher estimates, while those who seek to justify less and weaker of such things tend to want lower estimates.

The third view says that the main reason to not die is to gain more years of life. We thus care less about deaths of older and sicker folks, who have shorter remaining lives if they are saved now from death. Older people are often upset to be thus less valued, and Congress put terms into the US ACA (Obamacare) medicine bill forbidding agencies from using life years saved to judge medical treatments. Those disabled and in pain can also be upset to have their life years valued less, due to lower quality, though discounting low-quality years is exactly how the calculus says that it is good to prevent disability and pain, as well as death.

It can make sense to discount life years not only for disability, but also for distance in time. That is, saving you from dying now instead of a year from now can be worth more than saving you from dying 59 years from now, instead of 60 years from now. I haven’t seen studies which estimate how much we actually discount life years with time.

You can’t spend more to prevent death or disability than you have. There is thus a hard upper bound on how much you can be willing to pay for anything, even your life. So if you spend a substantial fraction of what you have for your life, your value of life must at least roughly scale with income, at least at the high or low end of the income spectrum. Which leads us to the fourth view listed above, that if you double your income, you double the monetary value you place on a QALY. Of course we aren’t talking about short-term income, which can vary a lot. More like a lifetime income, or the average long-term incomes of the many associates who may care about someone.

The fact that medical spending as a fraction of income tends to rise with income suggests that richer people place proportionally more value on their life. But in fact meta-analyses of the many studies on value of life seem to suggest that higher income people place proportionally less value on life. Often as low as value of life going as the square root of income.

Back in 1992, Lawrence Summers, then Chief Economist of the World Bank, got into trouble for approving a memo which suggested shipping pollution to poor nations, as lives lost there cost less. People were furious at this “moral premise”. So maybe studies done in poor nations are being slanted by the people there to get high values, to prove that their lives are worth just as much.

Empirical estimates of the value ratio of life relative to income still vary a lot. But a simple theoretical argument suggests that variation in this value is mostly due to variation in risk-aversion. Which is the fifth and last view listed above. Here’s a suggestive little formal model. (If you don’t like math, skip to the last two paragraphs.)

Assume life happens at discrete times t. Between each t and t+1, there is a probability p(et) of not dying, which is increasing in death prevention effort et. (To model time discounting, use δ*p here instead of p.) Thus from time t onward, expected lifespan is Lt = 1 + p(et)*Lt+1. Total value from time t onward is similarly given by Vt = u(ct) + p(et)*Vt+1, where utility u(ct) is increasing in that time’s consumption ct.

Consumption ct and effort et are constrained by budget B, so that ct + etB. If budget B and functions p(e) and u(c) are the same at all times t, then unique interior optimums of e and c are as well, and also L and V. Thus we have L = 1/(1-p), and V = u/(1-p) = u*L.

In this model, the life to income value ratio is the value of increasing Lt from L to L+x, divided by the value of increasing ct from c to c(1+x), for x small and some particular time t. That is:

(dL * dV/dL) / (dc * dV/dc) = xu / (x * c  * du/dc) = [ c * u’(c) / u(c) ]-1.

Which is just the inverse of the elasticity of with respect to c.

That non-linear (concave) shape of the utility function u(c) is also what produces risk-aversion. Note that (relative) risk aversion is usually defined as -c*u”(c)/u’(c), to be invariant under affine transformations of u and c. Here we don’t need such an invariance, as we have a clear zero level of c, the level at which u(c) = 0, so that one is indifferent between death and life with that consumption level.

So in this simple model, the life to income value ratio is just the inverse of the elasticity of the utility function. If elasticity is constant (as with power-law utility), then the life to income ratio is independent of income. A risk-neutral agent puts an equal value on a year of life and a year of income, while an agent with square root utility puts twice as much value on a year of life as a year of income. With no time discounting, the US EPA value of life of $10M corresponds to a life year worth over four times average US income, and thus to a power law utility function where the power is less than one quarter.

This reduction of the value of life to risk aversion (really concavity) helps us understand why the value of life varies so much over individuals and contexts, as we also see puzzlingly large variation and context dependence when we measure risk aversion. I’ll write more on that puzzle soon.

Added 23June: The above model applies directly to the case where, by being alive, one can earn budget B in each time period to spend in that period. This model can also apply to the case where one owns assets A, assets which when invested can grow from A to rA in one time period, and be gambled at fair odds on whether one dies. In this case the above model applies for B = A*(1-p/r).

Added 25June: I think the model gives the same result if we generalize it in the following way: Bt, and pt(et,ct) vary with time, but in a way so that optimal ct = c is constant in time, and dpt/ct = o at the actual values of ct,et.

GD Star Rating
a WordPress rating system
Tagged as: , , ,

Modeling the ‘Unknown’ Label

Recently I’ve browsed some specific UFO encounter reports, and I must admit they can feel quite compelling. But then I remember the huge selection effect. We all go about our lives looking at things, and only rarely do any of us officially report anything as so strange that authorities should know about it. And then when experts do look into such reports, they usually assign them to one of a few mundane explanation categories, such as “Venus” or “helicopter.”  For only a small fraction do they label it “unidentified”. And from thousands of these cases, the strangest and most compelling few become the most widely reported. So of course the UFO reports I see are compelling!

However, that doesn’t mean that such reports aren’t telling us about new kinds of things. After all, noticing weird deviations from existing categories is how we always learn about new kinds of things. So we should study this data carefully to see if random variation around our existing categories seems sufficient to explain it, or if we need to expand our list of categories and the theories on which they are based. Alas, while enormous time and effort has been spent collecting all these reports, it seems that far less effort has been spent to formally analyze them. So that’s what I propose.

Specifically, I suggest that we more formally model the choice to label something “unknown”. That is, model all this data as a finite mixture of classes, and then explicitly model the process by which items are assigned to a known class, versus labeled as “unknown.” Let me explain.

Imagine that we had a data set of images of characters from the alphabet, A to Z, and perhaps a few more weird characters like წ. Nice clean images. Then we add a lot of noise and mess them up in many ways and to varying degrees. Then we show people these images and ask them to label them as characters A to Z, or as “unknown”. I can see three main processes that would lead people to choose this “unknown” label for a case:

  1. Image is just weird, sitting very far from prototype of any character A to Z.
  2. Image sits midway between prototypes of two particular characters in A to Z.
  3. Image closely matches prototype of one of the weird added characters, not in A to Z

If we use a stat analysis that formally models this process, we might be able to take enough of this labeling data and then figure out whether in fact weird characters have been added to the data set of images, and to roughly describe their features.

You’d want to test this method, and see how well it could pick out weird characters and their features. But once it work at least minimally for character images, or some other simple problem, we could then try to do the same for UFO reports. That is, we could model the “unidentified” cases in that data as a combination of weird cases, midway cases, and cases that cluster around new prototypes, which we could then roughly describe. We could then compare the rough descriptions of these new classes to popular but radical UFO explanations, such as aliens or secret military projects.

More formally, assume we have a space of class models, parameterized by A, models that predict the likelihood P(X|A) that a data case X would arise from that class. Then given a set of classes C, each with parameters Ac and a class weight wc, we could for any case X produce a vector of likelihoods pc = wc*P(X|Ac), one for each class c in C. A person might tend more to assign the known label L when the value of pL was high, relative to the other pc. And if a subset U of classes C were unknown, people might tend more to assign assign the label “unknown” when either:

  1. even the highest pc was relatively low,
  2. the top two pc had nearly equal values, or
  3. the highest pc belonged to an unknown class, with c in U.

Using this model of how the label “unknown” is chosen, then given a data set of labeled cases X, including the unknown label, we could find the best parameters wc and Ac (and any in the labeling process) to fit this dataset. When fitting such a model to data, one could try adding new unknown classes, not included in the initial set of labels L. And in this way find out if this data supports the idea of new unknown classes U, and with what parameters.

For UFO reports, the first question is whether the first two processes for producing “unknown” labels seems sufficient to explain the data, or if we need to add a process associated with new classes. And if we need new classes, I’d be interested to see if there is a class fitting the “military prototype” theory, where events happened more near to military bases, more at days and times when those folks tend to work, with more intelligent response, more noise and less making nearby equipment malfunction, and impressive but not crazy extreme speeds and accelerations that increase over time with research abilities. And I’d be especially interested to see if there is a class fitting the “alien” theory, with more crazy extreme speeds and accelerations, enormous sizes, nearby malfunctions, total silence, apparent remarkable knowledge, etc.

Added 9a: Of course the quality of such a stat analysis will depend greatly on the quality of the representations of data X. Poor low-level representations of characters, or of UFO reports, aren’t likely to reveal much interesting or deep. So it is worth trying hard to process UFO reports to create good high level representations of their features.

Added 28May: If there is a variable of importance or visibility of an event, one might also want to model censoring of unimportant hard-to-see events. Perhaps also include censoring near events that authorities want to keep hidden.

GD Star Rating
a WordPress rating system
Tagged as: ,

Constant Elasticity Prevention

While many engaged Analysis #1 in my last post, only one engaged Analysis #2. So let me try again, this time with a graph.

This is about a simple model of prevention, one that assumes a constant elasticity (= power law) between harm and prevention effort. An elasticity of 1 means that 1% more effort cuts harm by 1%. For an elasticity of 2, then 1% more effort cuts harm by 2%, while for an elasticity of 0.5, 1% more effort cuts harm by 0.5%.

Such simple “reduced form” models are common in many fields, including economics. Yes of course the real situation is far more complex than this. Even so, reduced forms are typically decent approximations for at least small variations around a reference policy. As with all models, they are wrong, but can be useful.

Each line in the following graph shows how total loss, i.e., the sum of harm and prevention effort, varies with the fraction of that loss coming from prevention. The different lines are for different elasticities, and the big dots which match the color of their lines show the optimum choice on each line to min total loss. (The lines all intersect at prevention = 1/20, harm = 20.)

As you can see, for min total loss you want to be on a line with higher elasticity, where prevention effort is more effective at cutting harm. And the more effective is prevention effort, then the more effort you want to put in, which will result in a larger fraction of the total harm coming from prevention effort.

So if locks are very effective at preventing theft, you may well pay a lot more for locks on than you ever suffer on average in theft. And in the US today, the elasticity of crime with respect to spending on police is ~0.3, explaining why we suffer ~3x more losses from crime than we spend on police to prevent crime.

Recently, I asked a few polls on using lockdown duration as a way to prevent pandemic deaths. In these polls, I asked directly for estimates of elasticity, and in this poll, I asked for estimates of the ratio of prevention to health harm loss. And here I asked if if the ratio is above one.

In the above graph there is a red dot on the 0.5 elasticity line. In the polls, 56% estimate that our position will be somewhere to the right of the red dot on the graph, while 58% estimate that we will be somewhere above that grey 0.5% elasticity line (with less elasticity). Which means they expect us to do too much lockdown.

Fortunately, the loss at that red dot is “only” 26% higher than at the min of the grey line. So if this pandemic hurts the US by ~$4T, the median poll respondent expects “only” an extra $1T lost due to extra lockdown. Whew.

Added 26May: Follow-up surveys on US find (via lognormal fit) median effort to harm ratio of 3.6, median elasticity of 0.23. For optimum these should be equal – so far more than optimal lockdown!

Added 1Aug: Repeating same questions now gives median effort to harm ratio of 4.0, median elasticity of 0.18. That is, they see the situation as even worse than they saw it before.

Added 22Oct: Repeating the questions now gives median effort to harm ratio of 5.2, median elasticity of 0.10. The estimated deviation between these two key numbers has continued to increase over time.

GD Star Rating
a WordPress rating system
Tagged as: , ,

2 Lockdown Cost-Benefit Analyses

Back on Mar. 21 I complained that I hadn’t seen any cost-benefit analyses of the lockdown policies that had just been applied where I live. Some have been posted since, but I’ve finally bothered to make my own. Here are two.

ANALYSIS #1: One the one side are costs of economic disruption. Let us estimate that a typical strong lockdown cuts ~1/3 of income of econ/social value gained per unit time. (It would be more due to harm from time needed to recover afterward, and to due to for stress and mental health harms.) If one adds 9 weeks of lockdown, perhaps on and off spread out over a longer period, that’s a total of 3 week’s income lost.

On the other side are losses due to infection. I estimate an average infection fatality rate (IFR) of 0.5%, and half as much additional harm to those who don’t die, due to other infection harms. (E.g., 3% have severe symptoms, and 40% of those get 20% disabled.) I estimate that eventually half would get infected, and assume the recovered are immune. Because most victims are old, the average number of life years lost seems to be about 12. But time discounting, quality-of-life adjustment, and the fact that they are poorer, sicker, and wouldn’t live as long as others their age, together arguably cuts that figure by 1/3. And a standard health-econ estimate is that a life-year is worth about twice annual income. Multiply these together and you get an expected loss of 3 week’s income..

As these equal the same amount, it seems a convenient reference point for analysis. Thus, if we believed these estimates, we should be indifferent between doing nothing and a policy of spending 9 added weeks of lockdown (beyond the perhaps 4-8 weeks that might happen without government rules) to prevent all deaths, perhaps because a vaccine would come by then. Or, if death rates would actually be double this estimate due to an overloaded medical system, we should be indifferent between doing nothing and spending 9 added weeks of lockdown to avoid that overloading. Or we should be indifferent between doing nothing and 4 added weeks of lockdown which somehow cuts the above estimated death rate in half.

Unfortunately, the usual “aspirational” estimate for a time till vaccine is far longer, or over 18 months. And a doubling of death rates seems a high estimate for medical system overload effects, perhaps valid sometimes but not usually. It seems hard to use that to argue for longer lockdown periods when medical systems are not nearly overwhelmed. Especially in places like the US with far more medical capacity.

During the 1918 flu epidemic, duration variations around the typical one month lockdown had no noticeable effect on overall deaths. In the US lately we’ve also so far seen no correlation between earlier lockdowns and deaths. And people consistently overestimate the value of medical treatment. Also, as death rates for patients on the oft-celebrated ventilators is 85%, they can’t cut deaths by more than 15%.

We’ve had about 6 weeks of lockdown so far where I live. A short added lockdown seems likely to just delay deaths by a few months, not to cut them much, while a long one seems likely to do more damage than could possibly be saved by cutting deaths.

Of course you don’t have to agree with my reference estimates above. But ask yourself how you’d change them, and what indifferences your new estimates imply. Yes, there are places in the world that seem to have done the right sort of lockdown early enough in the process to get big gains, at least so far. But if your place didn’t start that early nor is doing that right sort of lockdown, can you really expect similar benefits now?

ANALYSIS #2: Consider the related question: how much should we pay to prevent crime?

Assume a simple power-law (= constant elasticity) relation between the cost H of the harm resulting directly from the crimes committed, and the cost P of efforts to prevent crime:

H = k*Pa,  or  dln/ dlnP = –a ,

where a is the (positive) elasticity of harm H with respect to prevention P. To minimize total loss L = H + P, you set P = (k*a)1/(1+a), at which point we have a nice simple expression for the cost ratio, namely P/H = a.

So, when you do it right, the more effective is prevention at stopping harm, then the larger is the fraction of total loss due to prevention. If 1% more prevention effort cuts 1% of crime, you should lose about the same amounts from harm and prevention. If 1% more prevention cuts 2% of crime, then you should lose twice as much in prevention as you do in harm. And if it takes 2% more prevention effort to cut 1% of crime, you should lose about twice as much in harm as you do in prevention.

This model roughly fits two facts about US crime spending: the elasticity is less than one, and most loss comes from the crimes themselves, rather than prevention efforts. Typical estimates of elasticity are around 0.3 (ranging 0.1-0.7). US governments spend $280B a year on police, courts, and jails, and private security spends <$34B. Estimates of the total costs of crime range $690-3410B.

Now consider Covid19 prevention efforts. In this poll respondents said 3.44 to 1 that more harm will come from econ disruption than from direct health harms. And in this poll, 56% say that more than twice the loss will come from econ disruption. For that to be optimal in this constant elasticity model, a 10% increase in lockdown, say adding 12 days to a 4 month lockdown, must cut total eventual deaths (and other illness harm) by over 20%. That seems very hard to achieve, and in this poll 42% said they expect us to see too much econ disruption, while only 29% thought we’d see too little.

(More on Analysis #2 in the next post.)

In this post I’ve outlined two simple analyses of lockdown tradeoffs. Both suggest that we are at serious risk of doing too much lockdown.

10am: On reflection, I changed my estimate of the lockdown from 25% to 27% of income, and my estimate of non-death harm from as-much-as to half-as-much-as the death harm. So my reference added shutdown duration is now 4 months instead of 6.

12pm: Even if recovery gave immunity for only a limited period, then as long as you were considering lockdown durations less than that period, the above calculation still applies, but now it applies to each such period. For example, if immunity only lasts a year, then these are annual costs, not eventual costs. And that’s only if infection chances are independent each period. If, more likely, it is the same people who at more at risk each year, then in later years gains from lockdowns decline.

29Apr, 3am: We are now at 73 comments, and so far all of them are about analysis #1, and none about analysis #2. Also, tweet on #1 got 18 retweets, tweet on #2 got none.

29Apr, 1pm: In two more polls. over half estimate a 10% increase in lockdown duration gives <5% decrease in deaths, for both world and US. Instead of the >20% that would be required to justify allowing twice the damage of lockdowns as health harms. See also results on the cost of masks.

28May:  I’ve updated the numbers a bit.

22Oct: This analysis from March 22, based on happiness, also suggests far more harm from the economy dip than from deaths. And I confirm my analysis with more recent estimates here.

23Oct: I’ve just shown that the above condition that =dln/ dlnP = P/H holds for any function H(P).

GD Star Rating
a WordPress rating system
Tagged as: , , , ,

Beware R0 Variance

The big push now re Covid19 is to use “social distancing” to cut “R0”, the rate at which infection spreads. More precisely, R0 is the average number of other people that one infected person would infect, if they were not already infected. With no efforts to reduce it, estimates for natural R0 range from 2 to 15, with a best estimate perhaps around 4. The big goal is to get this number below 1, so that the pandemic is “suppressed” and goes away, and stays away, until a vaccine or other strong treatment, allowing most to escape infection. In contrast, if R0 stays above 1 we might “flatten the curve”, so that each infected person can get more medical resources when they are sick, but soon most everyone gets infected.

Apparently even with current “lockdown” efforts, all of 11 European nations studied now still have best estimate R0 over 2, with a median ~3.7. So they must do a lot more if they are to suppress. But how much more? My message in this post is that it is far from enough to push median R0 down below 1; one must also push down its variance.

Imagine a population composed of different relatively-isolated subpopulations, each with a different value of R0. Assume that few are infected, so that subpopulation pandemic growth rates are basically just R0. Assume also that these different R0 are distributed log-normally, i.e., the logarithm of R0 has a Gaussian distribution across subpopulations. This is (correctly) the usual distribution assumption for parameters bounded by zero below, as usually many small factors multiply together to set such parameters. The total effective R0 for the whole population is then found simply by integrating (via a lognormal) the effective growth over R0 subpopulations.

For example, assume that the R0 lognormal distribution has log mean (mu) -2 and sigma 1. Here the mode of the distribution, i.e., the most common R0 number, is 0.05, the median R0 is 0.14, only 5% of subpopulations have R0 above 0.70, and only 2% have R0 >1. Even so, if each of these subpopulations maintain their differing R0 over ten infection iterations, the mean growth factor R0 of the whole population is 20 per iteration!

As another example (for log mean -1, sigma 0.5), the R0 mode is 0.29, the median is 0.37, only 5% of subpopulations have an R0 over 0.85, only 2% have R0>1. Yet over ten infection iterations maintaining these same R0 factors per subpopulation, the mean growth factor R0 of the whole population is 1.28 per iteration. That is, the pandemic grows.

Of course these growth numbers eventually don’t apply to finite subpopulations, once most everyone in them gets infected. Because when most of a population is infected, then R0 no longer sets pandemic growth rates. And if these subpopulations were completely isolated from each other, then all of the subpopulations with R0<1 would succeed in suppressing. However, with even a modest amount of interaction among these populations, the highly infected ones will infect the rest.

The following graph tells a somewhat more general story. On the x-axis I vary the median value of R0 among the subpopulations, which sets the log-mean. For each such value, I searched for the log-sigma of the lognormal R0 distribution that makes the total average R0 for the whole population (over ten iterations) exactly equal to 1, so that the pandemic neither grows nor shrinks. Then on the graph I show the standard deviation, in R0 terms, that this requires, and the fraction of subpopulations that grow via R0>1.

As you can see, we consistently need an R0 standard deviation less than 0.21, and the lower the median R0, the lower a fraction of subpopulations with a positive R0 we can tolerate.

So, as long as there is substantial mixing in the world, or within a nation, it is far from enough to get the R0 for the median subpopulation below 1. You also need to greatly reduce the variation, especially the fraction of subpopulations in which the pandemic grows via R0>1. For example, when the median R0 is 0.5, you can tolerate less than 3% of subpopulations having an R0>1, just to hold the pandemic at a constant overall level. And to suppress in limited time, you need to go a lot further.

Different subpopulations with differing R0 seems plausible not just because our world has different nations, classes, cultures, professions, industries, etc., but because Covid19 policy has mostly been made at relatively local levels, varying greatly even within nations. In addition, most things that seem log-normally distributed actually have thicker than-lognormal tails, which makes this whole problem worse.

All of which is to say that suppressing a pandemic like this, with high R0 and many asymptomatic infected, after it has escaped its initial size and region, is very hard. Which is also to say, we probably won’t succeed. Which is to say: we need to set up a Plan B, such as variolation.

Spreadsheet for all this here.

GD Star Rating
a WordPress rating system
Tagged as: ,

Beware Multi-Monopolies

Back in 1948, the Supreme Court ordered Paramount, Metro-Goldwyn-Mayer and other movie studios to divest themselves of their theater chains, ruling that the practice of giving their own theaters preference on the best movies amounted to illegal restraint of trade.

In 1962, MCA, then the most powerful force in Hollywood as both a talent agency and producer of TV shows, was forced to spin off its talent agency after the Justice Department concluded that the combination gave it unfair advantage in both markets.

And in 1970, the Federal Communications Commission prohibited the broadcast networks — ABC, CBS and NBC — from owning or producing programming aired during prime time, ushering in a new golden era of independent production.

In recent decades, however, because of new technology and the government’s willful neglect of the antitrust laws, most of those prohibitions have fallen by the wayside. (more)

My last post talked about how our standard economic models of firms competing in industries typically show industries having too many, not too few, firms. It is a suspicious and damning fact that economists and policy makers have allowed themselves and the public to gain the opposite impression, that our best theories support interventions to cut industry concentration.

My last post didn’t mention the most extreme example of this, the case where we have the strongest theory reason to expect insufficient concentration:

  • Multi-Monopoly: There’s a linear demand curve for a product that customers must assemble for themselves via buying components separately from multiple monopolists. Each monopolist must pay a fixed cost and a constant marginal cost per component sold. Monopolists simultaneously set their prices, and the sum of these prices is intersected with the demand curve to get a quantity, which becomes the quantity that each firms sells.

The coordination failure among these firms is severe. It produces a much lower quantity and welfare than would result if all these firms were merged into a single monopolist who sold a single merged product. So in this case the equilibrium industry concentration is far too low.

This problem continues, though to a lessor extent, even when each of these monopolists is replaced by a small set of firms, each of who faces the same costs, firms who compete to sell that component. This is because the problem arises due to firms having sufficient market power to influence their prices.

For example, this multi-monopoly problem shows up when many towns along a river each separately set the tax they charge for boats to travel down that river. Or when, to get a functioning computer, you must buy both a processing chip and an operating system from separate firms like Intel and Microsoft.

Or when you must buy a movie or TV experience from (1) an agent who makes actors available, (2) a studio who puts those actors together into a performance, and (3) a theatre or broadcast network who finally show it to you. When these 3 parties separately set their prices for these three parts, you have a 3-way monopoly (or strong market power) problem.

This last example is why the quote above by Steven Pearlstein is so sad. He calls for anti-trust authorities to repeat some of their biggest ever mistakes: breaking monopolies into multi-monopolies. And alas, our economic and policy authorities fail to make clear just how big a mistake this is. In most industrial organization classes, both grad and undergrad, you will never even hear about this problem.

GD Star Rating
a WordPress rating system
Tagged as: ,

What’s So Bad About Concentration?

Practical men, who believe themselves to be quite exempt from any intellectual influences, are usually slaves of some defunct economist. (Keynes)

Many have recently said 1) US industries have become more concentrated lately, 2) this is a bad thing, and 3) inadequate antitrust enforcement is in part to blame. (See many related MR posts.)

I’m teaching grad Industrial Organization again this fall, and in that class I go through many standard simple (game-theoretic) math models about firms competiting within industries. And occurs to me to mention that when these models allow “free entry”, i.e., when the number of firms is set by the constraint that they must all expect to make non-negative profits, then such models consistently predict that too many firms enter, not too few. These models suggest that we should worry more about insufficient, not excess, concentration.

Two examples:

  • “Cournot” Quantity Competition Firms pay (the same) fixed cost to enter an industry, and (the same) constant marginal cost to make products there. Knowing the number of firms, each firm simultaneously picks the quantity it will produce. The sum of these quantities is intersected with a linear demand curve to set the price they will all be paid for their products.
  • “Circular City” Differentiated Products Customers are uniformly distributed, and firms are equally distributed, around a circle. Firms pay (the same) fixed cost to enter, and (the same) constant marginal cost to serve each customer. Each firm simultaneously sets its price, and then each customer chooses the firm from which it will buy one unit. This customer must pay not only that firm’s price, but also a “delivery cost” proportional to its distance to that firm.
  • [I also give a Multi-Monopoly example in my next post.]

In both of these cases, when non-negative profit is used to set the number of firms, that number turns out to higher than the number that maximizes total welfare (i.e., consumer value minus production cost). This is true not only for these specific models I’ve just described, but also for most simple variations that I’ve come across. For example, quantity competition might have increasing marginal costs, or a sequential choice of firm quantity. Differentiated products might have a quadratic delivery cost, allow price discrimination by consumer location, or have firms partially pay for delivery costs.

Furthermore, we have a decent general account that explains this general pattern. It is a lot like how there is typically overfishing if new boats enter a fishing area whenever they expect a non-negative profit per boat; each boat ignores the harm it does to other boats by entering. Similarly, firms who enter an industry neglect the costs they impose on other firms already in that industry.

Yes, I do know of models that predict too few firms entering each industry. For example, a model might assume that all the firms who enter an industry go to war with each other via an all-pay auction. The winning firm is the one who paid the most, and gains the option to destroy any other firm. Only one firm remains in the industry, and that is usually too few. However, such models seem more like special cases designed to produce this effect, not typical cases in the space of models.

I’m also not claiming that firms would always set efficient prices. For example, a sufficiently well-informed regulator might be able to improve welfare by lowering the price set by a monopolist. But that’s about the efficiency of prices, not of the number of firms. You can’t say there’s too much concentration even with a monopolist unless the industry would actually be better with more than one firm.

Of course the world is complex and space of possible models is vast. Even so, it does look like the more natural result for the most obvious models is insufficient concentration. That doesn’t prove that this is in fact the typical case in the real world, but it does at least raise a legitimate question: what theory model do people have in mind when they suggest that we now have too much industry concentration? What are they thinking? Can anyone explain?

Added 11a: People sometimes say the cause of excess concentration is “barriers to entry”. The wikipedia page on the concept notes that most specific things “cited as barriers to entry … don’t fit all the commonly cited definitions of a barrier to entry.” These include economies of scale, cost advantages, network effects, regulations, ads, customer loyalty, research, inelastic demand, vertical integration, occupational licensing, mergers, and predatory pricing. Including these factors in models does not typically predict excess concentration.

That wiki page does list some specific factors as fitting “all the common definitions of primary economic barriers to entry.” These include IP, zoning, agreements with distributors and suppliers, customers switching costs, and taxes. But I say that models which include such factors also do not consistently predict excess firm concentration. And I still want to know which of these factors complainers have in mind as the source of the recent increased US concentration problem that they see.

Added 7Sep: Many have in mind the idea that regulations impose fixed costs that are easier on larger firms. But let us always agree that it would be good to lower costs. Fixed costs are real costs, and can’t be just assumed away. If you know a feasible way to actually lower such costs, great let’s do that, but that’s not about excess concentration, that’s about excess costs.

GD Star Rating
a WordPress rating system
Tagged as: ,

Non-Conformist Influence

Here is a simple model that suggests that non-conformists can have more influence than conformists.

Regarding a one dimensional choice x, let each person i take a public position xi, and let the perceived mean social consensus be m = Σiwixi, where wi is the weight that person i gets in the consensus. In choosing their public position xi, person i cares about getting close to both their personal ideal point ai and to the consensus m, via the utility function

Ui(xi) = -ci(xi-ai)2 – (1-ci)(xi-m)2.

Here ci is person i’s non-conformity, i.e., their willingness to have their public position reflect their personal ideal point, relative to the social consensus. When each person simultaneously chooses their xi while knowing all of the ai,wi,ci, the (Nash) equilibrium consensus is

m = Σi wiciai (ci + (1-ci)(1-wi))-1 (1- Σjwj(1-cj)(1-wj)/(cj + (1-cj)(1-wj)))-1

If each wi<<1, then the relative weight that each person gets in the consensus is close to wiciai. So how much their ideal point ai counts is roughly proportional to their non-conformity ci times their weight wi. So all else equal, non-conformists have more influence over the consensus.

Now it is possible that others will reduce the weight wi that they give the non-conformists with high ci in the consensus. But this is hard when ci is hard to observe, and as long as this reduction is not fully (or more than fully) proportional to their increased non-confomity, non-conformists continue to have more influence.

It is also possible that extremists, who pick xi that deviate more from that of others, will be directly down-weighted. (This happens in the weights wi=k/|xi-xm| that produce a median xm, for example.) This makes more sense in the more plausible situation where xi,wi are observable but ai,ci are not. In this case, it is the moderate non-conformists, who happen to agree more with others, who have the most influence.

Note that there is already a sense in which, holding constant their weight wi, an extremist has a disproportionate influence on the mean: a 10 percent change in the quantity xi – m changes the consensus mean m twice as much when that quantity xi – m is twice as large.

GD Star Rating
a WordPress rating system
Tagged as: ,