Tag Archives: Math

Lognormal Priorities

In many polls on continuous variables over the last year, I’ve seen lognormal distributions typically fit poll responses well. And of course lognormals are also one of the most common distributions in nature. So let’s consider the possibility that, regarding problem areas like global warming, falling fertility, or nuclear war, distributions of priority estimate are lognormal.

Here are parameter values (M = median, A = (mean) average, S = sigma) for lognormal fits to polls on how many full-time equivalent workers should be working on each of the following six problems:

Note that priorities as set by medians are quite different from those set by averages.

Imagine that someone is asked to estimate their (median) priority of a topic area. If their estimate results from taking the product of many estimates regarding relevant factors, then not-fully-dependent noise across different factors will tend to produce a lognormal distribution regarding overall (median) estimates. If they were to then act on those estimates, such as for a poll or choosing to devote time or money, we should see a lognormal distribution of opinions and efforts. When variance (and sigma) is high, and effort is on average roughly proportional to perceived priority, then most effort should come from a quite small fraction of the population. And poll answers should look lognormal. We see both these things.

Now let’s make our theory a bit more complex. Assume that people see not only their own estimate, but sometimes also estimates of others. They then naturally put info weight on others’ estimates. This results in a distribution of (median) opinions with the same median, but a lower variance (and sigma). If they were fully rational and fully aware of each others’ opinions, this variance would fall to zero. But it doesn’t; people in general don’t listen to each other as much as they should if they cared only about accuracy. So the poll response variance we see is probably smaller than the variance in initial individual estimates, though we don’t know how much smaller.

What if the topic area in question has many subareas, and each person gives an estimate that applies to a random subarea of the total area? For example, when estimating the priority of depression, each person may draw conclusions by looking at the depressed people around them. In this case, the distribution of estimates reflects not only the variance of noisy clues, but also the real variance of priority within the overall area. Here fully rational people would come to agree on both a median and a variance, a variance reflecting the distribution of priority within this area. This true variance would be less than the variance in poll responses in a population that does not listen to each other as much as they should.

(The same applies to the variance within each person’s estimate distribution. Even if all info is aggregated, if this distribution has a remaining variance, that is “real” variance that should count, just as variance within an area should count. It is the variance resulting from failing to aggregate info that should not count.)

Now let’s consider what this all implies for action biases. If the variance in opinion expressed and acted on were due entirely to people randomly sampling from the actual variance within each area, then efforts toward each area would end up being in proportion to an info-aggregated best estimates of each area’s priority – a social optimum! But the more that variance in opinion and thus effort is also due to variance in individual noisy estimates, then the more that such variance will distort efforts. Efforts will go more as the average of each distribution, rather than its median. The priority areas with higher variance in individual noise will get too much effort, relative to areas with lower variance.

Of course there are other relevant factors that determine efforts, besides these priorities. Some priority areas have organizations that help to coordinate related efforts, thus reducing free riding problems. Some areas become fashionable, giving people extra social reasons to put in visible efforts. And other areas look weird or evil, discouraging visible efforts. Even so, we should worry that too much effort will go to areas with high variance in priority estimate noise. All else equal, you should avoid such areas. Unless estimate variance reflects mostly true variance within an area, prefer high medians over high averages.

Added 3p: I tried 7 more mundane issues, to see how they varied in variance. The following includes all 13, sorted by median.

GD Star Rating
a WordPress rating system
Tagged as: , ,

Risk-Aversion Sets Life Value

Many pandemic cost-benefit analyses estimate larger containment benefits than did I, mainly due to larger costs for each life lost. Surprised to see this, I’ve been reviewing the value of life literature. The key question: how much money (or resources) should you, or we, be willing to pay to gain more life? Here are five increasingly sophisticated views:

  1. Infinite – Pay any price for any chance to save any human life.
  2. Value Per Life – $ value per human life saved.
  3. Quality Adjusted Life Year (QALY) – $ value per life year saved, adjusted for quality.
  4. Life Year To Income Ratio – Value ratio between a year of life and a year of income.
  5. Risk Aversion – Life to income ratio comes from elasticity of utility w.r.t. income.

The first view, of infinite value, is the simplest. If you imagine someone putting a gun to your head, you might imagine paying any dollar price to not be shot. There are popular sayings to this effect, and many even call this a fundamental moral norm, punishing those who visibly violate it. For example, a hospital administrator who could save a boy’s life, but at great expense, is seen as evil and deserving of punishment, if he doesn’t save the boy. But he is seen as almost as evil if he does save the boy, but thinks about his choice for a while.

Which shows just how hypocritical and selective our norm enforcement can be, as we all make frequent choices that express a finite values on life. Every time we don’t pay all possible costs to use the absolutely safest products and processes because they cost more in terms of time, money, or quality of output, we reveal that we do not put infinite value on life.

The second view, where we put a specific dollar value on each life, has long been shunned by officials, who deny they do any such thing, even though they in effect do. Juries have awarded big claims against firms that explicitly used value of life calculations to not to adopt safety features, even when they used high values of life. Yet it is easy to show that we can have both more money and save more lives if we are more consistent about the price we pay for lives in the many different death-risk-versus-cost choices that we make.

Studies that estimate the monetary price we are willing to pay to save a life have long shown puzzlingly great variation across individuals and contexts. Perhaps in part because the topic is politically charged. Those who seek to justify higher safety spending, stronger regulations, or larger court damages re medicine, food, environmental, or job accidents tend to want higher estimates, while those who seek to justify less and weaker of such things tend to want lower estimates.

The third view says that the main reason to not die is to gain more years of life. We thus care less about deaths of older and sicker folks, who have shorter remaining lives if they are saved now from death. Older people are often upset to be thus less valued, and Congress put terms into the US ACA (Obamacare) medicine bill forbidding agencies from using life years saved to judge medical treatments. Those disabled and in pain can also be upset to have their life years valued less, due to lower quality, though discounting low-quality years is exactly how the calculus says that it is good to prevent disability and pain, as well as death.

It can make sense to discount life years not only for disability, but also for distance in time. That is, saving you from dying now instead of a year from now can be worth more than saving you from dying 59 years from now, instead of 60 years from now. I haven’t seen studies which estimate how much we actually discount life years with time.

You can’t spend more to prevent death or disability than you have. There is thus a hard upper bound on how much you can be willing to pay for anything, even your life. So if you spend a substantial fraction of what you have for your life, your value of life must at least roughly scale with income, at least at the high or low end of the income spectrum. Which leads us to the fourth view listed above, that if you double your income, you double the monetary value you place on a QALY. Of course we aren’t talking about short-term income, which can vary a lot. More like a lifetime income, or the average long-term incomes of the many associates who may care about someone.

The fact that medical spending as a fraction of income tends to rise with income suggests that richer people place proportionally more value on their life. But in fact meta-analyses of the many studies on value of life seem to suggest that higher income people place proportionally less value on life. Often as low as value of life going as the square root of income.

Back in 1992, Lawrence Summers, then Chief Economist of the World Bank, got into trouble for approving a memo which suggested shipping pollution to poor nations, as lives lost there cost less. People were furious at this “moral premise”. So maybe studies done in poor nations are being slanted by the people there to get high values, to prove that their lives are worth just as much.

Empirical estimates of the value ratio of life relative to income still vary a lot. But a simple theoretical argument suggests that variation in this value is mostly due to variation in risk-aversion. Which is the fifth and last view listed above. Here’s a suggestive little formal model. (If you don’t like math, skip to the last two paragraphs.)

Assume life happens at discrete times t. Between each t and t+1, there is a probability p(et) of not dying, which is increasing in death prevention effort et. (To model time discounting, use δ*p here instead of p.) Thus from time t onward, expected lifespan is Lt = 1 + p(et)*Lt+1. Total value from time t onward is similarly given by Vt = u(ct) + p(et)*Vt+1, where utility u(ct) is increasing in that time’s consumption ct.

Consumption ct and effort et are constrained by budget B, so that ct + etB. If budget B and functions p(e) and u(c) are the same at all times t, then unique interior optimums of e and c are as well, and also L and V. Thus we have L = 1/(1-p), and V = u/(1-p) = u*L.

In this model, the life to income value ratio is the value of increasing Lt from L to L+x, divided by the value of increasing ct from c to c(1+x), for x small and some particular time t. That is:

(dL * dV/dL) / (dc * dV/dc) = xu / (x * c  * du/dc) = [ c * u’(c) / u(c) ]-1.

Which is just the inverse of the elasticity of with respect to c.

That non-linear (concave) shape of the utility function u(c) is also what produces risk-aversion. Note that (relative) risk aversion is usually defined as -c*u”(c)/u’(c), to be invariant under affine transformations of u and c. Here we don’t need such an invariance, as we have a clear zero level of c, the level at which u(c) = 0, so that one is indifferent between death and life with that consumption level.

So in this simple model, the life to income value ratio is just the inverse of the elasticity of the utility function. If elasticity is constant (as with power-law utility), then the life to income ratio is independent of income. A risk-neutral agent puts an equal value on a year of life and a year of income, while an agent with square root utility puts twice as much value on a year of life as a year of income. With no time discounting, the US EPA value of life of $10M corresponds to a life year worth over four times average US income, and thus to a power law utility function where the power is less than one quarter.

This reduction of the value of life to risk aversion (really concavity) helps us understand why the value of life varies so much over individuals and contexts, as we also see puzzlingly large variation and context dependence when we measure risk aversion. I’ll write more on that puzzle soon.

Added 23June: The above model applies directly to the case where, by being alive, one can earn budget B in each time period to spend in that period. This model can also apply to the case where one owns assets A, assets which when invested can grow from A to rA in one time period, and be gambled at fair odds on whether one dies. In this case the above model applies for B = A*(1-p/r).

Added 25June: I think the model gives the same result if we generalize it in the following way: Bt, and pt(et,ct) vary with time, but in a way so that optimal ct = c is constant in time, and dpt/ct = o at the actual values of ct,et.

GD Star Rating
a WordPress rating system
Tagged as: , , ,

Modeling the ‘Unknown’ Label

Recently I’ve browsed some specific UFO encounter reports, and I must admit they can feel quite compelling. But then I remember the huge selection effect. We all go about our lives looking at things, and only rarely do any of us officially report anything as so strange that authorities should know about it. And then when experts do look into such reports, they usually assign them to one of a few mundane explanation categories, such as “Venus” or “helicopter.”  For only a small fraction do they label it “unidentified”. And from thousands of these cases, the strangest and most compelling few become the most widely reported. So of course the UFO reports I see are compelling!

However, that doesn’t mean that such reports aren’t telling us about new kinds of things. After all, noticing weird deviations from existing categories is how we always learn about new kinds of things. So we should study this data carefully to see if random variation around our existing categories seems sufficient to explain it, or if we need to expand our list of categories and the theories on which they are based. Alas, while enormous time and effort has been spent collecting all these reports, it seems that far less effort has been spent to formally analyze them. So that’s what I propose.

Specifically, I suggest that we more formally model the choice to label something “unknown”. That is, model all this data as a finite mixture of classes, and then explicitly model the process by which items are assigned to a known class, versus labeled as “unknown.” Let me explain.

Imagine that we had a data set of images of characters from the alphabet, A to Z, and perhaps a few more weird characters like წ. Nice clean images. Then we add a lot of noise and mess them up in many ways and to varying degrees. Then we show people these images and ask them to label them as characters A to Z, or as “unknown”. I can see three main processes that would lead people to choose this “unknown” label for a case:

  1. Image is just weird, sitting very far from prototype of any character A to Z.
  2. Image sits midway between prototypes of two particular characters in A to Z.
  3. Image closely matches prototype of one of the weird added characters, not in A to Z

If we use a stat analysis that formally models this process, we might be able to take enough of this labeling data and then figure out whether in fact weird characters have been added to the data set of images, and to roughly describe their features.

You’d want to test this method, and see how well it could pick out weird characters and their features. But once it work at least minimally for character images, or some other simple problem, we could then try to do the same for UFO reports. That is, we could model the “unidentified” cases in that data as a combination of weird cases, midway cases, and cases that cluster around new prototypes, which we could then roughly describe. We could then compare the rough descriptions of these new classes to popular but radical UFO explanations, such as aliens or secret military projects.

More formally, assume we have a space of class models, parameterized by A, models that predict the likelihood P(X|A) that a data case X would arise from that class. Then given a set of classes C, each with parameters Ac and a class weight wc, we could for any case X produce a vector of likelihoods pc = wc*P(X|Ac), one for each class c in C. A person might tend more to assign the known label L when the value of pL was high, relative to the other pc. And if a subset U of classes C were unknown, people might tend more to assign assign the label “unknown” when either:

  1. even the highest pc was relatively low,
  2. the top two pc had nearly equal values, or
  3. the highest pc belonged to an unknown class, with c in U.

Using this model of how the label “unknown” is chosen, then given a data set of labeled cases X, including the unknown label, we could find the best parameters wc and Ac (and any in the labeling process) to fit this dataset. When fitting such a model to data, one could try adding new unknown classes, not included in the initial set of labels L. And in this way find out if this data supports the idea of new unknown classes U, and with what parameters.

For UFO reports, the first question is whether the first two processes for producing “unknown” labels seems sufficient to explain the data, or if we need to add a process associated with new classes. And if we need new classes, I’d be interested to see if there is a class fitting the “military prototype” theory, where events happened more near to military bases, more at days and times when those folks tend to work, with more intelligent response, more noise and less making nearby equipment malfunction, and impressive but not crazy extreme speeds and accelerations that increase over time with research abilities. And I’d be especially interested to see if there is a class fitting the “alien” theory, with more crazy extreme speeds and accelerations, enormous sizes, nearby malfunctions, total silence, apparent remarkable knowledge, etc.

Added 9a: Of course the quality of such a stat analysis will depend greatly on the quality of the representations of data X. Poor low-level representations of characters, or of UFO reports, aren’t likely to reveal much interesting or deep. So it is worth trying hard to process UFO reports to create good high level representations of their features.

Added 28May: If there is a variable of importance or visibility of an event, one might also want to model censoring of unimportant hard-to-see events. Perhaps also include censoring near events that authorities want to keep hidden.

GD Star Rating
a WordPress rating system
Tagged as: ,

Constant Elasticity Prevention

While many engaged Analysis #1 in my last post, only one engaged Analysis #2. So let me try again, this time with a graph.

This is about a simple model of prevention, one that assumes a constant elasticity (= power law) between harm and prevention effort. An elasticity of 1 means that 1% more effort cuts harm by 1%. For an elasticity of 2, then 1% more effort cuts harm by 2%, while for an elasticity of 0.5, 1% more effort cuts harm by 0.5%.

Such simple “reduced form” models are common in many fields, including economics. Yes of course the real situation is far more complex than this. Even so, reduced forms are typically decent approximations for at least small variations around a reference policy. As with all models, they are wrong, but can be useful.

Each line in the following graph shows how total loss, i.e., the sum of harm and prevention effort, varies with the fraction of that loss coming from prevention. The different lines are for different elasticities, and the big dots which match the color of their lines show the optimum choice on each line to min total loss. (The lines all intersect at prevention = 1/20, harm = 20.)

As you can see, for min total loss you want to be on a line with higher elasticity, where prevention effort is more effective at cutting harm. And the more effective is prevention effort, then the more effort you want to put in, which will result in a larger fraction of the total harm coming from prevention effort.

So if locks are very effective at preventing theft, you may well pay a lot more for locks on than you ever suffer on average in theft. And in the US today, the elasticity of crime with respect to spending on police is ~0.3, explaining why we suffer ~3x more losses from crime than we spend on police to prevent crime.

Recently, I asked a few polls on using lockdown duration as a way to prevent pandemic deaths. In these polls, I asked directly for estimates of elasticity, and in this poll, I asked for estimates of the ratio of prevention to health harm loss. And here I asked if if the ratio is above one.

In the above graph there is a red dot on the 0.5 elasticity line. In the polls, 56% estimate that our position will be somewhere to the right of the red dot on the graph, while 58% estimate that we will be somewhere above that grey 0.5% elasticity line (with less elasticity). Which means they expect us to do too much lockdown.

Fortunately, the loss at that red dot is “only” 26% higher than at the min of the grey line. So if this pandemic hurts the US by ~$4T, the median poll respondent expects “only” an extra $1T lost due to extra lockdown. Whew.

Added 26May: Follow-up surveys on US find (via lognormal fit) median effort to harm ratio of 3.6, median elasticity of 0.23. For optimum these should be equal – so far more than optimal lockdown!

Added 1Aug: Repeating same questions now gives median effort to harm ratio of 4.0, median elasticity of 0.18. That is, they see the situation as even worse than they saw it before.

GD Star Rating
a WordPress rating system
Tagged as: , ,

Beware R0 Variance

The big push now re Covid19 is to use “social distancing” to cut “R0”, the rate at which infection spreads. More precisely, R0 is the average number of other people that one infected person would infect, if they were not already infected. With no efforts to reduce it, estimates for natural R0 range from 2 to 15, with a best estimate perhaps around 4. The big goal is to get this number below 1, so that the pandemic is “suppressed” and goes away, and stays away, until a vaccine or other strong treatment, allowing most to escape infection. In contrast, if R0 stays above 1 we might “flatten the curve”, so that each infected person can get more medical resources when they are sick, but soon most everyone gets infected.

Apparently even with current “lockdown” efforts, all of 11 European nations studied now still have best estimate R0 over 2, with a median ~3.7. So they must do a lot more if they are to suppress. But how much more? My message in this post is that it is far from enough to push median R0 down below 1; one must also push down its variance.

Imagine a population composed of different relatively-isolated subpopulations, each with a different value of R0. Assume that few are infected, so that subpopulation pandemic growth rates are basically just R0. Assume also that these different R0 are distributed log-normally, i.e., the logarithm of R0 has a Gaussian distribution across subpopulations. This is (correctly) the usual distribution assumption for parameters bounded by zero below, as usually many small factors multiply together to set such parameters. The total effective R0 for the whole population is then found simply by integrating (via a lognormal) the effective growth over R0 subpopulations.

For example, assume that the R0 lognormal distribution has log mean (mu) -2 and sigma 1. Here the mode of the distribution, i.e., the most common R0 number, is 0.05, the median R0 is 0.14, only 5% of subpopulations have R0 above 0.70, and only 2% have R0 >1. Even so, if each of these subpopulations maintain their differing R0 over ten infection iterations, the mean growth factor R0 of the whole population is 20 per iteration!

As another example (for log mean -1, sigma 0.5), the R0 mode is 0.29, the median is 0.37, only 5% of subpopulations have an R0 over 0.85, only 2% have R0>1. Yet over ten infection iterations maintaining these same R0 factors per subpopulation, the mean growth factor R0 of the whole population is 1.28 per iteration. That is, the pandemic grows.

Of course these growth numbers eventually don’t apply to finite subpopulations, once most everyone in them gets infected. Because when most of a population is infected, then R0 no longer sets pandemic growth rates. And if these subpopulations were completely isolated from each other, then all of the subpopulations with R0<1 would succeed in suppressing. However, with even a modest amount of interaction among these populations, the highly infected ones will infect the rest.

The following graph tells a somewhat more general story. On the x-axis I vary the median value of R0 among the subpopulations, which sets the log-mean. For each such value, I searched for the log-sigma of the lognormal R0 distribution that makes the total average R0 for the whole population (over ten iterations) exactly equal to 1, so that the pandemic neither grows nor shrinks. Then on the graph I show the standard deviation, in R0 terms, that this requires, and the fraction of subpopulations that grow via R0>1.

As you can see, we consistently need an R0 standard deviation less than 0.21, and the lower the median R0, the lower a fraction of subpopulations with a positive R0 we can tolerate.

So, as long as there is substantial mixing in the world, or within a nation, it is far from enough to get the R0 for the median subpopulation below 1. You also need to greatly reduce the variation, especially the fraction of subpopulations in which the pandemic grows via R0>1. For example, when the median R0 is 0.5, you can tolerate less than 3% of subpopulations having an R0>1, just to hold the pandemic at a constant overall level. And to suppress in limited time, you need to go a lot further.

Different subpopulations with differing R0 seems plausible not just because our world has different nations, classes, cultures, professions, industries, etc., but because Covid19 policy has mostly been made at relatively local levels, varying greatly even within nations. In addition, most things that seem log-normally distributed actually have thicker than-lognormal tails, which makes this whole problem worse.

All of which is to say that suppressing a pandemic like this, with high R0 and many asymptomatic infected, after it has escaped its initial size and region, is very hard. Which is also to say, we probably won’t succeed. Which is to say: we need to set up a Plan B, such as variolation.

Spreadsheet for all this here.

GD Star Rating
a WordPress rating system
Tagged as: ,

Beware Multi-Monopolies

Back in 1948, the Supreme Court ordered Paramount, Metro-Goldwyn-Mayer and other movie studios to divest themselves of their theater chains, ruling that the practice of giving their own theaters preference on the best movies amounted to illegal restraint of trade.

In 1962, MCA, then the most powerful force in Hollywood as both a talent agency and producer of TV shows, was forced to spin off its talent agency after the Justice Department concluded that the combination gave it unfair advantage in both markets.

And in 1970, the Federal Communications Commission prohibited the broadcast networks — ABC, CBS and NBC — from owning or producing programming aired during prime time, ushering in a new golden era of independent production.

In recent decades, however, because of new technology and the government’s willful neglect of the antitrust laws, most of those prohibitions have fallen by the wayside. (more)

My last post talked about how our standard economic models of firms competing in industries typically show industries having too many, not too few, firms. It is a suspicious and damning fact that economists and policy makers have allowed themselves and the public to gain the opposite impression, that our best theories support interventions to cut industry concentration.

My last post didn’t mention the most extreme example of this, the case where we have the strongest theory reason to expect insufficient concentration:

  • Multi-Monopoly: There’s a linear demand curve for a product that customers must assemble for themselves via buying components separately from multiple monopolists. Each monopolist must pay a fixed cost and a constant marginal cost per component sold. Monopolists simultaneously set their prices, and the sum of these prices is intersected with the demand curve to get a quantity, which becomes the quantity that each firms sells.

The coordination failure among these firms is severe. It produces a much lower quantity and welfare than would result if all these firms were merged into a single monopolist who sold a single merged product. So in this case the equilibrium industry concentration is far too low.

This problem continues, though to a lessor extent, even when each of these monopolists is replaced by a small set of firms, each of who faces the same costs, firms who compete to sell that component. This is because the problem arises due to firms having sufficient market power to influence their prices.

For example, this multi-monopoly problem shows up when many towns along a river each separately set the tax they charge for boats to travel down that river. Or when, to get a functioning computer, you must buy both a processing chip and an operating system from separate firms like Intel and Microsoft.

Or when you must buy a movie or TV experience from (1) an agent who makes actors available, (2) a studio who puts those actors together into a performance, and (3) a theatre or broadcast network who finally show it to you. When these 3 parties separately set their prices for these three parts, you have a 3-way monopoly (or strong market power) problem.

This last example is why the quote above by Steven Pearlstein is so sad. He calls for anti-trust authorities to repeat some of their biggest ever mistakes: breaking monopolies into multi-monopolies. And alas, our economic and policy authorities fail to make clear just how big a mistake this is. In most industrial organization classes, both grad and undergrad, you will never even hear about this problem.

GD Star Rating
a WordPress rating system
Tagged as: ,

What’s So Bad About Concentration?

Practical men, who believe themselves to be quite exempt from any intellectual influences, are usually slaves of some defunct economist. (Keynes)

Many have recently said 1) US industries have become more concentrated lately, 2) this is a bad thing, and 3) inadequate antitrust enforcement is in part to blame. (See many related MR posts.)

I’m teaching grad Industrial Organization again this fall, and in that class I go through many standard simple (game-theoretic) math models about firms competiting within industries. And occurs to me to mention that when these models allow “free entry”, i.e., when the number of firms is set by the constraint that they must all expect to make non-negative profits, then such models consistently predict that too many firms enter, not too few. These models suggest that we should worry more about insufficient, not excess, concentration.

Two examples:

  • “Cournot” Quantity Competition Firms pay (the same) fixed cost to enter an industry, and (the same) constant marginal cost to make products there. Knowing the number of firms, each firm simultaneously picks the quantity it will produce. The sum of these quantities is intersected with a linear demand curve to set the price they will all be paid for their products.
  • “Circular City” Differentiated Products Customers are uniformly distributed, and firms are equally distributed, around a circle. Firms pay (the same) fixed cost to enter, and (the same) constant marginal cost to serve each customer. Each firm simultaneously sets its price, and then each customer chooses the firm from which it will buy one unit. This customer must pay not only that firm’s price, but also a “delivery cost” proportional to its distance to that firm.
  • [I also give a Multi-Monopoly example in my next post.]

In both of these cases, when non-negative profit is used to set the number of firms, that number turns out to higher than the number that maximizes total welfare (i.e., consumer value minus production cost). This is true not only for these specific models I’ve just described, but also for most simple variations that I’ve come across. For example, quantity competition might have increasing marginal costs, or a sequential choice of firm quantity. Differentiated products might have a quadratic delivery cost, allow price discrimination by consumer location, or have firms partially pay for delivery costs.

Furthermore, we have a decent general account that explains this general pattern. It is a lot like how there is typically overfishing if new boats enter a fishing area whenever they expect a non-negative profit per boat; each boat ignores the harm it does to other boats by entering. Similarly, firms who enter an industry neglect the costs they impose on other firms already in that industry.

Yes, I do know of models that predict too few firms entering each industry. For example, a model might assume that all the firms who enter an industry go to war with each other via an all-pay auction. The winning firm is the one who paid the most, and gains the option to destroy any other firm. Only one firm remains in the industry, and that is usually too few. However, such models seem more like special cases designed to produce this effect, not typical cases in the space of models.

I’m also not claiming that firms would always set efficient prices. For example, a sufficiently well-informed regulator might be able to improve welfare by lowering the price set by a monopolist. But that’s about the efficiency of prices, not of the number of firms. You can’t say there’s too much concentration even with a monopolist unless the industry would actually be better with more than one firm.

Of course the world is complex and space of possible models is vast. Even so, it does look like the more natural result for the most obvious models is insufficient concentration. That doesn’t prove that this is in fact the typical case in the real world, but it does at least raise a legitimate question: what theory model do people have in mind when they suggest that we now have too much industry concentration? What are they thinking? Can anyone explain?

Added 11a: People sometimes say the cause of excess concentration is “barriers to entry”. The wikipedia page on the concept notes that most specific things “cited as barriers to entry … don’t fit all the commonly cited definitions of a barrier to entry.” These include economies of scale, cost advantages, network effects, regulations, ads, customer loyalty, research, inelastic demand, vertical integration, occupational licensing, mergers, and predatory pricing. Including these factors in models does not typically predict excess concentration.

That wiki page does list some specific factors as fitting “all the common definitions of primary economic barriers to entry.” These include IP, zoning, agreements with distributors and suppliers, customers switching costs, and taxes. But I say that models which include such factors also do not consistently predict excess firm concentration. And I still want to know which of these factors complainers have in mind as the source of the recent increased US concentration problem that they see.

Added 7Sep: Many have in mind the idea that regulations impose fixed costs that are easier on larger firms. But let us always agree that it would be good to lower costs. Fixed costs are real costs, and can’t be just assumed away. If you know a feasible way to actually lower such costs, great let’s do that, but that’s not about excess concentration, that’s about excess costs.

GD Star Rating
a WordPress rating system
Tagged as: ,

Non-Conformist Influence

Here is a simple model that suggests that non-conformists can have more influence than conformists.

Regarding a one dimensional choice x, let each person i take a public position xi, and let the perceived mean social consensus be m = Σiwixi, where wi is the weight that person i gets in the consensus. In choosing their public position xi, person i cares about getting close to both their personal ideal point ai and to the consensus m, via the utility function

Ui(xi) = -ci(xi-ai)2 – (1-ci)(xi-m)2.

Here ci is person i’s non-conformity, i.e., their willingness to have their public position reflect their personal ideal point, relative to the social consensus. When each person simultaneously chooses their xi while knowing all of the ai,wi,ci, the (Nash) equilibrium consensus is

m = Σi wiciai (ci + (1-ci)(1-wi))-1 (1- Σjwj(1-cj)(1-wj)/(cj + (1-cj)(1-wj)))-1

If each wi<<1, then the relative weight that each person gets in the consensus is close to wiciai. So how much their ideal point ai counts is roughly proportional to their non-conformity ci times their weight wi. So all else equal, non-conformists have more influence over the consensus.

Now it is possible that others will reduce the weight wi that they give the non-conformists with high ci in the consensus. But this is hard when ci is hard to observe, and as long as this reduction is not fully (or more than fully) proportional to their increased non-confomity, non-conformists continue to have more influence.

It is also possible that extremists, who pick xi that deviate more from that of others, will be directly down-weighted. (This happens in the weights wi=k/|xi-xm| that produce a median xm, for example.) This makes more sense in the more plausible situation where xi,wi are observable but ai,ci are not. In this case, it is the moderate non-conformists, who happen to agree more with others, who have the most influence.

Note that there is already a sense in which, holding constant their weight wi, an extremist has a disproportionate influence on the mean: a 10 percent change in the quantity xi – m changes the consensus mean m twice as much when that quantity xi – m is twice as large.

GD Star Rating
a WordPress rating system
Tagged as: ,

High Dimensional Societes?

I’ve seen many “spatial” models in social science. Such as models where voters and politicians sit at points in a space of policies. Or where customers and firms sit at points in a space of products. But I’ve never seen a discussion of how one should expect such models to change in high dimensions, such as when there are more dimensions than points.

In small dimensional spaces, the distances between points vary greatly; neighboring points are much closer to each other than are distant points. However, in high dimensional spaces, distances between points vary much less; all points are about the same distance from all other points. When points are distributed randomly, however, these distances do vary somewhat, allowing us to define the few points closest to each point as that point’s “neighbors”. “Hubs” are closest neighbors to many more points than average, while “anti-hubs” are closest neighbors to many fewer points than average. It turns out that in higher dimensions a larger fraction of points are hubs and anti-hubs (Zimek et al. 2012).

If we think of people or organizations as such points, is being a hub or anti-hub associated with any distinct social behavior?  Does it contribute substantially to being popular or unpopular? Or does the fact that real people and organizations are in fact distributed in real space overwhelm such things, which only only happen in a truly high dimensional social world?

GD Star Rating
a WordPress rating system
Tagged as:

Chip Away At Hard Problems

Catherine: And your own research.
Harold: Such as it is.
C: What’s wrong with it?
H: The big ideas aren’t there.
C: Well, it’s not about big ideas. It’s… It’s work. You got to chip away at a problem.
H: That’s not what your dad did.
C: I think it was, in a way. I mean, he’d attack a problem from the side, you know, from some weird angle. Sneak up on it, grind away at it.
(Lines from movie Proof; Catherine is a famous mathematician’s daughter.)

In math, plausibility arguments don’t count for much; proofs are required. So math folks have little choice but to chip away at hard problems, seeking weird angles where indirect progress may be possible.

Outside of math, however, we usually have many possible methods of study and analysis. And a key tradeoff in our methods is between ease and directness on the one hand, and robustness and rigor on the other. At one extreme, you can just ask your intuition to quickly form a judgement that’s directly on topic. At the other extreme, you can try to prove math theorems. In between these extremes, informal conversation is more direct, while statistical inference is more rigorous.

When you need to make an immediate decision fast, direct easy methods look great. But when many varied people want to share an analysis process over a longer time period, more robust rigorous methods start to look better. Easy direct easy methods tend to be more uncertain and context dependent, and so don’t aggregate as well. Distant others find it harder to understand your claims and reasoning, and to judge their reliability. So distant others tend more to redo such analysis themselves rather than building on your analysis.

One of the most common ways that wannabe academics fail is by failing to sufficiently focus on a few topics of interest to academia. Many of them become amateur intellectuals, people who think and write more as a hobby, and less to gain professional rewards via institutions like academia, media, and business. Such amateurs are often just as smart and hard-working as professionals, and they can more directly address the topics that interest them. Professionals, in contrast, must specialize more, have less freedom to pick topics, and must try harder to impress others, which encourages the use of more difficult robust/rigorous methods.

You might think their added freedom would result in amateurs contributing proportionally more to intellectual progress, but in fact they contribute less. Yes, amateurs can and do make more initial progress when new topics arise suddenly far from topics where established expert institutions have specialized. But then over time amateurs blow their lead by focusing less and relying on easier more direct methods. They rely more on informal conversation as analysis method, they prefer personal connections over open competitions in choosing people, and they rely more on a perceived consensus among a smaller group of fellow enthusiasts. As a result, their contributions just don’t appeal as widely or as long.

I must admit that compared to most academics near me, I’ve leaned more toward amateur styles. That is, I’ve used my own judgement more on topics, and I’ve been willing to use less formal methods. I clearly see the optimum as somewhere between the typical amateur and academic styles. But even so, I’m very conscious of trying to avoid typical amateur errors.

So instead of just trying to directly address what seem the most important topics, I instead look for weird angles to contribute less directly via more reliable/robust methods. I have great patience for revisiting the few biggest questions, not to see who agrees with me, but to search for new angles at which one might chip away.

I want each thing I say to be relatively clear, and so understandable from a wide range of cultural and intellectual contexts, and to be either a pretty obvious no-brainer, or based on a transparent easy to explain argument. This is partly why I try to avoid arguing values. Even so, I expect that the most likely reason I will fail is that that I’ve allowed myself to move too far in the amateur direction.

GD Star Rating
a WordPress rating system
Tagged as: , ,