# Tag Archives: Math

## Is Nothing Sacred?

“is nothing sacred?” is spoken used to express shock when something you think is valuable or important is being changed or harmed (more)

Human groups often unite via agreeing on what to treat as “sacred”. While we don’t all agree on what is how sacred, almost all of us treat some things as pretty sacred way. Sacred things are especially valuable, sharply distinguished, and idealized, so they have less decay, messiness, inhomogeneities, or internal conflicts.

We are not to mix the sacred (S) with the non-sacred (NS), nor to trade S for NS. Thus S should not have clear measures or money prices, and we shouldn’t enforce rules that promote NS at S expense. We are to desire S “for itself”, understand S intuitively not cognitively, and not choose S based on explicit calculation or analysis. We didn’t make S; S made us. We are to trust “priests” of S, give them more self-rule and job tenure, and their differences from us don’t count as “inequality”. Objects, spaces, and times can become S by association.

Treating things as sacred will tend to bias our thinking when such things do not actually have all these features, or when our values regarding them don’t actually justify all these sacred valuing rules. Yes, the benefits we get from uniting into groups might justify paying the costs of this bias. But even so, we might wonder if there are cheaper ways to gain such benefits. In particular, we might wonder if we could change what things we see as sacred, so as to reduce these biases. Asked another way: is there anything that is in fact, naturally sacred, so that treating it as such induces the least bias?

Yes, I think so. And that thing is: math. We do not create math; we find it, and it describes us. Math objects are in fact quite idealized and immortal, mostly lacking internal messy inhomogeneities. Yes, proofs can have messy details, but their assumptions and conclusions are much simpler. Math concepts don’t even suffer from the cultural context-dependence or long-term conceptual drift suffered by most abstract language concepts.

We can draw clear lines distinguishing math vs. non-math objects. Usually no one can own math, avoiding the vulgarity of associated prices. And while we think about math cognitively, the value we put on any piece of math, or on math as a whole, tends to be come intuitively, even reverently, not via calculation.

Compared to other areas, math seems an at extreme of ease of evaluation of abilities and contributions, and thus math can suppress factionalism and corruption in such evaluations. This helps us to use math to judge mental ability, care, and clarity, especially in the young. So we use math tests to sort and assign prestige early in life.

As math is so prestigious and reliable to evaluate, we can more just let math priests tell us who is good at math, and then use that as a way to choose who to hire to do math. We can thus avoid using vulgar outcome-based forms of payment to compensate math workers. It doesn’t work so badly to give math priests self-rule an long job tenures. Furthermore, so many want to be math priests that their market wages are low, making math inequality feel less offensive.

The main thing that doesn’t fit re math as sacred is that today treating math as sacred doesn’t much help us unite some groups in contrast to other groups. Though that did happen long ago (e.g., among ancient Greeks). However, I don’t at all mind this aspect of math today.

The main bias I see is that treating math as sacred induces us to treat it as more valuable than it actually is. Many academic fields, for example, put way too high a priority on math models of their topics. Which distracts from actually learning about what is important. But, hey, at least math does in fact have a lot of uses, such as in engineering and finance. Math was even crucial to great advances in many areas of science.

Yes, many over-estimate math’s contributions. But even so, I can’t think of something else that is in fact more naturally “sacred” than math. If we all in fact have a deep need to treat some things as sacred, this seems a least biased target. If something must be sacred, let it be math.

GD Star Rating
a WordPress rating system
Tagged as: ,

## Fading Past Blocks Simulation Argument

The simulation argument was famously elaborated by Nick Bostrom. The idea is that our descendants may be able to create simulated creatures like you, and put them in simulated environments that look like the one you now find yourself in. If so, you can’t be sure that you are not now one of these future simulated people. The chance that you should assign to this possibility depends on the number of such future creatures, relative to the number of real creatures like you today.

More precisely, let P be the fraction of descendant civs that become able to create these ancestors simulations, I the fraction of these that actually do so, N the average number of ancestors simulated by each such civ per ancestor who once existed, and S the chance that you are now such an ancestor sim. Bostrom says that S = P*I*N/(P*I*N+1), and that N is very large, which implies that either P or I is very small, or that S is near 1. That is, if the future will simulate many ancestors, then you are one.

However, I will now show that this argument collapses if we allow the inclination to simulate ancestors to depend on the time duration that has elapsed between those ancestors and the descendants who might simulate them. My main claim is that our interest in the past generally seems to fall away with time faster than the rate at which the population grows with time. For example, while over the last century world population has doubled roughly every 40 to 60 years, this graph shows much faster declines in how often books mention of each of these specific past years: 1880, 1900, 1920, 1940, 1960.

Let us now include this fading past effect in a simple formal model. Let t denote a cultural “time” (not necessarily clock time), relative to which population (really a density of observer-moments) grows exponentially forever, while interest in the past declines exponentially. More formally, assume that it is already possible to create ancestor sims, that population grows as eg*t, that a constant fraction a of this population is turned into simulated ancestors, and that the relative fraction of these simulated ancestors associated with simulating a time t units into the past goes as eb*g*t. Thus for b>1 per-person interest in past people falls as e-(b-1)*g*t.

Given these assumptions, the ratio of future ancestors simulations of the current population to that actual current population is F = a*b/(b-1), and S = F/(F+1). So, for example, if at any one time 10% of people are ancestor simulations, and if interest in the past falls by 12% every time population rises by 10%, then a = 0.1, b = 1.2, and F = 0.6, giving each person who seems to be real a S = 3/8 chance of instead being an ancestor simulation. If a = 0.001 instead, then F = 0.006, and each person should estimate a S =~0.6% chance of being an ancestor simulation.

The above assumed that ancestor sims are possible and are being done now. If instead sims can’t start being created until c time units in the future, then we instead have F = a*(b/(b-1))*e(b-1)*g*c, giving an even smaller chance S of your being an ancestor simulation.

By the way, these calculations can also be done in terms of rank. If all people in history are ordered in time, with r=0 being the first person ever, and all others having r>0, then we could assume that a fraction a of people are always ancestor simulations, and that interest in past people falls as rb, and we’d again get the same result F = a*b/(b-1).

Thus given the realistic tendency to have less interest in past people the further away they are in time, and the likely small fraction of future economies that could plausibly be devoted to simulating ancestors, I feel comfortable telling you: you are most likely not an ancestor simulation.

Added 9pm: See this more careful analysis by Anders Sandberg of falling interest in year names. Seems to me that fall in interest is in fact faster than the population growth rate, even a century after the date.

GD Star Rating
a WordPress rating system
Tagged as: ,

## How Far To Grabby Aliens? Part 2.

In my last post, I recommended these assumptions:

1. It is worth knowing how far to grabby alien civs (GCs), even if that doesn’t tell about other alien types.
2. Try-try parts of the great filter alone make it unlikely for any one small volume to birth an GC in 14 billion years.
3. We can roughly estimate GC expansion speed, and the number of hard try-try steps in the great filter.
4. Earth is not now within the sphere of control of an GC.
5. Earth is at risk of birthing an GC soon, making today’s date a sample from GC time origin distribution.

I tried to explain how these assumptions can allow us to estimate how far away are GC. And I promised to give more math details in my next post post. This is that next post.

First, I promised to elaborate on how well tn works as the chance that a small volume will birth a GC at time t. The simplest model is that eternal oases like Earth are all born at some t=0, and last forever. Each oasis must pass through a great filter, i.e., a sequence of hard steps, from simple dead matter to simple life to complex life, etc., ending at a GC birth. For each hard step, there’s a (different) constant chance per unit time to make it to the next step, a chance so low that the expected time for each step is much less than t.

In this case, the chance of GC birth per unit time in a small volume is tn, with n = h-1, where h is the number of hard steps. If there are many oases in a small volume with varying difficulty, their chances still add up to the same tn dependence as long as they all have the same number of hard steps between dead matter an a GC.

If there are try-once steps in the great filter, steps where an oasis can fail but which don’t take much time, that just reduces the constant in front of tn, without changing the tdependence. If there are also easy steps in this filter, steps that take expected time much less than t, these just add a constant delay, moving the t=0 point in time. We can accommodate other fixed delays in the same way.

We have so far assumed that, one the prior steps have happened, the chance of each step happening is constant per unit time. But we can also generalize to the case where this step chance per time is a power law tm , with t the time since the last step was achieved, and with a different mi for each step i. In this case, h = Σi (1+mi). These step powers m can be negative, or fractional.

Instead of having the oases all turn on at some t=0, oases like Earth with a chance tn can instead be born at a constant rate per unit time after some t=0. It turns out that the integrated chance across all such oases of birthing a GC at time t is again proportional to tn, with again n = h-1.

A more elaborate model would consider the actually distribution of star masses, which have a CDF that goes as m-1.5, and the actual distribution of stellar lifetime L per mass m, which has a CDF that goes as m-3. Assuming that stars of all masses are created at the same constant rate, but that each star drops out of the distribution when it reaches its lifetime, we still get that the chance of GC birth per unit time goes as tn, except that now n = h-1.5.

Thus the tn time dependence seems a decent approximation in more complex cases, even if the exact value of n varies with details. Okay, now lets get back to this diagram I showed in my last post:

If the GC expansion speed is constant in conformal time (a reasonable approximation for small civ spatial separations), and if the civ origin time x that shapes the diagram has rank r in this civ origin time distribution, then x,r should satisfy:

((1-r)/r) ∫0x tn dt = ∫x1 tn (1 – ((t-x)D/(1-x))) dt.
Here D is the space dimension. D = 3 is appropriate on the largest and the small many-star scales, but D = 2 across galaxy disks, and D = 1 in filaments of galaxies. This equation can be solved numerically. The ratio of the time from an GC origin til that GC directly meets aliens, relative to universe age at civ origin, is (1-x)/x, and is shown in this table:

The x-axis here is the power n in tn, and the y-axis is shown logarithmically. As you can see, aliens can be close in the sense that the time to reach aliens is much smaller than is the time it takes to birth the GC. This time til meet is also smaller for higher powers and for more spatial dimensions.

Note that these meet-to-origin time ratios don’t depend on the GC expansion speed. As I discussed in my last post, this model suggests that spatial distances between GC origins double if either the median GC origin time doubles, or if the expansion speed doubles. The lower is the expansion speed relative to the speed of light, the better a chance a civ has of seeing an approaching GC before meeting them directly. (Note that we only need a GC expansion speed estimate to get distributions over how many GCs each can see at its origin, and how easy they are to see. We don’t need speeds to estimate how long til meet aliens.)

To get more realistic estimates, I also made a quick Excel-based sim for a one dimensional universe. (And I am happy to get help making better sims, such as in higher dimensions.) I randomly picked 1000 candidate GC origins (x,t), with x drawn uniformly in [0,1], and t drawn proportional to tn in [0,1]. I then deleted any origin from this list if, before its slated origin time, it could be colonized from some other origin in the list at speed 1/4. What remained were the actual GC origin points.

Here is a table with key stats for 4 different powers n:

I also did a version with 4000 candidate GCs, speed 1/8, and power n = 10, in which there were 75 C origins. This diagram shows the resulting space-time history (time vertical, space horizontal):

In the lower part, we see Vs where an GC starts and grows outward to the left and right. In the upper part, we see Λs where two adjacent GC meet. As you can see, for high powers GC origins have a relatively narrow range of times, but a pretty wide range of spatial separations from adjacent GC.

Scaling these results to our 13.8 billion year origin date, we get a median time to meet aliens of  roughly 1.0 billion years, though the tenth percentile is about 250 million years. If the results of our prior math model are a guide, average times to meet aliens in D=3 would be about a factor two smaller. But the variance of these meet times should also be smaller, so I’m not sure which way the tenth percentile might change.

A more general way to sim this model is to:

• A) set a power n in tn and estimate 1) a density in space-time of origins of oases which might birth GCs, 2) a distribution over oasis durations, and 3) a distribution over GC expansion speeds,
• B) randomly sample 1) oasis spacetime origins, 2) durations to produce a candidate GC origin after its oasis origin times, using tn , and 3) expansion speed for each candidate GC,
• C) delete candidate GCs if their birth happens after its oasis ends or after a colony from another GC colony could reach there before then at its expansion speed.
• D) The GC origins that remain give a distribution over space-time of such GC origins. Projecting the expansion speed forward in time gives the later spheres of control of each GC until they meet.

I’ll put an added to this post if I ever make or find more elaborate sims of this model.

GD Star Rating
a WordPress rating system
Tagged as: , ,

## Optimum Prevention

Assume you use prevention efforts P to reduce a harm H, a harm which depends on those efforts via some function H(P). If you measure these in the same units, then at a prevention optimum you should minimize P+H(P) with respect to P, giving (for an interior optimum) dH/dP = -1.  And since in general dlnX = dX/X, this implies:

-dlnH/dlnP = P/H.

That is, the elasticity of harm with respect to prevention equals the ratio of losses from prevention to losses from harm. (I previously showed that this applies when H(P) is a power law, but here I’ve shown it more generally.)

Yesterday I estimated that for Covid in the U.S., the ratio P/H seems to be around 5.3. So to be near an optimum of total prevention efforts, we’d need the elasticity -dlnH/dlnP to also be around 5.3. Yet when I’ve done polls asking for estimates of that elasticity, they have been far lower and falling. I got 0.23 on May 26, 0.18 on Aug. 1, and 0.10 on Oct. 22. That most recent estimate is a factor of 50 too small!

So you need to argue that these poll estimates are far too low, or admit that in the aggregate we have spent far too much on prevention. Yes, we might have spent too much in some categories even as we spent too little in others. But overall, we are spending way too much.

Note that if you define P to be a particular small sub-category of prevention efforts, instead of all prevention efforts, then you can put all the other prevention efforts into the H, and then you get a much smaller ratio P/H. And yes, this smaller ratio takes a smaller elasticity to justify. But beware of assuming a high enough elasticity out of mere wishful thinking.

GD Star Rating
a WordPress rating system
Tagged as: ,

## Lognormal Priorities

In many polls on continuous variables over the last year, I’ve seen lognormal distributions typically fit poll responses well. And of course lognormals are also one of the most common distributions in nature. So let’s consider the possibility that, regarding problem areas like global warming, falling fertility, or nuclear war, distributions of priority estimate are lognormal.

Here are parameter values (M = median, A = (mean) average, S = sigma) for lognormal fits to polls on how many full-time equivalent workers should be working on each of the following six problems:

Note that priorities as set by medians are quite different from those set by averages.

Imagine that someone is asked to estimate their (median) priority of a topic area. If their estimate results from taking the product of many estimates regarding relevant factors, then not-fully-dependent noise across different factors will tend to produce a lognormal distribution regarding overall (median) estimates. If they were to then act on those estimates, such as for a poll or choosing to devote time or money, we should see a lognormal distribution of opinions and efforts. When variance (and sigma) is high, and effort is on average roughly proportional to perceived priority, then most effort should come from a quite small fraction of the population. And poll answers should look lognormal. We see both these things.

Now let’s make our theory a bit more complex. Assume that people see not only their own estimate, but sometimes also estimates of others. They then naturally put info weight on others’ estimates. This results in a distribution of (median) opinions with the same median, but a lower variance (and sigma). If they were fully rational and fully aware of each others’ opinions, this variance would fall to zero. But it doesn’t; people in general don’t listen to each other as much as they should if they cared only about accuracy. So the poll response variance we see is probably smaller than the variance in initial individual estimates, though we don’t know how much smaller.

What if the topic area in question has many subareas, and each person gives an estimate that applies to a random subarea of the total area? For example, when estimating the priority of depression, each person may draw conclusions by looking at the depressed people around them. In this case, the distribution of estimates reflects not only the variance of noisy clues, but also the real variance of priority within the overall area. Here fully rational people would come to agree on both a median and a variance, a variance reflecting the distribution of priority within this area. This true variance would be less than the variance in poll responses in a population that does not listen to each other as much as they should.

(The same applies to the variance within each person’s estimate distribution. Even if all info is aggregated, if this distribution has a remaining variance, that is “real” variance that should count, just as variance within an area should count. It is the variance resulting from failing to aggregate info that should not count.)

Now let’s consider what this all implies for action biases. If the variance in opinion expressed and acted on were due entirely to people randomly sampling from the actual variance within each area, then efforts toward each area would end up being in proportion to an info-aggregated best estimates of each area’s priority – a social optimum! But the more that variance in opinion and thus effort is also due to variance in individual noisy estimates, then the more that such variance will distort efforts. Efforts will go more as the average of each distribution, rather than its median. The priority areas with higher variance in individual noise will get too much effort, relative to areas with lower variance.

Of course there are other relevant factors that determine efforts, besides these priorities. Some priority areas have organizations that help to coordinate related efforts, thus reducing free riding problems. Some areas become fashionable, giving people extra social reasons to put in visible efforts. And other areas look weird or evil, discouraging visible efforts. Even so, we should worry that too much effort will go to areas with high variance in priority estimate noise. All else equal, you should avoid such areas. Unless estimate variance reflects mostly true variance within an area, prefer high medians over high averages.

Added 3p: I tried 7 more mundane issues, to see how they varied in variance. The following includes all 13, sorted by median.

GD Star Rating
a WordPress rating system
Tagged as: , ,

## Risk-Aversion Sets Life Value

Many pandemic cost-benefit analyses estimate larger containment benefits than did I, mainly due to larger costs for each life lost. Surprised to see this, I’ve been reviewing the value of life literature. The key question: how much money (or resources) should you, or we, be willing to pay to gain more life? Here are five increasingly sophisticated views:

1. Infinite – Pay any price for any chance to save any human life.
2. Value Per Life – \$ value per human life saved.
3. Quality Adjusted Life Year (QALY) – \$ value per life year saved, adjusted for quality.
4. Life Year To Income Ratio – Value ratio between a year of life and a year of income.
5. Risk Aversion – Life to income ratio comes from elasticity of utility w.r.t. income.

The first view, of infinite value, is the simplest. If you imagine someone putting a gun to your head, you might imagine paying any dollar price to not be shot. There are popular sayings to this effect, and many even call this a fundamental moral norm, punishing those who visibly violate it. For example, a hospital administrator who could save a boy’s life, but at great expense, is seen as evil and deserving of punishment, if he doesn’t save the boy. But he is seen as almost as evil if he does save the boy, but thinks about his choice for a while.

Which shows just how hypocritical and selective our norm enforcement can be, as we all make frequent choices that express a finite values on life. Every time we don’t pay all possible costs to use the absolutely safest products and processes because they cost more in terms of time, money, or quality of output, we reveal that we do not put infinite value on life.

The second view, where we put a specific dollar value on each life, has long been shunned by officials, who deny they do any such thing, even though they in effect do. Juries have awarded big claims against firms that explicitly used value of life calculations to not to adopt safety features, even when they used high values of life. Yet it is easy to show that we can have both more money and save more lives if we are more consistent about the price we pay for lives in the many different death-risk-versus-cost choices that we make.

Studies that estimate the monetary price we are willing to pay to save a life have long shown puzzlingly great variation across individuals and contexts. Perhaps in part because the topic is politically charged. Those who seek to justify higher safety spending, stronger regulations, or larger court damages re medicine, food, environmental, or job accidents tend to want higher estimates, while those who seek to justify less and weaker of such things tend to want lower estimates.

The third view says that the main reason to not die is to gain more years of life. We thus care less about deaths of older and sicker folks, who have shorter remaining lives if they are saved now from death. Older people are often upset to be thus less valued, and Congress put terms into the US ACA (Obamacare) medicine bill forbidding agencies from using life years saved to judge medical treatments. Those disabled and in pain can also be upset to have their life years valued less, due to lower quality, though discounting low-quality years is exactly how the calculus says that it is good to prevent disability and pain, as well as death.

It can make sense to discount life years not only for disability, but also for distance in time. That is, saving you from dying now instead of a year from now can be worth more than saving you from dying 59 years from now, instead of 60 years from now. I haven’t seen studies which estimate how much we actually discount life years with time.

You can’t spend more to prevent death or disability than you have. There is thus a hard upper bound on how much you can be willing to pay for anything, even your life. So if you spend a substantial fraction of what you have for your life, your value of life must at least roughly scale with income, at least at the high or low end of the income spectrum. Which leads us to the fourth view listed above, that if you double your income, you double the monetary value you place on a QALY. Of course we aren’t talking about short-term income, which can vary a lot. More like a lifetime income, or the average long-term incomes of the many associates who may care about someone.

The fact that medical spending as a fraction of income tends to rise with income suggests that richer people place proportionally more value on their life. But in fact meta-analyses of the many studies on value of life seem to suggest that higher income people place proportionally less value on life. Often as low as value of life going as the square root of income.

Back in 1992, Lawrence Summers, then Chief Economist of the World Bank, got into trouble for approving a memo which suggested shipping pollution to poor nations, as lives lost there cost less. People were furious at this “moral premise”. So maybe studies done in poor nations are being slanted by the people there to get high values, to prove that their lives are worth just as much.

Empirical estimates of the value ratio of life relative to income still vary a lot. But a simple theoretical argument suggests that variation in this value is mostly due to variation in risk-aversion. Which is the fifth and last view listed above. Here’s a suggestive little formal model. (If you don’t like math, skip to the last two paragraphs.)

Assume life happens at discrete times t. Between each t and t+1, there is a probability p(et) of not dying, which is increasing in death prevention effort et. (To model time discounting, use δ*p here instead of p.) Thus from time t onward, expected lifespan is Lt = 1 + p(et)*Lt+1. Total value from time t onward is similarly given by Vt = u(ct) + p(et)*Vt+1, where utility u(ct) is increasing in that time’s consumption ct.

Consumption ct and effort et are constrained by budget B, so that ct + etB. If budget B and functions p(e) and u(c) are the same at all times t, then unique interior optimums of e and c are as well, and also L and V. Thus we have L = 1/(1-p), and V = u/(1-p) = u*L.

In this model, the life to income value ratio is the value of increasing Lt from L to L+x, divided by the value of increasing ct from c to c(1+x), for x small and some particular time t. That is:

(dL * dV/dL) / (dc * dV/dc) = xu / (x * c  * du/dc) = [ c * u’(c) / u(c) ]-1.

Which is just the inverse of the elasticity of with respect to c.

That non-linear (concave) shape of the utility function u(c) is also what produces risk-aversion. Note that (relative) risk aversion is usually defined as -c*u”(c)/u’(c), to be invariant under affine transformations of u and c. Here we don’t need such an invariance, as we have a clear zero level of c, the level at which u(c) = 0, so that one is indifferent between death and life with that consumption level.

So in this simple model, the life to income value ratio is just the inverse of the elasticity of the utility function. If elasticity is constant (as with power-law utility), then the life to income ratio is independent of income. A risk-neutral agent puts an equal value on a year of life and a year of income, while an agent with square root utility puts twice as much value on a year of life as a year of income. With no time discounting, the US EPA value of life of \$10M corresponds to a life year worth over four times average US income, and thus to a power law utility function where the power is less than one quarter.

This reduction of the value of life to risk aversion (really concavity) helps us understand why the value of life varies so much over individuals and contexts, as we also see puzzlingly large variation and context dependence when we measure risk aversion. I’ll write more on that puzzle soon.

Added 23June: The above model applies directly to the case where, by being alive, one can earn budget B in each time period to spend in that period. This model can also apply to the case where one owns assets A, assets which when invested can grow from A to rA in one time period, and be gambled at fair odds on whether one dies. In this case the above model applies for B = A*(1-p/r).

Added 25June: I think the model gives the same result if we generalize it in the following way: Bt, and pt(et,ct) vary with time, but in a way so that optimal ct = c is constant in time, and dpt/ct = o at the actual values of ct,et.

GD Star Rating
a WordPress rating system
Tagged as: , , ,

## Modeling the ‘Unknown’ Label

Recently I’ve browsed some specific UFO encounter reports, and I must admit they can feel quite compelling. But then I remember the huge selection effect. We all go about our lives looking at things, and only rarely do any of us officially report anything as so strange that authorities should know about it. And then when experts do look into such reports, they usually assign them to one of a few mundane explanation categories, such as “Venus” or “helicopter.”  For only a small fraction do they label it “unidentified”. And from thousands of these cases, the strangest and most compelling few become the most widely reported. So of course the UFO reports I see are compelling!

However, that doesn’t mean that such reports aren’t telling us about new kinds of things. After all, noticing weird deviations from existing categories is how we always learn about new kinds of things. So we should study this data carefully to see if random variation around our existing categories seems sufficient to explain it, or if we need to expand our list of categories and the theories on which they are based. Alas, while enormous time and effort has been spent collecting all these reports, it seems that far less effort has been spent to formally analyze them. So that’s what I propose.

Specifically, I suggest that we more formally model the choice to label something “unknown”. That is, model all this data as a finite mixture of classes, and then explicitly model the process by which items are assigned to a known class, versus labeled as “unknown.” Let me explain.

Imagine that we had a data set of images of characters from the alphabet, A to Z, and perhaps a few more weird characters like წ. Nice clean images. Then we add a lot of noise and mess them up in many ways and to varying degrees. Then we show people these images and ask them to label them as characters A to Z, or as “unknown”. I can see three main processes that would lead people to choose this “unknown” label for a case:

1. Image is just weird, sitting very far from prototype of any character A to Z.
2. Image sits midway between prototypes of two particular characters in A to Z.
3. Image closely matches prototype of one of the weird added characters, not in A to Z

If we use a stat analysis that formally models this process, we might be able to take enough of this labeling data and then figure out whether in fact weird characters have been added to the data set of images, and to roughly describe their features.

You’d want to test this method, and see how well it could pick out weird characters and their features. But once it work at least minimally for character images, or some other simple problem, we could then try to do the same for UFO reports. That is, we could model the “unidentified” cases in that data as a combination of weird cases, midway cases, and cases that cluster around new prototypes, which we could then roughly describe. We could then compare the rough descriptions of these new classes to popular but radical UFO explanations, such as aliens or secret military projects.

More formally, assume we have a space of class models, parameterized by A, models that predict the likelihood P(X|A) that a data case X would arise from that class. Then given a set of classes C, each with parameters Ac and a class weight wc, we could for any case X produce a vector of likelihoods pc = wc*P(X|Ac), one for each class c in C. A person might tend more to assign the known label L when the value of pL was high, relative to the other pc. And if a subset U of classes C were unknown, people might tend more to assign assign the label “unknown” when either:

1. even the highest pc was relatively low,
2. the top two pc had nearly equal values, or
3. the highest pc belonged to an unknown class, with c in U.

Using this model of how the label “unknown” is chosen, then given a data set of labeled cases X, including the unknown label, we could find the best parameters wc and Ac (and any in the labeling process) to fit this dataset. When fitting such a model to data, one could try adding new unknown classes, not included in the initial set of labels L. And in this way find out if this data supports the idea of new unknown classes U, and with what parameters.

For UFO reports, the first question is whether the first two processes for producing “unknown” labels seems sufficient to explain the data, or if we need to add a process associated with new classes. And if we need new classes, I’d be interested to see if there is a class fitting the “military prototype” theory, where events happened more near to military bases, more at days and times when those folks tend to work, with more intelligent response, more noise and less making nearby equipment malfunction, and impressive but not crazy extreme speeds and accelerations that increase over time with research abilities. And I’d be especially interested to see if there is a class fitting the “alien” theory, with more crazy extreme speeds and accelerations, enormous sizes, nearby malfunctions, total silence, apparent remarkable knowledge, etc.

Added 9a: Of course the quality of such a stat analysis will depend greatly on the quality of the representations of data X. Poor low-level representations of characters, or of UFO reports, aren’t likely to reveal much interesting or deep. So it is worth trying hard to process UFO reports to create good high level representations of their features.

Added 28May: If there is a variable of importance or visibility of an event, one might also want to model censoring of unimportant hard-to-see events. Perhaps also include censoring near events that authorities want to keep hidden.

GD Star Rating
a WordPress rating system
Tagged as: ,

## Constant Elasticity Prevention

While many engaged Analysis #1 in my last post, only one engaged Analysis #2. So let me try again, this time with a graph.

This is about a simple model of prevention, one that assumes a constant elasticity (= power law) between harm and prevention effort. An elasticity of 1 means that 1% more effort cuts harm by 1%. For an elasticity of 2, then 1% more effort cuts harm by 2%, while for an elasticity of 0.5, 1% more effort cuts harm by 0.5%.

Such simple “reduced form” models are common in many fields, including economics. Yes of course the real situation is far more complex than this. Even so, reduced forms are typically decent approximations for at least small variations around a reference policy. As with all models, they are wrong, but can be useful.

Each line in the following graph shows how total loss, i.e., the sum of harm and prevention effort, varies with the fraction of that loss coming from prevention. The different lines are for different elasticities, and the big dots which match the color of their lines show the optimum choice on each line to min total loss. (The lines all intersect at prevention = 1/20, harm = 20.)

As you can see, for min total loss you want to be on a line with higher elasticity, where prevention effort is more effective at cutting harm. And the more effective is prevention effort, then the more effort you want to put in, which will result in a larger fraction of the total harm coming from prevention effort.

So if locks are very effective at preventing theft, you may well pay a lot more for locks on than you ever suffer on average in theft. And in the US today, the elasticity of crime with respect to spending on police is ~0.3, explaining why we suffer ~3x more losses from crime than we spend on police to prevent crime.

Recently, I asked a few polls on using lockdown duration as a way to prevent pandemic deaths. In these polls, I asked directly for estimates of elasticity, and in this poll, I asked for estimates of the ratio of prevention to health harm loss. And here I asked if if the ratio is above one.

In the above graph there is a red dot on the 0.5 elasticity line. In the polls, 56% estimate that our position will be somewhere to the right of the red dot on the graph, while 58% estimate that we will be somewhere above that grey 0.5% elasticity line (with less elasticity). Which means they expect us to do too much lockdown.

Fortunately, the loss at that red dot is “only” 26% higher than at the min of the grey line. So if this pandemic hurts the US by ~\$4T, the median poll respondent expects “only” an extra \$1T lost due to extra lockdown. Whew.

Added 26May: Follow-up surveys on US find (via lognormal fit) median effort to harm ratio of 3.6, median elasticity of 0.23. For optimum these should be equal – so far more than optimal lockdown!

Added 1Aug: Repeating same questions now gives median effort to harm ratio of 4.0, median elasticity of 0.18. That is, they see the situation as even worse than they saw it before.

Added 22Oct: Repeating the questions now gives median effort to harm ratio of 5.2, median elasticity of 0.10. The estimated deviation between these two key numbers has continued to increase over time.

GD Star Rating
a WordPress rating system
Tagged as: , ,

## 2 Lockdown Cost-Benefit Analyses

Back on Mar. 21 I complained that I hadn’t seen any cost-benefit analyses of the lockdown policies that had just been applied where I live. Some have been posted since, but I’ve finally bothered to make my own. Here are two.

ANALYSIS #1: One the one side are costs of economic disruption. Let us estimate that a typical strong lockdown cuts ~1/3 of income of econ/social value gained per unit time. (It would be more due to harm from time needed to recover afterward, and to due to for stress and mental health harms.) If one adds 9 weeks of lockdown, perhaps on and off spread out over a longer period, that’s a total of 3 week’s income lost.

On the other side are losses due to infection. I estimate an average infection fatality rate (IFR) of 0.5%, and half as much additional harm to those who don’t die, due to other infection harms. (E.g., 3% have severe symptoms, and 40% of those get 20% disabled.) I estimate that eventually half would get infected, and assume the recovered are immune. Because most victims are old, the average number of life years lost seems to be about 12. But time discounting, quality-of-life adjustment, and the fact that they are poorer, sicker, and wouldn’t live as long as others their age, together arguably cuts that figure by 1/3. And a standard health-econ estimate is that a life-year is worth about twice annual income. Multiply these together and you get an expected loss of 3 week’s income..

As these equal the same amount, it seems a convenient reference point for analysis. Thus, if we believed these estimates, we should be indifferent between doing nothing and a policy of spending 9 added weeks of lockdown (beyond the perhaps 4-8 weeks that might happen without government rules) to prevent all deaths, perhaps because a vaccine would come by then. Or, if death rates would actually be double this estimate due to an overloaded medical system, we should be indifferent between doing nothing and spending 9 added weeks of lockdown to avoid that overloading. Or we should be indifferent between doing nothing and 4 added weeks of lockdown which somehow cuts the above estimated death rate in half.

Unfortunately, the usual “aspirational” estimate for a time till vaccine is far longer, or over 18 months. And a doubling of death rates seems a high estimate for medical system overload effects, perhaps valid sometimes but not usually. It seems hard to use that to argue for longer lockdown periods when medical systems are not nearly overwhelmed. Especially in places like the US with far more medical capacity.

During the 1918 flu epidemic, duration variations around the typical one month lockdown had no noticeable effect on overall deaths. In the US lately we’ve also so far seen no correlation between earlier lockdowns and deaths. And people consistently overestimate the value of medical treatment. Also, as death rates for patients on the oft-celebrated ventilators is 85%, they can’t cut deaths by more than 15%.

We’ve had about 6 weeks of lockdown so far where I live. A short added lockdown seems likely to just delay deaths by a few months, not to cut them much, while a long one seems likely to do more damage than could possibly be saved by cutting deaths.

Of course you don’t have to agree with my reference estimates above. But ask yourself how you’d change them, and what indifferences your new estimates imply. Yes, there are places in the world that seem to have done the right sort of lockdown early enough in the process to get big gains, at least so far. But if your place didn’t start that early nor is doing that right sort of lockdown, can you really expect similar benefits now?

ANALYSIS #2: Consider the related question: how much should we pay to prevent crime?

Assume a simple power-law (= constant elasticity) relation between the cost H of the harm resulting directly from the crimes committed, and the cost P of efforts to prevent crime:

H = k*Pa,  or  dln/ dlnP = –a ,

where a is the (positive) elasticity of harm H with respect to prevention P. To minimize total loss L = H + P, you set P = (k*a)1/(1+a), at which point we have a nice simple expression for the cost ratio, namely P/H = a.

So, when you do it right, the more effective is prevention at stopping harm, then the larger is the fraction of total loss due to prevention. If 1% more prevention effort cuts 1% of crime, you should lose about the same amounts from harm and prevention. If 1% more prevention cuts 2% of crime, then you should lose twice as much in prevention as you do in harm. And if it takes 2% more prevention effort to cut 1% of crime, you should lose about twice as much in harm as you do in prevention.

This model roughly fits two facts about US crime spending: the elasticity is less than one, and most loss comes from the crimes themselves, rather than prevention efforts. Typical estimates of elasticity are around 0.3 (ranging 0.1-0.7). US governments spend \$280B a year on police, courts, and jails, and private security spends <\$34B. Estimates of the total costs of crime range \$690-3410B.

Now consider Covid19 prevention efforts. In this poll respondents said 3.44 to 1 that more harm will come from econ disruption than from direct health harms. And in this poll, 56% say that more than twice the loss will come from econ disruption. For that to be optimal in this constant elasticity model, a 10% increase in lockdown, say adding 12 days to a 4 month lockdown, must cut total eventual deaths (and other illness harm) by over 20%. That seems very hard to achieve, and in this poll 42% said they expect us to see too much econ disruption, while only 29% thought we’d see too little.

(More on Analysis #2 in the next post.)

In this post I’ve outlined two simple analyses of lockdown tradeoffs. Both suggest that we are at serious risk of doing too much lockdown.

10am: On reflection, I changed my estimate of the lockdown from 25% to 27% of income, and my estimate of non-death harm from as-much-as to half-as-much-as the death harm. So my reference added shutdown duration is now 4 months instead of 6.

12pm: Even if recovery gave immunity for only a limited period, then as long as you were considering lockdown durations less than that period, the above calculation still applies, but now it applies to each such period. For example, if immunity only lasts a year, then these are annual costs, not eventual costs. And that’s only if infection chances are independent each period. If, more likely, it is the same people who at more at risk each year, then in later years gains from lockdowns decline.

29Apr, 3am: We are now at 73 comments, and so far all of them are about analysis #1, and none about analysis #2. Also, tweet on #1 got 18 retweets, tweet on #2 got none.

29Apr, 1pm: In two more polls. over half estimate a 10% increase in lockdown duration gives <5% decrease in deaths, for both world and US. Instead of the >20% that would be required to justify allowing twice the damage of lockdowns as health harms. See also results on the cost of masks.

28May:  I’ve updated the numbers a bit.

22Oct: This analysis from March 22, based on happiness, also suggests far more harm from the economy dip than from deaths. And I confirm my analysis with more recent estimates here.

23Oct: I’ve just shown that the above condition that =dln/ dlnP = P/H holds for any function H(P).

GD Star Rating
a WordPress rating system
Tagged as: , , , ,

## Beware R0 Variance

The big push now re Covid19 is to use “social distancing” to cut “R0”, the rate at which infection spreads. More precisely, R0 is the average number of other people that one infected person would infect, if they were not already infected. With no efforts to reduce it, estimates for natural R0 range from 2 to 15, with a best estimate perhaps around 4. The big goal is to get this number below 1, so that the pandemic is “suppressed” and goes away, and stays away, until a vaccine or other strong treatment, allowing most to escape infection. In contrast, if R0 stays above 1 we might “flatten the curve”, so that each infected person can get more medical resources when they are sick, but soon most everyone gets infected.

Apparently even with current “lockdown” efforts, all of 11 European nations studied now still have best estimate R0 over 2, with a median ~3.7. So they must do a lot more if they are to suppress. But how much more? My message in this post is that it is far from enough to push median R0 down below 1; one must also push down its variance.

Imagine a population composed of different relatively-isolated subpopulations, each with a different value of R0. Assume that few are infected, so that subpopulation pandemic growth rates are basically just R0. Assume also that these different R0 are distributed log-normally, i.e., the logarithm of R0 has a Gaussian distribution across subpopulations. This is (correctly) the usual distribution assumption for parameters bounded by zero below, as usually many small factors multiply together to set such parameters. The total effective R0 for the whole population is then found simply by integrating (via a lognormal) the effective growth over R0 subpopulations.

For example, assume that the R0 lognormal distribution has log mean (mu) -2 and sigma 1. Here the mode of the distribution, i.e., the most common R0 number, is 0.05, the median R0 is 0.14, only 5% of subpopulations have R0 above 0.70, and only 2% have R0 >1. Even so, if each of these subpopulations maintain their differing R0 over ten infection iterations, the mean growth factor R0 of the whole population is 20 per iteration!

As another example (for log mean -1, sigma 0.5), the R0 mode is 0.29, the median is 0.37, only 5% of subpopulations have an R0 over 0.85, only 2% have R0>1. Yet over ten infection iterations maintaining these same R0 factors per subpopulation, the mean growth factor R0 of the whole population is 1.28 per iteration. That is, the pandemic grows.

Of course these growth numbers eventually don’t apply to finite subpopulations, once most everyone in them gets infected. Because when most of a population is infected, then R0 no longer sets pandemic growth rates. And if these subpopulations were completely isolated from each other, then all of the subpopulations with R0<1 would succeed in suppressing. However, with even a modest amount of interaction among these populations, the highly infected ones will infect the rest.

The following graph tells a somewhat more general story. On the x-axis I vary the median value of R0 among the subpopulations, which sets the log-mean. For each such value, I searched for the log-sigma of the lognormal R0 distribution that makes the total average R0 for the whole population (over ten iterations) exactly equal to 1, so that the pandemic neither grows nor shrinks. Then on the graph I show the standard deviation, in R0 terms, that this requires, and the fraction of subpopulations that grow via R0>1.

As you can see, we consistently need an R0 standard deviation less than 0.21, and the lower the median R0, the lower a fraction of subpopulations with a positive R0 we can tolerate.

So, as long as there is substantial mixing in the world, or within a nation, it is far from enough to get the R0 for the median subpopulation below 1. You also need to greatly reduce the variation, especially the fraction of subpopulations in which the pandemic grows via R0>1. For example, when the median R0 is 0.5, you can tolerate less than 3% of subpopulations having an R0>1, just to hold the pandemic at a constant overall level. And to suppress in limited time, you need to go a lot further.

Different subpopulations with differing R0 seems plausible not just because our world has different nations, classes, cultures, professions, industries, etc., but because Covid19 policy has mostly been made at relatively local levels, varying greatly even within nations. In addition, most things that seem log-normally distributed actually have thicker than-lognormal tails, which makes this whole problem worse.

All of which is to say that suppressing a pandemic like this, with high R0 and many asymptomatic infected, after it has escaped its initial size and region, is very hard. Which is also to say, we probably won’t succeed. Which is to say: we need to set up a Plan B, such as variolation.