46 Comments

Wow, I missed this conversation by a year. Good comments, and I think those who defended "no one can know" got part of what I was saying.

The other part was a simple reminder that "peak oil" cannot be directly measured. All we have as measurable data are price and current production data. The next step is always an extrapolation based upon an assumption. One starts, for instance, with the assumption that Hubbert's method will hold for world production, and that a calculation done today will yield an accurate "high production" and "high production date."

How do you put error bars on that assumption, that Hubbert's method, a heuristic, will hold?

(And I might also comment that in the year since this post, the "Hubbert's date" for Peak oil has moved and argued again and again.)

Expand full comment

EliezerI agree, I was only using the example as something we can all agree we can't know. If you would like an unarguable example where the distribution as well as the expected value is unknowable, how-about the number of intelligent life forms in a galaxy outside our light cone? My point was really that there are a range from things that we can know well to things that we can't know at all. But when we get a distribution from someone how do we know how well or how much it is underpinned by real knowledge?

If we look at, say, the global warming predictions, we get a range in possible rises in average temperatures - I have heard from 3 to 6 deg C. But how much faith should we put in this distribution? Clearly it is of worse quality than if the same distribution was provided for the temperatures in New York tomorrow. How could we "measure" or otherwise agree on this quality factor? Could the measure include whether the model that produced the distribution can be tuned by real feedback or not.

Expand full comment

ChrisA, "No one can know next week's lottery numbers" is what fools say when they buy tickets; "You can't know I *won't* win." Lottery numbers are something we have very exact probability distributions over, and anyone who departs from this probability distribution is fooling themselves and losing money in the process. Computing the expectation, or any other simple derived quantity, would be no trouble at all. "I don't know, and no one can know" is a very dangerous thing to say about a lottery ticket; "That ticket has an exactly 1 in 28,203,400 chance of winning and if you bet at other odds you will lose money" is much more helpful to say to someone considering buying a ticket.

Expand full comment

Russ,

Sometimes there are technical problems with assigning equal odds to all outcomes. If I know that a cube factory produces cubes with edges that vary between 1 and 2 units of length, then I also know that the factory produces cubes whose volumes vary between 1 and 8 units cubed. Suppose that's all I know about the factory. If my probability distribution for lengths is uniform, then my probability distribution for volumes is not uniform, and vice versa. For example, if I think that there's a 0.5 chance that a randomly selected cube will have edge length > 1.5 units, then I must think that there's a <0.5 chance that a randomly selected cube will have volume > 4 units cubed. I cannot consistently assign equal odds to all possible lengths, and assign equal odds to all possible volumes.

We might think this sort of a case is a good candidate for an "I don't know" response. If somebody else has the same information that I do, and he says his estimate that a randomly selected cube has an edge of length >1.5 is 0.5, I might say I think he's doing something wrong. Namely, he's decided to be uniform over length rather than volume for no reason. I would have the same objection if he had a degree of belief 0.5 that the volume of a random cube would be >4. You might think that in this situation, it would be inappropriate to have precise degrees of belief about the expected length, or the expected volume, of a random cube in the factory.

If I were asked what my expected length for a random cube in the factory was, I think I'd just say that I expect that the length of a cube is between 1 and 2. I don't think it would be reasonable to offer any particular expected length between 1 and 2.

Expand full comment

Interesting thread.

ChrisA: The intent of having a "no one can know" option is interesting. Its just going to be very difficult to practically implement or realistically price. Nice theoretical concept though.

My simplistic view is that "I don't know" is equivalent to assigning equal odds to all outcomes. Solves Robin's scoring rule and sign challenge.

Expand full comment

Just to note that there is a difference between saying "I don't know" and "no-one can know". The "I don't know" statement can be made for many different reasons and I agree is not terribly helpful in moving the debate forward. The "no-one can know" statement is more useful - it puts the problem into a particular class. There are things that everyone can agree that no-one can know the answer to - next week's lottery numbers for instance, but there is a range from this to case of absolutely known probability function. An estimate of next's weeks lottery numbers should be met by a response of "no one can know". You can have a very useful debate about whether something can be partly or not all estimated or known.

Any prediction market which is trying to estimate a "no-one can know" factor should be recognisable by having a very flat distribution, but not necessarily. So perhaps prediction markets should routinely include a "no one can know" option. This would pay off if the prediction market result was false (say outside a sigma or two). If this option is attracting a lot of money - it perhaps would say that the prediction made by the market is not very useful.

Expand full comment

My claim then, is twofold:

"You're wrong" automatically implies that some expected value of some random variable is different. In the absence of explanation, there are generally obvious default ones such as expected utility.

"We just don't know" has several possible rational interpretations as used in rhetorical discourse as mentioned above, any one of which may be meant.

It is of course preferable to fully explain what you mean in the course of subsequent argument, but I do not think that the summarized statement is "worse than useless."

It is also possible that the person making the argument means it in an irrational form; however, it seems to me that you stated that we're already assuming that the participants are both rational and can agree on what random variables to measure. If you insist on translating the rhetorical into that particular irrational meaning, then I must agree with you that it is an impermissible statement. If that's what your entire argument is, then I totally agree with you.

But as a statement about actual argumentation and rhetoric, I disagree, as outlined above.

Expand full comment

"some expected value of yours" means the expected value of some random variable given your probability distribution. I don't see why you would think that I am "assuming that the people discussing are already considering *all* possible alternate course of actions," nor why you think it reasonable to interpret "you are wrong because we don't know" as "let's change the subject and talk about something else.""

In that case, I think that your original phrasing of the post was essentially useless and misleading. If someone says "you are wrong," then it's an entirely natural consequence to assume that they're saying that expected utility (or whatever) is lower by taking the other person's advice. According to the trivial interpretation of your argument, that's a "sign of the bias" right there. It thus seems to me useless to insist that someone who says "you're wrong" in *any* way provide a "sign of the bias of some random variable given a probability distribution." Merely by saying "you're wrong," we can assume that they're claiming that expected utility would be maximized by taking some other option.

I agree that the person claiming "you're wrong" has an obligation to offer an alternate course of action, but if one is not mentioned I believe that it is reasonable to assume that the alternative is the status quo to the position offered by the other party. Thus it seemed to me that your objection is essentially contentless and redundant, at least in this trivial fashion.

Since it's obvious to me that any claim that someone is wrong involves a claim that their expected utility claims are wrong and do not actually maximize utility, to me the natural interpretation of your original point "tell me the sign of the bias" was a claim that someone had to offer a point of dispute with the original new piece of information offerred by the other party-- i.e., dispute the price of oil itself or its distribution. I see that that's not what you meant.

"That's a signed bias on the probability mass for that set of points - too much, rather than too little. In plain-language strategic thinking, this would come out as "You're focusing too much on what you think is the most likely outcome, and not thinking about possible exceptions.""

Yes absolutely, but my point, as above, is that it's "worse than useless" pedantry to attempt to claim that "you are wrong," especially as used in casual rhetoric, does not automatically assume a claim that *some* sort of expected value (esp. expected utility) is different. Again, in casual rhetoric, I believe that a reply of "show me where my bias is" is generally going to be taken as a request to focus on the particular measurements of the random variables brought up originally, especially on focusing on the original expected value offered, such as a price of oil.

I think fundamentally we agree that "you're wrong" must rationally contain a disagreement about some expected value of some random variable. Where we disagree is in what the natural interpretations of rhetorical discourse are. To me, "you're wrong, because we just don't know" should automatically indicate that I disagree about some expected value of a random variable and prefer some alternate course to the one you're suggesting-- if not explicitly stated, then the status quo. It may be that the point you're attempting to make is so trivially obvious to me that to demand that people state it in rhetorical discourse seemed useless and redundant, so I incorrectly searched for an alternate meaning. To me, "you're wrong, because we just don't know" is not "worse than useless," because I automatically interpret "you're wrong" as implying the necessary claim that *some* expected value is different and then interpret "because we just don't know" as an imprecise statement of one of several possibilities.

One possibility is a statement that the other person's variance is wrong, or that the first person is not considering other alternatives or that the random variable brought up by the first person had a random variable with poorly understand states that cause it, making it difficult to hedge against, unlike other alternatives. To me it is also reasonable to assume that the first person may have made one of several very common mathematical errors, especially that of deriving variables which are non-linear measurable functions of estimated random variables by using only the expected value of the estimates rather the entire distribution. In particular, if the function is complicated enough, then it can be too computationally difficult or intensive to properly derive the variable of interest (such as utility) using the entire distributions of the estimated variables. In which case, "we just don't know" also has, to me, a natural interpretation of "in order to perform this calculation given our resources, we must make many simplifying assumptions (often including using expected or maximum likelihood values for estimated parameters). However, the equation of interest is highly non-linearly dependent on the estimated parameters and if the calculation were performed correctly without such mathematical simplifications, the results would be different."

Considering how poorly understood this mathematical point is, I think it's a reasonable interpretation. (For example, much statistical literature recommends that people use an estimate for a standard deviation of a population based on a sample by using an estimate which is not unbiased, but rather the square root of an unbiased estimator for the variance, which is not the same, not that many people realize it. The entire concept of the unbiased estimator has problems in certain probability distributions and so too do maximum likelihood estimators in certain situations.)

Expand full comment

John, "more variance than you anticipate" -> "You are assigning too much probability density to the mean/mode/center of your distribution." That's a signed bias on the probability mass for that set of points - too much, rather than too little. In plain-language strategic thinking, this would come out as "You're focusing too much on what you think is the most likely outcome, and not thinking about possible exceptions."

Expand full comment

John, "some expected value of yours" means the expected value of some random variable given your probability distribution. I don't see why you would think that I am "assuming that the people discussing are already considering *all* possible alternate course of actions," nor why you think it reasonable to interpret "you are wrong because we don't know" as "let's change the subject and talk about something else."

Expand full comment

In practical argument, I feel that "we just don't know" also has a high chance of meaning "you haven't properly considered the risks involved, you've only made a lot of assumptions and then done calculations based on your expected value for each assumption, when in reality if you included your full distribution at each step, you would obtain a resultant expected value which would give a very different answer than the one you've given."

People very frequently take random variable X, find its expected value, and then proceed to estimate X^2 or all sorts of other derived variables by pretending that E[X^2] = (E[X])^2. But that's not true in general. Once you start throwing lots and lots of variables into the calculation, you can get some tremendously different answers.

Expand full comment

I don't see how you could give any info about your probability distribution which does not implicitly tell us something about the sign of some expected value of yours. Can you find a counterexample?

What do you mean by "some expected value of yours?" Do you mean the expected value of some function of the random variable of interest, such as one of its moments?

I gave one counterexample above of random variables with different distributions but the same expected value. Such a thing is trivial. Of course, yes, they do have different moments. (Expected value of the random variable raised to different powers.) I'm sorry, I don't really understand what you mean by "some expected value of yours;" are you saying that one should able to give the difference in the expected value of some function of the original random variable? (Forgive me for coming at this from the perspective of a probabilist.)

Of course, we can have random variables identical in distribution, but different because they have different values on different events. These leads to different values of cross-correlations and various joint statistics. If you expand to include all expected values of all possible random variables, including random variables not originally of interest, then I don't suppose anyone could disagree.

However, I don't think that was how you originally phrased the problem. You talked about people not being able to say "we just don't know" or "we don't have enough information" without disagreeing about the expected value (or, as we've gone to, the distribution.) If it's merely a case of me feeling that your original language was imprecise, then I apologize.

I grant that there are natural meanings of "we just don't know" that are precisely as you describe, and for which your objection is absolutely natural. However there are, in my opinion, some entirely natural meanings of "we just don't know" that do not warrant your dismissal.

The first is a reply of "we just don't know the events which lead to the event very well, making it difficult to find other well-correlated events, making it difficult to hedge."

The second is a replay of "we just don't know a great deal right now, but our information will improve in the future, so we should wait."

Now, in both cases it is of course possible to find some scalar expressing a belief, and that scalar will have an expected value which is different from what has previously been discussed. In both cases, however, the argument introduces a different random variable, a different statistic, or a different possible course of action than those that have previously been discussed. It is a way of changing the subject. I feel that "we just don't know" is an acceptable way to introduce the argument "even if I grant yours as the best estimate for that particular problem, your figures contain a considerable amount of uncertainty and thus risk if we pursue your course of action; let me shift your attention to an alternative course of action for which we are able to hedge quite effectively and thus can guarantee doing some good."

I feel that you've stipulated the problem into something unrealistic, such as assuming that the people discussing are already considering *all* possible alternate course of actions. Not only would, IMO, the average person take your "sign of the bias" comment to mean that someone should offer a sign of the bias in the original prediction offered (rather than some measurable function of the original random variable or a complicated cross-correlation argument), quite often people make arguments having only considered a finite number of courses of action, or feel it sufficient to demonstrate that their proposal is better than the status quo but not all possible alternatives. Alternatively, it is extremely common to find arguments which deal with only the expected result based on current best forecasts, and which have failed to consider the risk or uncertainty involved. In the latter, a "tell me the sign of the bias" type comment will normally be interpreted as a request to offer a better expected value-- such as a better expected price of oil, to use your original example, *not* a request to explain that due to a larger than expected variance, the risk premium is particularly high for the recommended course of action and thus an entirely different course of action that combines two hedging strategies (which may include waiting an amount of time until information improves, since committing now to a radical course of action carries high risk premium) and thus the expected utility of the originally recommended action is different. Most people, I reckon, would feel that the second argument, rather than best encapsulated by "tell me the sign of the bias of my estimate of the price of oil," would be best summarized by "we just don't know what the price of oil will be, so your suggestion is risky."

But YMMV.

Expand full comment

John, I wasn't talking about investment strategies in particular. All said that if you have a complaint about someone else's expected value of something, you should at least give a sign of some expected value of yours, relative to what you think they said. I didn't say which random variable you should talk about; I explicitly listed several possibilities. I don't see how you could give any info about your probability distribution which does not implicitly tell us something about the sign of some expected value of yours. Can you find a counterexample?

Expand full comment

"You can't say something about your probability distribution without making a claim about an expected value; after all, the set of all expected values determines a probability distribution."

Consider the infinite family of probability distributions P_x where P_x(x+2) = P_x(-x) = 1/2 for all x > 0. Let the random variables A_x have distribution P_x for all x > 0. Clearly E[A_x] = 1 for all x.

Now, obviously if you have *all* the other moments, or otherwise know the distribution via some of the ways I pointed out above, then you can specify the distribution.

Again, there's still room for pointing out when we do or do not know enough to hedge. Let random variables B_x and C_x also have the same distributions. Let B_x(\omega) = -A_x(\omega) for all \omega \in \Omega of the probability space. I.e., A_x and B_x are perfectly anti-correlated. Let C_x be uncorrelated with both A_x and B_x.

Then, it is perfectly legitimate for someone to argue that an investment strategy should avoid C_x and invest equally in A_x and B_x in order to hedge and have 100% chance of expected value 1 (and avoid any loss), since "we don't know enough about C_x." We know its *distribution* perfectly, but still not enough about the random variable.

Expand full comment

"You can't say something about your probability distribution without making a claim about an expected value; after all, the set of all expected values determines a probability distribution."

Okay, now you're confusing terminology and changing your question again. What are you using "the set of all expected values" to mean? Certainly knowing the expected value of a single random variable does not determine the probability distribution of the random variable. That's a totally wrong statement.

On the other hand, if you mean by "the set of all expected values" the set of the expected values of all measurable functions of the random variable, then yes, that certainly determines the probability distribution. Similarly, knowing the value of the random variable on all members of the probability space, or all expected values of the random variable with the domain restricted to each of the members of the σ-algebra of the probability space, or the probability of the random variable lying in each Borel set (or even just each set unbounded below and bounded above only by each constant) determines the distribution. Also, I suppose that you could mean by "all the expected values" knowing all the moments, or, equivalently, the moment generating function.*

*-- Technically, knowing the last two formulations is only enough to know the probability distribution up to equality in distribution, which is not enough to specify the random variable completely, and which could be particularly important in the case of cross-correlations with other random variables. A and B might be equal in distribution, but for hedging purposes it is important whether they are perfectly correlated, perfectly anti-correlated, or whatever.

I know of very few papers or meta-analyses that give the full moment generating function of the random variable that they study, or otherwise completely specify the distribution. Instead, they tend to give the best possible guess, whether mean, median, or even mode, sometimes a confidence interval or Bayesian credible interval, usually assuming or implying a normal distribution of some sort.

However, in many of those cases the normal is implied by the fact that the average result of the experiments over many trials tends to a normal distribution by the Central Limit Theorem, not because the distribution in question being studied is actually a normal distribution.

I will grant, surely, that any case where someone has fully specified the probability distribution, including by giving the expected value of all integer powers of the random variable in question, that the distribution is fully specified for the random variable and indeed for all measurable functions of it-- provided that we aren't interested in cross-correlations, since this only specifies the distribution, but not correlations with other random variables that might be of interest for hedging purposes.

Expand full comment

I think that, in addition to all the other arguments, we've already seen the argument that, for example, a disagreement might be about a distribution and not an expectation value, or the disagreeing party might hold that only trivial predictions are possible. Anyway, I'm dropping out at this point.

Expand full comment