34 Comments

It's tautolgical that if only those with quality signal voted, we'd get a better outcome. But in real life we don't know the quality of our signal. I don't understand the need for the power law - what does that add to the analysis?

Expand full comment

Re: "Large voter turnouts seem to me better understood as overconfidence leading to disagreement – we each think we just know better than others what is good for society".

You think voters vote in order to improve society? How about the thesis that voters vote because they are manipulated into doing so by politicians?

Expand full comment

aj, my analysis ignored voting costs, and even when people want to vote in my model for most the benefit in terms of increasing the probability of the better candidate winning is very small.

Expand full comment

Actually, if you take this analysis the other way -- doesn't it provide an argument for why (at least in some circumstances) it is worthwhile to vote on a purely individual basis?

Isn't the conventional "economic" argument that it's not worthwhile for any individual to vote because they'll almost never be the one vote that puts the better candidate over the line? Whereas this analysis indicates that (in many cases, but obviously depending on the distribution of q) no matter how many other people are voting, your participation actually increases the odds of a socially optimal outcome, even if every other voter is smarter than you?

Expand full comment

aj, your variation gives an asymptotic power of zero, so as my analysis predicts it favors having everyone vote. Also, see my added to the post.

Expand full comment

This seems to be biased by two things -- first, as has been noted, the top ranked voters are assumed to be much smarter than the remaining 99.9% of the population and it's going to take a lot of distributed information to overcome that, and second that the information held by the lowest ranked voters disappears to insignificance very, very quickly, which stops it from ever doing so.

If you tweak the formula to assume *everyone* has some definite, better than even chance of voting for what's good for society, so that q = q1*(r^n)+c, where q1,n,c are constant and r is the ranking, no matter how small c is, you eventually get more voters improving the odds of selecting the socially optimal candidate again. For q1=0.1, n=1 and c=0.001 (thus giving a minimum probability of 50.05% of selecting the best candidate), the best voter has a 55.05% chance of getting the right outcome, then it's downhill until you have 475 voters and a 52.10367% chance, then up-hill after that, getting back to 55.0503% at 14013 voters and continuing to increase thereafter. This is when only the top 100 voters have a better than 50.1% chance of picking the socially optimal candidate, everyone else varies between that and 50.05%, but at least they vary independently.

Self-interest probably works fine as a proxy for q too: the socially optimal candidate is socially optimal because he or she benefits lots of people, so they're likely to benefit any particular individual -- so if you have a signal q' for a candidate that helps you, you probably have a signal q=q'/?? for the socially optimal candidate.

Expand full comment

I think that what may have led Robin to his distribution is that elections are often nearly evenly split. If you suppose that voters have at least a 50 percent chance of being right, then to explain this, you must further suppose that nearly all voters have just a tiny bit over a 50 percent chance of being right. You would probably end up with Robin's model.

A simple solution would be to suppose that some voters have a less than 50 percent chance of being right. But I don't think that would save this model, because it isn't modeling the fact that politics is an adaptive system. Whatever the dispositions of the voters are, political parties adjust the alternatives offered until the expected vote is again split 50/50. Voters are polarized into 2 groups (in the US), which self-adjust to each claim about 50% of the voters. And if you believe that your group is right, then it is always rational to vote - even on issues you don't understand, or for candidates you've never heard of.

Expand full comment

Robin - Caplan clearly didn't have the right slogan. I'm thinking 'Rock The Ignorance!'

Or possibly 'Vote and Die!'

Expand full comment

Expert Political Judgment by Phillip Tetlock seems like a relevant book on the topic of how competence relates to getting questions right. It's worth noting that all the experts did MUCH better than Berkley undergraduates. The outliers among the experts were pretty strikingly not from the same distribution as the bulk of experts, though all were worse than sophisticated algorithms, though unlike algorithms the experts can ask questions, not just answer them.

Expand full comment

I wasn't very clear. Robin isn't giving the probability a power-law distribution; he is basing it on a power-law distribution.

Also, when I spoke of using a log-normal distribution for IQ, this is misleading. While it may be true that you can fit some of the data better with a log-normal distribution, this log normal distribution will be so close to a normal distribution, that if you plotted it and showed it to a statistician, he would call it a normal distribution.

Skills usually have approximately normal distributions, because they are the combination of a large number of random factors. You can sometimes use a log-normal distribution to account for a skew caused when the distribtion is bounded below but not above.

Now, how does a skill, with a nearly normal distribution, map into a probability of voting correctly? Looking at our usual data, such as times in running the 100m dash, might be problematic because it isn't clear whether to consider times bounded or unbounded.

So I plotted the 1273 scores in the Netflix competition. These scores are in root-mean square error of guessed movie ratings, and theoretically range from 0 to about 1.05 (what you get if you guess the average value for each rating). In practice, the lower bound can be approximated by having a person try to guess their own ratings for movies that they already rated in the past; based on one experiment, this lower bound is about .79.

This is a pretty good substitute for a probability, because the RMSE is closely related to the probability of guessing the right rating. (It is sqrt(9*p(off by 3) + 4*p(off by 2) + p(off by 1)).)

Sadly, I can't post the picture of the plot here, but I can tell you what the histogram looks like. It occupies the range .864 to .951, with a mode of about .905. Most of the mass is between .90 and .93, with a rather sharp drop-off down towards .86. The end with the lowest-RMSE scores (corresponding to our highest-ranked voters) is nearly flat in the histogram. In other words, it looks vaguely normal, but heavy in the range .9 to .945. As with all skill-based scores, in this distribution, there are a few people who are very bad, and a few (fewer) who are very good, with most being in the middle.

This is in contrast to Robin's proposed distribution, in which almost everyone is very bad. That distribution is extremely sharp; it looks like an L if you plot p(correct) vs. rank, for all parameters I have tried. That is not a realistic distribution. And it is that unrealistic distribution (and possibly some issue with how Robin handles variance) that leads directly to the conclusion that almost everyone shouldn't vote.

Expand full comment

Mike makes a good point in bringing up correlation. If you assume that voters are uncorrelated, most models will probably conclude that everyone should vote. The main problem with uninformed voters is when their votes are correlated (because of eg. advertisements, cultural biases, or systematic errors in reasoning).

Since your model doesn't mention correlations, and yet comes up with small numbers of voters being optimal, I have to continue to suspect that you aren't taking variance into account correctly.

Expand full comment

Phil, I most certainly am explicitly taking variance into account. Many things that can vary by large magnitudes are distributed as power laws, and I stand by my claim that "log-normals with a wide variance look like power laws over their mid-range." Mid-range is less than one standard deviation. I disagree that I need to assume folks identify rank precisely; we need only posit voters have a decent idea of how they rank. If you arbitrarily assume that the info of the top M folks is no better than that of rank M in my model, you will only ensure that at least M folks must vote; the rest will be the same.<ul><li>Perhaps you could post more of the model, so we can see how variance is taken into account, and how you compute the numbers for cases 1 and 2.<li>You still haven't explained why you want to use a power law, beyond saying that many things that can vary by large magnitudes are expressed as power laws.<ul><li>The quantity you are computing is a probability that ranges from .5 to 1. It cannot vary by large magnitudes. Some related quantity, something like Eliezer's "optimization power" measurement, might vary according to a power law; but the probability, being bounded below and above, is not a good candidate.<li>Even if you were looking at an underlying measure of "voting intelligence", rather than the probability, standard practice is to use a normal distribution for this kind of thing. Only radicals like me and Mike Vassar use a log-normal.<li>Using a power law is what gives you your result. Your entire case boils down to the claim that voting intelligence (actually, the probability of choosing correctly, which is an even more extreme claim) has a power-law distribution. There's no point discussing anything else until that is cleared up, because your point is that your model says fewer people should vote, and I believe it says that because you're using a power law.<li>The mid-range is not the problem. The top end, rank 1-10 or so, is the problem.</ul><li>For case 1, you do need to identify rank exactly. You couldn't say that everybody but the top-ranked voter should abstain, unless everyone knew who the top-ranked voter was. If you were likely to end up with voter #5 instead of voter #1, you would be better off taking the (estimated) top several voters.</ul>

Expand full comment

I am curious about the magnitude of the social benefit of abstaining in this model. It seems like they only remove a very small amount of noise, so I am guessing it doesn't make a huge difference in terms of the probability of choosing the correct candidate.

In reality, I am more concerned about voters who are biased because they are uniformed. You could examine that by allowing for more correlation in the less informed voters. Maybe a model where some signals are common and some are rare, so that uninformed voters are more likely to have common signals. I am guessing you would get similar results about abstaining, but there would be a higher social cost to uninformed voting.

Expand full comment

Phil, I most certainly am explicitly taking variance into account. Many things that can vary by large magnitudes are distributed as power laws, and I stand by my claim that "log-normals with a wide variance look like power laws over their mid-range." Mid-range is less than one standard deviation. I disagree that I need to assume folks identify rank precisely; we need only posit voters have a decent idea of how they rank. If you arbitrarily assume that the info of the top M folks is no better than that of rank M in my model, you will only ensure that at least M folks must vote; the rest will be the same.

Expand full comment

Robin, in the first case, log-normals are not power laws. I don't know how you define "midrange", but power laws dramatically emphasize the impact of the first few ranked items; log-normals do not, unless you set the distribution's variance too high. They should give very different results in this case. And since we always use either a normal or a log-normal distribution for IQ, why on earth did you choose to use a power-law distribution for voter acuity? You need a very strong justification as to why you chose a power law, and haven't given any.

If you used the more-reasonable log-normal model, and interpreted (1+q)/2 as an estimate of the person's expected probability of being correct on any given issue (that is, a voter is not described by a probability, but by a probability distribution), and you used reasonable parameter values, you would get very different results, invalidating the point you are making with this post.

This LISP code will help demonstrate the behavior of your model:

(setq power -1)(setq q1 .1)(defun q (r q1 power) (* q1 (expt r power)))(defun p (r) (/ (+ 1 (q r q1 power)) 2))(defun genlist(r)(cond ((eq r 1) (list (p 1)))(t (append (genlist (- r 1) ) (list (p r))))))

Now do this:> (genlist 20) ;; list p(correct) for top 20 voters(0.55 0.525 0.51666665 0.5125 0.51 0.5083333 0.50714284 0.50625 0.50555557 0.505 0.50454545 0.50416666 0.50384617 0.50357145 0.50333333 0.5031250.5029412 0.50277776 0.5026316 0.5025)

The top 20 voters are all pretty stupid. Say we have 100,000,000 voters turn out:> (p 1000000)0.50000006

According to these parameters, the one-millionth best voter (top 1%) votes better than random only 6 out of 100 million times. This is why their votes are useless in your calculations.

If you set q1 to the maximum, which is 1, you of course want only 1 voter, because the top voter is right all the time. But look:> (setq q1 .99)> (p 1000000)0.5000005

Now the voter at the 1% boundary is better than random 5 times in 10 million. With power = -1, you get this same basic result no matter what q1 is.

Now let's try a different exponent:> (setq power -.1)> (p 1000000)0.6243384> (genlist 20)(0.995 0.96185136 0.94349945 0.9309225 0.9214133 0.91379964 0.907469750.9020649 0.8973571 0.89319247 0.8894627 0.8860887 0.8830106 0.880182740.8775688 0.87513983 0.8728725 0.8707472 0.8687482 0.8668616)

These parameters are somewhat more believable - but not for the top-ranked people; no person is right 99.5% of the time in these matters, or even 96% (except by chance).

So your model has 3 problems. One is that, because it uses a power law, the top-ranked 1 to 3 people will dominate for most parameters. The second is that the benefit of having a large number of voters is reducing variance, and I don't think you're taking variance into account. The third is that it relies on being able to identify - not approximately, but precisely - everyone's rank; because you have to identify the rank 1, 2, and 3 people exactly right. Even if the power-law turns out to be justified (and I don't think it is), you would have to account for the uncertainty in ranking, which would dramatically steer your results in the direction of "more voters is better".

Expand full comment

Perhaps we in the US ought to be celebrating our right to abstain. In some countries, such as Brazil, voting is mandatory.

Expand full comment