# Inequality Math

Here is a distribution of aeolian sand grain sizes:

Here is a distribution of diamond sizes:

On a log-log scale like these, a power law is a straight line, while a lognormal distribution is a downward facing parabola. These distributions look like a lognormal in the middle with power law tails on either side.

Important social variables are distributed similarly, including the (people) size of firms:

and of cities:

In these two cases the upper tail follows Zipf’s law, with a slope very close to one, implying that each factor of two in size contains the same number of people. That is, there are just as many people in all the cities with 100,000 to 200,000 people as there are in all the cities with one million to two million people. (Since there are an infinite number of such ranges, this adds up to an infinite expected number of people in huge cities, but actual samples are finite.)

The double Pareto lognormal distribution models this via an exponential distribution over lognormal lifetimes. In a simple diffusion process, positions that start out concentrated at a point spread out into a normal distribution whose variance increases steadily with time. With a normal distribution over the point where this process started, and a constant chance in time of ending it, the distribution over ending positions is normal in the middle, but has fat exponential tails. And via a log transform, this becomes a lognormal with power-law tails.

This makes sense as a model of sizes for particles, firms, and cities when such things have widely (e.g., exponentially) varying lifetimes. Random collisions between grains chip off pieces, giving both a fluctuating drift in particle size and an exponential distribution of grain ages (since starting as a chip). Firms and cities also tend to start and die at somewhat constant rates, and to drift randomly in size.

In the math, a Zipf upper tail, with a power of near one, implies little local net growth of each item, so that size drift nearly counters birth and death rates. For example, if a typical thousand-person firm grows by 1% per year (with half growing slower and half growing faster than 1%), but has a 1% chance each year of dying (assuming no firms start at that size), it will keep the same expected number of employees. Such a firm has no local net growth.

Interestingly, individual wealth is distributed similarly. More on that in my next post.

GD Star Rating
a WordPress rating system
Tagged as: ,
• Khoth

Not really related to the point, but I can’t help wondering why the city size graph has a bunch of lines in it. Do some people have a weird compulsion to try to fit a straight line to anything?

• genauer

If you read the original paper, the various lines show what linear regression errors you make, if you cut off the complete distribution at various points.
From my perspective, you do not only get increasingly wrong power coefficients, but you miss out on half the dynamic, especially what happens to small and shrinking cities and regions, and how to adjust public investment to that.

Pretty interesting paper !

• Khoth

I did see from the paper that it was linear regressions. I just didn’t notice why anyone would even think of it. It makes sense to do it for the parts that looks basically like straight lines, but the other ones are just completely meaningless.

• y81

A lognormal distribution is not a parabola, unless you are using the term parabola very loosely.

• Michael Wengler

A log-normal distribution is precisely a parabola on a log-log plot.

A regular normal distribution is a parabola on a logy plot.

P = C*exp( (v-u)^2/D ) is the pdf of a normal dist of v. Take the log

log(P) = E + (v-u)^2/D
if y = log(P) and x=v this is the equation of a parabola.

but if v = log(z) then this is the pdf of a log-normally distributed z. Then
y = log(P) and x=log(z) is the equation of a parabola.

• genauer

@kloth
way more complicated.
If you look in statistical physics on these self-organized criticality phenomena, pertubation theory and similar stuff, there are also good reasons to cut off at the right side, when you touch the system size.
You can see this, if you know what you are looking for, for example in the “firm size” plot above. Just shift the lin fit line a little up to match the central points.

Just with the problem, that the plot doesn’t pass the smell test.

In the Journal with the highest impact factor of all, “Science”, from which that was taken, see the link above, they find it now very often not necessary any more to do any reasonable peer review. Otherwise they would have seen easily that in a nation of 3e8 people, and probably some 1e7 – 1e8 corporations, you can not have a frequency of less than 3e-9, meaning that at least two points to the right are garbage.

garbage in, garbage out. Just like New York Times “the physicists does the city” with links to the next “high impact factor” PNAS garbage.

grummel, outspoken arrogance here.

That ordinary people don’t know nor understand quantum mechanics, no problem. Scaling theory, of course not, elementary statistic is actually not that difficult and helpful in daily life. But this is just elementary math, no frequency of 1e-13 possible in a population of 3e8, ad the “Creme de la Creme” doesn’t catch it, showing a massive decay of quality in the US in the last 30 years.

• Khoth

Hah, I thought the firm size graph looked suspiciously neat. I hadn’t noticed that the scale on the y-axis was impossible. Nice catch.

• genauer

@Khoth
I was a little bit too fast with my praise of the eeckhout paper (cities).

If you take a closer look at the whole thing, there are more than 20 % of the population missing in the plot.
Most (european) countries do not accept “cities” with less than 2 – 5000 people, cutting off the left 2/3 of the data points, or half the ln size distribution.
On the right side, as the paper states, he takes the smaller US size definition, putting LA at 3.7 Mio, instead of 16 Mio with the definition closer to european views, and strong deviation from a log normal or Zipf distribution.
That leaves than the transition region for fitting, and you can do this in many ways. You can even find on the homepage of the author some comment exchange in 2009 with Levy on this.
Beyond that, if you look up data on “urbanization”, (e.g. CIA, wiki) you will find that over 50 % of the differences are due to different definitions of “city” (see German wiki “Stadt”: DK: 200; JP 50 000)

bottomline: if you want, you can fit the log normal distribution, but many others as well, and it doesn’t prove anything about underlying mechanisms.
Very typical results for economics / sociology.