@KhothI was a little bit too fast with my praise of the eeckhout paper (cities).

If you take a closer look at the whole thing, there are more than 20 % of the population missing in the plot. Most (european) countries do not accept "cities" with less than 2 - 5000 people, cutting off the left 2/3 of the data points, or half the ln size distribution. On the right side, as the paper states, he takes the smaller US size definition, putting LA at 3.7 Mio, instead of 16 Mio with the definition closer to european views, and strong deviation from a log normal or Zipf distribution. That leaves than the transition region for fitting, and you can do this in many ways. You can even find on the homepage of the author some comment exchange in 2009 with Levy on this. Beyond that, if you look up data on "urbanization", (e.g. CIA, wiki) you will find that over 50 % of the differences are due to different definitions of "city" (see German wiki "Stadt": DK: 200; JP 50 000)

bottomline: if you want, you can fit the log normal distribution, but many others as well, and it doesn't prove anything about underlying mechanisms.Very typical results for economics / sociology.

@klothway more complicated.If you look in statistical physics on these self-organized criticality phenomena, pertubation theory and similar stuff, there are also good reasons to cut off at the right side, when you touch the system size.You can see this, if you know what you are looking for, for example in the "firm size" plot above. Just shift the lin fit line a little up to match the central points.

Just with the problem, that the plot doesn't pass the smell test.

In the Journal with the highest impact factor of all, "Science", from which that was taken, see the link above, they find it now very often not necessary any more to do any reasonable peer review. Otherwise they would have seen easily that in a nation of 3e8 people, and probably some 1e7 - 1e8 corporations, you can not have a frequency of less than 3e-9, meaning that at least two points to the right are garbage.

garbage in, garbage out. Just like New York Times "the physicists does the city" with links to the next "high impact factor" PNAS garbage.

grummel, outspoken arrogance here.

That ordinary people don't know nor understand quantum mechanics, no problem. Scaling theory, of course not, elementary statistic is actually not that difficult and helpful in daily life. But this is just elementary math, no frequency of 1e-13 possible in a population of 3e8, ad the "Creme de la Creme" doesn't catch it, showing a massive decay of quality in the US in the last 30 years.

I did see from the paper that it was linear regressions. I just didn't notice why anyone would even think of it. It makes sense to do it for the parts that looks basically like straight lines, but the other ones are just completely meaningless.

If you read the original paper, the various lines show what linear regression errors you make, if you cut off the complete distribution at various points.From my perspective, you do not only get increasingly wrong power coefficients, but you miss out on half the dynamic, especially what happens to small and shrinking cities and regions, and how to adjust public investment to that.

Not really related to the point, but I can't help wondering why the city size graph has a bunch of lines in it. Do some people have a weird compulsion to try to fit a straight line to anything?

@KhothI was a little bit too fast with my praise of the eeckhout paper (cities).

If you take a closer look at the whole thing, there are more than 20 % of the population missing in the plot. Most (european) countries do not accept "cities" with less than 2 - 5000 people, cutting off the left 2/3 of the data points, or half the ln size distribution. On the right side, as the paper states, he takes the smaller US size definition, putting LA at 3.7 Mio, instead of 16 Mio with the definition closer to european views, and strong deviation from a log normal or Zipf distribution. That leaves than the transition region for fitting, and you can do this in many ways. You can even find on the homepage of the author some comment exchange in 2009 with Levy on this. Beyond that, if you look up data on "urbanization", (e.g. CIA, wiki) you will find that over 50 % of the differences are due to different definitions of "city" (see German wiki "Stadt": DK: 200; JP 50 000)

bottomline: if you want, you can fit the log normal distribution, but many others as well, and it doesn't prove anything about underlying mechanisms.Very typical results for economics / sociology.

Hah, I thought the firm size graph looked suspiciously neat. I hadn't noticed that the scale on the y-axis was impossible. Nice catch.

@klothway more complicated.If you look in statistical physics on these self-organized criticality phenomena, pertubation theory and similar stuff, there are also good reasons to cut off at the right side, when you touch the system size.You can see this, if you know what you are looking for, for example in the "firm size" plot above. Just shift the lin fit line a little up to match the central points.

Just with the problem, that the plot doesn't pass the smell test.

In the Journal with the highest impact factor of all, "Science", from which that was taken, see the link above, they find it now very often not necessary any more to do any reasonable peer review. Otherwise they would have seen easily that in a nation of 3e8 people, and probably some 1e7 - 1e8 corporations, you can not have a frequency of less than 3e-9, meaning that at least two points to the right are garbage.

garbage in, garbage out. Just like New York Times "the physicists does the city" with links to the next "high impact factor" PNAS garbage.

grummel, outspoken arrogance here.

That ordinary people don't know nor understand quantum mechanics, no problem. Scaling theory, of course not, elementary statistic is actually not that difficult and helpful in daily life. But this is just elementary math, no frequency of 1e-13 possible in a population of 3e8, ad the "Creme de la Creme" doesn't catch it, showing a massive decay of quality in the US in the last 30 years.

A log-normal distribution is precisely a parabola on a log-log plot.

A regular normal distribution is a parabola on a logy plot.

P = C*exp( (v-u)^2/D ) is the pdf of a normal dist of v. Take the log

log(P) = E + (v-u)^2/Dif y = log(P) and x=v this is the equation of a parabola.

but if v = log(z) then this is the pdf of a log-normally distributed z. Theny = log(P) and x=log(z) is the equation of a parabola.

I did see from the paper that it was linear regressions. I just didn't notice why anyone would even think of it. It makes sense to do it for the parts that looks basically like straight lines, but the other ones are just completely meaningless.

If you read the original paper, the various lines show what linear regression errors you make, if you cut off the complete distribution at various points.From my perspective, you do not only get increasingly wrong power coefficients, but you miss out on half the dynamic, especially what happens to small and shrinking cities and regions, and how to adjust public investment to that.

Pretty interesting paper !

A lognormal distribution is not a parabola, unless you are using the term parabola very loosely.

Not really related to the point, but I can't help wondering why the city size graph has a bunch of lines in it. Do some people have a weird compulsion to try to fit a straight line to anything?