Life Before Earth

This paper is two years old now, but still seems big news to me:


Genetic complexity, roughly measured by the number of non-redundant functional nucleotides … Linear regression of genetic complexity (on a log scale) extrapolated back to just one base pair suggests the time of the origin of life = 9.7 ± 2.5 billion years ago. … There was no intelligent life in our universe at the time of the origin of Earth, because the universe was 8 billion years old at that time, whereas the development of intelligent life requires ca. 10 billion years of evolution. (source; discussion; HT Stuart LaForge)

That seems remarkably close to the age of the universe, 13.8 billion years. Yes it might be a coincidence, but we have other reasons to suspect life began before Earth. So I take this as a substantial if hardly overwhelming confirmation.

GD Star Rating
Tagged as:
Trackback URL:

    Life with just 1 base pair cannot exist (just as I’m sure Moore’s law doesn’t hold for the very earliest of transistor devices). Even virusses contain thousands of base pairs and since we’re talking about a Moore’s law type of dynamic the first couple of thousand of base pairs represent quite a large portion of this 10 billion year stretch. So if the first couple of thousand (I say couple of thousand, it could actually be closer to tens of thousands, I’m not a geneticist) came about rather quickly we’re back at conventional numbers. For example 10k DNA base pairs might have quickly evolved from 10k RNA nucleotides which form more easily than DNA base pairs and therefore could have evolved quite fast from even simpler self-replicating molecules, finally, with large organic molecules existing in sterile environments (confirmed by space probes and spectroscopy) the whole chain of events already starts out at a scale larger than 1 small molecule. Errors that result in too much copying can then easily change 10k base pairs into 20k base pairs (the design is still simple: less can go wrong and more beneficial features still await to be unlocked through simple mutations).

    Still, any geneticist worth his or her salt could have come up with the same arguments as I just did, so there must be more to it.

    • vaniver

      “Life with just 1 base pair cannot exist (just as I’m sure Moore’s law doesn’t hold for the very earliest of transistor devices).”
      Here’s Kurzweil’s plot that extends Moore’s Law to three computing technologies before transistors:

      There are issues with that graph–the early days of transistors do indeed look flat, and it may be the case that the exponential growth is being snuck in through the normalization by cost.

  • Personzorz

    This is an EXTREMELY poorly done paper that says much more about assumptions of uniformatarianism and how you pick your datapoints from the huge cloud of those you could pick than anything about astrobiology.

    • RobinHanson

      The main complaint is “They cherrypicked their data points”. Yet where is the alternative data set one could look at instead? Complaining about a data set without offering an alternative data set is way too easy sloppy criticism.

      • Stephen Diamond

        Yet where is the alternative data set one could look at instead?

        Didn’t Myers provide an alternative data set:

        “They cherrypicked their data points. They didn’t include lungfish, ferns, onions, or some protists because that would totally undermine their premise; those are contemporary organisms with much larger genomes than mammals’, and their shallow, stupid exercise in curve-fitting would have flopped miserably. It’s a great example of garbage in, garbage out.”

      • RobinHanson

        Those are words. In this case a data set is a set of pairs of times and genome sizes.

      • Stephen Diamond

        But it’s enough information (isn’t it?) to create an alternative data set and test Myers’s criticism?

      • RobinHanson

        Obviously not, or you would have just posted the alternative data set.

      • IMASBA

        Robin, don’t be such a bully. Exclusion of known organisms with huge genomes is a valid criticism that is not diminished when a commenter here doesn’t spend weeks trailing genetic databases and figuring out the data format and hidden assumptions/conventions of academic genetics and this one particular paper (it only takes one flaw to bust a theory). Regression doesn’t take into account different priors, it will just treat a lungfish as a fluke/noise to be averaged out even though we know our data about the lungfish is reliable and that panspermia has a low prior probability, especially from a first generation solar system).

      • RobinHanson

        The fact that with more effort one could collect more data and draw stronger inferences does not excuse one from drawing what inferences one can from what data one has at hand.

      • Stephen Diamond

        does not excuse one from drawing what inferences one can

        I understand this is one of your credos, but it doesn’t apply here. The authors of the target work make no claim that they sampled data based on availability, and the criticism is that they must have deliberately (or at least negligently) chosen to use a biased data set.

        [Or do you contend that even deliberately biased “science” is better than nothing?]

      • IMASBA

        Yup, it’s not like the approximate size of the genomes of those species was unknown two years ago. This paper just does not represent the latest knowledge of two years ago, let alone that of today. The “we used all the data that was available at the time” defense does not hold here.

      • Personzorz

        Also what the hell is the genome size of “worms”?

      • dmytryl

        The times are in the 20th century for the most part. We have no clue what genomes bacteria had a billion years ago.

      • Personzorz

        That paper has the same author, it is not an independent source. The reviewers tore it to shreds, and rightly so.

        There is no methodology given for picking the sizes or even the locations of the ‘eukaryote’ and ‘prokaryote’ datapoints. And what prokaryote and eukaryote are they using? The nearly 1 terabase amoeba? The 12 megabase yeast? Some ferns are known to have GIGANTIC genomes in the gens of gigabases and that lineage has been around for 350 megayears, More to the point where do you make the boundary between a large clade and the small subclade you then include as a small further right datapoint?

        Bacteria are still around and constantly speciating. Why not include those speciation events rather than speciation events within the vertebrates on the right?

        Within any of those large genome clades, genome sizes random walk up and down dependent upon the exact history of duplication events and transposon resistances and DNA repair mechanisms in any given lineage. Within land plants you have genomes ranging over three orders of magnitude in size and gene numbers ranging over a factor of four. Similar in the metazoa. Where do you pick your cutoff size?

        If you want a more rigorous treatment of genome size and the evolution thereof, I recommend a great book by a person I very nearly worked for: “The Origins of Genome Architecture” by Dr. Michael Lynch of Indiana University. Does actual math of population genetics and lays out very interesting ideas about how in the long run, genome size is much more controlled by long-term population size than by organismal complexity with incredibly common organisms like bacteria experiencing direct selection towards small genome size and the ability to select against lots of DNA experiences a few discontinuities as population numbers drop as you go to larger eukaryotic cells and then gargantuan multicellular creatues with piddlingly tiny population sizes. Particulars of DNA replication and DNA repair come into play as well, as do the histories of each lineage as in some got virulent transposons or large deletion events or some did not.

        The people who wrote these papers, to use a phrase from physics, aren’t even wrong.

  • Jonathan Graehl

    I also consider this mildly convincing. But why should we assume that the rate of selection+mutation (pre-sex) is roughly constant? Perhaps early life had less shield against transcription+mutation type errors, or early earth was punishing. Also, there must be some equilibrium between length of DNA/whatever and metabolic cost, so perhaps the first life to use such mechanisms quickly accumulated cheaper initial length. Also, early life was missing a lot of features (waving little hairs/appendages to move, impermeable membrane/shell, digestive tract, skeleton, light/smell/vibration/temperature sensors, …), and whoever got them had a big advantage, whereas incremental improvements once you have a sensible body plan might not pay for much additional code length.

  • Michael Bishop

    Did their regression use more data than the points plotted? (I hope so). I agree this is suggestive, but I’ve learned to be very careful about making predictions out of sample.

  • Dude Man

    So all of the data they use is based off of Earth-based organisms, and they try to use this to extrapolate to what the universe looked like before the Earth was formed? It seems like this may be a case of over-extrapolation.

  • Daniel Carrier

    A lot of people think that evolution causes life to get more adapted over time. This is false. Once it’s adapted enough that harmful mutations are as frequent as helpful adaptations, it just drifts. But when life first evolved, it hadn’t reached that equilibrium. Perhaps at a hundred thousand base pairs, evolution switched from adaptation to drift, and the increase in base pairs rapidly slowed.

  • Grant

    I’m no biologist, but from what I’ve read early single-cell lifeforms (as well as some current bacteria) evolved by horizontal gene transfer as well as Darwinian evolution. Since horizontal transfer probably doesn’t work so well in multi-celled organisms, is it possible this method sped up (early) single-celled evolution when compared to multi-cell evolution?

    I suppose the above graph would indicate “no”, but I’ve not read the paper.

    • free_agent

      You write, “Since horizontal transfer probably doesn’t work so well in multi-celled organisms”.

      That’s true, but it’s by design — the cells of a multi-celled organism can cooperate only because they “know” that their neighbor cells are genetically identical.

  • Petter

    All three reviewers seem very critical of the methods in this paper.

  • Petter

    One should also keep in mind that all life forms that we currently have on Earth have evolved for exactly the same amount of time. Bacteria have not stayed the same for a billion years, small animals not for half a billion etc. Yet they have different molecular complexities. That must be for other reasons than time, then,

  • Curt Adams

    Aside from the methodological criticisms below, this paper is based on some really bad assumptions. Nucleic acid polymers are extremely themodynamically unfavorable and you’ve got to have something “living” to even create the building blocks of current life. We really have no idea what “pre-life” life was like, and certainly no reason to think it has similar evolutionary processes and rates to current life.

    Further, current prokaryotes are sometimes rather “minimal” in that they have basically the minimum number of genes for a free-living organism to exist. IMO they’re probably actually simplified versions of more complex (but less efficient) ancestors. In any case, there’s no good reason to think modern prokaryotes, the product of literally billions of years of selection for efficiency, rapid reproduction, and mutation resistance, are in any way representative of early Earth life.

  • Doug Jones/logarithmichistory

    Maybe relevant is this: Lithopanspermia in Star Forming Clusters which argues that panspermia would have been a lot easier at the very beginning of the solar system. If the origin of prokaryote life was really difficult and took a long time, on another planet, then there might be patches here and there of planets seeded with bacteria, because their parent molecular cloud was “infected,” while many more planets might have missed out on this head start.

  • Dallas Weaver

    I am not sure it is at all relevant. Genome size varies a lot more than indicated. I understand pine trees,

    Paris japonica, and lung fish make mammals genomes look trivial.

  • Martin-2

    This post reminded me that it would be fun to see you do one of those “totally conventional views that I hold” posts that some of your colleagues have been doing.

  • dmytryl

    So, 10 billions years ago, we had one base pair lifeforms.

    This is quite ridiculous. Clearly at some point it’s not life any more, and it’s not governed by evolution, it’s just chemicals in a chemical equilibrium (which implies there will be fairly complicated molecules at low concentrations). You start with quite a bit of ‘complexity’ already, and then you acquire extra complexity much more rapidly (because you aren’t yet at equilibrium).

  • Pingback: Overcoming Bias : The Labor-From-Factories Explosion