Why Do Algorithms Gain Like Chips?

Computer hardware has famously improved much faster than most other kinds of hardware, and most other useful things. Computer hardware is about a million times cheaper than four decades ago; what other widely useful thing comes has grown remotely as fast? Oddly, computer algorithms, the abstract strategies by which computer hardware solves real problems, seem to have typically improved at a roughly comparable rate. (Algorithm growth rates seem well within a factor of two of hardware rates; quotes below.) This coincidence cries out for explanation.

On the surface the processes that produce faster hardware and faster algorithms seem quite different. Hardware is made by huge companies that achieve massive scale economies via high levels of coordination, relying largely on internal R&D. Algorithms instead seem to be made more by many small artisans who watch and copy each other, and mostly focus on their special problem area. How is it that these two very different processes, with very different outputs, both grow at roughly the same remarkably fast rate? The obvious hypothesis is that they share some important common cause. But what? Some possibilities:

  • Digital – Both computer hardware and algorithms are digital technologies, which allow for an unusually high degree of formal reasoning to aid their development. So maybe digital techs just intrinsically grow faster. But aren’t there lots of digital techs that aren’t growing nearly as fast?
  • Software – Maybe software development is really key to the rapid growth of both techs. After all, both hardware and algorithm experts use software to aid their work. But the usual descriptions of both fields don’t put a huge weight on gains from being able to use better productivity software.
  • Algorithms – Maybe progress in hardware is really driven behind the scenes by progress in algorithms; new algorithms are what really enables each new generation of computer hardware. But that sure isn’t the story I’ve heard.
  • Hardware – Maybe there are always lots of decent ideas for better algorithms, but most are hard to explore because of limited computer hardware. As hardware gets better, more new ideas can be explored, and some of them turn out to improve on the prior best algorithms. This story seems to at least roughly fit what I’ve heard about the process of algorithm design.

This last story of hardware as key has some testable predictions. It suggests that since gains in serial hardware have slowed down lately, while gains in parallel hardware have not, parallel algorithms will continue to improve as fast as before, but serial algorithm gains will slow down. It also suggests that when even parallel hardware gains slow substantially in the future, because reversible computing is required to limit power use, algorithm gains will also slow down a comparable amount.

If true, this hardware as key theory also has policy implications. It suggests that it is much better to subsidize hardware research, relative to algorithm research; even with less research funding algorithm gains will happen anyway, if just a bit later. This theory also suggests that there is less prospect for self-improving algorithms making huge gains.

So what other explanations can we come up with, and what predictions might they make?

Added 5June: There are actually several possible ways that software progress might be determined by hardware progress. In the post I mentioned better hardware letting one explore more possible ideas, but it could also be that people already knew of better algorithms that couldn’t work on smaller hardware. Algorithms vary both in their asymptotic efficiency and in their initial overhead, and we might slowly be switching to bigger overhead algorithms.

Those promised quotes:

The White House advisory report cited research, including a study of progress over a 15-year span on a benchmark production-planning task. Over that time, the speed of completing the calculations improved by a factor of 43 million. Of the total, a factor of roughly 1,000 was attributable to faster processor speeds. .. Yet a factor of 43,000 was due to improvements in the efficiency of software algorithms.

Solve some test LPs … hardware seems to be currently winning the speed race. Even so, studies suggest that over time, say 20 years, the speed-ups have been quite evenly divided between mathematics/algorithms and hardware improvements.

25 years progress solving sparse linear systems. Hardware alone: 4 orders of magnitude. Software alone: 6 orders of magnitude. The N-Body Problem: in 30 years 10^7 hardware, 10^10 software.

Added 5June: roystgnr points us to a similar result in magnetohydrodynamics.

These quotes are from a forthcoming MIRI tech report by Katja Grace currently titled “Assorted evidence on algorithmic progress.”

Chess programs … over the last four decades. … Estimates for the significance of hardware improvements are very noisy, but are consistent with hardware improvements being responsible for approximately half of progress. … Go programs … for the last three decades. Hardware doublings produce diminishing Elo gains, on a scale consistent with accounting for around half of progress. … Improvements in a variety of physics simulations (selected after the fact to exhibit performance increases due to software) appear to be roughly half due to hardware progress.

GD Star Rating
a WordPress rating system
Tagged as: ,
Trackback URL:
  • http://www.gwern.net/ gwern

    > Hardware – Maybe there are always lots of decent
    ideas for better algorithms, but most are hard to explore because of
    limited computer hardware. As hardware gets better, more new ideas can
    be explored, and some of them turn out to improve on the prior best
    algorithms. This story seems to at least roughly fit what I’ve heard
    about the process of algorithm design.

    I like this explanation best. The gains seem to, except in some mathematical fields, be extensive rather than intensive – where is the Moore’s law for sorting an array of integers, for example? So it’s that we are running more diverse taskloads and problems, with equally diverse algorithms. (Somewhere in a Knuth book, I think, he mentions that in the 1960s something like 30% of all CPU time was devoting to sorting; does anyone think that is remotely the case today?) With a larger field of new applications and new algorithms to tweak and cross-breed, there will be a growth of algorithms in general. We see something similar with computer languages: never before have so many computer languages been actively used at the same time.

    Another related factor is that as you gain more computing power, constant factors matter less and asymptotics matter more. Bit-twiddling because it saves you a few fixed cycles is unimportant and a waste of time, while improving your algorithm to be O(n) rather than O(n^2) is very important because it means when you get a twice-as-fast new computer in three years, you can run on twice the data size. Who then cares about making the original data size run 1% faster if it blows up exponentially when you double the input?

    • Doug

      “Another related factor is that as you gain more computing power, constant factors matter less and asymptotics matter more.”

      This would augur a slowdown in algorithmic improvement. Constant time improvements can be done by nearly any programmer, and usually involve tedious manual optimization of the code. Asymptotic improvements on the other hand require fundamental changes to structure of the algorithm itself. This almost always means genuine theoretical breakthroughs on the problem.

      In other words constant time improvements are low-hanging fruit, and asymptotic improvements are the product of rare genius. If the latter’s becoming a larger component of algorithmic improvement, then we’d expect the task to get harder and slow down.

      • http://www.gwern.net/ gwern

        > In other words constant time improvements are low-hanging fruit, and
        asymptotic improvements are the product of rare genius. If the latter’s
        becoming a larger component of algorithmic improvement, then we’d expect the task to get harder and slow down.

        I don’t see how this addresses my point that with the continued obsolescence of computers, increasing size of data, and other factors we could add like the expense of programmer time, constant-factor gains with very short lifespans are bad buys compared to asymptotic algorithmic gains.

    • Brutus

      There’s another meta-level point: Programming theory used to not consider asymptotic time to be an important field of study.

      I don’t think current-gen hardware design asks the question “If we put a large amount of this hardware together, how well will it perform?”, instead picking a single scale and designing hardware optimized for that scale.

      And the algorithm optimization I am aware of tries to minimize the number of cycles that a single process requires, rather than tradeoffs between the total number of cycles required for a task and the number of operations dependent on the results of other operations; ask “How can I do this fastest with a large number of parallel threads?”

      • http://www.gwern.net/ gwern

        > There’s another meta-level point: Programming theory used to not consider asymptotic time to be an important field of study.

        When was it not important and in what sense was it not important? Study of Big O has been important in computer science since very early on; take for example Knuth’s TAOCP, started in the ’60s, which is surely a long time ago by any standard.

      • dmytryl

        Read up what Big O is (mathematically). The limit behaviour as N–>infinity became relevant when typical Ns got very big. When they’re not so big its more important to keep track of slower growing terms in the sum as well.

      • http://www.gwern.net/ gwern

        I don’t see how that’s a response to my comment. I know perfectly well that Big O is about behavior as N gets bigger, my point is that as Moore’s law continued, problems *got* bigger.

      • dmytryl

        You said, “Study of Big O”. The analysis of algorithms has never been “Study of big O”, the big O is just a fairly idiosyncratic (and not quite correct) use of mathematical notation, misused to describe tight bound of asymptote.

      • http://www.gwern.net/ gwern

        Um… so what? You’re quibbling over terminology; who cares whether it’s ‘fairly idiosyncratic’, it’s how CS operates and has operated for half a century, which is the point. Characterizing the asymptotics, proving bounds, and improving bounds are big parts of studying algorithms, and have been since early on, as I said.

      • dmytryl

        Terminology is important when you try to argue with Brutus…

        If you look at e.g. “the art of computer programming”, or other old books, you frequently see a full formula for the run time of an algorithm, or at least, for the number of key operations (e.g. comparisons).

      • http://www.gwern.net/ gwern

        Yes, it is important, but look at what he said:

        > There’s another meta-level point: Programming theory used to not consider asymptotic time to be an important field of study.

        How is this even remotely accurate? Stop dodging the question and quibbling about whether big O is exactly right, and answer the question: how did ‘programming theory used to not consider asymptotic time to be an important field of study’? Because this seems exactly opposite of the historical reality, and you admit as much:

        > at least, for the number of key operations (e.g. comparisons).

        Which of course are exactly what you would want to be doing asymptotics on, the ‘key operations’…

        Maybe ordinary programmers spent most of their time worrying about constant-factors and bit-banging, but the theorists didn’t.

      • dmytryl

        It’s hardly a “field of study”. Algorithms are a field of study, proofs of the minimum number of operations required for something are a field of study, and so on. Finding the asymptote is high school math, not a field of study.

        Also, plenty of theorists are most notable for decreasing a constant factor in some algorithm which is at it’s theoretical best asymptotically (e.g. sorting). And plenty of important problems (e.g. breaking cryptographic functions) are O(1) by definition , just the constant is incredibly huge.

      • http://www.gwern.net/ gwern

        > Finding the asymptote is high school math, not a field of study.

        More playing with words, more ignoring my historical point. Why aren’t you addressing my original point? (And calculus is not ‘a field of study’? OK then…) But you know what, I’ll bite on your tangent:

        > Also, plenty of theorists are most notable for decreasing a constant factor in some algorithm which is at it’s theoretical best
        asymptotically (e.g. sorting).

        Name three. And note that people responsible for things like radix sort are not studying comparison sorts where the sorts are at their theoretical best.

      • Philip Goetz

        I think Brutus probably meant the opposite, that programming theory used to not consider constant time to be worth study. It was all about O(n).

  • rpandya

    Here’s some more history on the software/hardware tradeoff, by someone I have worked with at Microsoft Research:
    Spending Moore’s Dividend, James Larus
    http://research.microsoft.com/apps/pubs/default.aspx?id=70581

    • lemmycaution

      That is a good article. Basically, the hardware speeds up and software expands in features to consume the new speed.

      Improvements in “benchmark production-planning tasks”,”sparse linear systems.” and “linear programming” are all well and good but they don’t significantly speed up software as a whole.

      • http://www.facebook.com/CronoDAS Douglas Scheinberg

        Wirth’s Law: Software is getting slower more rapidly than hardware becomes faster.

      • IMASBA

        Yeah, expect to be looking at 4k res ad banners in 2015, thank god for adblock!

  • Don Geddis

    I’m a software guy, and I didn’t originally believe your claim, that software gains could match known hardware gains over decades. But your quotes at the end are intriguing. I’d like more data about software gains across different problem domains, over decades.

    (Probably one fruitful domain: chess. What credit is due to hardware vs. software for dominance of chess by computers? My impression is that it was “mostly hardware”, up to Deep Blue (just) beating Kasparov. But then a major software innovation happened in the last decade, and chess computers are now far, far better than any human, on hardware far far weaker than Deep Blue used.)

    • Doug

      I’m skeptical as well. The problems seem cherry picked to me. I’d like a broader cross-section of problems. LP and chess are interesting, but probably make up less than 0.1% of aggregate global CPU time.

      On the flip side problems that computers work on a lot more than those two: memory page replacement. compression, matrix inversion, code compilation. None of these areas have seen large algorithmic improvements to speed of anything near the hardware improvement.

      • http://www.facebook.com/CronoDAS Douglas Scheinberg

        Video compression has gotten a lot better (in terms of producing smaller files), but I think that’s due to better understanding of people, not algorithms; we know more about what kind of information you can get rid of and still have a decent looking picture.

      • Ilya Shpitser

        Alex Gray’s talk at last year’s UAI talked about staggering computational advances for “big data” algorithms (sorry I don’t have the slides):

        http://www.auai.org/uai2012/invited.shtml

  • IMASBA

    “Hardware – Maybe there are always lots of decent ideas for better algorithms, but most are hard to explore because of limited computer hardware. As hardware gets better, more new ideas can be explored, and some of them turn out to improve on the prior best algorithms. This story seems to at least roughly fit what I’ve heard about the process of algorithm design.”
    This seems like the most important factor. Many algorithms are actually quite old (at least in physics) and it used to be that some algorithms were rejected, or considered unpractical simply because no computer capable of running them existed yet. In the past it was really hard to invent a good algorithm. You couldn’t just think of how you would solve the problem with your human brain (intuitive algorithm), you really had to think about hardware limitations (“can’t use multiplication here, that’s too slow, have to find a way to do it through addition”). Modern computers are far more capable of running intuitive algorithms, especially now that videogaming has spurred the creation of processors that specialize in parallel computing (graphics cards). Case in point, I once derived an algorithm that I later found out was indentical to the Hoshen-Kopelman algorithm. For them it was a difficult task to design and make it work in the 1970s, for me it was just the most intuitive solution because I expected my modern PC to have no difficulty running any of the steps of the algorithm, and that turned out to be correct.

    P.S. I dispute the claim that algorithms have become thousands of times better, like hardware has. Algorithms have improved but not by that much.

    • IMASBA

      I have to point out that just using the FLOPS and number of transistors of a processer can be deceptive. Hardware manufacturers continuously adapt chip design: replacing half of the multiplication units on a chip by extra addition units may make the processor far better at running certain algorithms even though the FLOPS and number of transistors remains the same. It may be that in such cases the credit for faster applications is given to the algorithms.

    • Dale

      In many cases, algorithms have improved by that much. And more.

      http://www.whitehouse.gov/sites/default/files/microsites/ostp/pcast-nitrd-report-2010.pdf

      From p. 71.

      “Even more remarkable – and even less widely understood – is that in many areas, performance gains due to improvements in algorithms have vastly exceeded even the dramatic performance gains due to increased processor speed.

      “The algorithms that we use today for speech recognition, for natural language translation, for chess playing, for logistics planning, have evolved remarkably in the past decade. It’s difficult to quantify the improvement, though, because it is as much in the realm of quality as of execution time.

      “In the field of numerical algorithms, however, the improvement can be quantified. Here is just one example, provided by Professor Martin Grötschel of Konrad-Zuse-Zentrum für Informationstechnik Berlin. Grötschel, an expert in optimization, observes that a benchmark production planning model solved using linear programming would have taken 82 years to solve in 1988, using the computers and the linear programming algorithms of the day. Fifteen years later – in 2003 – this same model could be solved in roughly 1 minute, an improvement by a factor of roughly 43 million. Of this, a factor of roughly 1,000 was due to increased processor speed, whereas a factor of roughly 43,000 was due to improvements in algo-rithms! Grötschel also cites an algorithmic improvement of roughly 30,000 for mixed integer programming between 1991 and 2008.”

      Sources available in that document.

      • IMASBA

        Like other people here I’m not convinced those examples are representative and chip design changes may have been overlooked. I suppose a big question we have to ask is whether it’s enough to look at the execution speed? If, mathematically speaking a new algorithm is only slightly different from an older algorithm but executes 100 times faster was the algorithm improved 100 times or only slightly? Also, algorithm advances are rare, what we’ve seen may just be sporadic harvesting of low-hanging fruit. Many applications have seen only two or three generations of algorithms, nothing to suggest a trend as dependable and robust as Moore’s law.

      • Brutus

        If a chip design is changed only slightly from a previous design but executes operations 100 times faster, is it 100 times better or only slightly?

  • arch1

    I find synergy in both directions plausible. That said, as to the basic trend noted I suspect there’s less commonality than meets the eye:

    1) HW gains, while not fully generic (e.g. sequential vs parallel), tend to be much more widely applicable than algorithm gains.
    2) The algorithm gains cited, while impressive, get attention precisely *because* they are areas in which research has made particularly large improvements.
    a) I can’t locate the quote but recall Knuth saying that many (most?) of the
    algorithms in his (admittedly sequential-focused) Art of Computer
    Programming series had reached their final form before he incorporated them in his books.

    A more balanced approach might be to weight algorithm gains by machine cycles they productively consume worldwide (whether consumed pre or post improvement, and whether and how such weighting might usefully be applied to the HW side as well, are exercises for the reader:-)

  • http://twan.home.fmf.nl Twan van Laarhoven

    > It suggests that it is much better to subsidize hardware research,
    relative to algorithm research; even with less research funding
    algorithm gains will happen anyway, if just a bit later.

    That is only true if the marginal impact per dollar of hardware research is comparable to the that of software research. It could be the case that progress in hardware is constrained by something other than available money.

    As another data point, I recall hearing from a university professor that speedups in computational fluid dynamics are also evenly divided between software and hardware.

    • roystgnr

      There was a high-profile report that I saw quoted this year with a graph which claimed that large-scale magnetohydrodynamics problem speed improvements are evenly distributed between software and hardware:

      http://users.ices.utexas.edu/~roystgnr/MHD_moore_law.pdf

      More prosaic computational fluid dynamics problems are probably in a similar position. None of the software improvements they list are hard to apply to plain subsonic fluids problems.

      • http://CommonSenseAtheism.com lukeprog

        What report is this from?

      • roystgnr

        SCaLeS Report, Vol. 2, D. Keyes et. al. eds, 2004

      • Jmiller

        Do you know where it can be found?

      • roystgnr

        http://science.energy.gov/~/media/ascr/pdf/program-documents/archive/Scales_report_vol2.pdf

        Looks like the graph is on page 127. It’s surely not the only thing in those 300 pages of relevance to this discussion, it just happens to be the one bit that stood out enough to get reused in a couple talks I attended.

  • Douglas Knight

    You have more quotes than examples. All three quotes talk about linear programming, suggesting that is a special case. At least the third quote talks about a second domain.

    • http://overcomingbias.com RobinHanson

      I got permission for more quotes; they have been added.

  • Carl

    “So maybe digital techs just intrinsically grow faster. But aren’t there lots of digital techs that aren’t growing nearly as fast?”

    Storage, bandwidth, sensors…there are a lot of other digital technologies with very high growth rates.

    • IMASBA

      Gentech maybe? Gene sequencing has become so much faster and cheaper than it used to be and knowledge about specific genes has also grown a lot. Btw, I’d say “storage” is a subdomain of computer hardware.

  • kevinsdick

    I’m pretty sure causality runs in both directions.

    Faster hardware allows you to not only more quickly test the algorithms themselves, it allows you to run much more computationally expensive tools to develop the algorithms.

    Recall the Wheeler/Lampson quote, “All problems in computer science can be solved by another level of indirection.” But each level of indirection requires more computation.

    Then if you talk to a chip guy about the software they use for designing chips, it’s always the most cutting edge in terms of the algorithms. So virtuous circle, plain and simple.

  • Dave Lindbergh

    Improvements in hardware speeds are driven by very large investments that aim to match or exceed Moore’s Law – by necessity, since the competition is expected to achieve that rate of progress.

    The algorithmic improvements you cite all seem to be in small niche areas where relatively little effort has been invested – those fields may still be harvesting the “low hanging fruit” of easily discovered improvements.

    In contrast, lossy compression of audio and video – something I’m familiar with – has received a lot of investment since the introduction of the CD (~1980) and digital video (~1990).

    The early gains were huge, but after a very brief early period when moderately sophisticated algorithms replaced dumb brute force (CDs store uncompressed samples), algorithmic progress has been FAR slower than hardware progress.

    In the last 20 years, we’ve seen compression efficiency improve by maybe 10x, and computation efficiency (computes per compressed sample) has gotten worse, not better, by about the same amount. While hardware has gotten about 10000x faster.

    I could argue that the compression gains are mostly driven by the availability of faster hardware, which makes less-efficient (but more effective) algorithms practical.

    It’s only a single data point, but I wonder if algorithmic gains really do keep up with hardware gains, or if you just found a couple of exceptional examples.

    • IMASBA

      “Improvements in hardware speeds are driven by very large investments that aim to match or exceed Moore’s Law – by necessity, since the competition is expected to achieve that rate of progress.”

      The competition is expected to achieve that rate of progress because of investments. New lithography techniques don’t invent themselves.

      “The algorithmic improvements you cite all seem to be in small niche areas where relatively little effort has been invested – those fields may still be harvesting the “low hanging fruit” of easily discovered improvements.”

      I’m inclined to agree, but I might be wrong.

      • Dave Lindbergh

        “New lithography techniques don’t invent themselves.”

        Yes, of course. Moore’s Law is a self-fulfilling prophecy. It’ll stop when people stop believing in it.

      • Stephen Diamond

        Moore’s Law is a self-fulfilling prophecy.

        Very significant claim: Moore’s law has inspired futurologists who believe that human history will be found to demonstrate similarly quantifiable trends.

      • Dave Lindbergh

        Significant claim? Hardly – it’s the prevailing view in the industry (just ask Google).

        The fascinating thing is that it still works EVEN THOUGH everyone involved knows it’s a self-fulfilling prophecy.

        If you’re an industry participant, as long as you think Moore’s Law has predictive power, you have to make investments to keep up with it, therefor making the Law true.

        Whether you think it’s a self-fulfilling prophecy or not.

        It’s like a gigantic consensual illusion that strikes your company dead if you don’t believe in it.

      • IMASBA

        So Dave, you really believe that corporations keep adding investments until Moore’s law is satisfied and withhold further investment after that?

        – “Well, sir I’ve calculated we can increase our profits by 15% if we hasten the introduction of the 14nm production process.”

        – “You know we can’t do that John: transistor count would go up faster than Moore’s law predicts.”

      • ZHD

        Dave,
        This is a pretty interesting thing to think about. Lately I feel that there has been an inherent Moore’s Law Bias pervasive in all software start-ups since Thiel invested in Facebook.

        Perhaps it’s a rational part of valuation. However, I do know that most of the initial and major investors of the early and biggest tech startups since 2006 all drink from the same futurist punch bowl.

      • Brutus

        People will continue to believe it until there is evidence that it is no longer true.

        One possibility is that the evidence will take the form of “We have reached the theoretical/practical limits of physics”; I think it is more likely that the evidence would take the form of “Speeds have not been increasing at the same exponent for the last few years.”

      • ZHD

        I think Moore’s Law will become less true if competition becomes unnecessary and the world population declines.

    • http://overcomingbias.com RobinHanson

      If areas are a mix of areas where gains are similar to hardware, and gains where gains are much slower, what does that say differently about the prospects for self-improving algorithms, or the gains from algorithm R&D? Seems to me the answer is similar.

    • http://newstechnica.com David Gerard

      Yep. They’re seeing the beginning of a logistic curve and they’re assuming it’s exponential all the way.

      PRETTY MUCH EVERYTHING PROGRESSES ON A LOGISTIC CURVE.

      • http://www.gwern.net/ gwern

        As has been pointed out since the start, observing that Moore’s law is probably the first part of a sigmoid is about as useful as knowing that ‘in the long run’, the market is always rational. That may well be, but in the long run we are all dead.

  • Ravi

    Developments in chess can be a good example. Once hardware became powerful enough, usage/development of algorithms increased proportionally. Monetization is important for any field to grow and algorithms can be monetized only when they can be implemented, for which appropriate hardware is required.

  • Siddharth

    The ‘hardware as key’ explanation can also explain why the quantum computing community is having a hard time designing quantum algorithms: in the ~20 years the community has existed, there are few new algorithms. The slow progress is because they don’t have hardware (working quantum computers) to play with.

    • IMASBA

      That and the fact that quantum algorithms are so damn hard to deisgn for creatures that evolved in a Newtonian world.

  • Anonymous

    So what other explanations can we come up with, and what predictions might they make?

    The forbidden explanation: demographics. Hardware is designed primarily by men, primarily by the young, primarily by Asians and Whites. The forbidden prediction that comes from the forbidden explanation: if more women and more non-Asian & non-White men and women become involved in hardware design, the gain will decrease.

    It is unthinkable that the non-young will become involved in hardware design. That claim is forbidden even to me.

    • Brutus

      Why would the demographics of the designers have any effect?

    • free_agent

      Even if demographics was the key driver, and even if a severe affirmative-action program forced a substantial fraction of the people hired as designers to be demographically-challenged, the brutal competitiveness of the market would ensure that the products sold would be dominated by the best designs. The cost would be an extra few thousand designers who did not contribute to sellable products, which is relatively small compared to the overall cash flow of the industry. (Hell, we have thousands of startups spending money on products that will never sell, and nobody suggests that this augurs the doom of the tech industry.)

    • free_agent

      Conversely, many of the industries which have *not* shown dramatic increases in productivity are dominated by white males, in the management, technical, and production sectors. So it doesn’t seem that the areas that have done well have done well solely because of a better demographic.

  • Christopher Chang

    Software speed improvements can be divided into two classes: reductions in fundamental computational complexity, and reductions in just the leading constant.

    The most important thing to note is that a reduction in computational complexity is a gift that keeps on giving: as hardware improves, standard problem sizes grow larger, and that by itself increases the software speedup multiple. I’m guessing this is the main driver of the “1000x vs. 43000x” result.

    As others have observed, for *well-studied* problems, computational complexity reductions are rare. But the diversity of problems being solved by software is increasing fairly rapidly. Thus, at least for the foreseeable future, there will be a fairly wide frontier of newer problems where complexity improvements can still realistically be found.

    As for leading constants, it often isn’t clear how to assign credit to hardware vs. software for a particular speedup. For example, old software rarely exploited parallelism very well, because while the hardware existed, it wasn’t cheap. However, even though almost all new computers are multicore and have significant support for vector operations today, most software still doesn’t really exploit these features. While changing this state of affairs requires a lot of careful reasoning (which is why it doesn’t happen much), it does not really require deep insights. So when someone spends the time and effort to reap a large parallel dividend that has been sitting around and growing for years, how do you count it?

    • arch1

      Thanks for the excellent point made in your second paragraph.

    • Eliezer Yudkowsky

      (Remark: Supposing that the parallel dividend is waiting around sitting for someone to reap it, and they do so only when they can’t get gains some other way, is a demand-side explanation for the speed of algorithmic progress. I think this is very plausible, but other discussants including Hanson are much more skeptical of demand-side explanations.)

      • http://overcomingbias.com RobinHanson

        I have no problem seeing a preference for serial over parallel algorithms. I just don’t see how anything related explains the hardware and algorithm growth rates being similar and much faster than other growth rates.

      • dmytryl

        This “algorithm growth” came from improving the performance as function of the problem size. E.g. the “growth” due to simple O(n^2) n-body simulator versus some advanced O(n*log(n)) n-body simulator is roughly proportional to n, and n which would be used in, say, 1 minute benchmark, is roughly proportional to hardware speed, so of course you get yourself “similar growth rates”. Especially if you consider that the runtime depends to several variables (the length of simulation interval, the accuracy, et cetera), and several of them could have such gains.

        edit: specifically, suppose at some date O(n^2) algorithm performs close enough in a typical use case to O(n) – the latter algorithm has big constant factor comparable to n itself. Let’s suppose hardware has improved by factor of 1000 since then. Now, the O(n) algorithm does 1000x faster task, that would have taken 1000x more time for the O(n^2) algorithm.

    • dmytryl

      Also note that once you get it down to something like O(N*log(N)) , if the problem requires at least looking at the every item, there’s simply no place for massive improvement left, none what so ever.

      As for multicore, for most software it is not really worthwhile to optimize it, and the general kinds of software which is worth optimizing for multicore have been running in parallel since 1980s or earlier. And in majority of the software I think the hardware speed was utilized to cut down programmer effort.

  • DHM

    You can claim an arbitrarily large reduction in computing time from a change in algorithm that has a reduced Big-O simply by choosing a large enough example of the problem.

  • Brutus

    We have both historical hardware and historical algorithms available.

    How well do modern algorithms perform on historical hardware, and how well does modern hardware execute historical algorithms? If they are equally responsible for improved performance, each alone should provide 41% of the total improvement.

  • http://gavinlewis.over-blog.com/the-need-to-remain-competitive-has-boosted-the-popularity-of-a-software-development-firm-today Kennedy Thornton

    I simply want to say I am all new to blogging and site-building and truly loved this page. Very likely I’m going to bookmark your blog post . You definitely have really good article content. Thanks a bunch for sharing your blog.

  • dmytryl

    To give technical backing to Robin Hanson’s view here:

    The algorithm’s performance is generally described with a formula that takes in the task size and returns the number of cycles, memory requirements, and the like.

    For instance, a simple and easy to implement algorithm for solving the problem might require N^2*a + N*b + c cycles. Significant gains come from, through considerable effort, improving such algorithm to, for example, N*log(N)*a2 + c2 , ( where a2 and c2 are usually larger than a , b , and c.).

    One can immediately see that the latter algorithm can’t be described as specific number of times “faster” than the former without specifying N.

    What happens is that initially N is small (often due to hardware limitations), and former algorithm is an acceptable or even the best solution, and stays feasible until N is large enough as to obtain considerable gains from switching to the latter algorithm. Gains which are promptly realized.

    The situation is mirrored in the case of e.g. chess, where the problem complexity grows exponentially in the depth of the search, and minor improvements in culling the exponential constant give seemingly large gains, when expressed in terms of hardware improvements necessary for equivalent gain. (The gains which are still quite small when expressed in terms of search depth, and which would come at a price of missing good moves).

    This could make a good case study into how lack of any domain expertise invariably translates into entirely worthless opinions and research that is not even wrong.

    • IMASBA

      “To give technical backing to Robin Hanson’s view here:

      The algorithm’s performance is generally described with a formula that takes in the task size and returns the number of cycles, memory requirements, and the like.”

      Yes, of course it’s runtime/N that matters, but even so you can still get “free” speed increases from using a larger N, for example if you improve a process whose runtime was first proportional to N, to become proportional to sqrt(N).

      • dmytryl

        Yes, of course. Sorry if that was not clear – it is exactly my point that percentage gains (N/runtime) due to algorithmic improvement increase as N grows, thus increasing advantage of algorithms that grow slower in N, until it becomes feasible to develop them, at which point you get some speed bump which you can later see as larger and larger as N grows.

        (I develop software for computer graphics for a living. I’m also ranked #10 of all time for marathon match impressive debuts on TopCoder)

      • IMASBA

        Ok, I understand what you were getting at now, I think we are in agreement and I wasn’t trying to discredit your coding abilities, I’m sure they’re much better than mine (I can only do c++, for scientific simulations, I don’t know all the tricks for optimization and I wouldn’t know where to start if someone asked me to write graphics software).

    • Philip Goetz

      I would hope that the word “benchmark” meant they were comparing programs that operated on the same data.

  • Michael Bishop

    A friend writes:

    It’s also important to distinguish between algorithms and software, the realization of algorithms. This article seemed to assume that software exists in a constant state of perfection. It’s sort of the opposite. Software tools like compilers, which convert programming language statements into machine instructions, generate substantially more efficient programs now than they did in the 1980s. And of course programs (and programmers) get better. So improvement “from software” is not just about better algorithms. In fact early research implementations of algorithms are often pretty awful.

    A more subtle factor is architecture. That is how computer software and hardware fit together to be good at solving certain kinds of problems. If you just count up how much faster a computer is at say multiplying two numbers now versus in the 1980s, you’ll think that hardware is X amount faster. But actually for each time you multiply two numbers, you have to get the two numbers together to multiply and you have to put the result somewhere. That process of storage and retrieval is greatly expedited by the memory architecture in modern computers; and as programmers have been freed of having to worry about other mundane details by better tools, they can more efficiently exploit this architecture. My guess is that a lot of the savings one might naively attribute to “algorithms” is actually due to architectures that have become better adapted to the problems those algorithms try to solve.

    For example, lots of signal processing algorithms do the same operation to many pieces of data at the same time. Modern computer processors include instructions to facilitate that, and software libraries make them relatively easy to use (compared to programming a crazy ass Cray supercomputer), leaving the programmer to focus on getting the ducks in a row. An algorithm like the fast fourier transform may be largely unchanged but it’s just easier and works better.

    Also interesting is how new types of algorithms grow up around new types of computer systems. Like, “distributed algorithms” really took off at some point in the 80s to deal with problems involving groups of computers collaborating over networks which didn’t really exist before. So that’s another way in which it’s limiting to think of the lines between algorithms, hardware and software and being quite so definite.

    • http://overcomingbias.com RobinHanson

      I see that you can make these distinctions, but I don’t see how introducing them has helped explain the puzzle.

  • JW Ogden

    Funny my impression is that faster hardware has allowed for simpler algorithms to work.

  • free_agent

    I remember reading an article by Robert Noyce that noted that the rate of improvement in processor costs per transistor was quite in line with the usual “experience curve”, which is roughly a 30% or 40% reduction in unit costs when total historical production doubles. What makes processors special is the extreme price-sensitivity of their applications, a small reduction in price causes a large increase in total sales, which causes the next moderate reduction in price.

    From this point of view, anything we call an “algorithm” does some well-defined, high-volume process that we would gladly do much more of if it were cheaper to do. This suggests that the same intense feedback loop is present for any problem that we would consider an “algorithm”. Conversely, things where there is a limited demand would not be expected to show such an increase in efficiency once the yearly processing volume became relatively small compared to the historical total. Examples might be ATM transaction processing or e-mail delivery.

  • Simon

    Memory and bandwidth have also increased at a rate similar to Moore’s law. This means that many algorithms will be more likely to exploit using extra space in order to save time, giving them a speedup roughly equivalent to Moore’s law.

  • Philip Goetz

    So why does Microsoft Word get slower every year?

    Intriguing, but I’d like to see results on a random sample. Sorting is still N logN. I think internet secure connections still use RSA. The most-used algorithm in bioinformatics, BLAST, is 23 years old. It has, however, been more and more hardware-optimized over time. BLAST is much faster than its predecessor algorithm, but at the expense of accuracy. A functionally similar algorithm, HMMER, has increased in speed on a par with hardware.

    I’m surprised there wasn’t a blowing-up effect from recent programs being written in slower languages, like Perl, and from being bogged down with more and more features, like Windows.

    The average runtime of programs is dominated by very badly-written programs that get improved drastically but trivially when a competent programmer reworks them. I took a crucial program at work that took one month of computer time on about a thousand multi-core computers each with 4G of RAM, and I made it run on a single thread on a single CPU with 8G in one day. I didn’t write any clever algorithm. I just rewrote it to do what it had to do and not do all the useless things it had been doing.

    You also need to factor compiler optimizations out of that comparison. The same Java program compiled today will run 4 times as fast on the same hardware as it did a decade ago.

    Re. the hardware hypothesis, bigger faster hardware encourages people to write sloppy algorithms.

    Still, it makes me wonder where we might be if there were funding for computer science research, or at least a recognition that computer science is a thing.

  • Philip Goetz

    The 2010 White House report is at http://www.whitehouse.gov/sites/default/files/microsites/ostp/pcast-nitrd-report-2010.pdf . The relevent section is in a box on p. 71:

    Even more remarkable – and even less widely understood – is that in many areas, performance gains due to improvements in algorithms have vastly exceeded even the dramatic performance gains due to increased processor speed.

    The algorithms that we use today for speech recognition, for natural language translation, for chess playing, for logistics planning, have evolved remarkably in the past decade. It’s difficult to quantify the improvement, though, because it is as much in the realm of quality as of execution time.

    In the field of numerical algorithms, however, the improvement can be quantified. Here is just one example, provided by Professor Martin Grötschel of Konrad-Zuse-Zentrum für Informationstechnik Berlin. Grötschel, an expert in optimization, observes that a benchmark production planning model solved using linear programming would have taken 82 years to solve in 1988, using the computers and the linear programming algorithms of the day. Fifteen years later – in 2003 – this same model could be solved in roughly 1 minute, an improvement by a factor of roughly 43 million. Of this, a factor of roughly 1,000 was due to increased processor speed, whereas a factor of roughly 43,000 was due to improvements in algorithms! Grötschel also cites an algorithmic improvement of roughly 30,000 for mixed integer programming between 1991 and 2008.

    • dmytryl

      Yeah, and in the what ever year (let’s say 2018) the hardware gets 1000 times faster than 2003’s , you could write a report again and say, about the exact same pair of algorithms from 2003 and 1988 , that the benchmark which takes 1 minute in 2018 would of took billions years using 1988 technology, with a gain of 1 million attributable to hardware, and a gain of several billion attributable to software.

      This is because the software improvement improved the runtime as a function of the problem size. And a benchmark would usually have problem size so that it runs for a minute or so.

  • Matthew Belcher

    Many problems involve space/time tradeoffs. One neglected aspect of this analysis is that computer memory densities have also been increasing exponentially, allowing us to use more space in order to save more time. We pre-compute massive lookup tables now that would have been impossible to store 20 years ago.

  • http://newstechnica.com David Gerard

    This is basic Unix philosophy: http://www.faqs.org/docs/artu/ch01s06.html Rob Pike’s rules 2 and 3 are:

    >Rule 2. Measure. Don’t tune for speed until you’ve measured, and even then don’t unless one part of the code overwhelms the rest.

    >Rule 3. Fancy algorithms are slow when n is small, and n is usually small. Fancy algorithms have big constants. Until you know that n is frequently going to be big, don’t get fancy. (Even if n does get big, use Rule 2 first.)

    That is: we have better algorithms now entirely because we finally need them.

    Would our current algorithms be of *practical* use for the same sort of problems 20 years ago? Or would the constants have overwhelmed them? I’m sure there will be a few cases where this is true, but I would strongly suspect that in the vast majority of cases, the new algorithms wouldn’t be an improvement for the old *use cases*.

  • http://twitter.com/lucaswiman Lucas Wiman

    “Maybe progress in hardware is really driven behind the scenes by progress in algorithms; new algorithms are what really enables each new generation of computer hardware. But that sure isn’t the story I’ve heard.”

    I don’t have much personal experience with semiconductor manufacturing, but one of my former coworkers worked in semiconductor CAD and would talk about these issues frequently. My understanding is that many of the advances in lithography and other fabrication techniques go hand-in-hand with software to generate masks and automate design of ever larger circuits with ever smaller feature sizes. Reducing the feature size (ie what Moore’s law describes) introduces new opportunities for subtle quantum interference effects that needs to be compensated for. As the chips get more complex, the chip needs to be laid out by a computer as well. Quality control is another issue. Blown up to the size that a human can see individual features, a modern CPU is enormous. It’s far too big for humans to check, so the process needs to be automated, algorithmically identifying errors in fabrication by analyzing enormous image files.

    I think these are for the most part extremely special purpose algorithms, but they do depend on advances in general scientific computation, as well as image processing, databases and combinatorial optimization techhniques.

    • IMASBA

      The software you speak of here isn’t really groundbreaking. Existing programs could easily design 5nm chips, it’s always a question of waiting for the physicists and chemists to develop new lithography techniques.

  • ZHD

    I think this issue is a lot simpler than what you’re making it out to be. These “artisans” are otherwise known as “grad students”–although I’m sure they’d appreciate the former moniker just as well.

    The university system for Computer Science programs is very analogous to your description of large corporations achieving large scale economies through coordination and investment. Most open source projects begin as thesis research on how to build a better mouse trap. This is desirable in academia because of the considerably less overhead it takes to fund narrowly-scoped research in algorithm design and complexity versus hardware based projects.

    This research also achieves coordination across disciplines. For example, the generalizations of the von mises distributions and asymmetrical exponential power distributions were collaborative efforts that ended up in lengthy proofs for grad students in statistics. Then the computer science students who were interested in turning these into data packages for projects were able to put those theses into data sets, identify areas where computational efficiency could be increased, and then back out more publishable material from that.

    A consistent supply of students and funding for STEM fields only makes sense that the algorithm efficiency would increase just like hardware. Efficiency begets efficiency. And this field also is consistent with the “publish or perish” doctrine.

    More examples of research in algorithm design turning into open source projects:
    most popular math packages (e.g. weka)
    agent based modeling packages
    pseudo-random number generators
    CUDA wrappers
    database packages (e.g. Berkley DB)

    None of the above magically materialized from lonely artists sitting with their slab of marble and chisel. Someone in the highly competitive and interdependent academic universe needed to publish.

  • Pingback: Overcoming Bias : Drexler Responds

  • Pingback: "Algorithmic Progress in Six Domains" Released | Machine Intelligence Research Institute

  • TioMoco

    I’d say that the development of specialized programming languages are responsible for much of what is observed. The languages are designed to support, or work in a manner that addresses many of the problems of previous languages and the constraints they placed on how you had to think about a problem. When a language is designed to get out of your way and allow you to focus on higher-level constructs, you’d be surprised how much more you can do with it.

    I’m also leaving out the incredible amount of work that modern compilers must do to translate purely imaginary constructs invented for a particular application into something the hardware can run.

    The stuff you can do with a few lines of something as widely-known such as Java or Python is trivial compared to what one had to write to accomplish the same thing in say, FORTRAN – and that assumes that FORTRAN can, somehow – through sheer volume of code – accomplish the same task at all.

    It’s like asking your kid to mow the yard with tweezers. I’ll take my ExMark 36″ commercial mower any day.

  • Pingback: Overcoming Bias : I Still Don’t Get Foom