AI Architecture

Mar 20, 2023

What I’m about to say is pretty obvious to most artificial intelligence (AI) researchers, but I think others might benefit from hearing it.

In general, AI systems are created by combining computer hardware, other hardware, access to data, an architecture, parameter settings, and hand-made software. Machine learning (ML) systems have relatively little hand-made software, while robotic systems have more other hardware.

Over the last few decades, our most impressive AI systems have mostly been of the non-robotic ML type, have had limited scopes of application, and have had a wide range of architectures. AI developers have considered architecture to be important in the sense that they repeatedly start over with new architectures, instead of continuing to build up systems based on old architectures.

Clearly the AI world is to some extent searching in the space of possible AI system architectures. Plausibly this is because, so far, architecture gains have made a substantial contribution to performance gains, just as more generally algorithm gains have been similar to computer hardware gains. (As architecture is only part of algorithms, its gains are likely less than hardware gains.)

Many are anticipating a future day when we may have AIs with a relatively wide range of abilities (call them AGIs = “artificial general intelligences”), and at human-level or better for most of those abilities. Many others are anticipating a probably different day, earlier or later, when we will see an AI which is good at the general task of searching in the space of new AI architectures (Call them SIAI = “self-improving artificial intelligences”). Depending on just how good a SIAI is, and just how hard or easy is that task, the abilities in its lineage might (or might not) rapidly improve.

The best architectures today are plausibly those behind large language models, which are indeed impressive. But similarly impressive to how the best new systems of each decade have looked in the past. They do not seem especially close to being either AGI or SIAI, and it seems pretty clear to me that they would not become such just by adding more hardware and data. Many people have plausible ideas for architecture modifications that might move such systems in these directions. But the question is how fast to expect such innovations to appear.

I continue to see past track records of progress in such things as our best basis for estimating future progress. Recent advances have been exciting, but they seem within the usual long-term range of variation in the frequency and lumpiness of architecture innovations. And judging from past trends in the key metric of the fraction of world income that goes to pay for AI systems, we still seem to be a long way from an economy dominated by AI.

Here’s a project idea to help track progress in AI architectures. Let’s try to define an uninformed Bayesian prior over the space of all AI architectures, and then track the prior values assigned to particular new AI systems over time, along with other noteworthy facts about each system. We might similarly set a prior over parameter settings and track changes in those prior values. In this way we could create a dataset to support forecasting future rates, variation, and lumpiness of progress in AI architecture and parameter settings.

19 Comments

Berder

"We might similarly set a prior over parameter settings and track changes in those prior values."

Practically speaking, you can't do that. Every time you train a network you're going to get wildly different values of almost all of your parameters. The value of an individual parameter is meaningless and random, so tracking it would serve no purpose.

Here's something you probably don't know. In practice, the "fully trained" network has almost identical edge weights to the initial "randomly initialized" network. The training makes only small changes, essentially just fine tuning the random initialization. This is because the weight space (the vector space of all the weights) is so high dimensional, which means that wherever you are in weight space, you're *close* to a viable solution, of which there are very many. So it only takes a small adjustment from the random initialization to get to the closest viable solution.

This "small adjustment" makes a big difference in the network's behavior, but it is small in terms of the absolute sizes of the weight updates.

Expand full comment

3 replies by Robin Hanson and others

Peter Gerdes

Peter’s Substack

Mar 20, 2023·edited Mar 20, 2023

I feel like the measure of "economy dominated by AI" is a very flawed measure as if AI is cheap whatever it does wont be a large fraction of economic activity.

I mean, suppose that in 40 years virtually no human is physically involved in any sort of manufacture, resource extraction, farming, routine janitorial work or construction but it's able to supply that labour really cheaply. Does that count or not? I mean, if we get really good at programming AI without much effort is then it seems likely that those aspects of production become a relatively small amount of economic activity (in dollar terms).

Doesn't AI only com to dominate economic activity if it's both very useful but also requires some relatively limited resource (eg requires huge amounts of power). Unless you just mean the amount of economic activity it's involved with which sounds bad since it means that if everyone wears AI enabled smart glasses it's true even if they only add a bit of value.

4 replies by Robin Hanson and others

17 more comments...

Overcoming Bias

AI Architecture