I’ll do three public talks at U Rochester next week:
- Monday noon, Harkness 208, on Agreeing to Disagree.
- Monday 7:30p, Dewey 1-101, on The Age of Em.
- Tuesday 5:15p, Morey 321, on Vote on Values, Bet on Beliefs
I’ll do three public talks at U Rochester next week:
We can explain human behavior on many levels. For example, we can explain a specific choice in terms of that person’s thoughts and feelings at the time. Or we can explain typical patterns of individual behavior in terms of their stable preferences, resources, abilities, and a rough social equilibrium in which people find themselves. Or one can try to explain why different social worlds find themselves in different local equilibria.
For example, while pressures to confirm are indeed often powerful, that power makes conformity especially inadequate as a total explanation. Yes in an equilibrium where everyone squawks like a chicken when they meet, you’d seem weird if you didn’t also squawk. But if we found a place where that was in fact the equilibrium, we might still puzzle over why that happened there.
Last week I tried to outline an explanation for why young people in rich nations today spend so much energy signaling their work potential via school. Yes in today’s equilibrium you look weird if you try to skip prestigious schools to show your work potential in other ways. So yes we can explain the typical pattern of personal school choices today in terms of the equilibrium that people find themselves in.
But centuries ago few went to school, and the few who went didn’t go long. So young people mostly showed their work potential in other ways, such as via family background and child labor. And then over the last few centuries enthusiasm for school grew greatly, until today 2/3 of US kids graduate from high school, and 2/3 of those at least start college. Mere conformity pressures seem quite inadequate to explain this vast change.
My tentative story less tries to explain individual behavior given a local equilibrium, and more tries to explain why cultures changed to support new different equilibria. I can believe that today school’s main function is to signal work potential, and that child labor has always been better at school at signaling work potential and at acclimating kids to work habits, if the local culture supports that pattern.
But as I said in my last post, cultures around the world and through history have been typically hostile to industrial work habits, such as frequent explicit novel orders and ranking. Adults resisted both such taking such jobs themselves and sending their kids to learn such jobs. And culture seems to have contributed a lot to this, such as via status concepts; people were often ashamed to take such jobs.
Because schools have long and widely had a more prestigious and noble image, people have been more eager to send their kids to school. So schools could habituate kids into industrial workplace styles, and parents could be less ashamed of accepting this. I’m not saying that this was a conscious plan (though sometimes it was), but that this was a lower-resistance path for cultural evolution. Societies that adopted more industry friendly schooling tended to get richer and then other societies were more willing to copy them.
Bryan Caplan seems to accept part of my story:
Let me propose a variant on Robin’s story. Namely: While school is not and never was a good way to acclimate kids to the world of work, it does wrap itself in high-minded rhetoric or “prestige.” “Teaching every child to reach his full potential” sounds far nobler than “Training every child for his probable future.” As a result, making the political case for ample education funding is child’s play. Education’s prestigious image in turn cements its focal status role, making academic achievement our society’s central signal of conformity.
Where Bryan disagrees is that he sees government as the main agent pushing school. He says it wasn’t individual workers who were unwilling to adopt industrial work habits, it was government regulators:
The main problem of development isn’t that people in poor places won’t individually submit to foreign direction, but that people in poor places won’t collectively submit to foreign direction. “Letting foreigners run our economy” sounds bad, but individuals are happy to swallow their pride for higher wages. Voters and politicians in LDCs, in contrast, loathe to put a price on pride – and therefore hamstring multinationals in a hundred different destructive ways.
And he says it wasn’t individuals who were eager to send their kids to school, it was government:
While I don’t dwell on history, my book does answer the question, “Why does schooling pass the market test?” My answer is: “Market test?! Government showers almost a trillion dollars a year on the status quo, and you call that ‘passing the market test’?!” … When individuals spend their own money, of course, they at least ponder whether what sounds wonderful is really worth the cost. For collective spending, in contrast, Social Desirability Bias reigns supreme.
But these just don’t match the history I’ve read. For example, In the US there was a lots of other school funding before government took over:
The school system remained largely private and unorganized until the 1840s. Public schools were always under local control, with no federal role, and little state role. The 1840 census indicated that of the 3.68 million children between the ages of five and fifteen, about 55% attended primary schools and academies. (more)
On typical worker reluctance to follow orders, see Greg Clark’s classic “Why Isn’t the Whole World Developed? Lessons from the Cotton Mills”:
Moser, an American visitor to India in the 1920s, is even more adamant about the refusal of Indian workers to tend as many machines as they could “… it was apparent that they could easily have taken care of more, but they won’t … They cannot be persuaded by any exhortation, ambition, or the opportunity to increase their earnings.” In 1928 attempts by management to increase the number of machines per worker led to the great Bombay mill strike. Similar stories crop up in Europe and Latin America.
Chris Dillow says my viewpoint is not new, and quotes some 70s Marxist scholars:
Robin would, I guess, reach for the holy water and crucifix on learning this, but his idea is an orthodox Marxian one. I don’t say this to embarrass him. Quite the opposite. I do so to point out that Marxists and libertarians have much in common. We both believe that freedom is a – the? – great good; Marxists, though, more than right-libertarians, are also troubled by non-state coercion. We are both sceptical about whether state power can be used benignly. … However, whereas Marxists have engaged intelligently with right-libertarianism, the opposite has, AFAIK, not been the case – as Robin and Bryan’s ignorance of the intellectual history of Robin’s theory of schooling demonstrates. This is perhaps regrettable.
To be clear, I’m only somewhat libertarian, I’m happy to credit Marxist scholars with useful insight, and I wasn’t claiming my view on schools to be starkly original. I’m well aware that many have long seen school as training kids in industrial work habits. What I haven’t seen elsewhere, though I could easily believe it has been said before, is the idea of schools being an easier to swallow form of work habituation due to the ancient human connection between prestige and learning.
Most animals in the world can’t be usefully domesticated. This isn’t because we can’t eat their meat, or feed them the food they need. It is because all animals naturally resist being dominated. Only rare social species can let a human sit in the role of dominant pack animal whom they will obey, and only if humans do it just right.
Most nations today would be richer if they had long ago just submitted wholesale to a rich nation, allowing that rich nation to change their laws, customs, etc., and just do everything their way. But this idea greatly offends national and cultural pride. So nations stay poor.
When firms and managers from rich places try to transplant rich practices to poor places, giving poor place workers exactly the same equipment, materials, procedures, etc., one of the main things that goes wrong is that poor place workers just refuse to do what they are told. They won’t show up for work reliably on time, have many problematic superstitions, hate direct orders, won’t accept tasks and roles that that deviate from their non-work relative status with co-workers, and won’t accept being told to do tasks differently than they had done them before, especially when new ways seem harder. Related complaints are often made about the poorest workers in rich societies; they just won’t consistently do what they are told. It seems pride is a big barrier to material wealth.
The farming mode required humans to swallow many changes that didn’t feel nice or natural to foragers. While foragers are fiercely egalitarian, farmers are dominated by kings and generals, and have unequal property and classes. Farmers work more hours at less mentally challenging tasks, and get less variety via travel. Huge new cultural pressures, such as religions with moralizing gods, were needed to turn foragers into farmers.
But at work farmers are mostly autonomous and treated as the equal of workers around them. They may resent having to work, but adults are mostly trusted to do their job as they choose, since job practices are standardized and don’t change much over time. In contrast, productive industrial era workers must accept more local domination and inequality than would most farmers. Industry workers have bosses more in their face giving them specific instructions, telling them what they did wrong, and ranking them explicitly relative to their previous performance and to other nearby workers. They face more ambiguity and uncertainty about what they are supposed to do and how.
How did the industrial era get at least some workers to accept more domination, inequality, and ambiguity, and why hasn’t that worked equally well everywhere? A simple answer I want to explore in this post is: prestigious schools.
While human foragers are especially averse to even a hint of domination, they are also especially eager to take “orders” via copying the practices of prestigious folks. Humans have a uniquely powerful capacity for cultural evolution exactly because we are especially eager and able to copy what prestigious people do. So if humans hate industrial workplace practices when they see them as bosses dominating, but love to copy the practices of prestigious folks, an obvious solution is to habituate kids into modern workplace practices in contexts that look more like the latter than the former.
In his upcoming book, The Case Against Education, my colleague Bryan Caplan argues that school today, especially at the upper levels, functions mostly to help students signal intelligence, conscientiousness, and conformity to modern workplace practices. He says we’d be better off if kids did this via early jobs, but sees us as having fallen into an unfortunate equilibrium wherein individuals who try that seem non-conformist. I agree with Bryan that, compared with the theory that older students mostly go to school to learn useful skills, signaling better explains the low usefulness of school subjects, low transfer to other tasks, low retention of what is taught, low interest in learning relative to credentials, big last-year-of-school gains, and student preferences for cancelled classes.
My main problem with Caplan’s story so far (he still has time to change his book) is the fact that centuries ago most young people did signal their abilities via jobs, and the school signaling system has slowly displaced that job signaling system. Pressures to conform to existing practices can’t explain this displacement of a previous practice by a new practice. So why did signaling via school did win out over signaling via early jobs?
Like early jobs, school can have people practice habits that will be useful in jobs, such as showing up on time, doing what you are told even when that is different from what you did before, figuring out ambiguous instructions, and accepting being frequently and publicly ranked relative to similar people. But while early jobs threaten to trip the triggers than make most animals run from domination, schools try to frame a similar habit practice in more acceptable terms, as more like copying prestigious people.
Forager children aren’t told what to do; they just wander around and do what they like. But they get bored and want to be respected like adults, so eventually they follow some adults around and ask to be shown how to do things. In this process they sometimes have to take orders, but only until they are no longer novices. They don’t have a single random boss they don’t respect, but can instead be trained by many adults, can select them to be the most prestigious adults around, and can stop training with each when they like.
Schools work best when they set up an apparently similar process wherein students practice modern workplaces habits. Start with prestigious teachers, like the researchers who also teach at leading universities. Have students take several classes at at a time, so they have no single “boss” who personally benefits from their following his or her orders. Make class attendance optional, and let students pick their classes. Have teachers continually give students complex assignments with new ambiguous instructions, using the excuse of helping students to learn new things. Have lots of students per teacher, to lower costs, to create excuses for having students arrive and turn in assignments on time, and to create social proof that other students accept all of this. Frequently and publicly rank student performance, using the excuse of helping students to learn and decide which classes and jobs to take later. And continue the whole process well into adulthood, so that these habits become deeply ingrained.
When students finally switch from school to work, most will find work to be similar enough to transition smoothly. This is especially true for desk professional jobs, and when bosses avoid giving direct explicit orders. Yes, workers now have one main boss, and can’t as often pick new classes/jobs. But they won’t be publicly ranked and corrected nearly as often as in school, even though such things will happen far more often than their ancestors would have tolerated. And if their job ends up giving them prestige, their prior “submission” to prestigious teachers will seem more appropriate.
This point of view can help explain how schools could help workers to accept habits of modern workplaces, and thus how there could have been selection for societies that substituted schools for early jobs or other child activities. It can also help explain unequal gains from school; some kinds of schools should be less effective than others. For example, teachers might not be prestigious, teachers may fail to show up on time to teach, teacher evaluations might correlate poorly with student performance, students might not have much choice of classes, school tasks might diverge too far from work tasks, students may not get prestigious jobs, or the whole process might continue too long into adulthood, long after the key habituation has been achieved.
In sum, while students today may mostly use schools to signal smarts, drive, and conformity, we need something else to explain how school displaced early work in this signaling role. One plausible story is that schools habituate students in modern workplace habits while on the surface looking more like prestigious forager teachers than like the dominating bosses that all animals are primed to resist. But this hardly implies that everything today that calls itself a school is equally effective at producing this benefit.
I recently posted on a hypothetical “kilo-vote” scenario intended to help show that most of us don’t vote mainly to influence who wins the election. However, the ability of any given scenario to convince a reader of such a result depends on many details of the scenario, and of reader beliefs about behavior. So on reflection, I’ve come up with a new scenario I think can persuade more people, because in it fewer things change from the prototypical voting scenario.
Imagine that polls stayed open for a month before the election deadline, and that a random one percent of voters were upgraded to “super-voters,” who can privately vote up to twenty times, as long as they wait at least an hour between votes. When a super-voter votes all twenty times, their votes are doubled, and counted as forty votes. “Privately” means no one else ever knows that this person was a super-voter. (Yes that could be hard to achieve, but just assume that it is achieved somehow.)
To a voter who cares mainly about picking the election winner, and who casts only a tiny fraction of the votes, the value of voting is proportional to their number of votes. Twice the votes gives twice the value. If such a person votes when they are an ordinary voter, then they should be greatly tempted to vote twenty times as a super-voter; their costs aren’t much more than twenty times their costs from voting once, yet for that effort they get forty votes.
I feel pretty sure that most of the people assigned to super-voter status would not in fact vote twenty times. Yes I haven’t tested this, but I’d be willing to bet on it. Most voters care a lot more about seeming to have done their duty than they do about maximizing any new opportunities that arise from being assigned super-voter status. So most super-voters would think they’d done their duty with their first vote. After all, if voting once is good enough for ordinary voters who are not assigned to super-voter status, why shouldn’t that be good enough for super-voters as well?
Software systems are divided into parts, and we have two main ways to measure the fraction of a system that each part represents: lines of code, and resources used. Lines (or bits) of code is a rough measure of the amount of understanding that a part embodies, i.e., how hard it is to create, modify, test, and maintain. For example, a system that is more robust or has a wider range of capacities typically has more lines of code. Resources used include processors, memory, and communication between these items. Resources measure how much it costs to use each part of the system. Systems that do very narrow tasks that are still very hard typically take more resources.
Human brains can be seen as software systems composed of many parts. Each brain occupies a spatial volume, and we can measure the fraction of each brain part via the volume it takes up. People sometimes talk about measuring our understanding of the brain in terms of the fraction of brain volume that is occupied by systems we understand. For example, if we understand parts that take up a big fraction of brain volume, some are tempted to say we are a big fraction of the way toward understanding the brain.
However, using the software analogy, brain volume seems usually to correspond more closely to resources used than to lines of code. For example, brain volumes seem to have roughly similar levels of activity, which isn’t what we’d expect if they corresponded more to lines of code than to resources used.
Consider two ways that we might shrink a software system: we might cut 1% of the lines of code, or 1% of the resources used. If we cut 1% of the resources used via cutting the lines of code that use the fewest resources, we will likely severely limited the range of abilities of a broadly capable system. On the other hand, if we cut the most modular 1% of the lines of code, that system’s effectiveness and range of abilities will probably not fall by remotely as much.
So there can be a huge variation in the effective lines of code corresponding to each brain region, and the easiest parts to understand are probably those with the fewest lines of code. So understanding the quarter of brain volume that is easiest to understand might correspond to understanding only 1% or less of lines of code. And continuing along that path we might understand 99% of brain volume and still be a very long way from being able to create a system that is as productive or useful as a full human brain.
This is why I’m not very optimistic about creating human level AI before brain emulations. Yes, when we have nearly the ability to emulate a whole brain, we will have better data and simulations to help us understand brain parts. But the more brain parts there are to understand, the harder it will be to understand them all before brain emulation is feasible.
Those who expect AI-before-emulations tend to think that there just aren’t that many brain parts, i.e., that the brain doesn’t really embody very many lines of code. Even though the range of capacities of a human brain, even a baby brain, seems large compared to most known software systems, these people think that this analogy is misleading. They guess that in fact there is a concise powerful theory of intelligence that will allow huge performance gains once we understand it. In contrast, I see the analogy to familiar software as more relevant; the vast capacity of human brains suggests they embody the equivalent of a great many lines of code. Content matters more than architecture.
I leave Friday on a nine day trip to give six talks, all but one on Age of Em:
Imagine that at every U.S. presidential election, the system randomly picked one random U.S. voter and asked them to pay a fee to become a “kilo-voter.” Come election day, if there is a kilo-voter then the election system officially tosses sixteen fair coins. If all sixteen coins come up heads, the kilo-voter’s vote decides the election. If not, or if there is no kilo-voter, the election is decided as usual via ordinary votes. The kilo-voter only gets to pick between Democrat and Republican nominees, and no one ever learns that they were the kilo-voter that year.
“Kilo voters” are so named because they have about a thousand times a chance of deciding the election as an ordinary voter does. In the 2008 U.S. presidential election the average voter had a one in sixty million chance of deciding who won the election. The chance that sixteen fair coins all come up heads is roughly a thousand times larger than this.
Consider: 1) How much is the typical voter willing to pay to become a kilo-voter? and 2) How much does it cost the typical voter, in time and trouble, to actually vote in a U.S. presidential election? As long as these numbers are both small compared to a voter’s wealth, then for a voter motived primarily by the chance to change the election outcome, these numbers should differ by at least a factor of one thousand.
For example, if it takes you at least a half hour to get to the voting booth and back, and to think beforehand about your vote, and if you make the average U.S. hourly wage of $20, then voting costs you at least $10. In this case you should be willing to pay at least $10,000 to become a super-voter, if you are offered the option. Me, I very much doubt that typical voters would pay $10,000 to become secret kilo-voters.
Yes, the 2008 election influenced the lives of 305 million U.S. residents, and someone who cared enough might pay a lot for a higher chance of deciding such an election. But typical voters would not pay a lot. Which suggests that the chance to decide the election is just not the main reason that they vote. The chance of being decisive actually doesn’t seem to matter remotely as much to typical voting behavior as it should to someone focused on changing outcomes. For example, states where voters have much higher chances of being decisive about the president don’t have much higher voter turnout rates, and turnout is actually lower in local and state elections where the chances of being decisive is higher.
My conclusion: we don’t mainly vote to change the outcome.
My first book, The Age of Em: Work, Love, and Life When Robots Rule the Earth, is moving along toward its June 1 publication date (in UK, a few weeks later in US). A full book jacket is now available:
Blurbs are also now available, from: Sean Carroll, Marc Andreessen, David Brin, Andrew McAfee, Erik Brynjolfsson, Matt Ridley, Hal Varian, Tyler Cowen, Vernor Vinge, Steve Fuller, Bryan Caplan, Gregory Benford, Kevin Kelly, Ben Goertzel, Tim Harford, Geoffrey Miller, Tim O’Reilly, Scott Aaronson, Ramez Naam, Hannu Rajaniemi, William MacAskill, Eliezer Yudkowsky, Zach Weinersmith, Robert Freitas, Neil Jacobstein, Ralph Merkle, and Michael Chwe.
Kindle and Audible versions are in the works, as is a Chinese translation.
I have a page that lists all my talks on the book, many of which I’ll also post about here at this blog.
Abstracts for each of the thirty chapters should be available to see within a few weeks.
My ex-co-blogger Eliezer Yudkowsky recently made a Facebook post saying that recent AI Go progress confirmed his predictions from our foom debate. He and I then discussed this there, and I thought I’d summarize my resulting point of view here.
Today an individual firm can often innovate well in one of its products via a small team that keeps its work secret and shares little with other competing teams. Such innovations can be lumpy in the sense that gain relative to effort varies over a wide range, and a single innovation can sometimes make a big difference to product value.
However big lumps are rare; typically most value gained is via many small lumps rather than a few big ones. Most innovation comes from detailed practice, rather than targeted research, and abstract theory contributes only a small fraction. Innovations vary in their generality, and this contributes to the variation in innovation lumpiness. For example, a better washing machine can better wash many kinds of clothes.
If instead of looking at individual firms we look at nations as a whole, the picture changes because a nation is an aggregation of activities across a great many firm teams. While one firm can do well with a secret innovation team that doesn’t share, a big nation would hurt itself a lot by closing its borders to stop sharing with other nations. Single innovations make a much smaller difference to nations as a whole then they do to individual products. So nations grow much more steadily than do firms.
All of these patterns apply not just to products in general, but also to the subcategory of software. While some of our most general innovations may be in software, most software innovation is still made of many small lumps. Software that is broadly capable, such as a tool-filled operating system, is created by much larger teams, and particular innovations make less of a difference to its overall performance. Most software is created via tools that are shared with many other teams of software developers.
From an economic point of view, a near-human-level “artificial general intelligence” (AGI) would be a software system with a near-human level competence across almost the entire range of mental tasks that matter to an economy. This is a wide range, much more like scope of abilities found in a nation than those found in a firm. In contrast, an AI Go program has a far more limited range of abilities, more like those found in typical software products. So even if the recent Go program was made by a small team and embodies lumpy performance gains, it is not obviously a significant outlier relative to the usual pattern in software.
It seems to me that the key claim made by Eliezer Yudkowsky, and others who predict a local foom scenario, is that our experience in both ordinary products in general and software in particular is misleading regarding the type of software that will eventually contribute most to the first human-level AGI. In products and software, we have observed a certain joint distribution over innovation scope, cost, value, team size, and team sharing. And if that were also the distribution behind the first human-level AGI software, then we should predict that it will be made via a great many people in a great many teams, probably across a great many firms, with lots of sharing across this wide scope. No one team or firm would be very far in advance of the others.
However, the key local foom claim is that there is some way for small teams that share little to produce innovations with far more generality and lumpiness than these previous distributions suggests, perhaps due to being based more on math and basic theory. This would increase the chances that a small team could create a program that grabs a big fraction of world income, and keeps that advantage for an important length of time.
Presumably the basis for this claim is that some people think they see a different distribution among some subset of AI software, perhaps including machine learning software. I don’t see it yet, but the obvious way for them to convince skeptics like me is to create and analyze a formal dataset of software projects and innovations. Show us a significantly-deviating subset of AI programs with more economic scope, generality, and lumpiness in gains. Statistics from such an analysis could let us numerically estimate the chances of a single small team encompassing a big fraction of AGI software power and value.
That is, we might estimate the chances of local foom. Which I’ve said isn’t zero; I’ve instead just suggested that foom has gained too much attention relative to its importance.
Imagine a not-beloved grade school teacher who seemed emotionally weak to his students, and was fastidious about where exactly everything was on his desk and in his classroom. If the students moved things around when the teacher wasn’t looking, this teacher would seem disrupted and give long boring lectures against such behavior. This sort of reaction might well encourage students to move things, just to get a rise out of the teacher.
Imagine a daughter who felt overly controlled and under considered by clueless parents, and who was attracted to and tempted to get involved with a particular “bad boy.” Imagine that these parents seemed visibly disturbed by this, and went out of their way to lecture her often about why bad boys are a bad idea, though never actually telling her anything she didn’t think she already knew. In such a case, this daughter might well be more tempted to date this bad boy, just to bother her parents.
Today a big chunk of the U.S. electorate feels neglected by a political establishment that they don’t especially respect, and is tempted to favor political bad boy Donald Trump. The main response of our many establishments, especially over the last few weeks, has been to constantly lecture everyone about how bad an idea this would be. Most of this lecturing, however, doesn’t seem to tell Trump supporters anything they don’t think they already know, and little of it acknowledges reasonable complaints regarding establishment neglect and incompetence.
By analogy with these other cases, the obvious conclusion is that all this tone-deaf sanctimonious lecturing will not actually help reduce interest in Trump, and may instead increase it. But surely an awful lot of our establishments must be smart enough to have figured this out. Yet the tsunami of lectures continues. Why?
A simple interpretation in all of these cases is that people typically care more about making sure they are seen to take a particular moral stance than they care about the net effect of their lectures on behavior. The teacher with misbehaving students cares more about showing everyone he has a valid complaint than he does about reducing misbehavior. The parents of a daughter dating a bad boy care more about showing they took the correct moral stance than they do about whether she actually dates him. And members of the political establishment today care more about making it clear that they oppose Trump than they do about actually preventing him from becoming president.