Philosophy Vs. Duck Tests

Jul 21, 2017

Philosophers, and intellectuals more broadly, love to point out how things might be more complex than they seem. They identify more and subtler distinctions, suggest more complex dependencies, and warn against relying on “shallow” advisors less “deep” than they. Subtly and complexity is basically what they have to sell.

I’ve often heard people resist such sales pressure by saying things like “if it looks like a duck, walks like a duck, and quacks like a duck, it’s a duck.” Instead of using complex analysis and concepts to infer and apply deep structures, they prefer to such use a “duck test” and judge by adding up many weak surface clues. When a deep analysis disagrees with a shallow appearance, they usually prefer to go shallow.

Interestingly, this whole duck example came from philosophers trying to warn against judging from surface appearances:

In 1738 a French automaton maker fooled the world into thinking he’d replicated life, and accidentally created a flippant philosophical conundrum we are still using. You’ve heard the phrase “if it looks like a duck, walks like a duck and quacks like a duck, then it’s a duck” haven’t you? .. People were saying this .. in the 18th century about a certain mechanical duck. And they were being very serious.

That mechanical duck was built to astound audiences, by quacking, moving it’s head to eat some grain which the mechanical marvel seemingly digested and then after a short time, the machine would round things off by plopping out a dollop of, what has been described as, foul smelling sh*t. ..The “looks like a duck” phrase (or Duck Test as some call it) is now thought of as a mildly amusing philosophical argument but back in the 18th century would certainly have been more akin to the way the Turing Test challenges artificial intelligence systems to fool the assessor into believing the system is a real human and not a computer. (more)

Philosophers had lectured saying, “See, you wouldn’t want to be fooled by surface appearances to call this automaton a duck would you?” But then others defiantly embraced this example, saying, “We plan to do exactly that; if it appears in enough ways to be a duck, that’s good enough for us.”

The philosophy of mind topics, such as classifying and judging minds, are topics where many intellectuals offer deep analysis. Imagine you have a wide range of creatures and objects that have various similarities to creatures. For each one you want to estimate many capacities. Does it have a distinct personality? Can it plan, remember, communicate, think, desire, exercise self-control, or judge right and wrong? Does it get embarrassed, proud, or enraged? Can it can guess how others feel? Does it feel fear, hunger, or joy, pain, pleasure? Is it conscious? You also want to judge: if you had two such characters, which one you would more try to make happy, save from destruction, avoid harming, or punish for causing a death. And which is more likely to have a soul?

This is a huge range of topics, on which learned intellectuals have written many thousands of books and articles, arguing for the relevance of a great many distinctions, to be taken into account in many subtle ways. But if ordinary people use simple-minded duck tests on such topics, they’d tend to judge each one by simply adding up many weak clues. And if people were especially simple-minded, they might even judge them all using roughly same set of weak clues. Even though some of the above capacities (e.g., plan, remember) are ones that many machines have today, if weakly, while other capacities (e.g. conscious, soul), are especially abstract and contentious.

Amazingly, as I posted a few days ago, this extreme scenario looks pretty close to the truth! At least as a first approximation. When 2400 people compared 13 diverse characters on the above 18 capacities and 6 judgements, one factor explained 88% of the total variance in capacities, while a second factor explained 8%, leaving only 4% unexplained. The study found some weak correlations with political and religious attitudes, but otherwise its main result is that survey responses on these many mind topics are mostly made using the same simple duck test (plus noise of course).

Now this study is hardly the last word. I’d love to see a survey with even more characters, and the judgements should be included in the factor analysis. And we also know that people are capable of “dehumanization”, i.e., using motivated reasoning to give lower scores to humans when they want to avoid blame for mistreatment.

But if these basic results continue to hold, they have big implications for how most people will treat various possible future creatures, including aliens, chimeras, alters, robots, AI, and ems. We don’t need a subtle analysis to predict how people will treat such things. We need only predict a wide range of apparent capacities for such creatures, and perhaps also a degree of motivated reasoning. The more such capacities creatures have, and at higher levels, and the weaker the motivated reasoning, then the higher people will rate them. And when people are motivated to rate creatures lower, they will do this via rating them lower on many capacities at once, as slave-owners have often done with slaves.

If you believe that such ratings will often be influenced by whether creatures are made out silicon or biochemicals, or whether they are natural or artificial, then you either have to believe that such factors will only work indirectly via a broad influence over all of these capacities and judgements together, or that the factor analysis of a bigger survey will find big factors associated with such things. I’ve offered to bet that a new bigger survey will not find such big factors.

Bryan Caplan says that he disagrees about how future ems will be treated, but calls survey factor analyses irrelevant, and so won’t bet on them. He is instead very impressed that subjects gave a low rating on the main factor to a character called “robot” in the paper, and described this way to survey participants:

Kismet is part of a new class of “sociable” robots that can engage people in natural interaction. To do this, Kismet perceives a variety of natural social signals from sound and sight, and delivers his own signals back to the human partner through gaze direction, facial expression, body posture, and vocal babbles.

Apparently Bryan is confident that ems and all future artificial creatures will be rated as lowly as this character, so he offers to bet on how a survey will rank an “em” character. Alas it is unlikely that the next few surveys would include such a character, in part because it is a pretty unfamiliar concept for most people.

I just don’t see “robot” as useful category here, such that we should expect most everything given this label to rate the same. That seems to me like expecting all bipedal creatures to rate low if a bipedal barbie doll rates low.

The survey above suggests instead is that what matters is how creatures are rated on many specific capacities. I expect that most people correctly estimated, from their experience with many other machines they’ve seen and heard of, that Kismet is in fact pretty bad at most of the listed capacities. In contrast, when most people are presented with fictional “robots” that are presented as being quite good at many of these capacities, such people consistently rate those “robots” relatively high on most other capacities and judgments. I’d bet a survey will also show that if such characters were included.

Because while people are often impressed with intellectuals’ subtle analysis, they still usually judge creature mental capacities via a simple duck test. If it quacks like a mind, its a mind.

Overcoming Bias

Philosophy Vs. Duck Tests