You can think of knowing how to write as knowing how to correlate words. Given no words, what first word should you write. Then given one word, what second word best correlates with that. Then given two words, what third word best fits with those two. And so on. Thus your knowledge of how to write can be broken into what you know at these different correlation orders: one word, two words, three words, and so on. Each time you pick a new word you can combine knowledge at these different orders, by weighing all their different recommendations for your next word.
This correlation order approach can also be applied at different scales. For example, given some classification of your first sentence, what kind of second sentence should follow? Given a classification of your first chapter, what kind of second chapter should follow? Many other kinds of knowledge can be similarly broken down into correlation orders, at different scales. We can do this for music, paintings, interior decoration, computer programs, math theorems, and so on.
Given a huge database, such as of writings, it is easy to get good at very low orders; you can just use the correlation frequencies found in your dataset. After that, simple statistical models applied to this database can give you good estimates for correlations to use at somewhat higher orders. And if you have enough data (roughly ten million examples per category I’m told) then recently popular machine learning techniques can improve your estimates at a next set of higher orders.
There are some cases where this is enough; either you can get enormous amounts of data, or learning low order correlations well is enough to solve your problem. These cases include many games with well defined rules, many physical tasks where exact simulations are feasible, and some kinds of language translation. But there are still many other cases where this is far from enough to achieve human level proficiency. In these cases an important part of what we know can be described as very high order correlations produced by “deep” knowledge structures that aren’t well reduced to low order correlations.
After eighteen years of being a professor, I’ve graded many student essays. And while I usually try to teach a deep structure of concepts, what the median student actually learns seems to mostly be a set of low order correlations. They know what words to use, which words tend to go together, which combinations tend to have positive associations, and so on. But if you ask an exam question where the deep structure answer differs from answer you’d guess looking at low order correlations, most students usually give the wrong answer.
Simple correlations also seem sufficient to capture most polite conversation talk, such as the weather is nice, how is your mother’s illness, and damn that other political party. Simple correlations are also most of what I see in inspirational TED talks, and when public intellectuals and talk show guests pontificate on topics they really don’t understand, such as quantum mechanics, consciousness, postmodernism, or the need always for more regulation everywhere. After all, media entertainers don’t need to understand deep structures any better than do their audiences.
Let me call styles of talking (or music, etc.) that rely mostly on low order correlations “babbling”. Babbling isn’t meaningless, but to ignorant audiences it often appears to be based on a deeper understanding than is actually the case. When done well, babbling can be entertaining, comforting, titillating, or exciting. It just isn’t usually a good place to learn deep insight.
As we slowly get better at statistics and machine learning, our machines will slowly get better at babbling. The famous Eliza chatbot went surprisingly far using very low order correlations, and today chatbots best fool us into thinking they are human when they stick to babbling style conversations. So what does a world of better babblers look like?
First, machines will better mimic low quality student essays, so schools will have to try harder to keep such students from using artificial babblers.
Second, the better machines get at babbling, the more humans will try to distinguish themselves from machines via non-babbling conversational styles. So expect less use of simple easy-to-understand-and-predict speech in casual polite conversation, inspirational speeches, and public intellectual talk.
One option is to put a higher premium on talk that actually makes deep sense, in terms of deep concepts that experts understand. That would be nice for those of us who have always emphasized such things. But alas there are other options.
A second option is to put a higher premium on developing very distinctive styles of talking. This would be like how typical popular songs from two centuries ago could be sung and enjoyed by most anyone, compared to how popular music today is matched in great detail to the particular features of particular artists. Imagine most all future speakers having as distinct a personal talking style.
A third option is more indirect, ironic, and insider style talk, such as we tend to see on Twitter today. People using words and phrases and cultural references in ways that only folks very near in cultural space can clearly accept as within recent local fashion. Artificial babblers might not have enough data to track changing fashions in such narrow groups.
Bottom line: the more kinds of conversation styles that simple machines can manage, the more humans will try to avoid talking in those styles, a least when not talking to machines.
This post reminds me of something I read somewhere about psychopaths, that they don't understand the emotional content of normal human language but learn how to use language instrumentally to con people into doing what the psychopath wants--I think this was in Robert Hare's book "Without Conscience". In a sense the psychopath has a shallower understanding of language and in another sense a deeper understanding compared to a normal person. Similarly I wonder if machines could become surprisingly persuasive.
Yes and no. This is an area I’ve worked on - adding fluctuations to computer-performed music (specifically, note length and volume) to make it sound human-like. Pretty crude randomness of the right kinds are sufficient to fool the ear.