Think of a verse from a holy book. It doesn’t matter which - whichever you’re most familiar with will do.
If it contained at least one layer of recursion (“that lived in the house that Jack built”) you’re probably in the minority, and if there’s a logic operator (“IF-THEN”), then you definitely are.
The reason behind this may seem obvious - people are more likely to place their faith in a god who makes bold assertions than one who adds caveats. The same thing can be seen in advertising and political slogans: a good statement is one that presents a fact rather than inviting the hearer to question its logic. The Chinese Communist Party are past masters of this, and the red propaganda banners that are ubiquitous throughout China often contain noun-phrases of epic length with little or no actual recursion and certainly no logic operators.
This is not the only explanation, however. There is something else going on here: a deeper and far more unsettling reflection of the ways in which humans process and store information.
The Tao of Machine Learning
Recently the Daoist Methodologies Discord server, and in particular remember#5329 (without whom this article would never have been written) has been discussing the use of large language models (linguistic AIs such GPT-3) to generate plausible imitations of ancient philosophical texts, inspired by the enchanting and often genuinely profound @gods_txt bot on Twitter.
While @gods_txt’s tweets are produced by GPT-2 models fine-tuned using religious texts, remember#5329’s plan was to train a GPT-J model using English translations of Confucian, Daoist and legalist texts, such that it would produce a snippet imitating each school upon command. The open source GPT-J model having more parameters than GPT-2, it should, in theory, have produced more convincing results. This was partially true.
The Confucian model was the first to complete training and start producing texts that were recognisably Confucian in tone and ethos. The Daoist model took a little longer, but managed it in the end. The legalist model, however, never really made it. It wasn’t so much that it was bad at replicating legalist texts (though it wasn’t great), but that it seemed to be doing everything it could to avoid making the attempt. Given a legalist training set and asked to produce something similar, it would imitate quotations from other Confucian texts cited within the corpus, or produce boilerplate physical descriptions, or even mimic the footnotes. If it could avoid trying to copy the long chains of necessary-and-sufficient dialectic propositions that make up the main body of most legalist texts, it would do so. A typical output would be something like this:
He took the seat upon the throne facing south and looked down upon his people, then spoke to them in such a way as to make them understand what was essential. Then there were those who, by paying attention to their own affairs and paying little or no attention to the state of the world, were able to make their states prosperous and their people well provided for.33 Such were men who were good at handling affairs.34 33. This refers to the same practice as in A 32 above and the following lines. 34. The text here follows Xunzi (2:45) and Zhuangzi (1:9) in taking 'those who pay attention to their own affairs',in their treatment of affairs, as a reference to the Confucian gentleman. 5.7 "In order to make your states flourish, there must be people who are good at handling affairs." Now, it was not that all states were the same and that all men were the same. Some states did not flourish, while others did.
It opens with uncontroversial statements of fact before segueing into an apparent imitation of an academic footnote. The last section replicates the vocabulary used in classical legalist texts but without any of the reasoning. (You can find a larger selection of sample legalist outputs here. Some Confucian and Daoist results can be found here and here.)
While the experiment is far from complete, the most probable explanation for this seems to be that GPT-2 has not merely learnt to imitate natural language, but to identify and avoid sentence structures that are likely to cause it to perform poorly, and notably those with extensive and complex recursion and frequent use of logic operators - i.e. the kind of sentences that make up that vast majority of most legalist texts.
“You shall know a word by the company it keeps”
In fact, this is a common tendency in large language models, though it is much less obvious when not working with a training set that is composed largely of the the kinds of sentences described above. Consider this prompt and its corresponding GPT-3 output:
Write a story about Scheherazade telling the Sultan a story about Sinbad the Sailor.
Once upon a time, Scheherazade told the Sultan a story about Sinbad the Sailor. Sinbad was a brave and bold sailor who went on many adventures. He sailed to faraway lands and saw many wonders. One day, he sailed to a land called the Land of the Giants. The giants were so big that they could pick up Sinbad's ship and throw it into the sea. Sinbad and his men were terrified. They thought they would die for sure. But Sinbad was a resourceful man. He came up with a plan. He had his men build a giant raft. They tied the raft to the ship. When the giant came, he picked up the ship and threw it into the sea. But the raft held fast. Sinbad and his men were safe. They sailed away and lived to tell the tale.
Dubious hydrodynamics aside, it’s an impressive technological feat, but how recursive is it?
As a human used to parsing vague or mangled data on a regular basis, there is every chance that you read the first two sentences recursively. The first one tells us that we’re going to hear a story about Sinbad and the second begins that story. However, there is nothing in the text to connect the two. The reader’s brain fills in the gap and concludes, naturally, that sentence two is the story that was promised in sentence one. It isn’t a complete logical disconnect: the model has correctly worked out that the introduction should refer to Scheherazade and the Sultan while the rest of the text focuses on Sinbad, it just hasn’t added in any of the verbal chaff a human would use to emphasise this. Similarly, after the giants are introduced we get - as one would expect - a description of giants, but still without any explicit verbal explanation of the link between the two. Most readers will fill this in themselves out of habit, but why is the AI so reluctant to do the job itself? The likely explanation for this is that adding in logic operators and recursion disproportionally increases the chances that the model will make a mistake.
Look at the following string of sentences: “He came up with a plan. He had his men build a giant raft. They tied the raft to the ship.” A human (or at least a human who had never encountered a boat before) would be much more likely to say “He came up with a plan to have his men build a giant raft and tie it to the ship.” However, this requires an ability on the part of the speaker to keep multiple ideas in his head at the same time, something that large language models cannot do: they build their sentences by calculating the most likely next word(s) based upon examples encountered in training. Being simple probability engines, they have no concept of concepts. When producing a sentence, such models rely upon a map of word “embeddings” (the contexts in which they have previously encountered a given word) to decide which word to use next. Thus, if a model has begun a sentence with “mix butter and sugar before adding the”, then “flour” is a more likely next word than “brown recluse spiders”.
By contrast, the connecting words used to express logic and recursion - “but”, “hence”, “then” etc. - are often used in identical contexts despite having the power to entirely reverse the sense of a given statement. “Frank was in the pub so I didn’t hang around” and “Frank was in the pub but I didn’t hang around” are equally plausible, but say entirely different things about your relationship with Frank. The model is incapable of understanding that there exists a logical relationship between the first and second halves of the sentence that it is building - it has no concept of concepts - so it is unable to do much better than make a random guess when it comes to working out whether “so” or “but” would be more appropriate. It does, however, know that if it uses any conjunction at all rather than simply saying “Frank was in the pub; I didn’t hang around” it is far more likely to be judged to have failed. Thus it eschews connectors altogether. Returning to the original example, the model - being unable to conceptualise the relationship between the plan, the raft and the ship - has learnt that the most successful approach is simply to state that these things exist and allow the reader to supply the connections, rather than taking a guess and getting it wrong.
Human-Level Intelligence
And here’s where it gets interesting, because humans have no innate capacity for recursive speech either, and not all humans do it. Most famously, the Pirahã language of Brazil seems to contain no recursive structures, and stories told in it show a similar rhythm to the one given above[1]. While some facets of language can be traced back to particular areas of the brain, recursion is a learned skill. If you don’t pick it up in the first few years of your life you will never manage it. One current hypothesis concerning its emergence in human languages posits the intervention of a genetic mutation around 70,000 years ago that slowed down the brain development of children in one family, giving them enough time to develop a recursive “twin language” before their neuroplasticity faded with age. These children grew up and taught their private language to their own children, and so on down to us. Or some of us, at any rate.
Because recursive speech is an acquired characteristic, its development - at least among children raised in a recursive speech environment - seems to be strongly tied to intelligence. While few of us would struggle with “he came up with a plan to have his men build a giant raft and tie it to the ship”, experiments show that those with lower IQs tend to have difficulty with more complex forms of recursion, such as stories involving one person speculating on how another might think in a given situation. Add in second-order effects and even those with relatively high IQs will begin to struggle: political science students all memorise the features of the checks and balances system - a mechanism whose effects are produced by the interaction of multiple individuals’ responses to one another’s actions - but a quick skim of their essays soon reveals that only a small proportion ever genuinely understand it. When asked to write on the topic, they simply string together vague approximations of sentences that they have encountered in their reading, in much the same way as GPT-3 produced the Sinbad story above: they are simulating cognition rather than actually doing it. The reason that GPT-3 generated essays have become so prevalent in higher education is precisely because students were using much the same techniques to generate their own content, and the differences between the outputs are thus minimal.
Conversely, the absence of recursion and logic operators in advertising and propaganda slogans suggests that it is possible for strategic actors to deactivate these and related psychological functions. The utility of “sparing-no-effort-in-composing-a-new-chapter-of-comprehensively-building-a-socialist-modern-Beijing-under-the-guidance-of-Xi-Jinping-thought-on-socialism-with-Chinese-characteristics-for-a-new-era” does not lie in its capacity to encourage action, but in the impossibility of argument. Because it is not a true sentence there is no way to dispute it. Recursive elements would open it up to question, and logic operators would be the kiss of death: by that point the reader’s proof-checking circuits have been engaged and you have lost them. Give them a conceptually bewildering but linguistically simple statement, however, the vast majority of readers will not even register it as an invitation to interact, with most of the remainder being too lazy to attempt to extract any sense from it. A significant proportion of the people who read the Sinbad story above probably did not notice that it is not an accurate description of how either rafts or boats work. (Be honest: did you?) Because it does not demand that the reader’s brain engage in an unnatural activity - multithreading - it flies under the their intellectual radar, and because multithreading is a learned skill, one may assume that it declines with lack of use. The less it is triggered, the less likely it is to be applied in the future: something that will make the jobs of both advertising agency reps and the Communist Party much easier[2].
All of which is not to say that the success of Confucianism and the world’s other major faiths and philosophies is due to the fact that they appeal principally to low-IQ individuals - anyone who has ever read a Ming dynasty commentary on the Analects would immediately dispute that. However, it does suggest that that the popularity and the longevity of the core Confucian texts are likely connected. This can be attributed to two related and essentially mathematical principles. Firstly, they can be stored and reproduced by the brains of a comparatively high proportion of the population. The more back-ups a data element has, the longer its half-life. Secondly, because even readers capable of parsing the most highly recursive legalist texts will nevertheless tend to find Confucian material more immediately satisfying, calling as it does upon innate functions in preference to learned ones. In simple terms, Confucianism is easy to like. Even if you know a hundred linguistically complex Bible quotes, chances are that when challenged to cite one you chose one of the simpler sentences, just because those are the ones that the brain finds easiest and most enjoyable to store and reproduce.
Not that the legalist authors would necessarily have been bothered by this - indeed, their reliance upon sentence structures that defy the comprehension of a large section of society may well have been a deliberate strategy on their part. While Confucianism requires mass adherence to function, legalist policies tend to work better when as few people as possible know that they are being applied. The works that have come down to us contain descriptions of sophisticated cryptographic systems intended to ensure that information is transferred accurately on a strictly need-to-know basis, and there is no reason that their literary style should not be another of example of this.
[1] Daniel Everett gives multiple examples of this. For example, a story told by a Pirahã named Kaaboogí: “There the jaguar pounced on my dog, killing him. It happened with respect to me. There the jaguar killed the dog by pouncing on it. With respect to it, the jaguar pounced on the dog. I thought I saw it. Then I, thus the panther, pounced on my dog. Then I spoke. That this [is the work of] a panther. Then I spoke with respect to the panther. Here is where it went. I think I see [where it went]. Uh, I said. The jaguar then jumped up on the log. As for the dog, the panther pounced on it. The panther killed the dog by hitting it. Then when I had gunshot the jaguar it began to fall. To Kaapási I spoke. Throw a basket [to me]. Throw me a basket. [It is] to put the dog into. The cat is the same. It pounced on the dog. The panther pounced on the dog. Thus it caused him to be not. Put the jaguar into the same basket with the dog. Put it in with the dog, he caused the dog to be not. He has therefore already [died]. You have the jaguar parts in the basket. Put the basket on your head. The dog then at night smelled him for sure then. It is right on top of the dog. It pounced on the dog and killed him. It wanted to pounce on the dog. It really wanted to. Then I was talking, then Kaapási he, animal, he… Don’t shoot from far away. Be shooting down on it. I moved quickly down toward the action onto the trunk, [I] killed it, thus it changed [died]. It was dying. I wasn’t able to leave therefore. OK, then, it thus came to die. Then it was coming to die. Then Kaapási, OK, he shot it. Then the animal thus changed and was dying. The animal stood up. It went away again. Its dying was lingering. I therefore shot it again, breaking its elbow. Then I shot it again. I then shot it again then. It came to die. It came to die. It had thick fir [a Pirahã way of saying that it was tough]. It intended thus to die. He did not move. It is really tough. He had not died. [I said] “That foreigner, you [Dan], the foreigner, have not seen a jaguar dead. Then right away, [I] moved it, right then. Then cats, Xisaitaógi [Steve Sheldon'] has already seen. Here jaguars [he has seen], only panthers Steve Sheldon has not yet seen. Now the Pirahãs have just now shot [a jaguar], right now. Then the Pirahãs are intensely afraid of panthers. OK, I’m done.” It is important to note that this should not be read as indicating low intelligence on the part of the Pirahã - as the story shows, they are perfectly capable of making logical deductions and dealing with the consequences of these. (Could you deal so competently with a jaguar attacking your dog?) However, not having been raised in proximity to recursive speech during the crucial few years of childhood during which it can be acquired, it remains equally inaccessible to even the most intelligent Pirahã. Had you been raised in a Pirahã-speaking environment you would currently be in the same situation, though with benefit of English you have already made it to the end of this article. Children raised speaking Pirahã and another language, on the other hand, appear to have no trouble dealing with recursion when speaking their other language.
[2] As in so many other fields the Pirahã seem to offer a challenge to this, being perfectly capable of exercising logical thought. It seems likely that seeing an IF-THEN operator is only one of multiple triggers for active reasoning/questioning. A jaguar pouncing on one’s dog likely performs the same function as well or better.
I like levels-of-recursion as a shortcut for cognitive complexity. Having spent the greater part of my working life as a corporate lawyer, the most draining work per minute of time was complex drafting, precisely because it requires the draftsman to fix multiple concepts in their mind simultaneously and hold them there until the paragraph is finished. I’ve found programming similar in terms of butting up against my own cognitive limits.
GK Chesterton’s essay “on the novel with a purpose” comes to mind here.