Wittgenstein and Searle's Chinese room
Written by Sil Hamilton in March 2022.

Those academics presently working on the philosophy of mind have spent the last few decades developing methods for refuting a well-cited paper by John Searle. In what is commonly dubbed the Chinese Room Argument, Searle argues that because computer programs are syntactic in nature (i.e. according to a λ-calculus), they cannot access the semantic meaning of any human input (say, a sentence written in some language). That is to say, they are not capable of understanding the meaning of this input. Please see the link for a more detailed description if you are unfamiliar with the argument. The gist is that computer programs cannot understand the fundamental meaning of a sentence written in some human language: they recognize the form, but not the content. The program might know both the grammar of the sentence and the dictionary meaning of each word, using these to produce a response to any question according to the rules of engagement laid out by a language; but none of these facts allow the computer to understanding the sentence. Searle believes computers cannot assign meaning to each symbol forming any sentence, meaning they are effectively blinded to everything save what their programming allows them to accomplish.

His argument implicitly depends on a larger distinction in linguistics regarding private and public languages. In 1957, Noam Chomsky published a book arguing humans are fundamentally privy to two classes of languages. He dubbed these the E-Language and I-Language; standing for extensional and intensional respectively. The E-Language is what we use to talk to each other with. It has a very particular set of grammar rules by which sentences are constructed; these rules helping to ensure consistency and reliability in communication exchanges. This blog post is an example of it. The I-Language is a deeper structure present within our mind, hidden deep away from our consciousness. It is a mental encoding of language, that deeper layer where semantics are decompressed. Chomsky believes the I-Language is the source of the E-Language, the interface by which it is generated and interpreted. It generates grammatical sentences from some private set of intuitions regarding how E-Language should work, and vice-versa. The classic example for evoking the difference between the two is: "colourless green ideas sleep furiously." This sentence is clearly semantically nonsense, but it is grammatically sound. How do we know this? I-Language. Searle argues a computer would find no issue with the sentence. Ludwig Wittgenstein thinks otherwise.

"For we can avoid unfairness or vacuity in our assertions only by presenting the model as what it is, as an object of comparison – as a sort of yardstick; not as a preconception to which reality must correspond," (s131).

"Every sign by itself seems dead. What gives it life? – In use it lives. Is it there that it has living breath within it? – Or is the use its breath?" (s432).

"…One fails to keep in mind the fact that one calculates, operates, with words, and in due course transforms them into this or that picture. – It is as if one believed that a written order for a cow, which someone is to hand over to me, always had to be accompanied by a mental image of a cow if the order was not to lose its sense," (s449).

"'After he had said this, he left her as he did the day before.' – Do I understand this sentence? Do I understand it just as I would if I heard it in the course of a report? If it stood alone, I'd say I don't know what it's about. But all the same, I'd know how this sentence might perhaps be used; I could even invent a context for it," (s151).

Wittgenstein's position is clear: a particular instance of E-Language is given meaning by the reader, not the speaker. Thanks to the principles of qualia, we cannot know the mind-state of another (although we may develop very good guesses). We are fundamentally alienated from each other. It is our I-Language that provides the semantics, the context, the interpretation of a sentence. If our I-Language can provide this, then we can say we understand the sentence; the (so-called) true intention of the sentence notwithstanding. How do we verify this? We play the relevant language-game. In practice, the object of the game is to predict the behaviours of others. If we wish to persuade or coerce another into performing a set of actions, we undertake a series of steps. We first judge which particular language we two agents share in common. We then generate a grammatical sentence (imperative or not) and communicate it. If the agent acts according to our expectations, then we can say we have imparted meaning onto the sentence (that is, the sentence is meaningful from our point of view). I argue the same is true for the computer program.

To return to Searle: his assumption was not proven wrong for many years. Programs deploying natural language processing functions were script-like in nature; the mid-1960s ELIZA being a famous example. They were developed with grammars in mind; a finite set of rewrite-rules for generating grammatical sentences. ELIZA could not handle situations it was not equipped to. This remained the case for nearly all chat bots until just four years ago with the advent of (seriously) large language models like BERT and GPT. Various advancements in deep learning techniques have allowed for the construction of language models capable of producing convincingly human text. They achieve this by dynamically selecting particular words in a given sentence as being most indicative of the semantic context of the sentence, They are capable of determining the semantic content of a sentence. They do not achieve this by any sort of preprogrammed rule-based interpretation strategy; but rather through a "black box" consisting of many matrix-transformation functions trained by reading vast amounts of human text drawn from the Internet. I argue models like GPT have intuited a I-Language by studying examples of human communication; learning which idiomatic structures are intended for which situations; learning how each word in the English language is related to every other word given some set of preceding words. This is unprecedented: many believed a context-free grammar a la Chomsky was required for goading computers into understanding written language. GPT has proven them wrong: nothing but pure examples appear required for learning the intimate underpinnings of a particular language. This is because words do not inherently have meaning: we produce the meaning ourselves as we go along, using our understanding of language as a source. It is as Wittgenstein says: "What gives it life? – In use it lives." To each their own.