In Part I, I began to frame the discussion around the capabilities of LLMs, and around language in general, in terms of Chad Hansen’s thesis that, according to classical Chinese philosophy, the primary role of language is prescriptive (so it can be used to coordinate and regulate behavior), rather than descriptive (so it can be used to represent or picture facts and reality). This is the difference between pragmatic and semantic notions of truth. According to the Chinese view (which Hansen terms “daoist” to indicate the core concept of dao or way, a potential pattern of behavior or discourse), the chief criterion is the empirical success of actions and behaviors regulated by a linguistically expressed dao, rather than the propositional truth of sentences in that language. In this context, Hansen distinguishes the discourse dao from the performance dao. The former is encoded in language, the latter is realized through acts. These include speech acts, opening up a way for self-referentiality and various paradoxes associated with it1. For Hansen, classical Chinese philosophy of language has two main concerns:
Is the discourse dao a reliable guide for performance dao?
How can performance dao be aligned with discourse dao?
According to what Hansen calls the “computer analogy,” we use language to program others and are in turn programmed by them. This is how language (a World 3 artifact, in Popper’s interactionist philosophy) makes contact with, and has causal efficacy in, World 1 of material entities. Here’s how Hansen frames it:
Society uses language to guide our behavior. Elders teach us to conform to conventional ways of making distinctions among thing kinds in choosing and rejecting courses of action. We do not learn language in isolation from other ritual practices. Language guides behavior because learning the community's language induces us to adopt a socially shared way of reacting differentially to the world. Chinese thinkers explain this function of language in terms of the scope structure that dominated their attention.
There are two main currents of thought regarding this: the pragmatic realist one (underlying both the baseline Confucianism, with its emphasis on proper rites and the well-ordered social hierarchy, and various reactions to it, such as Mozi’s utilitarianism, Mencius’ innatism, or the neo-Mohist school of names) and the skeptical one (daoism proper, associated with Laozi and Zhuangzi). I will leave the daoist skeptics aside for Part III, here I will focus on the realists.
The Western-Chinese axis as the symbolist-connectionist axis
Hansen’s computer metaphor was inspired by the cognitivist turn in philosophy of mind, associated with people like Daniel Dennett or Paul Churchland. The cognitive science revolution has rekindled the debate between the adherents of the symbolic paradigm of Good Old-Fashioned AI and the followers of the connectionist view originally promoted by Frank Rosenblatt and later instantiated in the theory and practice of neural net learning. The symbolic view emphasized sequential information processing and logical reasoning, whereas the connectionist view emphasized parallel distributed processing, without any explicit symbolic intermediate representations. Hansen comes down on the connectionist side as he rejects the view of programs as arguments-on-paper and instead goes for the input-output formulation.
From this vantage point, the GOFAI symbolic approach is tied to logical reasoning and problem solving, as laid out in the works of Herbert Simon and Allen Newell: The agent’s environment poses problems, which are reflected as external experience. Experience, in the form of sense data, is converted into internal symbolic representations which are then manipulated according to computational rules of inference. David Marr expresses this sentiment forcefully in the final chapter of his Vision:2
[Retinal image] is a continuous two-dimensional array with few points of manifest interest. Yet by the time we talk about people or cars or fields or trees, we are clearly being very symbolic, and I think again that most would find suggestions of symbols in Hubel and Wiesel's (1962) recordings. Our view is that vision goes symbolic almost immediately, right at the level of zero-crossings,and the beauty of this is that the transition from the analogue arraylike representation to the discrete, oriented, sloped zero-crossing segments is probably accomplished without loss of information.
This is the internalist-symbolic take on cognition, tied to Fodor’s mentalese. By contrast, from the connectionist viewpoint, the external experience is an input that triggers the execution of a program implemented in a parallel distributed fashion by a deep neural net. This is how Hansen phrases it:
Western thought, in effect, treats experience as the programming step: experience generates the inner language of ideas with which we calculate. Chinese thought treats the programming as the social process of reading in guiding discourse. Experience then merely triggers execution of the program. The computer has program control tied to the external state of affairs. The senses provide discrimination—branching input to trigger different parts of the socialized program.
Hansen does not mention Hayek, but this is the Hayekian sensory order in a nutshell, a proto-connectionist framework emphasizing the role of the nervous system (or, in daoist terms, xin, the heart-mind) as an instrument of sophisticated classification. The classifications are made in a parallel, distributed fashion, involving both pattern recognition and pattern prediction. As we will see next, the classical Chinese emphasis on names (ming), distinctions (bian), and this/not-this (shi-fei) system of positive and negative reinforcement fits these ideas quite well.
Rectification of names, rectification of tokens?
Hansen frames his analysis of Chinese social-pragmatic philosophy of language in terms of the basic notion of ming, names. The dao of the guiding discourse consists of these names, which closely correspond to the concept of tokens in LLMs. Viewed through this lens, rectification of names, zhengming, refers to the process of aligning the tokens in the discourse dao with the correct tokens in the performance dao—in other words, it’s what we know as RLHF. Hansen puts it thus:
Executing a dao-program requires that we correctly register the external conditions and sensitively adjust our responses. At the base of that requirement is this: we should know the boundary conditions for applying each name used in the guiding discourse. This explains the role of rectifying names and its importance in the Confucian model. Confucius does not use definitions. His concern would be with correct behavioral response, not with cognitive content or meaning. Having control of the word amounts to triggering the right procedure in response to external conditions. Besides, giving a definition would merely duplicate the interface problem. More words or internal programming can help only if the programmer has already properly adjusted our de (virtuosity) to the external conditions for application of those words. A definition can only help if we correctly apply the words in the definition, so it cannot be the general solution to the problem of adjusting behavior. The basic solution is the equivalent of debugging. Run the program in real time and have the teacher (programmer) correct errors.
Linguistic input, arranged as a string of tokens, induces distinctions (Hayekian classifications or discriminations, which can be arbitrarily complex), which ideally should result in acts and behaviors deemed shi as opposed to acts and behaviors deemed fei. The often-expressed view is that LLMs serve as an empirical, constructive proof that intelligent behavior is all about “predicting the next token.” But, if we apply the more sophisticated view based on aligning the discourse dao with the performance dao, it’s not really about predicting the next token, but rather about performing appropriate actions in appropriately categorized situations. You need a good tokenizer (the generator of ming) and a good policy for interacting with your social milieu via the interface of tokens.
The Mohist critique (reward is not quite enough)
Mozi’s critical response to the Confucian tradition was centered on the issue of reliability and constancy. If the goal of rectifying names is to promote the making of appropriate distinctions which would lead to appropriate responses and actions, how do we know that the discourse dao sets the right standard? Mozi’s philosophy is utilitarian, although his utilitarianism is neither hedonistic nor subjective. His evaluation criteria are framed not in terms of happiness or pleasure or satisfaction of desire, but in terms of quantifiable material standards. If we stick with our reinforcement learning analogy, then Mozi’s concern is that reward (the shi-fei signal) is not enough, it needs to be tied to a constant standard. To Mozi, with his background as a craftsman and an engineer, accuracy is just such a constant standard. An action taken in response to a language-driven distinction is reliable if it is reliably accurate in some sense. Low perplexity in LLMs, both at training time and during inference and execution, is one such measure of accuracy. This was elaborated by the philosophers of the neo-Mohist school of names, who constructed a pragmatic semantics of knowledge by framing knowledge as skill (Gilbert Ryle’s “knowing-how”) rather than the Socratic notion of justified true belief. Moreover, this skill must be matched by a reliable disposition to take successful action, such that the operative distinctions are reliably projectable to previously unseen but similar situations. As I will detail in Part III, Laozi and Zhuangzi have been (rightly) skeptical about constancy and reliability of language informed by fluid and ambiguous social conventions.
Summing up (so far)
The main point made by Hansen is that classical Chinese philosophy of language provides a convincing demonstration that we develop a rich theory of interaction of language and action without relying on propositional reasoning and serial information processing. In other words, we can bridge Popper’s Worlds 1 and 3 without going through World 2 of symbol-like mental states. The classical Chinese view of language (and, as it turns out, modern connectionist paradigm in AI as well) interprets strings of names (tokens) as the external interface guiding program execution, where program is understood not propositionally as a proof or a syllogism, but as a parallel and distributed mechanism for pattern recognition, pattern prediction, and action. The way I see it, framing the capabilities of LLMs in terms of reasoning, internal representations, and mental states only complicates matters. Instead, inspired by Hansen’s take on daoist “anti-language” skeptics like Laozi and Zhuangzi and by Manuel DeLanda’s materialist phenomenology, I propose a semiotic reframing of LLMs using tokens derived from natural signs (i.e., icons and indices) as opposed to conventional signs (i.e., symbols). This will be the subject of Part III.
(to be continued)
However, as Hansen hastens to emphasize, the paradoxes of self-reference in Chinese philosophy are very different from the ones we are used to in the Western tradition (e.g., liar’s paradox). See, e.g., Gongsun Long’s White Horse Dialogue.
I would like to thank Lana Lazebnik for pointing out Marr’s quote to me.
So many cool connections to think about! But I'm a bit confused by this distinction between symbolism and connectionism (maybe in general, but also as you've described it):
> The classical Chinese view of language (and, as it turns out, modern connectionist paradigm in AI as well) interprets strings of names (tokens) as the external interface guiding program execution, where program is understood not propositionally as a proof or a syllogism, but as a parallel and distributed mechanism for pattern recognition, pattern prediction, and action.
Why doesn't the difference just amount to layer(s) of abstraction, i.e. are these two framings actually inconsistent? If such a program exists and will be executed, why can't we probe how it arrives at its outputs, and wouldn't doing that reveal some type of internal representation? Maybe your next post will answer these - I'm very curious to see what "natural signs" means, and why it's inconsistent with "framing the capabilities of LLMs in terms of reasoning, internal representations, and mental states".
Also, I think I am missing something - why the emphasis on parallel+distributed?