4. The Evolution of Mind and Language

4.1 Primate Social Intelligence
4.2 From Scripts to M-scripts
4.3 M-script Evolution
4.4 Domains of Regularity in Language
4.5 Greenberg Universals and the Head Parameter
4.6 Subject and Object
4.7 Competing Explanations of Regularity

The m-script theory gives a working computational model of language learning and use - a theory of `where we are now' in language, differing from others in important ways. For instance, in this theory a language is not defined by a few principles and parameters, and a child has no innate parameter setting mechanisms in his head.

No theory of where we are now is complete without a picture of how we got here - of the processes which led to m-script based language. These processes operate over two timescales:

  1. Processes of biological evolution, operating over millions of years, which gave our innate capacity to form scripts and m-scripts in our heads
  2. Processes of language evolution, operating over hundreds and thousands of years, which lead present-day languages to have the form they do.

By understanding these processes, we can see how the approximately regular, parametrised forms of many languages arose from historic change rather than evolution, so are not an innate endowment of our species. This has important implications for theories of language learning.

4.1 Primate Social Intelligence

I have proposed (Worden 1996) that the cognitive faculties underlying language evolved to support primate social intelligence. As this hypothesis can help us understand the nature and robustness of language learning, I summarise here the key arguments for it:

  1. Language is used for social purposes; therefore it is likely to have arisen from social intelligence.
  2. One facet of our social intelligence is a theory of mind, to represent what others may be thinking. A theory of mind is necessary for language.
  3. Language meanings have many properties in common with the internal representations of social situations, which are required for social intelligence.
  4. Social intelligence requires a fast, robust learning mechanism, which may then have been co-opted for language.
  5. Because of an evolutionary speed limit (Worden 1995) there can be at most 5 Kilobytes of new innate design information in the human brain created by evolution since our divergence from chimps. This is too little for the complete design of a language faculty; therefore language must be largely based on pre-existing faculties. Social intelligence is the best candidate.

We know from many observations that most primates have an acute social intelligence, not found in other land mammals (see e.g. Cheney and Seyfarth 1990; de Waal 1982; Tomasello & Call 1994). They recognise one another as individuals, know all about each others' kin and alliance relations, and can rapidly learn regularities about who will do what in what circumstances. To have this social intelligence, all primates need:

  1. To represent in their minds information about social situations past and present - facts about their peers and their social actions.
  2. To learn and represent the causal regularities whereby one social situation leads to another - regularities such as “if X screams and Y is X's mother, then Y will react”
  3. To combine a knowledge of the present with their knowledge of causal regularities to predict what may happen next, and so to choose actions which further their own ends of stronger alliances and increased rank.

So primates need a mental representation of social situations and causal regularities. In order to be effective, representations in the brain should match the properties of the things they represent (Marr 1982; Johnson-Laird 1983). The social representation in the primate brain should match the properties of social situations, which are:

  1. Structured: A social situation consists of a number of individuals with attributes (identity, sex, rank, mood...) and relationships or interactions (mother-of, grooming, threatening...). The structural way in which these are combined is important; it matters who is grooming whom.
  2. Complex and Open-ended: There may be several individuals in one incident, in a variety of relationships; and several incidents together may constitute a particular situation; the set of possible social situations is a very large set.
  3. Discrete-Valued: Many of the important variables which characterise social situations are discrete-valued (e.g. identity, sex, rank, kin and alliance relations)
  4. Extended in Space and Time: The incidents which make up a social situation may take place over several days or more, at different places
  5. Dependent on Sense Data of all Modalities : Important information about the social situation may come from vision, hearing, smell, movement or bodily feelings; the social representation must be connected in the brain to all these sense data.

This list bears a remarkable resemblance to the properties of scripts, and of language meanings. The meanings we can express in a script, or in a sentence, are structured, complex and open-ended, discrete-valued, extended in space and time, and may involve sense data of all modalities. This leads to the hypothesis that scripts evolved to support primate social intelligence, and were then co-opted for language; it applies not just to the script representation, but also to the learning and inference mechanisms.

I have built a computational model of primate social intelligence using scripts and compared it with diverse data (particularly from Cheney and Seyfarth, 1990) on primate social intelligence (Worden 1996). This model uses script intersection for learning, and unification for inference. It gives a simple account of many observations of primate social intelligence - such as the learning of alarm calls, habituation to false alarms, use of facts about rank and kinship, and attachment behaviour. These comparisons show that:

It seems likely that strong, sustained selection pressure for social intelligence has led to the near-optimal Bayesian form of social learning.

Therefore general primate social intelligence, as seen in vervet monkeys, is a likely evolutionary origin for the script representation and operations. However, monkeys' social intelligence can be modelled using scripts, not m-scripts; this does not yet account for the unbounded script function capability of m-scripts, which is also essential for language.

4.2 From Scripts to M-scripts

Why did the more powerful m-script facility (needed for language) evolve? Answers to this question are of necessity conjectural, and are not an integral part of the theory of language learning (it need only assume that an m-script capability evolved for some reason); but if there are plausible answers, they lend some support to the theory.

I assume that the m-script capability evolved well before language, and did not evolve just to support language. There are two possible selection pressures (needs for a more capable social intelligence) which may have driven the evolution from scripts to m-scripts, well before language. These are (a) the need for re-construal of social situations, and (b) the need to support a theory of mind.

A. Re-construal of social situations: Any social situation can be regarded from several different points of view - for instance, with different entities or agents `foregrounded' as initiators of change. One may construe the same situation (verbally) as Lucy cried or as Charlie made Lucy cry; in another example, as The green bottle broke or Charlie broke the green bottle. These different verbal construals correspond to different script meaning structures, which are different viewpoints on the same situation.

The more different construals one can make of the same situation, the more one is able to calculate interesting consequences, and so to do something about it. There is a selection pressure to make multiple construals. If there is any automatic way to create second and third construals, it will be selected for.

Given a script representation A of the situation Charlie broke the green bottle, there is a fairly automatic transformation to a script B for the second construal The green bottle broke. Script B is a script function of script A, and this function can be represented as an m-script (not as a simple script). To get from script A to script B, the whole subtree representing the green bottle (or any other breakable entity) must be moved from one place to another in a script tree; trump links are the feature of m-scripts which enables them to do just this.

Linguistically, such re-construal m-scripts are important in the analysis of alternating verb argument structure. One of these m-scripts, for the locative alternation, is shown in Figure 5.2 in section 5.7. However, the important point is that re-construal is an entirely non-linguistic operation; it is needed as a part of general primate social intelligence, independent of language. This may be a selection pressure that led to the evolution of m-scripts, before language.

B. The theory of mind: It appears that most primate species (such as monkeys) have no theory of mind. In predicting each others' social actions, they seem to act like behaviourists. A monkey seems to know social rules like `If I do A, then monkey X will do Y' (e.g. if I groom X, then he may help me in a later fight with Z); but they do not seem to know social rules of the form `If I do A, then X will know B, so he will do Y'. Monkeys can reason about how other monkeys will act, but not about what other monkeys perceive, know, want or plan. One monkey's mental states are opaque to another monkey.

It is unclear whether chimpanzees and other great apes have a theory of mind, but it is very clear that human beings do, and use it practically every moment of the day. We use it as a part of our social intelligence, to analyse what others will think about our actions; and it is an essential prerequisite for language. If we did not realise that other people know things and do not know things, there would be little point in talking to them.

This implies that (a) a working theory of mind has evolved at some stage in our ancestry (possibly since our divergence from chimps) as a facet of social intelligence, and (b) the theory of mind has a close link to language, if only as a pragmatic guide to what is worth saying.

I shall argue that the link to language is even closer; that the formal, computational extension of script-based social intelligence, required for a theory of mind, is just the introduction of m-scripts - and that this m-script capability was then co-opted as the computational basis of language.

What extensions to script-based social intelligence are needed for a theory of mind ?

It seems that monkeys require script trees of depth about 3-4 nodes, to represent scenes with other individuals, their actions and attributes. A theory of mind requires deeper script trees, to represent `X knows Y' - where the top of the script tree represents `X knows...' and the rest of the tree is Y - standing for any script which X may know. Script trees of depth about 8 can represent a basic `X knows Y', but greater depths would be needed to represent `X knows that Z knows that Y' and so on.

A more fundamental extension is also needed. To account for vervet monkey intelligence, rule scripts with simple variables, such as `?X', to represent `any individual', are sufficient. As explained in section 2, these rules act as simple, bounded script functions. To reason with a theory of mind, you need to represent not only facts such as `X knows Y', but also general theory-of-mind rules, such as `If X sees A, then he will know B' or `If I know rule R, then X also knows rule R'. Without these general theory-of-mind rules (and the ability to learn them), your theory-of-mind inferences will be very limited.

In rules such as `If X sees A, then he will know B' and `If I know rule R, then X knows rule R', the variables A, B and R stand not for individuals (which have only a finite number of possible identities), but for whole script subtrees, describing scenes or causal regularities. There is an unbounded set of these subtrees. So theory-of-mind rules need to be unbounded script functions, with unbounded variables in their arguments and results. Such variables can be represented by trump nodes, joined by trump links.

I suggest that the extension of social intelligence to contain a theory of mind required (and to some extent drove) the evolution of scripts to m-scripts. The script operations of unification and intersection were extended to m-unification (for applying unbounded rules) and m-intersection (for learning them). This extension (made possibly within the last 5 million years) was an evolutionary line of least resistance to give a working theory of mind. It was then a small step to co-opt these mechanisms to support language, as described in sections 2 and 3. The theory of mind supports not only the pragmatic `what to say now' aspects of language, but also the computational mechanisms by which we say, hear and learn it.

This evolutionary account is not unique or certain; but if it is true, it explains:

In this picture, although the script and m-script faculty for social intelligence may be autonomous (capable of learning and inference by its own mechanisms) it is not isolated, in the sense which seems to be implied by the phrase `the autonomy of syntax'. The social representation of scripts and m-scripts is richly interlinked to other representations in the brain, such as mental images and body schemata. It has to be, or it would serve no biological purpose.

In this theory, syntax is embodied in m-scripts, and m-scripts are richly connected to other non-symbolic meaning representations. Aspects of language (such as metaphor) which depend on other meaning representations are not excluded from this theory.

The rich mental faculties which underpin language evolved in response to strong, sustained selection pressure over 20 million years - not in a rapid burst of recent evolution (which could only produce 5 Kbytes of new design) or a freak mutation (which could produce even less). If this economical account can fit the facts of language, it is to be preferred.

4.3 M-script Evolution

The m-script theory gives a robust and flexible model of language. Individual word m-scripts can compose together the meanings of their arguments in powerful ways, to build complex meanings in several stages; and each word m-script can propagate robustly and stably through a population of speakers, by a learning mechanism which evolved to learn arbitrary symbolic regularities (of primate social life) from noisy, fragmentary data.

The theory is fully lexicalised. There are no separate phrase structure rules; each word effectively carries around its own phrase structure rule in its m-script, so the set of phrase structure rules could be completely irregular and word-specific.

The partial regularity of languages is a well-established fact, and has been at the base of many theories of language. The m-script theory should give some account of that partial regularity - for instance, the fact that languages seem to have a few core phrase structure rules, or a few core parameters. If, as far as the innate language faculty is concerned, languages do not need to be regular, why do they turn out to be nearly so ?

The answer lies in the process of language change, which can be envisaged (by analogy with biological evolution) as a process of evolution of word m-scripts.

The m-script for each word is a small information structure, with typically 10-100 bits of information. By use and learning, these structures propagate through a speaking population, like parasites in the human brain; they are an example of Dawkins' (1976) memes. By the fundamental theorem of language learning (section 3.12) they propagate stably, without basically changing their form from one generation to the next. They can be regarded as a simple parasitic life form, subject to variation, selection and evolution. It is this process of m-script evolution which leads to the quasi-regular structure of languages we see today.

4.4 Domains of Regularity in Language

If language structure results from selective pressures on individual word m-scripts, why should those changes lead to partial regularity ?

Language is used by people to communicate. To do this effectively, it should be expressive, robust, economical and learnable. If any word m-script tends to make the language more expressive, robust, economical or learnable, that word will tend to be favoured by speakers, used more frequently, and therefore learnt more by listeners. This is the basic selective force which leads the populations of different word m-scripts to wax and wane - which leads to language change.

The fitness of any word m-script depends not just on that m-script alone, but on how well it works together with other word m-scripts. M-scripts tend to hunt in packs, and to hunt best in regular packs; that is the selective force which builds the partial regularity of language.

As a first example, consider case-marked versus unmarked languages. Every language needs devices to distinguish between the two arguments of a simple transitive verb - to distinguish Fred hits Joe from Joe hits Fred. Languages have two main ways to do this:

In both cases, verb m-scripts and noun m-scripts should match up: either the verbs have time-order arrows and the noun meanings have no semantic role slots, or the verbs have no time-order arrows and the nouns have semantic role slots. A mismatch leads either to redundant meaning in the nouns, or to rampant ambiguity.

This means that if a few prominent verbs in a language go one way (eg require case markings), then there is a strong selection pressure on noun m-scripts to conform to that need; so all the noun m-scripts will tend over time to `line up' with those verbs, having the required case markings. This in turn puts the same selection pressure back on all the verbs of the language - they will tend to use case markings rather than word order. So over time, small incremental changes tend to line up all verbs and nouns with the argument matching system of the dominant verbs.

This interaction back and forth between the nouns and the verbs of a language provides a 'weak force' which tends to 'align' all the nouns and verbs in the same 'direction'. This force is strongest for nouns which are semantically similar, since they tend to be the most interchangeable (those nouns tend to be used with the same verbs); the force is a local force in the space of words, but also has a long-range component across the whole language.

This is much like the local inter-atomic forces in a ferromagnetic solid (interactions between magnetic fields of different atoms) which tend to line up the magnetic moments of all atoms in the same direction. Such a solid typically consists of small regular crystalline domains with irregular boundaries between domains; within each domain, the magnetic moments of the atoms are all aligned, but different domains have different alignments at random.

Similarly we would expect the weak forces of language change to produce local domains of regularity in any language Ñ sets of words of similar meaning and with similar syntax. Each domain tends to be self-sustaining and stable against change; but different domains may have different orientation, allowing overall irregularities in the language. When languages collide, domains are broken up and rearranged.

For instance, amongst case-marked languages, the choice between nominative/accusative and absolutive/ ergative markings also makes a force for regularity, tending to line up nouns and verbs on the same choice. Here, the penalties for mixing are weaker, and mixed languages exist; but still the mix is not random, and there are large domains of regularity.

4.5 Greenberg Universals and the Head Parameter

Perhaps the most important selective force for regularity concerns word order, and is the force leading to the Greenberg-Hawkins universals and the so-called `Head parameter' of languages.

Recall from section 2 that structural ambiguities are handled (in language understanding) by taking a script intersection of the two ambiguous meanings, thus avoiding a combinatorial explosion of possible readings of a sentence. In practice (as shown by the computational implementation of m-script-based understanding) this works well; but it only does so for languages which obey the Greenberg-Hawkins universals

This can be seen from an example, involving Greenberg's universal number 2, that "in languages with prepositions, the genitive almost always follows the governing noun, while in languages with postpositions it almost always precedes." This is statistically one of the most reliable of the universals discovered by Greenberg.

Consider the sentence “John saw the lid of the box on the table”. The noun phrase can be read in two different ways, as (the lid of (the box on the table)) or ((the lid of the box) on the table). Because English obeys Universal no. 2 - having prepositions and the genitive following the governing noun - both of these readings refer to some kind of lid. If, however, `on' were a postposition and so `A on B' referred to some kind of B, then the first reading would be a lid, while the second reading would be a table.

Taking the script intersection of the two senses preserves their common information that it is a lid, helping in further analysis of the sentence; but if English had postpositions, taking the script intersection (of a `lid' script and a `table' script) would leave little useful information for the rest of the analysis.

So to make this particular structural ambiguity easier to handle, the word m-scripts of English pre/postpositions and possessives must evolve together to obey Greenberg's universal no. 2. Any postposition or reversed genitive (in a dominantly prepositional language) would give rise to hard ambiguities, where script intersection does not work and multiple senses must be handled in parallel. Such an m-script would be strongly selected against by speakers and usage. After a generation or so of restricted use, any such `reversed' m-script would die out, leaving Universal no. 2 true again.

Similar considerations apply to other kinds of headedness of phrases, giving selection pressures on word m-scripts to make them obey the other Greenberg universals. Since, without the universals, structural ambiguities would be very hard to handle, this is a strong selection pressure on word m-scripts, and it tends to line up the whole language in one domain of regularity. There is a strong tendency for languages to be either head-first (like English) or head-last (like Japanese).

Thus the Head parameter of languages does not reflect a fundamental constraint in our capacity to use or learn languages. It did not evolve as part of the human brain to make languages learnable, but evolved in the m-scripts of each language, to make ambiguities easy to handle. A similar analysis may apply to other parameters in the `principles and parameters' model of language.

4.6 Subject and Object

In this theory, matching of verbs with their arguments is done purely in terms of semantic roles, and the grammatical functions of subject and object emerge for other reasons. I shall first describe how, in case-marked languages, verbs match up their arguments with semantic roles; then describe how subject and object emerge, and how they relate to semantic roles.

The account of nominative/accusative and ergative/ absolutive case-marked languages is as follows: entity nodes in scenes typically have two slots: an `actor' slot (with values act: = yes/no) which denotes whether that entity initiated the action; and a `change state' slot (with values chs: = yes/no) denoting whether that entity undergoes a change of state. In their use with transitive and intransitive verbs, these slots have values:

Georgy [act:yes, chs:no] kisses the girl [act:no, chs:yes].

Georgy [act:yes, chs:yes] runs away.

These values are an intrinsic part of the meaning scripts we form, pre-linguistically. Every language needs at least a binary marking (or position) to distinguish the two arguments of a transitive verb. Nominative/ accusative languages define the case marking by the action slot on the noun entity (nom = act:yes, acc = acc:no), and morphologically ergative languages define it by the `change state' slot (erg = chs:no, abs - chs:yes). In each type of language, for economy the more marked case tends to be the rarer (the `1 out of 3' cases in the two examples above - act:no and chs:no).

When sentences are constructed by m-unification, verbs are matched to their arguments directly by these semantic slots, rather than through the grammatical functions of subject and object. Therefore there is no need for `linking rules' between grammatical functions and semantic roles. But the m-script theory still owes us an explanation of how grammatical subjects and objects have arisen in languages.

The answer is driven by the requirement of economy of expression in language Ñ in particular, by the need to avoid unnecessary repetitions of noun phrases. (The need to express pragmatic information also plays a part.)

People tend to say several things in succession about the same thing - particularly when building up a complex meaning script, where entity nodes for the same thing may occur repeatedly. In these cases, it is uneconomical to use the same noun phrase repeatedly to refer to one thing. Languages have evolved to have a whole battery of economy devices to avoid doing so:

E1. Reflexive pronouns, referring to the nearest entity in certain roles

E2. Pronouns, typically referring to some recently-mentioned entity

E3. Omission of a noun phrase in coordinate constructions

E4. Omission of a noun phrase in relative constructions

E5. Omission of a noun phrase in complements

E6. Cross-referencing on verbs, so that semantic roles which are not explicitly given may be identified more easily.

E7. Switch-reference in strings of statements about the same thing

All of these devices achieve economy of communication by avoiding repetition of 'obvious' noun phrases Ñ making it somehow easy for the hearer to work out (by a convention) what is being referred to, without hearing it described explicitly.

Economy devices need rules or conventions to help listeners pick out the thing not described. These rules need to be simple and consistent across any language, in order to minimise the difficulty of learning and using them. This provides another force to make domains of regularity.

To make consistent economy devices across a language, it is useful to have a particular 'privileged' entity node in any verb meaning script, which any economy device may then use without risk of misunderstanding; listeners rapidly learn to use that entity node in consistent ways to construct the intended meaning.

In this respect, the entity node attached directly to the top scene node of a verb meaning script -the entity node with a slot [act:yes] - is particularly useful, because:

¥ Every verb (transitive or intransitive) has one

¥ It is easy to locate it in the verb meaning script

¥ Being the entity which initiates and controls the action, it is always important (and so is likely to be multiply referred to)

These entity nodes are the subject of the verb, and the ways in which the economy devices E1 - E7 use them are the syntactic criteria for identifying subjects in a language.

For any language, some subset of the devices E1 - E7 exist, and use the subject entity in the way described above. While this usually provides a good economy device, it is not the only way, and any language might use some different device (not involving the subject entity) for any of them. Therefore there is no consistent subset of E1 - E7 which precisely defines the notion of subject in all languages (Andrews 1985).

The economy devices are particularly involved in complex constructs such as complementation and coordination, and subjects are central to most economy devices; this is what leads to the autonomous 'grammatical' functions of subjects, independent of semantic roles, and to the historic importance of subjects in the study of syntax.

In the development of child language, on this account we would expect the semantic roles, and their identification in verb argument frames, to come first, and the more complex economy devices, with their use of the notions of subject and object, to come later.

4.7 Competing Explanations of Regularity

The m-script theory explains the partial regularity of languages as a consequence of historic language change - the evolution of word m-scripts to achieve greater economy and disambiguity of communication.

This explanation has the advantage that, not depending on innate biological structures in the brain, it does not demand complete regularity. Any account of language which starts from regularity, linking it to structures in the brain, will sooner or later find irregularity an embarrassment. How do brain structures designed for regularity cope with irregularity [1] , and why do they tolerate it ?

On the m-script account, however, m-scripts evolve to build `domains of regularity' in a language; but where two different regular domains collide (as, for instance, where two languages collide) there must be an irregular border between them. The brain copes easily with this irregularity, because the brain requires no regularity.

One might feel that this is just another, competing, theory of language regularity. Regularity might have evolved innately as part of the human brain, or it might have evolved by m-script evolution - how can we decide between these two accounts ?

We can do so by considering the speed at which the two competing processes take place. Both are processes of evolution and selection - on the one hand, selection of human brains to handle regular languages, and on the other hand, selection of word m-scripts to be approximately regular.

We can see, both theoretically (from the evolutionary speed limit in Worden 1995) and empirically, that the evolution of m-scripts is much, much faster than the evolution of the human brain. The form of a language can change radically in a few hundred years - whereas the design of the brain has not changed significantly in tens or hundreds of thousands of years.

If you have two competing mechanisms for the same effect - in this case, for the partial regularity of language - then there are two strong reasons to believe only the faster mechanism:

M-script evolution, being the faster mechanism, is clearly preferred. We would need to show that m-script evolution is either incoherent or fails to fit the facts, to justify a belief in the alternative.


Footnotes

[1] For instance, if a child learns a language by setting a few parameters, how does a bilingual child get by ? If parameters evolved to simplify the learning task, why did we complicate it again by providing multiple parameter sets just for multi-lingual children ?