5.1 Key Facts of Language Learning
5.2 Learning Word Meanings
5.3 Phrase Structure Rules
5.4 Morphology
5.5 Complementation and Control
5.6 Auxiliaries
5.7 Alternating Verb Argument Structures
5.8 Pronouns, Gaps, Quantifiers and Movement
5.9 Bilingualism and Language Change
5.10 Extra Assumptions
In this section I compare the theory with the evidence on language learning. These are summary comparisons which cannot do justice, in a small space, to the great wealth of evidence on child language learning which has been gathered in recent years. However, they show that the theory agrees well with the main known facts - in most cases, in a natural and unforced manner.
I have compared the theory with about 101 empirical facts about language learning, under the following headings:
A. Key Facts of Language Learning
B. Early Word Learning
C. Phrase Structure Rules
D. Morphology
E. Complementation and Control
F. Auxiliaries
G. Verb Argument Structures
H. Pronouns, Gaps and Movement
I. Bilingualism and Language Change
Comparisons under these headings are numbered (A1), (A2), etc. I have classified 4 different possible outcomes when comparing the theory with the evidence:
The comparisons are summarised in the table below, classifying the result of each comparison in the columns DA....CE at the right of the table. For each item of data, further left the asterisk, the better the agreement.
GENERAL PROPERTIES OF LANGUAGE
Languages are highly expressive
Languages are very diverse in structure
Languages are partially regular, but all have irregularities
Diverse languages are stable over time
Languages are learnt rapidly
Language learning starts slowly, then accelerates
Word segmentation is necessary for language learning
Language learning is very robust
We learn only structure-dependent rules
Language learning is lexically-based and conservative
Comprehension precedes production
Children make many types of transient errors, and correct them all
Languages change continually through intermediate forms
There are no major differences between the acquisition of sign language and of spoken language
LEARNING WORD MEANINGS
Word meanings are very diverse and rich in structure
Words are matched only intermittently with their meanings
The learner may observe many things which are not part of a word's meaning
Each word meaning is acquired from limited evidence
Children tend to link words to whole objects, and to types rather than thematically-related objects
The tendency to label whole objects and types is a bias, not a constraint
Some early words are very context-specific
Word meanings change gradually through intermediate forms
Children's over-extensions of word meanings may have a prototype structure
Some word meanings have a prototype structure
The set of meanings grammatically encoded in any language is quite limited
Children separate mutually less relevant elements of meaning into distinct words
True synonyms are very rare
Children apply a uniqueness bias in learning new words
Nouns predominate in the first 100 words learnt
Verbs and adjectives are learnt more rapidly after the first nouns are learnt
Learning closed-class morphemes accelerates at 400-600 words vocabulary
Early meanings tend to be over-specialised rather than over-generalised
Children over-extend some words in production (less so in understanding)
Names for basic-level categories are learnt first
Word meanings change by metaphor and metonymy
Children confuse names for parts of arms and legs
NP-type nouns describe social routines
In disambiguating homonyms, we favour common word senses
Gender has little to do with sex
PHRASE STRUCTURE
The syntax of any part of speech can be represented and learnt
Early syntax centres on verbs
Children link arguments to verbs correctly from the start of verb learning
Syntactic constraints also guide verb learning from an early age
There are no sharp, language-wide transitions observed in language learning
There is no dissociation between syntax and vocabulary size
Early analysed noun vocabulary predicts later syntactic ability; rote production does not
There is scant evidence for unmarked parameter values in language learning
There are no major blind alleys in language learning
Languages have regularities captured in X-bar syntax
In assigning agent roles, cue strength depends on overall cue validity
In many languages, cue strengths change dramatically between ages 6 and 16
Meaning elements encoded locally in the sentence are learnt most easily
Children tend to mark individual meaning elements explicitly and separately
Negation, interrogatives and conditionals are moved outside clause boundaries
Double-marking of negation is a later transient error
MORPHOLOGY
Individual word morphology is learnt before productive morphology rules
Productive regular inflections can be learnt even if regular forms are not in a majority
We learn which dimensions of meaning are encoded by inflectional morphemes
The speed of learning of inflectional morphology varies between languages
Productive inflections are learnt faster in agglutinating languages than synthetic languages
In agglutinating languages, inflections are learnt from the outside inwards
In agglutinating languages, children make no errors in ordering affixes
Irregular forms are initially learnt correctly
There is transient over-regularisation of irregular forms
English noun plurals and past tense verbs are over-regularised with low frequency
Specific Language Impairment affects regular morphology
High use of closed-class morphemes at 20 months leads to low use at 28 months
Ergative and accusative case markers are initially under-extended
COMPLEMENTATION AND CONTROL
Children acquire some complement-taking verbs early
Errors of control in complement-taking verbs are very rare
`Tough-movement' complements are acquired more slowly
There are mistakes in inflection of embedded verbs
Verbs acquired with missing complementisers are slow to acquire them
Verbs with optional complementisers are correctly learned
The `wanna' contraction is not made over a gap
The rare `promise' control structure is learnt more slowly
AUXILIARIES
Highly irregular English auxiliaries are learnt reliably
Over-generalisation of auxiliaries does not occur
Errors of Auxiliary control almost never occur
Children Often Fail to Invert Subjects and auxiliaries in Wh-questions
Complement verbs are sometimes overtensed
ALTERNATING VERB ARGUMENT STRUCTURES
Alternate argument structures for the same verb are learnt early and without confusion
Alternations of argument structure are in broad classes, yet respect narrow-range rules
Children use the alternations productively
Over-generalisations which violate the narrow rules occur, but are corrected
There are `indiosyncratic' non-alternators, which children learn
Children learn passives of `action' verbs before others
LONG-RANGE MOVEMENT PHENOMENA
Anaphors and pronouns have complementary binding domains
Reflexives are used correctly before pronouns
Pronoun reference principles have irregular edges
Some constraints on long-range movement are known from an early age
Ross' Island Constraints are obeyed from an early age
BILINGUALISM AND LANGUAGE CHANGE
Two or three languages can be learnt simultaneously
Children learn overlapping vocabularies for two languages
There is no evidence for a single grammatical system early in learning two languages
The course of bilingual language learning is very similar to the course of monolingual learning
Code-switching is done most frequently with nouns
Neighboring languages do not completely intermix
Creoles form very rapidly from Pidgins
Creoles use simple analytic forms to express meanings
Tense, Mood and Aspect appear in order TMA for creoles, MTA for most languages
The assessments of agreement in the table (distinctive/ unforced/ with extra assumptions/ conflicting) are merely my subjective judgements; readers will want to check for themselves that the accounts of learning phenomena in this theory really do work, by reading and checking some of the descriptions.
You may suspect that the choice of data for the table is not unbiased. If so, you are right; in writing the paper, I have tried (while addressing the key data) to pick out evidence which gives the most interesting, clear-cut comparisons with the theory, either for or against.
The table seems to show the theory in good agreement with a broad sweep of evidence about language learning. I have not found any major contradictions to the theory in this search of the evidence (and so the right-hand CE `Contradictory Evidence' column is unpopulated); but I am not the best person to look. I leave it to others to make a more objective evaluation. Since it is problems which move theories forward, I will be grateful if someone can provide some solid entries in the `CE' column.
In the comparisons which follow, I shall refer to sections of the paper which describe the relevant part of the theory, by giving a section number in square brackets, as e.g. [3.4]. I shall make brief comparisons in passing with some other theories of language acquisition; but the main comparison with other theories is deferred to section 6.
I first survey some of the broad, well known facts about language and language learning, and describe their interpretation in the theory. While these key facts are all well known, they are nonetheless remarkable and stand in need of explanation; many theories of language and language learning have difficulty accounting for them.
(A1) Languages are highly expressive: This is perhaps the central mystery of language learning - that languages can express an infinite range of complex meanings, using only finite resources; and that as children, somehow we all learn how to do this. That is what makes language learning seem so far beyond any form of animal learning, and has made it a central challenge for cognitive science.
This theory provides a working computational answer to that challenge. An unbounded set of language meanings are represented by tree-like feature structures (scripts) [2.1]. Each word meaning is a script, and language combines these word meanings by function application. Every word is a script function (m-script) which combines its own intrinsic meaning (its right branch script) with the meanings of its arguments [2.3]. This process can build up an unbounded set of meanings, or generate sentences from them - as has been shown by a working computer program [2.6]. We use a robust, general method of learning the word m-scripts [3.8] to learn this unbounded language capability. All this has been demonstrated in a working computer program.
(A2) Languages are very diverse in structure: (Shopen 1985) documents the great syntactic and expressive diversity of the world's languages. I have not found any forms in that survey which are not expressible in word m-scripts - which suggests that word m-scripts can capture the syntax of any language. This conclusion is supported by the correspondence between the m-script formalism and Lexical Functional Grammars [2.4] - which have been successfully applied to a very wide range of languages (e.g Austin & Dalrymple 1995).
Since there is a robust, general method to learn any word m-script [3.2 - 3.13], this gives us a way to acquire the syntax of any language - in spite of their huge syntactic diversity.
(A3) Languages are partially regular, but all have irregularities: Data in (Shopen 1985) and other surveys confirm that no language is completely regular, and equally, no language is completely irregular. The regularities have been the source of elegant syntactic theories, to which the irregularities have been annoying exceptions.
In this theory, the syntax of any language is embodied in the m-scripts of its words, which can embody complete regularity, complete irregularity, or anything in between. By m-script learning we can acquire an arbitrary set of word m-scripts, anywhere along this spectrum [3.8]. So irregularity is not a problem for the theory. Nor is regularity; the partial regularity in languages can be understood, as arising from historic changes in their population of word m-scripts, rather than from any structure in the human brain [4.3-4.7].
(A4) Languages are stable over time: The remarkable fact is that such a diverse range of languages are all individually stable over hundreds or thousands of years (e.g. Renfrew 1994). It might not be so; many conceivable learning mechanisms might, over the generations, `funnel' any language towards one of a few standard forms, apart from vocabulary variations.
In the m-script theory, this stability of language diversity follows from two facts:
(A5) Languages are learnt rapidly: From children's' peak learning rates of several words per day (Ingram 1989), it seems that each word must be fully learnable (in its syntax and semantics) from just a few exposures. In this theory, each word m-script can be completely learnt from (of the order of) six clear examples of hearing the word in use [3.2]. From everyday experience and observations of young children, it seems clear that children are exposed to at least this number of learning examples.
Therefore the m-script theory can account, to within an order of magnitude, for the observed speed of language learning. Some other theories (such as simple neural nets, whose training times are measured in `epochs', or thousands of examples) clearly do not.
(A6) Language learning starts slowly, then accelerates: Many studies have shown early learning rates of only one word every few days, some 50 times less than the later peak rates of order 10 words/day. There is widespread evidence that many (but not all) children undergo a `vocabulary burst', or rapidly increased learning rate, when their productive vocabulary is in the range 50-100 words. However, the effect seems to be a steady acceleration rather than a takeoff point (Bates et al 1993).
Many different mechanisms have been proposed to account for this burst. In this theory there is one obvious factor which may account for some, if not all, of this rapid acceleration.
Initially, children have few clues to segment the sound stream and to guess what is being referred to; so their initial learning must involve many wrong guesses, picking a signal out from a high noise level [3.7]. The learning mechanism can do this, but slowly. Later on, children can partially understand many sentences, giving strong clues as to what sounds are new words, and what they refer to [3.9]. The partial understanding process gives powerful constraints on what an unknown word may refer to; it is like having a pair of tweezers to pick up unknown meanings, in stead of clumsy fingers. This gives the child a much cleaner learning signal; so learning is expected to accelerate rapidly.
On this account, we would expect the number of clean learning examples per day to increase linearly (or perhaps even more rapidly) with vocabulary size V. As learning rate is proportional to the number of learning examples, this gives dV/dt = lV, which implies an exponential vocabulary growth, approximately as is observed (Van Geert 1991; Bates and Carnevale 1993).
(A7) Word Segmentation is Required for Language Learning : Most theories of language acquisition need to assume a separate mechanism for word segmentation, and these segmentation theories are not yet very satisfactory. Few of them address the problems of segmentation for sentences containing novel words. Some theories depend on particular prosodic cues or styles of caregiver speech, and these cues and patterns are not cross-culturally universal.
In this theory, provided the child can segment the sound stream into some kind of sub-word units such as phonemes, their grouping into words is learnt directly by the m-intersection learning mechanism, as described in [3.7, 3.8]. Just as m-intersection projects out the script meaning of a word (in the right branch of its m-script) from a large amount of extra meaning in learning examples, so it projects out the sound of the word (in the left branch of its m-script) from a large amount of surrounding sound in learning examples. Whatever is not common to all learning examples (meaning or sounds) is efficiently pruned out by the m-intersection mechanism. A separate word segmentation mechanism is not required.
An attractive feature of this account is that it does not depend on any special properties of the sound stream, and so can account for word segmentation in the acquisition of sign language just as well as in spoken language, by the same m-intersection mechanism.
(A8) Language learning is very robust: Children learn the correct meaning and syntax of words from a few examples which (a) may be interspersed between many noisy or misinterpreted cases and (b) each have a large amount of irrelevant information present; and (c) they can learn without any explicit prompting or instruction.
The social learning mechanism, which is the basis of language learning, evolved to do a very similar task - to learn social regularities robustly from examples sparsely spread amongst noise [4.1], with no explicit instruction.
The Bayesian learning mechanism evolved to meet this need. It can learn a rule or m-script which applies to only 5% of all qualifying cases [3.2-3.4] from around 6 good learning examples, where each example has a large amount of extraneous information. We can show mathematically that the Bayesian learning mechanism is capable of this performance [3.4]. Thus children's' learning examples may be heavily loaded with other information, interspersed with a large number of false examples, and they will still successfully learn word m-scripts.
This robustness applies not just to the learning of individual word meanings; since the syntax of a language is embodied in word m-scripts, the learning of syntax is equally robust. Since syntax is distributed across many word m-scripts, its learning is more robust than if it were embodied in a few parameters.
(A9) We learn structure-dependent rules: Children do not seem to make errors which would follow from learning a surface-order dependent rule in the place of a structure-dependent rule (Chomsky 1991); for instance, it has been shown that when children ask questions, they use forms which are a structure-dependent (rather than surface-order-dependent) modification of the indicative forms (Crain 1991).
In the m-script theory, both statements and questions are generated directly from script meaning structures [2.5]. The meaning script for a question is a structure-dependent alteration of the indicative meaning structure (it can only be structure-dependent, because meaning scripts are structures; they have no surface order [2.1]). Therefore we can only form questions in structure-dependent ways.
More generally, the way we generate language is fundamentally structure-dependent [2.5], and learning any word is centrally dependent on meaning structures [3.7]. The whole theory is built on script meaning structures rather than surface order; it would be extremely difficult, in this theory, to force some surface-order dependent error.
(A10) Language Learning is Lexically-Based and Conservative: For many years, linguistic theory has been dominated by the study of syntax and its language-wide approximate regularities, and some underlying cross-language universals. Child language research has been led by an expectation that the prime learning task is to acquire these language-wide syntactic regularities (implicitly, as fast as possible) perhaps in the process revealing cross-language universals.
These expectations have not, in broad terms, been met. In stead, the empirical picture of child language learning has repeatedly been one of lexically-based learning - grammatical structures being learnt in close association with individual words, with only cautious, conservative generalisations beyond this . For instance Maratsos (1983), in summarising the evidence, concluded:
A recurring finding of the last years is that children often make highly specific analyses of combinations, and apply possible generalisations cautiously, rather than rapidly making highly general ones which are productively extended immediately.
Similar points are made by Tomasello (1992) and Pinker (1996).
This broad finding fits well with the m-script learning theory in which the entire language is lexically based, the learning process centres on discovery of m-scripts for individual words [3.8], learning is mainly bottom-up, from narrow rules to broader generalisations [3.5], and where any further generalisations (secondary learning) must all pass a test of statistical significance [3.11]. Therefore the m-script theory predicts exactly the kind of lexically-based, conservative learning which children have repeatedly shown.
(A11) Comprehension Precedes Production: It is a robust cross-linguistic finding that vocabulary for comprehension grows well in advance of productive vocabulary. Between 12 ans 18 months, comprehension vocabulary exceeds productive vocabulary by typically a factor of 4 or more, with wide variations between children (Bates et al 1993).
This might seem to be an obvious fact, which must be true in any theory of language learning, but it is not so. If, for instance, language learning proceeded by doing - by trying out words to see if they work - it might be the case that production was closely matched with comprehension. Compared to many other skills - where we can only learn by doing - it is remarkable that language learning proceeds very effectively almost entirely by observation, with hardly any doing.
In the m-script theory, the learning mechanism is based on a mixture of comprehension and `silent' internal generation [3.9]; however, in the earliest stages (e.g. to learn nouns) comprehension dominates [3.7]. This means that, if children are under some selective pressure to learn language as fast as possible, it is comprehension which is under pressure to advance most rapidly. Although a word m-script, once fully learned, can be equally used for comprehension or production [2.3-2.6], we would expect the child at any stage to have a large number of word m-scripts partially learned - well enough to be used in comprehension (with contextual help) and to give valuable clues when learning other words - but not known with enough confidence for use in production.
The other obvious factor in explanation is that the child is exposed to (and so automatically learns) many words which she just has no interest in using for speech. This remains true into adulthood.
In summary, the lead of comprehension over production has no single neat explanation, but equally poses no problems for the m-script theory.
(A12) Children make many types of transient errors, and correct them all : Although the word `errors' may reflect an over-simplistic, normative view (Givon 1985), nevertheless there are many distinct ways in which children's' speech transiently differs from adult speech - many distinct `error types' which are universally eventually corrected.
We have reliable cross-linguistic evidence on an increasing number of these error types - including over-regularisation of morphology, over-extensions of word meanings, alternations of verb argument structure, and pronoun usage (see e.g. Slobin et al 1985). The ways in which these errors arise (while others do not) are diverse and provide many insights into the learning mechanism, which will be discussed individually in the sections which follow.
These errors are all, of course, corrected before adulthood. Learning theories have struggled to explain how each individual error type is corrected; in each case it has been a struggle, because explicit negative evidence is known to be absent or insufficient to support learning (Marcus 1993). However, the fact that they are all corrected (and we may find many more transient error types, also corrected) is itself a big experimental fact which calls for a simple explanation. We cannot carry on constructing piecemeal accounts of all error corrections.
In this theory, there is a straightforward mechanism for gathering and using negative evidence. When a child hears an adult sentence, and can infer the meaning, the normal learning process [3.9] is to partially understand the sentence, and also to partially generate a sentence from the inferred meaning. This gives the child the opportunity to observe where I might have said X, an adult said Y - a piece of negative evidence about construct X. The adult has supplied an implicit correction without intending to [3.10].
If the child gathers enough of this negative evidence about X (enough evidence to be statistically significant) she may conclude that in certain circumstances, construct X cannot be used [3.6]. This offers a general account of the correction of many distinct types of transient errors. Some of the specific instances are discussed below; but having an effective general mechanism for a very general phenomenon is an asset for the theory.
(A13) Languages change slowly through intermediate forms: While a few languages seem to have settled on highly regular and stable forms (e.g Turkish: Slobin 1982), the majority of the world's languages seem to be mixtures, or in a state of transition, or both (McMahon 1994).
This is problematic for some theories, where much of the structure of a language is carried in a few phrase structure rules or parameters. For these theories, it is hard to understand the intermediate forms of language which must persist for at least a few generations during the course of language change (eg between one parameter value and another). How do these languages work (in production or in understanding) and how do children manage to learn a language with mixed or intermediate-valued parameters? Theories which rely on a few regular discrete structures or parameter values cannot easily accommodate incremental changes.
In this theory, where the structure of the language is embodied in a large set of word m-scripts [2.3], a language in transition is defined simply by a mixed set of m-scripts - some for the initial state, and some for the final state. A mixed language contains m-scripts from both parent languages. There are then no difficulties either about how the language works either in production [2.5]or in comprehension [2.4] (since these processes can use an arbitrary mixture of word m-scripts), or in acquisition (since this process can reliably acquire an arbitrary set of word m-scripts [3.12] ). The m-script theory is entirely compatible with mixed and transitional languages.
(A14) There are no major differences between the acquisition of sign language and of spoken language: Sign language has all the syntactic complexity and expressive power of spoken language, and is acquired in a very similar manner. For instance, as summarised by Pettito (1993) deaf children acquiring signed languages from birth and hearing children acquiring spoken languages from birth achieve all linguistic milestones on an identical time course.
In this theory, language learning is based on a general mechanism of social intelligence, which evolved to learn causal regularities about sounds, gestures and data of any sensory modality [4.1]. So it is no surprise that language learning can be coupled to acoustic and visual channels with equal ease. Theories which postulate a more recent, language-specific origin for the learning mechanism might have difficulty with this fact.
The m-script learning mechanism gives a practical answer to the `poverty of the stimulus' argument which has motivated much linguistic theory. If you have a robust Bayesian learning mechanism, evolved over millions of years to learn complex social regularities in a noisy social milieu, then there is no lack of stimulus to learn language:
The Bayesian learning mechanism is provably capable of this performance, and it is now clear that children of all cultures get adequate inputs. They hear many hundreds of words per day, in situations (such as well-rehearsed family routines) where they know what is going on, and what is being referred to. What seemed like poverty of the stimulus was in fact poverty of learning mechanisms.
Points (A1) - (A14) are some of the most salient and remarkable facts about language learning - facts which have made it one of the major challenges for cognitive science. The m-script learning theory seems to give a natural and unforced account of all of them.
The acquisition of word meaning has been called `perhaps the deepest mystery in the study of language acquisition' (Bloom 1993) because, while syntax and phonology might be argued to be closed or parameter-driven systems, word meanings span an open set of concepts - possibly the whole of human cognition.
Points (B1) - (B4) list some of the key reasons why this has seemed such an deep mystery, and describe briefly how the m-script theory can gives an un-mysterious, practical answer to each of them.
(B1) Word meanings are very diverse and rich in structure: The size of any dictionary reminds us of the great diversity of word meanings, while introspection confirms that many word meanings are highly structured things - which could not, for instance, be captured as points in any small-dimensioned space. The set of possible word meanings is systematic and productive (singular things can be made plural; things that can be done may be undone or re-done; and so on). The meaning of a word may refer to almost any situation which we can conceptualise, and to sense data of any sensory modality. Somehow we can internally represent these diverse, richly structured word meanings, and learn each one of them from only a few examples. This is a major challenge for theories of language.
In the m-script theory, word meanings are scripts [2.1], symbolic structures which derive from primate social intelligence [4.1]. Primate social situations are diverse, highly structured, multi-sensory and systematic; so the script representation has evolved have these properties. Being tree structures of unlimited depth and breadth, scripts can express a systematic set of meanings of arbitrary complexity - just as is required for word meanings.
Although the basic script representation can capture much of the complexity of word meanings, scripts are not an isolated representation in the brain. For social cognition, they must couple to other internal representations, such as mental images, or representations of procedural and physical skills (Worden 1996b). These links to other representations extend the representational power of scripts. For instance, the word for a physical skill may be represented by a script which links to a mental image, or to a dynamic encoding of the skilled movements.
Each time a word is heard in a sentence, its meaning is embedded in the sentence meaning [2.3]. If the child can infer this meaning non-linguistically, for a few learning examples of each word, the operation of script intersection will rapidly project out the word meaning script from example-specific extra meaning [3.7] -learning the meaning of the word.
Therefore the script representation has sufficient expressive power to represent the meanings of words, and script intersection can rapidly learn word meanings from examples.
(B2) Words are matched only intermittently with their meanings: In the learning input of the young child, there is only (in Gleitman's (1990) phrase) a `fitful fit of word to world'. Studies of corpora of utterances heard by children show that for typical early verbs, the verb's action is visible, or the things which are the verb's arguments are in view, on less than 50% of the occasions when the verb is spoken (Beckwith, Tinker and Bloom 1989). It has been found (Golinkoff 1986) that mothers immediately understand their children's' wishes on only about 50% of occasions - so how could the children do any better ? With so much dilution of the learning signal, how can a learner find out the true (and possibly complex) meaning of a word ?
The analysis of the Bayesian learning algorithm in [3.2] shows that a signal:noise ratio of 50% can be handled easily; the mechanism can still pick out a sound/meaning regularity even when it only applies 5% of the time. The key criterion is the number of positive learning examples, irrespective of how many false examples there are to dilute them. Therefore the m-script learning method is highly robust against dilution of the learning signal.
(B3) The learner may observe many different things which might be part of a word's meaning: Again to quote Gleitman (1990) , it seems that `an observer who notices everything can learn nothing, for there is no end of categories known and constructable to describe a situation'. This problem has also been famously elaborated by Quine (1960).
In the m-script theory, the answer to this problem comes in two parts: first, our primate social intelligence pre-disposes us to construe the situations around us in a limited number of ways, expressible as scripts (see (B5) below) [4.1] ; and second, the script intersection mechanism [2.1, 3.7] is a highly efficient way to project out the common meaning from two or more scripts, rejecting noise.
To illustrate its efficiency, suppose the child observes 200 bits of information in a typical situation, and that some word has a meaning with information content of 40 bits. Each learning example for this word has a superfluous `noise' information content of 160 bits, almost drowning the signal; but intersecting two such example scripts together prunes the tree, and removes at least 3/4 the superfluous `noise' information (sparing only the slots where the two examples coincide accidentally), leaving the noise at less than 40 bits. As the learning examples are uncorrelated, intersecting three examples leaves under 10 bits of coincidental noise; four leaves only about 2 bits of noise, and after that there is probably only the pure word meaning script left. Script intersection rapidly projects out the true word meaning from large amounts of noise.
(B4) Each word meaning is acquired from limited evidence : Word meanings of unbounded complexity are acquired from a small number of examples per word, at peak learning rates often over ten words per day (Ingram 1989). At this speed there is not time to get very many learning examples per word.
The analysis of the script and m-script learning algorithms showed how any meaning script can be acquired reliably from around six (i.e 3-10) learning examples. This is enough to establish that the apparent word meaning is not just coincidence between the examples, and to remove any other meaning coincidences between examples. It seems reasonable that the child can find 3-10 learning examples per word, at a peak rate up to 10 words/day. This learning mechanism has a likely biological origin [4.1], and can be shown mathematically to give adequate performance and noise rejection [3.3-3.8].
(B5) Children tend to link words to whole objects, and to types rather than thematically-related objects: Markman (1990) has shown that children have a learning bias to associate word meanings with whole objects rather than their parts, and to extend word meanings taxonomically (rather than thematically). They do not show the same biases for non-linguistic tasks. For learning language, however, children have some effective solution to Quine's (1960) `gavagai' problem.
If words are learnt by a mechanism which evolved for social learning [4.1], we expect biases in word learning to reflect the needs of the social domain - which is mainly concerned with whole objects rather than parts, and involves taxonomies (eg types of food). Thus we expect a learning bias towards whole objects, towards labels for individuals (precocious learning of proper names - Bloom 1990), and to and taxonomies or types.
In practice, the needs of the social domain have shaped the script representation, which then shapes language learning. The biases are reflected in the architecture of scripts. The most important properties (for social reasoning) are expected to be near the roots of script meaning trees, so that social learning (by script intersection) preserves them best.
For instance, we would expect whole objects to be represented by nodes near the roots of script trees, and their component parts by subordinate nodes below each whole-object node. Then the learning mechanism of script intersection [2.1, 3.3] will preferentially preserve whole-object nodes, giving a learning bias towards whole objects - a bias which originates in social cognition, but is reflected in language learning.
(B6) The tendency to label whole objects and types is a bias, not a constraint: As argued by Nelson (1988, 1990), children do not always make the taxonomic choice, and they should not (and do not) always choose whole-object meanings. However, the social learning process is intrinsically probabilistic, not categorical; so its learning biases are just biases (in Bayesian priors, reflecting the social environment) rather than constraints [3.1] ; they can be overcome by sufficient evidence.
The learning mechanism normally requires of order 6 examples to learn any word meaning script, simple or complex. This is because the prior probability of a simple meaning script is higher than that for a complex one; but a complex meaning script accounts for more data, so can gather confirmation from the learning examples faster [3.2], catching up after 6 examples. However, if a child is called upon to guess a meaning on the basis of an insufficient number of examples (e.g. one example, as in some of Markman's experiments), the prior probability bias to simple meaning scripts will win - causing the child to prefer the simpler whole-object hypothesis.
(B7) Some early words are very context-specific: Nelson (1985) and others have noted how some of a child's very first words are formulae which seem to denote some whole routine (or event description) rather than any adult word meaning.
The script intersection learning mechanism approaches the true meaning script for a word from above, rather than from below; early approximations have more nodes and slots that the true meaning, and successive script intersections remove these extra nodes and slots [3.3,3.4].
If, then, adults tend to use some words in stereotyped and highly context-specific ways, script intersection will learn the words with those narrow, context-specific meanings; this is what we expect from a bottom-up learning process [3.6]. The child will tend to learn and use context-specific rote formulae rather than true words. Only later can he broaden the meaning by further script intersections, learning words by a hybrid of the primary and secondary learning mechanisms [3.11].
Scripts are very closely related to Nelson's event descriptions [2.2, 4.1], and may well be the same thing; so the script learning mechanism, which learns early words, is expected to learn some words as complex event descriptions [3.3, 3.7].
In Nelson's evidence, context-specific, formulaic words tend to be used more by `expressive' rather than `referential' children. In this analysis, the expressive/analytic distinction may depend more on parental input than on the innate dispositions of the child.
(B8) Word meanings change gradually through intermediate forms: It is a well known fact of language change that the meaning of a word may drift over hundreds of years, to become completely different from its original meaning. The incremental changes of whole languages were explained, in the m-script theory, as changes in the population of word m-scripts which constitute a language (A14), but the slow change in meaning of an individual word seems to pose a bigger challenge for the theory. If each word meaning is a script - a discrete tree structure with discrete-valued slots on its nodes - how can one script change gradually over time ?
The answer to this puzzle lies in our ability to re-construe the same external situation in different, but related ways [4.2]. At any time, a word's meaning may be a changing mixture of construals.
For instance, consider Bybee's (1995) analysis of the changing meanings of English auxiliary verbs. She notes that the auxiliaries should, would, and could were all originally the past forms of shall, will and can - but that now their meanings have departed considerably from those past forms, in each case accruing a much more conditional/hypothetical reading. I will means I definitely will; whereas I would means I would only under some circumscribed conditions. Bybee traces this change through texts in Old English, Middle English and Shakespearean English.
Suppose the meaning script of will embodies a definite intention to do. When used in the past tense, willed (at first) also embodies this definite intention; but there is an implication (as the speaker says X willed and does not say X did ) that maybe X did not actually carry out his intention, because it was conditional on something which did not happen; this second construal is represented by a distinct script, and listeners may learn this implication generally as a non-linguistic m-script [1] which transforms between the two construals [4.2].
Thus in understanding a sentence, the original `pure intention' meaning of willed can be followed up by the use of a non-linguistic re-construal m-script which generates the second meaning `intention maybe not carried out'. This second meaning may then be learnt as a kind of homonym for willed.
As the word becomes a mixture of two closely related meanings, the relative weights of the two senses (their learned rule probabilities [3.2]) may change over time, until the second form dominates (and meanwhile the pronunciation changes to would). Throughout all this change, speakers and listeners can use the re-construal m-script to understand one another.
So any word in a language may change meanings between two or more `pure script' forms by being a mixture along the way. The rule probabilities of the word m-scripts, learnt by children of each generation, measure the progress of the change over time.
(B9) Children's over-extensions of word meanings may have a prototype structure: An example comes from Bowerman (1978) who reported that Eva (at around 18 months) had a prototype meaning of kick with three main characteristics: (a) waving a limb (b) sudden sharp contact between the body and an object, and (c) propulsion forward of the object. In various extensions, she used the word kick when only one or two of these applied - a kitten with a ball near its paw, a moth fluttering on a table, pushing an object against her sister's chest. Some extensions had nothing in common with others, but all overlapped with the prototype.
In the m-script theory, prototype-like over-extension arises as a result of production with limited vocabulary. Language production is always a best-fit process, making use of the words you know to say something useful - even if you have to include things which are not part of your intended meaning [2.5]. For a child with limited vocabulary, this is a hard problem; so if you have any word which overlaps strongly with what you want to say, you may use it, even if parts of its meaning do not apply. Thus the word kick conveyed some useful meaning in all of its over-extended uses - which, as a group, have a prototype structure.
(B10) Word meanings may have a prototype structure: The meanings of some adult words are well described by such a prototype structure, in which all senses share some properties with a prototype (or central sense), but where some peripheral senses have nothing in common with other peripheral senses. A typical case is the word game, first discussed as a prototype structure by Wittgenstein.
The production mechanism described above explains how this arises, when we try to express new meanings with our limited vocabulary. A word m-script describes a central sense, but the word may be used when only parts of this central meaning script apply. This extension of word usage may be done by speakers of any age. If one of these extended uses is commonly made, then listeners will start to learn it; the word then becomes polysemous, with several distinct meaning scripts. If they all originate from some narrower `prototype' meaning script, each taking on different parts of the original meaning script, then the set of meanings will have a prototype structure.
If you hear six good learning examples of a word with broader-than-usual meaning, you will learn a new m-script for that meaning [3.4]; so the speed of the learning mechanism enables words to rapidly become polysemous with a prototype structure.
(B11) The set of meanings which are grammatically encoded in any language is quite limited: This clear cross-linguistic regularity has been described by Talmy (1983) :
[Grammatical forms] represent only certain categories , such as space, time (hence, also form, location and motion), perspective-point, distribution of attention point, force, causation, knowledge state, reality status, and the current speech event, to name some main ones. And, importantly, they are not free to express just anything within these conceptual domains, but are limited to quite particular aspects and combinations of aspects, ones that can be thought to constitute the structure of these domains.
If you had to form a list of properties to abstract from general cognition for use in social reasoning (about peers and the interactions with them for food, space, alliances, dominance, and so on) then that list would look much like Talmy's list. In this theory, those meanings which are grammatically encoded (e.g. in closed-class morphemes) consist of small atomic elements of the script meaning representation, which can be easily added to or taken from any script. These elements are the basic coinage of social interaction [4.1]; and they are worth encoding compactly for efficient communication.
Different languages may encode different subsets of this limited set of encodable notions (Bowerman 1985). This does not constitute evidence against the existence of a limited set. For communicative efficiency, the words of a language should not try to encode too much (otherwise the meaning of each word would be too narrow; too many words will be required), so the language must be selective. These selective choices of what to encode are often language-wide choices; this happens because word m-scripts evolve to make large domains of regularity in a language [4.4].
(B12) Children separate mutually less relevant elements of meaning into distinct words: Just as languages differ in what they encode grammatically, so they differ in what they group together in open-class word meanings. Bybee (1985) has introduced the notion of the mutual `relevance' of two meaning elements, noting that languages tend to encode mutually relevant elements of meaning in the same word; but where languages do not do this, children's' production errors often tend to group them back together (Slobin 1985). (This finding is closely related to the finding (C14) below).
An example is Talmy's (1985) observation that while verbs of motion may encode motion, direction, form of the moving object, and manner of motion, in any one language the verbs only encode only two of these elements. In English, verbs encode motion and manner. Some American languages encode motion and form of the moving object. Spanish encodes motion and direction; manner must be encoded separately by an adverb. For `roll down', in Spanish one must say `descend rollingly'. Spanish children, however, often group meaning according to the English analysis as in *correr abajo `run down' (Slobin 1985).
Suppose verbs of motion do not encode all elements of motion, manner, direction and shape, because doing so would make verbs too specific; too many different verbs would be required. The choice of which meaning elements are encoded in the verb is a language-wide choice, because of the weak forces for regularity in languages; once a few main motion verbs encode some set of meaning elements, other motion verbs are forced to `line up' with them in a domain of regularity [4.4]. (This also stabilises and perpetuates the differences between languages.)
We can interpret Bybee's Relevance as proximity in script meaning structures. If two elements of meaning are slots on the same node, they are maximally mutually relevant; if they are on neighbouring nodes, less so, and so on [2.1]. The advantage of encoding mutually relevant meaning elements in the same word is not one of learnability (the m-intersection learning method can project out a large script structure spanning the mutually irrelevant meaning elements [3.3]), but of ease of language generation. Generation carves a meaning script apart into pre-defined building blocks [2.5], and is more easily done `piece by piece' using small local building blocks, than by carving out large, sparse, overlapping component meaning scripts. So languages generally evolve [4.3] to have compact building blocks, which encode mutually relevant meanings, making generation easier.
Of the elements commonly encoded in verbs of motion, manner and motion have greatest mutual relevance - being both intrinsic to the verb meaning script, while other elements (the shape of the moving object and the direction of movement) are more separable in the script meaning and so less relevant to the motion. However, some languages, such as Spanish, somehow started to encode motion and direction in the verb (Talmy 1985), and this choice then became frozen into the language by the `domains of regularity' mechanism [4.4].
This faces children not with a learning problem, but with a generation problem; Spanish verbs of motion do not encode the local, mutually relevant parts of the meaning structure which are easiest to pick apart in individual words. Any bias to encode small parts of the meaning structure in individual words will tend to produce English-like forms such as *correr abado.
As another example, children easily acquire verb affixes encoding tense/aspect and person/number, but have more difficulty learning to mark verbs for gender or definiteness of the direct object (Slobin 1985). Again, I interpret this as a bias in language generation, rather than in learning.
A closely related notion is Bowerman's (1985) `hierarchy of accessibility' of meaning elements which, she proposes, affects (a) what children learn easily, and (b) what most of the world's languages tend to encode grammatically. Accessibility might depend on some script-related notion (e.g. proximity to the root node) or may be influenced by many other factors (eg perceptual salience). Bybee's relevance is a kind of relative accessibility; in practice the two notions may be very hard to distinguish, and partially interchangeable.
(B13) True synonyms are very rare: This fact, which aids language learning, is sometimes thought to be an innate property of the human language faculty - proposed as a uniqueness principle (Pinker 1984) or principle of contrast (Clark 1987) which children impose on the meanings they learn.
This theory is able to have its cake and eat it on the uniqueness issue. On the one hand, the observed lack of synonyms can be accounted for by other means, without a uniqueness principle; and on the other, there are independent grounds in the theory to expect some kind of uniqueness principle, as observed in experiments.
The observed lack of synonyms can be accounted for without a uniqueness principle, as follows:
However, although a uniqueness principle is not required by language data, there is independent evidence for it.
(B14) Children apply a uniqueness bias in learning new words: Markman (1990, 1992) finds experimental evidence that 3- and 4-year olds use a uniqueness bias when faced with new words, tending to believe that some new word must have a meaning distinct from a word they already know.
While the learning mechanism described in [3.3] -[3.9] seems to have no intrinsic bias against learning two words with identical meanings, the bias may be applied when the child comes to store the m-script for a newly-learnt word. For the purpose of language generation, word m-scripts must be stored in some structure which indexes them by meaning, so that a speaker may find the word with the best meaning at every moment. The particular form of storage proposed in this theory is an inclusion graph, so that by descending the graph we find meanings successively closer to the desired meaning [2.8].
Whatever meaning-indexed storage structure is adopted, it will encounter a difficulty when trying to store two words of exactly the same meaning, as they will try to occupy the same place in this structure. We may suppose the structure is not designed to support this, as it serves no useful communicative purpose; there is no point in having two equally good words to say one thing. Therefore a storage difficulty will force the child to try to find a distinct meaning for a distinct word.
(This raises the question of how bilinguals learn words in two languages with the same meaning. It is likely that they learn very early a socially-conditioned `language context flag' which serves to distinguish the lexical entries for their two languages, and allows them to have distinct storage locations)
(B15) Nouns predominate in the first 100 words: (Bates et al 1995) A noun m-script can be learnt in the absence of other linguistic knowledge, by inferring what entity is being referred to through non-linguistic means [3.1]. Full verb m-scripts cannot be learnt without knowing some nouns [3.1], and so we broadly expect noun learning to precede verb learning.
Although the bare meaning of a verb might be learnt in the same way as a noun (requiring no other vocabulary), in practice verb senses are not so easy to pick out. Adult speech tends not to refer to a present action in the same way as it refers to a present thing (verbs typically refer to a near-future, desired or near-past action). A knowledge of nouns is therefore needed to give clues about which event (past, future or present) a verb refers to.
(B16) Verbs and Adjectives are learnt more rapidly after the first nouns are learnt: (Bates et al 1995) To learn a full m-script for a verb or adjective, you need to know some nouns [3.8]. Knowledge of entity words also helps the child work out what action (near-future or near-past) is being referred to; so we would expect an acceleration of verb learning after, say, the first 50 nouns are known.
(B17) Learning closed-class morphemes accelerates at 400-600 words vocabulary: (Bates et al 1995) To learn closed-class morphemes, you cannot use the raw data of adult sentences and their inferred meanings [3.9]; it requires a secondary learning process, which depends on knowing some number of word m-scripts already [3.11]. So we expect closed-class morphemes to be learnt only when the child has a significant vocabulary.
(B18) Early meanings tend to be over-specialised rather than over-generalised : While it was once believed that over-extension of word meanings was common (Clark 1973), more recent evidence (Huttenlocher et al 1987) suggests that over-extension is much rarer than was thought; and that when it does occur, it is more likely to be fairly late in the developmental history of a word. In the early use of any word, underextension is more likely than over-extension (Barrett 1995; Dromi 1987).
The script intersection learning mechanism starts from rich meaning structures of observed situations, and prunes them by comparison with one another - rather than building up a script from meaning elements [3.3]. It approaches the true meaning of a word from above, rather than from below. So we would broadly expect early word meanings to be under-extended (from intersecting a few examples with coincidental similarities, or learning highly context-specific words as in B5) rather than over-extended.
(B19) Children over-extend some words in production after having used them correctly: Children are observed to use some words with over-broad meanings.
Usually, this arises by the mechanism discussed under (B9) above; under pressure to communicate with limited vocabulary, children use words when only parts of their learnt meaning scripts apply. However, there are interesting examples of late over-extension, observed by Bowerman (1985), whose explanation may be a little more complex.
Bowerman (1985) noted an interesting form of over-extension between pairs of words with closely related meanings, such as make/let (as in make me watch TV versus let me watch TV ) and give/put (as in give the plate onto the table). In these cases, the two meaning scripts may be quite complex and have a lot of structure in common. What is interesting is that these errors emerge only after each word has been used correctly for some time.
There is a possible interpretation in this theory - that, like many other over-extensions, these arise from the secondary learning process. M-scripts for make and let are initially learnt and used correctly. Then, at some later stage, the child m-intersects these together (looking for some broader generalisation, and forming a higher node on the inclusion graph structure which stores words for fast retrieval in generation [2.8]).
Before this m-intersection is formed, there is no easy way to make the error; but once it is formed, when producing a sentence the child navigates down the inclusion graph, looking for meanings as close as possible to her intended meaning. If the make/let distinction is not an important part of the meaning for the child, she may take a `wrong turn' at this stage, pragmatically adding a small element of extra meaning in order to find a word. Only later does she learn that this element of extra meaning is important in the socially regulated world of adults, and has to be got right.
(B20) Names for basic-level categories are learnt first: Many nouns are names of categories. When children learn these category names, the first names learnt are those for `basic-level' categories (e.g dog) rather than superordinate (mammal) or subordinate (dachshund) (Brown 1958; Lakoff 1987).
The basic level is not just the middle level of some category hierarchy; it has important distinguishing characteristics. It is, for instance, the highest level at which we readily form a mental image of the category, and the highest level at which we have motor routines for interacting with members of the category (e.g chair versus furniture). It is the level of distinctive actions, both physical and social (Rosch 1978; Lakoff 1987).
In the m-script theory, the script meaning of a word is not just an isolated structure; it has links to other meaning structures in the brain, including other scripts for social processes and routines, mental images, and physical movement routines [4.1]. These links are an essential part of the meaning; for instance, menu has not much meaning without a link to the well-known restaurant script.
Our pre-linguistic meaning representations contain such links, defined at the broadest, most widely applicable category for which those links work well; this is the basic level. Therefore children tend to learn first the words for basic-level categories, rather than superordinate categories (for which the links don't work) or subordinate categories (whose names occur more rarely in the learning input).
(B21) Word meanings change by metaphor and metonymy : Many word meanings depend on metaphor or metonymy, and this has been identified as an important source of systematic meaning changes (Traugott). Is metaphor compatible with a script-based account of word meaning ?
The m-script theory has plenty of room for metaphor and metonymy (although the details have not yet been fully worked out). The reason is that word meaning scripts are strongly linked to other meaning representations in the brain [4.1], such as procedural scripts and spatial models.
These links must be quite diverse and flexible, invoking domain-specific linking procedures (e.g to convert between script and a spatial model). We may suppose that the preferred form of linking for a script is specified by a special slot on its root node. In Jackendoff's (1990) terminology, this slot defines a semantic field in which to interpret the script; call it the semantic field slot. By changing the value of the semantic field slot (e.g from `spatial path' to `time interval' or `possession') while keeping the rest of a script unchanged, we make metaphoric relations between different domains - social, spatial, temporal, etc.
This form of script-mediated metaphor may have been useful before language. For instance, linking social rank to vertical displacement enables some useful reasoning about ranks - implying, for instance, that they are transitive.
Strict word meanings have the semantic field slot fixed. Unfixing this slot is the form of over-extension which leads to metaphor. Metaphoric over-extensions and meaning shifts occur in the same way as other extensions and shifts (see B9, B10); people extend word meanings in production by using the word when only part of its meaning applies [2.5] (i.e by changing the semantic field slot) and listeners learn the new meaning with a different `metaphoric' value of the semantic field slot - one which makes sense in the context.
Again, the speed of the learning mechanism enables metaphoric meaning shifts to happen rapidly in a speaking, learning population; hearing six examples is enough to learn a new metaphoric sense of a word [3.2]. A similar account applies to metonymy.
(B22) Children confuse names for parts of arms and legs: This is a particular case of (B15) above, with evolutionary significance. Learners of many languages often substitute `finger' for `toe', `wrist' for `ankle', etc.; and the two are not distinguished in some adult languages (Bowerman 1989). If the meaning representation which underlies language evolved after bipedalism, this is puzzling - as hands and feet seem to have little in common. However, if language is based on a script representation of primate social situations, evolved over the last 20 million years [4.1], then it is not surprising; for most of that period, primate hands and feet have had very similar properties and uses, so might well have the same representation.
(B23) NP-type nouns describe social routines: As well as mass and count nouns, there is a third category with distinct syntactic properties - words such as `breakfast' and `church' which seem to behave more like noun phrases than nouns. For instance, they can be used `bare' without quantifiers, as in `Come down for breakfast'. NP-type nouns all seem to describe social routines. Children aged 4-5 productively expect new words describing social routines to behave syntactically in this way (Burns & Soja 1994).
In this theory, the linguistic meaning representation is based on a representation of social situations [4.1]. So we might expect that entities which are social routines have some rather special and `complete' meaning representation in scripts, which in turn leads to special `complete' NP-like syntactic properties. This is consistent with the finding that children apply the distinction productively.
(B24) In disambiguating homonyms, we favour common word senses : It is a commonplace observation, and an established psycholinguistic finding, that it is easier to understand a sentence containing a homonym when one of the common word senses of the homonym is used. This raises the question : how is the fact that it is a common or uncommon word sense represented in the brain, and how is it learned ?
The rule learning mechanism of [3.2] automatically learns a rule probability s(R) (which is the probability of the effect, given the cause) from the observed frequencies of cause and effect. In learning homonyms, s(R) is the probability of different word senses, which is learned from their observed frequencies [3.7]. Then, when finding the appropriate sense of an ambiguous word or phrase, the Bayesian maximum likelihood comparison of the different possible senses includes this factor s(R) for each sense [2.4]. For rare word senses, which have s(R) near zero, it is harder to find enough other evidence to counterbalance this small factor - making rare word senses slower to understand.
(B25) Gender has little to do with sex : In many languages such as German, the gender system seems to correlate with biological sex in a few core cases, but for most words in the language, gender is arbitrary - or perhaps partially predictable by purely phonological regularities. (Maratsos 1982)
We may expect that scripts, being a social representation, have a `sex' slot for people and some other animate entities [4.1]; this may be involved in representing grammatical gender for people, but what about inanimate objects with arbitrary gender ?
If we assume that there is a degree of flexibility in mapping the script representation onto other meaning representations (and thus onto the world), then the extension of gender to inanimate objects can be understood as a domain of regularity in language, extended by selection of m-scripts [4.4].
Suppose initially that proper nouns, improper nouns and pronouns for people all have a `sex' slot defined in their meaning script. This `sex' slot may be correlated with inflectional markers, which then have a number of uses for resolving ambiguities in complex sentences - for instance, in helping to determine the referents of pronouns or the agents of verbs. In all these cases, even one bit of sex information can be very useful (e.g. halving the number of possible referents for a pronoun).
The language can then become less ambiguous if the sex marker is extended to other words which do not semantically need it. Meaning scripts for inanimate objects can be artificially marked with a sex slot - which does not interfere with the interpretation of the script, but makes sentences involving those object easier to interpret, because of the gender clues.
The presence of disambiguation mechanisms, which depend on the sex slot, provides a selection pressure on the word m-scripts for inanimate objects, forcing them to acquire an artificial sex slot - which then becomes their gender. This is a domain of regularity [4.4] which is so useful that it extends across the whole language. Furthermore since, for inanimate objects, the value of the sex slot does not matter, it can be correlated with phonological properties of the words - so making the language more regular and easier to learn.
So the extension from sex to gender is easily accounted for as a result of m-script evolution to form a domain of regularity. The selection pressure is ease of disambiguation.
[1] This m-script is similar to the verb alternation m-scripts of section (5.7), but, unlike the verb alternators, has no syntactic changes of verb arguments on the left branch.