5.3 Phrase Structure Rules

It is in the learning of phrase structure rules that this theory differs most markedly from many theories of language acquisition, such as Pinker's (1984) theory, or the Principles and Parameters approach, in both of which:

  1. A few key rules or parameters play a central role in a language
  2. They are learned in a discrete, all-or-nothing manner.

In this theory, by contrast:

  1. There are no distinct phrase structure rules or parameters; each word carries its own phrase structure rule in its m-script [2.3], so the phrase structure of a language is learnt incrementally along with the words
  2. Each word m-script is learnt incrementally by accumulating evidence, with a Bayesian statistical criterion for sufficient evidence [3.2].
  3. The basic style of language learning is bottom-up, through individual lexical items, with rather conservative induction of broader generalisations [3.5].

So broadly, while other theories predict a few key, all-or-nothing milestones in language acquisition, this theory leads to an incremental, undramatic form of learning, in which syntax is acquired gradually, `safely' , and linked to the lexicon. For these reasons, the predictions of this theory for broad syntax acquisition tend to be rather negative and undramatic; several of the comparisons in this section have a `negative evidence' flavour.

This theory does not, however, differ markedly from those theories in the continuity assumption. Pinker's theory, and the Principles and Parameters approach, assume that the child is learning the adult grammar, and prefer (for economy of hypothesis) to assume continuity: that the child does not make any major diversions into special `child grammars'. In this theory, too (with a few local exceptions) the child acquires adult word m-scripts, rather than some disposable intermediate forms, from the start.

(C1) The syntax of any part of speech can be represented and learnt: Different parts of speech have characteristically different ways of connecting with their arguments and fitting together. All these ways must be somehow represented in the brain, and learnt; any theory needs to explain how.

In this theory, the syntactic constraints of any word are contained in the structure of the left branch of its m-script. I have illustrated by examples how this can embody the main constraints of word order, agreement, and semantic restriction. Then trump links between left and right branches convey the meanings of arguments into the full meaning structure. In the program which can generate, understand or learn a fragment of English, I have verified this general statement for nouns, verbs, adjectives, adverb, auxiliaries, articles, quantifiers, pronouns and prepositions. I have also verified in general terms that the approach extends easily to other broad types of language (strongly case-marked, weak word order, agglutinating, isolative, ergative, etc.).

If all syntax can be embodied in m-scripts, then it can be learned, because the general learning mechanism can acquire any m-script [3.8 - 3.11]. The primary and secondary learning algorithms are the ends of a spectrum, variants of one fundamental learning method which can acquire all syntax.

(C2) Early syntax centres on verbs: The verb-centred nature of early syntax is confirmed by several studies (e.g. Bates et al 1988; Tomasello 1992).

In the m-script theory, nouns are learnt first, and then the way is open to learn any part of speech which can combine simply with nouns to express new meanings [3.1]. Verbs qualify on this count; so do a number of other morphemes which children use in `verb-like' ways (Tomasello 1992). With nouns and verbs alone, the child can say many useful things; so the m-script theory is fully consistent with the way children's early syntax centres on verbs.

(C3) Children link arguments to verbs correctly from the start of verb learning: While verb arguments are often omitted in early speech, `argument swapping' errors are extremely rare, and comprehension/ differential attention experiments confirm that children learn verb-argument linkages correctly from a very early age. The basic m-script learning mechanism can learn verbs' argument frames correctly, given only (in some minority of occasions) a knowledge of the meanings of the nouns which fill those frames, and a correct construal of the intended meaning script [3.8].

This is a form of semantic bootstrap, which requires no innate knowledge of `linking rules' between grammatical functions and verbs' semantic roles (Pinker 1984, 1989) - because it works entirely in terms of semantic roles, with no mention of grammatical functions.

Many theories of language learning aim to account for the direct learning of an adult grammar (the continuity hypothesis), and so learn rules which use the grammatical functions of subject and object. For instance, in Pinker's (1984) theory, the child learns phrase structure rules with subjects and objects. She must then also know a set of `linking rules' for each verb, to define which thematic roles are filled by the subject and the object. These must either be innate and universal, or learnt simultaneously.

In young children's language there is little evidence for the grammatical functions as distinct from semantic roles. All the evidence from learning first verbs is consistent with learning direct verb-noun linkages involving just semantic roles, with no intermediate concept of subject and object; that is the way verb argument structure is learnt in this theory [3.8].

Evidence for subject and object only emerges later, when children start to use to complex constructions (and the economy devices in them) which define those grammatical functions [4.6].

This theory is also consistent with a continuity hypothesis - in that the child learns adult word m-scripts from the start - but the semantic roles are primary in this process, and grammatical functions like subject and object emerge later, as part of the economy devices. These can only be learnt later, when the child has a large vocabulary and syntax (so can get the evidence for them, and also needs to use them), by separate means.

(C4) Syntactic constraints also guide verb learning from an early age: Gleitman (1990) has argued that children have `syntactic' expectations about the linking of thematic roles to specific argument positions from an early age. They can use SVO order to understand which is the agent and which is the patient of a transitive verb from 17 months (Hirsh-Pasek et al 1985), can use sentence form to distinguish a transitive from an intransitive meaning when learning a new verb at 20 months (Naigles 1990), and can use word order to fix the non-obvious verb argument assignments in learning new `chase/flee' type verbs at age 3-4 (Gleitman 1990).

In this theory, there can be no `syntactic' expectations about argument order until a few verbs have been learnt. From then on, m-intersection of those verb m-scripts (the secondary learning process) can form a few broad m-scripts which embody these expectations (e.g. that agents come before the verb, patients after) [3.11]. Gleitman's results, while showing that children can use word order very early, do not conflict with this interpretation. Since comprehension precedes production, at 17 months (when the first evidence for use of word order occurs) children may well have learnt a few verb m-scripts well enough for comprehension, and well enough to have made word-order generalisations.

(C5) There are no sharp, language-wide transitions observed in language learning: In a theory where syntax hinges on a few key phrase structure rules or parameters, one might expect some noticeable language-wide changes (e.g. changes discernible in the manner of use of most or all verbs) on or around the day when the child learns a key rule, or fixes a key parameter.

No such sharp or language-wide changes are observed; rather, as emphasised by Tomasello (1992) in his `verb island' hypothesis, different verbs (which are the locus of early phrase structure) seem to be learnt independently, so that the best predictor of a child's usage of any verb is her recent use of the same verb. Each verb matures at its own pace. This is just as we would expect in this theory, where each verb m-script is (at least in the early stages of language learning, before secondary learning begins) learnt independently from examples of that verb in use [3.8].

In a theory with discrete, language-wide changes, it is of course possible to find mechanisms whereby the effects of the change are not seen as a sharp transition, or are blurred out over time; but those theories have to be saved from their own predictions, rather than agreeing naturally with the data.

(C6) There is no dissociation between syntax and vocabulary size: If learning of key phrase structure rules (or parameters) were a distinct process from learning words, one might expect some children to set the parameters early or late, compared to their vocabulary development - leading to a statistical dissociation between vocabulary size and syntax.

While many aspects of language learning show strong statistical dissociations across populations (eg between comprehension and production), in studying a sample of over 1000 children Bates et al (1993) found no detectable dissociation between syntax and vocabulary.

This is just what we expect in the m-script theory, where the syntax of a language is embodied in the m-scripts for its words [2.3], so syntax must be acquired along with vocabulary, by m-script learning [3.8]. The m-script theory cannot predict a dissociation between syntax and vocabulary.

(C7) Early analysed noun vocabulary predicts later syntactic ability; rote production does not: In a longitudinal study of 27 children , Bates et al. (1988) found that analysed vocabulary at 13 and 20 months correlates very highly with syntactic ability at 28 months. This is just what we expect in this theory - since early syntactic ability is largely a matter of verb mastery; and to learn verbs, you need to know some nouns to work out how the arguments are filled [3.7].

In the same study, Bates et al found, following Nelson (1985) and others, a spectrum of styles in early language learning; at 13 and 20 months there were two main `styles' of language production - analytic (short forms made up of analysed words) and unanalysed rote production of groups of words. They found that the second `rote' style had very little correlation with 28-month syntactic ability (as measured by Mean Length of Utterance), while the first `analytic' style was strongly correlated (C3 above)

This again is easily interpreted in the theory. To learn language involves at least two separate abilities [3.7, 3.8]:

  1. To record in memory `learning examples' - stretches of speech paired with script descriptions of situations.
  2. To m-intersect these together, analysing the stretches of speech down to individual words and their meanings.

If children vary independently in their ability to do (1) and (2), this will account for the observed dissociation. For unanalysed rote production, only (1) is required; whereas to develop syntax, both (1) and (2) are required.

(C8) There is scant evidence for unmarked parameter values in language learning: In theories which centre on a few key rules or parameters, the learning problem can be eased by assuming that some of these rules or parameters are the `unmarked case', or default, which can be assumed until contradicted by evidence. For instance, Pinker's (1984) theory starts from deep, narrowly branching phrase structure rules (as in X-bar theory) and only learns broader and flatter phrase structures (as required for languages such as Latin or Warlpiri) when this hypothesis fails. In Principles and Parameter theories, it is commonly assumed that each parameter starts at some unmarked value (typically the value leading to a more restrictive language, if there is one) looking for `triggers' in the input which might re-set it.

These theories broadly predict that some languages (or aspects of them) should be learnt faster than others; in a language with an unmarked parameter value, that value should be known correctly from the start, whereas children whose language has the marked value might be expected to make characteristic early `parameter switched' errors before they hear the trigger.

I believe that the evidence for these favoured parameter values or rules is scant (although I do not claim it is non-existent). In principle, to prove the case, pairs of languages (with opposite parameter values) should be looked at together. Some examples are:

So in spite of the theoretical attractions of default parameter values, no clear-cut evidence for them has been found. In all of these cases (and others) the m-script theory makes the `neutral' prediction that children learn different `parameter settings' - embodied in different forms of word m-script - with equal ease. This seems broadly consistent with the bulk of the data.

(C9) There are no major blind alleys in language learning: Theories which hinge on discrete, all-or-nothing learning of a few key parameters or phrase structure rules tend to be haunted by a `one false move' prediction. Suppose the child sets some parameter wrongly; will that send her along some blind alley of language learning, and if so, how will she ever recover ?

Since these theories also tend to use categorical rather than statistical learning mechanisms, this problem is particularly tricky; there can be no gradations depending on weight of evidence. They may need elaborate mechanisms to reset parameters or unmake generalisations, with complex criteria about when resetting can happen.

While children do over-generalise, leading to mistakes with particular words, there is, as far as I know, no evidence for any major `blind alleys' taken by children in learning a language. That is fully consistent with this theory, in which there are no language-wide `strategic' learning decisions to be made; and any word's m-script can be continually re-learnt (and unlearnt if necessary) on the basis of recent evidence.

(C10) Languages have regularities captured in X-bar syntax : X-bar syntax, as developed by Jackendoff (1977) embodies two main insights:

  1. Different sentence subunits, such as NP, VP, etc., have similarities of behaviour which warrant a common treatment as XP.
  2. For each of these subunits, there can be several `bar levels' such as N, N', N'' etc. with different syntactic privileges depending on level.

In summary, the m-script theory readily accounts for (1), but does not yet have any neat account of (2).

The fact that in any language, all the various clause types tend to be either head-first or head-last, uniformly across the language, is accounted for by the strong force towards regularity needed to handle ambiguities [4.5], leading to the emergence of the Greenberg universals. Other similarities may be understood as general similarities of form between m-scripts for different parts of speech.

The regularities under (2) do not yet have any pleasing account in this theory. In this theory, m-unification matches meaning structures. If the meaning structures are simple scripts as used for illustratoin in this paper, then there is little in the scripts to distinguish the different bar levels identified by Jackendoff and others. That is not to say that we could not find elements of script structure which are identifiable as markers of bar level; just that it has not been done yet. The existence of bar-level regularities is probably an indication that script meaning structures are, indeed, more complex than the illustrative structures I use here.

(C11) In assigning agent roles, cue strength depends on overall cue validity : Bates, MacWhinney and co-workers (MacWhinney & Bates 1989) have performed experiments where speakers of many different languages and ages hear simple transitive sentences with conflicting cues about which noun is the agent. The cues manipulated include word order, case-marking on the noun, animacy, and verb agreement. These experiments robustly reveal fascinating cross-language differences, and differences between ages, in the importance hearers assign to different cue types in choosing the agent.

For instance, Italian children under 7 give priority to animacy, followed by SVO word order; whereas Italian adults give priority to SV agreement, followed by clitic agreement and animacy. English speakers of all ages give priority to SVO word order.

They find that across many languages, young children assign cue strength on the basis of overall cue validity. Bates and MacWhinney (1989) summarise data which qualitatively confirm this in English, Italian, French, Spanish, German, Dutch, Serbo-Croatian, Hungarian, Turkish, Hebrew, Warlpiri, Chinese, and Japanese.

Cue validity is defined independent of the experiments which measure cue strength, as

Cue validity = Cue availability * Cue reliability

Availability is the proportion of occasions the cue is there to be used, and reliability is the proportion of those occasions when it gives the correct role assignment; so both can be approximately measured for a language from corpora or texts.

The strong correlation between the cue strengths seen in young children and overall cue validity is an important finding, which MacWhinney (1989) interprets in a connectionist learning model of cue competition. It is equally possible to interpret it in the Bayesian learning and processing model.

There has been concern that, because the test sentences often give conflicting cues about the agent assignment, subjects switch to some other (perhaps non-grammatical, conscious) processing strategy. However, the robustness of the findings across different experimental conditions argues against this. In the m-script theory, I propose that hearing of a conflicting set of cues acts like a signal that something has been misheard or garbled, and so tends to greater activation of the strategies for handling ambiguities. (This is not a switch in strategies, as ambiguity handling is continually necessary).

The core ambiguity-handling strategy is (a) to use broad general m-scripts (gathered by secondary learning) to get some clue about what is going on [3.11] , then (b) use Bayesian maximum likelihood estimation to choose the best possible interpretation [2.4].

For each possible cue, we suppose the child has learnt, by secondary learning, a broad m-script which relates the cue to the choice of agency; for instance, English has a very strong SVO word order cue, which can be embodied in a simple m-script - the m-intersection of many verb m-scripts. The rule probability s(R) of this m-script is acquired from the many examples, and is equal to the cue reliability. Then, when Bayesian maximum likelihood inference is used to choose between the different possible interpretations, the rule strength s(R) of each conflicting cue enters into the competition.

For instance, suppose there are three cues, R1 , R2 and R3 , and R1 indicates a certain agent assignment in contradiction to R2 and R3 . The maximum likelihood calculation is to compare s(R1)[1- s(R2)][1-s(R3)] with [1-s(R1)] s(R2) s(R3); thus R1 competes with R2 and R3 on the basis of their cue reliabilities.

In this theory, therefore, we predict that cue reliability, rather than cue validity, is the best indicator of cue strength in the child. However, cue availability (the other component of cue validity) does enter into the predictions. While the eventual rule strength depends only on reliability, the speed of learning depends on cue availability. Since secondary learning depends on broad generalisations from a lot of evidence, we may assume it proceeds slower than primary learning, and this slower learning may affect the observed results. Detailed examination will be required to decide whether cue validity, or cue reliability paced by cue availability, gives a better fit to the data.

The main point is that a Bayesian maximum likelihood choice of interpretations, with broad cue m-scripts learned by secondary learning, gives a good overall interpretation of cross-linguistic cue competition results in young children.

(C12) In many languages, cue strengths change markedly between ages 6 and 16 : The same researchers have measured cue strengths into adulthood in the same set of languages, and found slow but profound changes over the age range from 6 - adult. They interpret these data in terms of the conflict validity of the different cues, where conflict validity is a single number for each cue, defined in terms of how reliable the cue is when in conflict with other cues.

In this theory, the interpretation of these results involves something like conflict validity, but it is not just a single number per cue.

For instance, consider Kail's (1989) results on French. For children under 6, the order of cue strength is SVO word order first, then animacy, then VSO and SOV order. For adults, however, the order of cue strength is completely different : SV agreement, then clitic agreement, then animacy, then SVO order, then word stress. In this theory, we would interpret such slow but profound changes above age 6 as arising from two sources:

(A) Changing Cue Reliability : SVO word order is not a reliable cue to the agent in adult French, largely because of the extensive use of clitics and pronouns (which have different word orders) and the use of diverse word orders for pragmatic/discourse purposes. The child under 6 has probably not yet differentiated or learnt most clitic pronouns, and has not yet learned the pragmatic/discourse markers for other word orders. She works at sentence level, rather than discourse level (Karmiloff-Smith 1979).

If the child restricts her learning set to the sentences she can interpret, this may exclude many sentences which use clitics and exotic word orders; so in this selected sample, the cue reliability of SVO word order is rather high. Later, when the child understands clitics, SVO reliability drops. So cue reliability cannot be simply estimated from text corpora; it depends on how the child filters her learning input.

(B) Learning Exception Rules : If two cues A and B both predict an agent assignment, the child will first learn individual rules (A agent) and (B agent); when these rules conflict, she can use the Bayesian maximum likelihood inference to choose between them. However, given time the child will accumulate enough information to learn an exception rule (A&B agent), which may have a different rule probability s(R) from that given by maximum likelihood combination. This process of learning exception rules is slow [3.6] and is discrepancy-driven - a new rule is learnt only when the old rules make systematically wrong predictions.

This learning of exception rules will slowly acquire something related to the `conflict validity' of cues proposed by Bates, MacWhinney and their collaborators, and can be expected to account for some of the same effects; but conflict validity will not be a single number per cue.

In summary, the m-script learning theory has within it mechanisms which seem able to account qualitatively for the changing cue strengths which have been observed in several languages; but a detailed analysis and comparison will be tricky, because of the complex overlapping effects of (A) and (B) above.

(C13) Meaning elements encoded locally in the sentence are learnt most easily: Slobin (1973, 1982, 1985) has summarised evidence that greater separation in the sentence of morphemes encoding an element of meaning leads to slower learning and more errors. For instance, German gender is marked on articles rather than on the nouns themselves, and is learnt slowly; part of the cause may be the distance between the article and the noun whose gender is at issue. Slobin (1982) notes that children understand causative constructs earlier in Turkish and Serbo-Croat than in English or Italian, because in the former two languages the causative construct is indicated by a local cue, and suggests that a local cue effect also explains the early acquisiton of object inflections in Turkish.

While some of these examples may have multi-faceted explanations, there is a clear causal link from separation in sentences to difficulty of learning, which may contribute to all of them.

A construct is learned by collecting some six or so clear learning examples, forming the SMS for each one, and m-intersecting them together [3.9]. The speed of learning is paced by the time taken to gather these learning examples.

When a child hears a sentence or fragment L phonemes long, the probability P that she hears it clearly, understands all the words and infers the intended meaning can be roughly modelled as P = exp[-µL]; every extra phoneme diminishes P by a multiplicative factor. The exponent µ gets smaller as the child's command of language increases, but for young children, µ is large, leading to a strong cutoff in P with increasing L. Therefore it is harder for the child to gather long learning examples.

A very local construct can be learnt from short learning examples, so can be learnt rapidly; but any construct which extends across many phonemes will require longer learning examples, which (in the early years) take longer to gather. That is why local constructs can be learnt earlier. (Another example of this effect is that nouns tend to be learnt before verbs; nouns require only the noun sound in learning examples, whereas verbs also require their arguments)

(C14) Children tend to mark individual meaning elements explicitly and separately: Slobin (1985) has summarised evidence from many languages that children tend to be explicit in their speech, preferring free morphemes over bound or contracted morphemes (e.g in English auxiliaries, Bellugi 1967), and under-using ellipsis (e.g in Japanese, Clancy 1985). Children tend to separate elements of meaning which in adult language are fused; for instance, in French, using de moi in stead of mon, separating possession from the pronoun (Karmiloff Smith 1979; Clark 1985). Similar examples are found in Hungarian (MacWhinney 1985) and Hebrew (Berman 1985). Slobin (1985) summarises this and other evidence in his two operating principles, `Maximal substance' and `Analytic form'.

In the m-script theory, these effects arise not from a preference for a particular `child grammar' or from a production bias; they arise because evidence for the separate forms accumulates faster, so the separate forms are learned earlier. For instance, in French the possessive de is used with high frequency in many places; whereas a pronoun-linked possessive such as mon occurs more rarely. Therefore it takes a child longer to accumulate the necessary clean learning examples for mon than for de and moi; there will be a period during which de is securely known as a possessive, and mon is not; during this period the child is likely to use de moi and similar `analytic' constructs in stead of adult-like fused constructs such as mon.

(C15) Negation, interrogatives and conditionals are moved outside clause boundaries: It is a robust finding across many languages (Slobin 1985) that in early uses of negation, children tend to put the negative particle outside the boundary of an unmodified clause, rather than `fused in' to the clause at a variety of places (eg on the verb) as it typically is in adult language. Slobin cites evidence from English (Bellugi 1967), Polish (Smoczynska 1985), Turkish (Aksu-Koc & Slobin 1985) and French (Clark 1985).

There is a similar `clause external' placing tendency in interrogatives (English:Bellugi-Klima 1968; Hungarian: MacWhinney 1973) and conditionals (Hungarian: MacWhinney 1973).

This effect can be understood, in the m-script theory, as arising from a difference in the speed of learning of different constructs, dependent on the placement of negation (or conditional status or interrogation) in the script meaning structure.

For instance, the simplest representation of negation seems to be some form of `negation slot' on the top node of a whole scene, whatever the content of that scene. If we assume that negation is represented in meaning structures in this way, and if the adult language has any clause-external form of negation at all, then that word is represented by a very simple and broad-range m-script, shown in figure 5.1 below.

Figure 5: word m-script for a clause-external negator

This m-script for `no' means that if any sequence of words which designates an event or process (i.e results in a slot [des:event] ) is preceded by the sound `no', then the polarity of the resulting event scene is negative - it is asserted not to happen.

The clause-external placement of the negator (in the left branch of the m-script) allows any clause to be negated; just as the top-node positioning of the `polarity' slot allows any event meaning structure to be negated. This m-script is so simple and so easily applicable that, we may suppose, given any opportunity the child will learn it and may then use it as a universal negator.

In contrast, many adult forms of negation - e.g fused into the verb, or into an auxiliary, or in an agent such as `nobody' - are more complex as m-scripts, and each one is not so universally applicable. Their use by the child is, in the early years, doubly inhibited - there are more of them to learn, and as the evidence for each distinct form of negation accumulates more slowly (only when its special cases occur) each one takes longer to learn. It is then not surprising that, under pressure to communicate, children often fall back on a universal negator.

The question then arises: if the universal clause-external negator is so easily learnable and so adaptable, why do adult languages use many specialised, hard-to-learn forms? I believe that the evolution of these specialised negation m-scripts has been driven by two selection pressures:

For adults, these two advantages of specialised within-clause forms overcome the learnability disadvantage.

(C16) Double-marking of negation and other constructs is a later error: It has been noted in several cases (e.g Hungarian conditionals: MacWhinney 1973; English negation: Bellugi 1967; English past tense morphology: Kuczaj 1978; French possessives: Karmiloff-Smith 1979) that redundant marking of the same meaning element tends to occur more in older children, not when children first learn to express that meaning element.

Redundant marking is not a generally-forbidden feature of adult language; for instance, it is regularly used in many forms of agreement such as noun-adjective agreement, verb-agent agreement, etc. The m-script language generation method [2.5] supports redundant marking (and the learning mechanism builds it into the m-scripts of words which have agreement constraints),and generally gives speakers the option of doing it - although the mechanism naturally favours economy over redundant marking. Therefore children may easily over-mark meaning elements, especially when they are concerned to convey those elements clearly. They need to learn the conventions of adult language about what can be redundantly marked, and what cannot.

For negation, over-marking will not happen when the child has learnt just one, clause-external negator - but may happen later when she has also learned some clause-internal negators, and has the option to use them as well. Perhaps not yet being confident of the clause-internal forms will increase the tendency to double mark `for safety'.

How do children learn to correct these over-markings ? As for many other errors, the theory has a direct mechanism to gather implicit negative evidence [3.9] - discrete occasions where the child can observe `where I would have used a double negative, an adult used a single clause-internal negative'. This is an intrinsic part of the primary learning mechanism [3.11] . Accumulating enough of these examples, the child learns that specific double negative forms are not used; but this is a slow learning process [3.6], and so children correct these errors slowly.

5.4 Morphology