Primate Social Intelligence

R.P.Worden

Charteris Ltd, 6 Kinghorn Street, London EC1A 7HT

rworden@dial.pipex.com


Abstract

A computational theory of primate social intelligence is proposed, in which primates represent social situations internally by discrete symbol structures, called scripts. Three well-defined computational operations on scripts are sufficient to support social learning, planning and prediction. This gives a formal, predictive model in which to analyse how primate social knowledge is acquired, as well as how it is used.

The theory is compared with primate data, such as Cheney and Seyfarth's observations of vervet monkeys. It gives simple, understandable script-based analyses of many observed phenomena - such as the recognition and use of kin relations, learning of alarm calls, habituation to calls, knowledge of rank, tactical deception, and attachment behaviour.

I argue that a tight, concise theory of social cognition, such as script theory, is needed to explain the rapid learning and social guile seen in primates. It also has the benefits of simplicity and testability. The extension of scripts to incorporate a primate theory of mind is described in a subsequent paper.

Published in Cognitive Science, 20(4): 579-616, 1996


1. Introduction

In recent years our knowledge of primate behaviour and intelligence have grown rapidly, giving new insights into the origins and nature of our own intelligence. It has been proposed that the richness and complexity of primate social interactions have been a forcing-house for the growth of primate intelligence (Jolly 1966; Humphrey 1976).

Primate social cognition is often approached by informal verbal descriptions (eg Byrne and Whiten 1988; Dennett 1983; Cheney and Seyfarth 1990). This paper presents a working computational model of social intelligence. By making the model complete and consistent, we force all its assumptions into the open and can calculate its predictions unambiguously. The main results are :

- There are good reasons to expect that primate social cognition is based on discrete, symbolic representations of social situations. Scripts are such a representation, chosen to be as simple as possible.

- A complete and consistent theory of social cognition can be built using scripts and three basic operations on them.

- The theory gives simple, understandable accounts of many observations, such as primates' understanding of kin and status relations in their group, of alarm calls and attachment behaviour.

- The theory gives highly adaptable social intelligence, with rapid learning of new social regularities - in broad agreement with observed primate behaviour.

A formal notation to describe primate social knowledge and behaviour has also been proposed by Byrne (1993), using a production rule formalism. The script analysis proposed here has features in common with Byrne's proposal, differing mainly in having an explicit theory of learning, tailored to the social domain.

The theory describes general primate social intelligence, as seen in monkeys and most primates, but not the extended social intelligence - which seems to require a knowledge of others' knowledge and intentions - seen in the great apes and mankind (Premack & Woodruff 1978; de Waal 1982; Byrne and Whiten 1988). The extension of this theory to include the primate 'theory of mind' is described in a subsequent paper (Worden 1995a).

Section 2 discusses the problem of primate social intelligence, and the types of computation in the brain which might underlie it, motivating the approach taken in this paper. Section 3 presents the computational model, which uses tree-like information structures called scripts, and three key operations on them for learning and performance. Scripts are easily envisaged, and the operations can be done with pencil and paper. I describe how these operations are used for learning, planning and prediction of social situations.

Section 4 compares the model with observations - particularly those of Cheney and Seyfarth (1990) on vervet monkeys. I give script-based analyses of monkey alarm calls, use of kin and rank relations, and attachment behaviour. Section 5 discusses further tests of the theory, while Section 6 compares the theory with other work and discusses its general implications.

In spite of the use of the term 'scripts', this computational model of social intelligence contains elements of scripts, mental models and production rule systems; the same script structures can serve as a specialised mental model of social situations, or as rules defining how those situations may develop. In this, the model has much in common with the framework for induction of Holland, Holyoak, Nisbett and Thagard (1986), which combine the same elements. Their models are more elaborate, being applied to human cognition; this simpler model, which maps onto a subset of theirs, applies to general primate social cognition. Both put strong emphasis on learning.

The theory of this paper tackles not only the problem of how social knowledge is represented and used in the brain, but also the related (and harder) problem of how that knowledge is acquired. I hope that by presenting a defined and predictive theory of primate social intelligence I may stimulate those who work with primates to express their findings in its terms, and to devise tests of the theory.

2. The Need for Social Intelligence

2.1 Social Intelligence in the Primate Brain

Social interactions in primates are more complex than those in other mammals. Some examples:

- Kin recognition : (Dasser 1987) has shown that monkeys recognise kin relationships amongst their peers.

- Redirected Aggression : (Judge 1982; Smuts 1985) After a fight between two monkeys, relatives of one are likely to threaten relatives of the other, showing again that monkeys recognise kin relations and use them in social exchanges.

- Protective Threat : (Kummer 1967) Female baboons will stay close to a dominant male for protection, and will use elaborate tactics to try to separate some other female from this protection.

Other examples are described in section 4, where they are compared with the theory. These examples show that primates have a detailed knowledge of others in their group, of their kin, status and alliance relations, of their current state and activities, and of the cause-effect regularities of their society; that they combine all this knowledge in flexible ways to achieve diverse goals, such as:

- Attachment to a parent

- Feeding

- Reproduction

- Avoidance of predators

- Maintaining status in the group

- Caregiving to offspring

Each one of these goals involves complex coordinated patterns of behaviour, and can be studied as a behavioral system (Hinde 1982). At any one time, an animal is involved in typically one, or at most two or three behavioural systems. In higher animals, a behavioral system involves not just stereotyped reflexes, but also goal-directed behaviour.

To achieve the goals of any behavioral system, complex locomotor problems, problems of navigation, and social problems may need to be solved. For instance, in order to feed, a primate might have to navigate to a food source, negotiate social obstacles of in the form of dominant peers, and then climb a tree to pick fruit.

We assume that there are common 'modules' in the brain to help solve these problems across many behavioral systems (Fodor 1983). In particular, to solve immediate problems of locomotion, there is an internal representation, or mental model, of Local Space and Motion - abbreviated as the LSM model - which is closely linked to the visual system.

We then postulate a Social Intelligence Module (SIM) which is used to achieve social goals, resulting from any behavioral system (eg reproductive, feeding, attachment..). Jackendoff (1992) has proposed a similar 'faculty of social cognition'. Since social situations can depend on sense data of any modality (vision, smell, hearing,...), the SIM must receive inputs from all sensory areas of the brain, many of them via the LSM model.

This paper presents a formalism and a theory to analyse the workings of the SIM - in particular, to analyse the learning problem of how social knowledge is acquired. The modularity assumption helps to keep this social learning problem within tractable bounds, by assuming that certain hard problems of learning are already solved by other modules of the brain.

For instance, to learn directly from complex, multi-dimensional sets of input stimuli (such as visual data) there are problems of individuation (deciding which features in the visual field relate to some individual entity or part-entity in the environment), and categorisation (deciding which subspaces of the input space form significant clusters, and which are categorically distinct). Categorisation may involve hierarchically-structured taxonomies. Learning in visual, spatial and olfactory domains depends crucially on solving such problems.

The problems of individuation and categorisation occur in many domains of cognition; they arise for (and are solved by) many non-primate species, and so in evolutionary terms were probably quite well solved (within visual, olfactory, locomotor and other modules of the brain) well before the period around 50 million years ago, when primate social life started to become complex. I therefore assume that feature individuation and categorisation are solved by other modules in the brain, which deliver categorised, individuated symbols to the SIM. Its role is to learn and use social knowledge in the newly-complex domain of kin, alliances, etc.

This assumption is doubtless an approximation, but is a necessary one in order to proceed to a first understanding of the SIM. As we shall see, the social domain has enough complexity of its own, without mixing in those other challenges; maybe a later theory will tackle the interactions - how the SIM itself may contribute to individuation, categorisation, and so on.

2.2 The Structure of the Social Domain

A good strategy in many domains of cognition seems to be to form internal representations of situations in the domain; running an internal simulation of external reality is a low-cost way to check the consequences of possible actions, before doing them for real (for some relevant considerations, see (Vera & Simon 1993), and the responses to their article, and (Worden 1995c)).

To apply the idea of internal representation to the social domain, we first list some important properties of social situations; the theory will use internal representations which match these properties. I shall use examples from a hypothetical troop of monkeys with Roman names; and will contrast the social domain with the spatial/physical domain represented in the LSM. The social domain is:

(S1) A Structured Domain: A social situation is not just an unstructured set of components (such as Romulus, Remus, Portia, and threatening); it is important that Romulus is threatening Portia (rather than Remus threatening Romulus). The structure and interrelations between the components are crucial.

(S2) A Systematic Domain : If it is possible for Romulus to threaten Remus, then it is equally possible for Remus to threaten Romulus. The set of possible social situations is a systematic set, which we can enumerate systematically (Fodor 1987; Fodor & Pylyshyn 1988); so is the set of possible causal relations between situations.

(S3) A Productive Domain: The set of possible situations is very large. If there are many individuals in a monkey's group, then any subset of them can be involved in the current situation; they can be in many different binary relationships (grooming, fighting, mating,...) and each one may have many attributes (large, male, hungry, angry,....). This makes a combinatorially large set of possible situations; and the set of possible causal relations (situation A causes situation B) is even larger.

(S4) A Domain of Discrete Values: A monkey's social milieu involves discrete, identified individuals, who tend to be in discrete, all-or-nothing relations to one another (two monkeys either are siblings, or they are not); and their behaviour tends to be discrete, as defined by their on/off behavioral systems. A monkey is feeding, or not; is is oestrus, or not; and so on. Many of the key variables describing the social situation are discrete variables, each with a few possible discrete values. (The categorisation to find these discrete values is done outside the SIM.)

This is a key difference between the social and spatial/physical domains. Physical situations also are structured, systematic and productive; but they are described by continuous variables such as sizes, distances and velocities.

(S5) Causal Relations Hold Over Long Intervals : The interval between social cause and effect may extend over minutes, hours or days. Remus, being intelligent, can remember for long periods and may bide his time. This is a second major difference between social and physical domains; in the domain of local physical movement, cause generally follows effect within a fraction of a second.

(S6) Generalisations Across Individuals are Important : Many causal regularities, such as "When X makes a distress call, X's mother will react" (Cheney & Seyfarth 1990) are generalisations across individuals; X may denote any juvenile in the troop. These generalisations are very prevalent and important in primate social life.

(S7) There is Chaining of Cause and Effect : If A causes B, and B causes C, then effectively A causes C. This can be used both for anticipation of outcomes and for planning one's own actions.

2.3 Cognitive Models of Social Intelligence

We next compare these seven properties (S1-S7) with four possible classes of cognitive model, to see how well they match:

A. Conditioning Models such as the Rescorla-Wagner (1972) model do not capture the structured, systematic and productive character of social situations (S1-S3), because they represent each causal relation by a single local coupling strength; there is no representation of the structure of the relation, or systematic enumeration of possible relations.They can represent discrete values (S4) causal relations over long intervals (S5), chaining of cause and effect (S7), but have no way of discovering or representing the generalisations across individuals (S6) which are important in social cognition.

B. Neural Net models (Rumelhart 1991; Denker et al 1987) do not capture the structured, systematic and productive character of the social domain (S1-S3) (Fodor & Pylyshyn 1988). While they can generalise from examples, they have no special sensitivity to the generalisations across individuals which are important in the social domain (S6); most neural nets would not form such generalisations without extensive and exhaustive training data (e.g. thousands ofexamples) which is not available to the average primate in its lifetime.

C. Mental models (eg analogue representations of local space and motion such as the LSM: Johnson-Laird 1983) are probably used by higher animals to predict the movements of objects around them and to plan their own movements - for instance in hunting. To the extent that this spatial/physical domain resembles the social domain (as in properties S1 - S3 and S7) these mental models are suited to the social domain. However, they are not sensitive to many of the key variables of the social domain (eg kin and status relations) (S4) or generalisations across individuals (S6), and do not model causal relations which hold over long time intervals (S5). Also, a detailed, continuous space-time model of a situation would be an overkill to represent a few simple discrete social facts.

D. Symbolic Processing (Charniak & McDermott 1988) has the structured, systematic and productive character needed for the social domain (S1-S3). It is also well suited to handle the discrete values involved in social situations (S4), the generalisations across individuals (S6) and the chaining of cause and effect (S7). It has no intrinsic bias against representing causal relations which hold over long time intervals (S5).

The match between features of the social domain and these styles of computational model is summarised in table 1.

Assoc.

Condit

Neural

Nets

Mental

Models

Symb

Proc.

S1Structuredyy
S2Systematicyy
S3Productiveyy
S4Discrete

Values

yyy
S5Long Time

Intervals

yyy
S6Generalise

across indivs

y
S7Chaining of

Cause/ Effect

yyyy

Table 1: The match between the social domain and four styles of cognitive model

Symbolic processing techniques, as developed in Artificial Intelligence to handle problems such as planning, language and logic (Charniak & McDermott 1988), are well suited to the social domain. Furthermore, there are well-developed theories of symbolic learning.The model of social cognition which we shall describe is largely symbolic, but is not simply a symbol processing model; it takes important features from the other major styles of computation. Like mental models, it uses internal representations of the external situation; and like conditioning models, it can learn regularities from very few training examples, using a statistical criterion of sufficient evidence.

3 A Theory of Social Intelligence

3.1 Structure and Meaning of Scripts

I shall describe the theory at Marr's (1982) algorithmic level - an abstract description of information structures and operations on them - not going to the implementation level to consider possible neural realisations (that is probably the level at which neural nets are relevant, as components of the SIM).

The SIM uses sense data of all modalities, and is concerned with discrete-valued information (S4), which can be encoded concisely. We assume that the sensory systems of the brain and the LSM model send concise, discrete information to the SIM; for instance, the visual cortex and LSM reduce the great volume of information from the eyes to a much smaller volume of output information, containing, for instance, discrete tokens whose meaning is essentially "leopard, over there" or "Remus, angry". Similarly for the auditory cortex, and other sensory modalities. Categorisation and individuation problems are solved in these other brain modules, in this approximation.

In like manner, the outputs of the SIM consist of concisely encoded command symbols such as "attack Romulus" or "run away" or or "submit"; the conversion of these high-level commands into detailed motor sequences, changes in hormone levels and so on, is done by other brain subsystems, acting on concise commands from the SIM.

We look for the simplest internal representation of social situations which captures their important properties - the properties (S1) - (S7) of section 2. A script is a tree-like information structure designed to capture these properties. Scripts are derived from the scripts introduced by Schank & Abelson (1977) and are notationally similar to them (and to many other AI knowledge representations). They differ from Schank's scripts in having a precise mathematical theory for their learning and use, which can be used to show that these scripts are an optimal solution to the problem of social cognition - giving the best possible fitness under defined conditions. There is not space here to present the mathematical theory of scripts, or the proof of their optimality; these are the subject of a later paper. Here we use examples to illustrate the key properties of scripts, and to show how they are used for social learning and intelligence. Any sequence of primate social events can be represented as a script, such as that in figure 1.

Figure 1: A script Frepresenting a simple sequence of social events. Node types are denoted by sr:script node; se: scene node and en: entity node.

This script shows a sequence of two scenes. In the first scene, the monkey which 'owns' this script (who is denoted by the identity 'self') bites another monkey, Portia. In the second scene, as he is eating a nut, Portia bites him back. The whole script is denoted by a symbol Fwhich will be used later.

The script is constructed of nodes (circles in the diagram) connected in a tree-like structure. The tree is rooted at the top script node, to which are connected several scene nodes - each one denoting the events happening at a place and time. The arrow between the scene nodes indicates that one scene precedes the other.

Below each scene node are entity nodes, denoting animals (peers of the script owner) or things. Each node has some slots, each with a value denoting some property of the node. These are shown as slot:value pairs written next to the node. A slot typically has a small number of allowed discrete values (eg gender can be 'male' or 'female'). The slot 'id:' denotes the identity of an individual.

Further nodes are used to denote binary relationships between individuals and other entities - relationships such as grooming, eating, mother-of, and so on. The slot 'rel:' describes which relationship is involved; it too has a few discrete allowed values.

Using suitable slots and values, scripts can describe social situations and sequences of some complexity; there is no limit to the size of script trees. Many important facets of primate social behaviour can be described using simple scripts, such as the examples in this paper.

Scripts embody by design several of the properties (S1) - (S7) of social situations. They are a structured representation (S1) using tree structures and linking information (in slots) with individuals (nodes). They are systematic (S2), in that there is a systematic set of possible tree structures; and productive (S3) in that the number of possible script trees grows exponentially with their size. Finally, as slots have discrete values, scripts are a discrete-valued representation (S4). It is hard to envisage any more concise information structure which could capture these important properties of the social domain.

3.2 Factual Scripts and Rule Scripts

In the theory, each primate continually forms script representations of the social events which he or she observes. These are called factual scripts, and form a sort of historic record of the primateÕs life (or recent past). The purpose of having this representation is to predict likely social outcomes before they happen, and take appropriate actions. To predict outcomes, you need to know the causal relations by which the present influences the future. We need a flexible and expressive way to represent both general and local social causal laws. Scripts also provide this representation, in the form of rule scripts.

Suppose that the same monkey as in figure 1 also observes the sequence described by the script Fof figure 2. Again he bites another monkey, and again he is bitten back.

Figure 2: A script Frepresenting another biting incident, involving the same monkey 'selfÕ

There seems to be an underlying regularity here, of the form "If you bite someone, he or she will bite you back." This regularity is represented by the rule script R in figure 3.

Figure 3 : A rule script R which underlies the examples of figures 1 and 2

This rule script R is interpreted like the factual scripts Fand F, but with the following extensions:

- Every rule script has one or more cause scenes and an effect scene; it says that if the cause scenes occur, the effect scene is likely to follow, with a probability defined in the rule.

- The effect scene may follow some time after the cause scenes.

- A generalisation across individuals is expressed by using a wild card identity (the slot id:?X) on two nodes of the script. On its first occurrence, the slot 'id:?X' effectively means 'any individual'; on its other occurrences, it means 'the same individual'. Wild cards are like variables in algebra, or in programming languages such as Prolog (Clocksin & Mellish 1979).

Rule scripts embody two further important properties of the social domain; they allow us to express causal relations which act over long time intervals (S5) and generalisations across individuals (S6).

3.3 Social Planning and Prediction - Applying Rule Scripts

Suppose that a monkey has a number of rule scripts R, S, T ..., similar in form to the rule script R of figure 3, each describing some causal regularity of monkey social life. These can be used in several ways to guide his social actions:

(1) Prediction : Suppose that the factual script F, which describes the current situation, matches with the cause scenes of a rule script R. This means that the rule R is applicable to the current situation, and the effect scene of rule R predicts what will ensue; the monkey may then take appropriate action to anticipate what will ensue.

(2) Forward Planning : Suppose a monkey is considering some action, after which the current situation will be FÕ. Again, if FÕ matches the cause scene of some rule R, the effect scene of R predicts what will ensue from his action, and may indicate that the action should or should not be taken.

(3) Goal-directed planning : suppose a monkey has a social goal which can be described by a script G. Now if G matches the effect scene of some rule R, the cause scenes of R may indicate what the monkey needs to do to reach the goal - to bring about the required effect.

Clearly, then, having a good set of rule scripts can be a major asset in predicting and exploiting social situations. I shall illustrate just one of these cases, the case (2) of forward planning.

Suppose that the same monkey as in the previous examples is considering biting yet a third monkey. His intention to bite is described by the script FÕ of figure 4; but suppose he also has the rule script R of figure 3. He may use this script to anticipate the consequences of his intention FÕ. By the test of script inclusion, he may realise that the rule script R matches the script FÕ which would arise if he carried out this intention to bite; he can then unify the rule script R with his intention script FÕ to find out the likely consequence.

Figure 4: A script FÕ describing an intention, to bite someone else.

Unification is a process of matching two scripts, node by node, to get the maximum possible overlap and including all the information from both scripts in the result; it cannot be done if the two scripts have conflicting information. It is much like unification in Prolog (Clocksin & Mellish 1979). The result of unifying R with FÕ is written as R U FÕ, and is shown in figure 5.

Figure 5: The result FÕ U R of unifying the scripts in figures (3) and (4), to calculate the consequences of the rule R in situation FÕ.

Unifying with the rule script does not alter any of the information in FÕ, but adds to it the information implicit in the rule R - drawing out the consequence that Cassius is likely to bite back. In this way, the monkey may anticipate the consequences of his actions, and save himself injury.

Prediction by script unification can be taken further, as the 'effect' scene of one rule may match with the 'cause' scene of another rule; then the second rule script can also be unified to predict a further consequence. Similarly the backward chaining of rules for goal-directed planning (from a desired goal to the required actions) can be chained through several steps if necessary. Thus the rule script mechanism embodies the chaining of cause and effect (S7) in social encounters.

Script unification is similar to the firing of a production rule, as in many AI systems, and as used by Byrne (1993) in his formal notation for primate social intelligence. Scripts can express the same information as these production rules. They are also similar to the scripts introduced by Schank and Abelson (1977) to describe childrens' social knowledge. In AI terms, therefore, the use of scripts for planning and prediction is not new (apart from its application to the social domain).

3.4 Learning Rule Scripts

Having a notation to describe primates' social knowledge, and a mechanism for them to apply that knowledge, does not yet give us a predictive theory of primate social behaviour. We might still endow a primate with an arbitrarily powerful set of rule scripts, giving it great (and unrealistic) powers of social anticipation. The theory is not predictive until we include a theory of social learning - so we can predict, from a primate's previous history, what particular set of rule scripts it is likely to know.

Some rule scripts may be an innate part of primates' cognitive makeup. Innate scripts cannot be assumed without limit; an arbitrarily powerful innate endowment of innate scripts would make the theory non-predictive. I shall return to the issue of innate rule scripts in section 4.6; for the moment we shall assume that there are very few innate scripts, and that the majority of useful rule scripts are acquired by learning.

Making a good cognitive model of script learning is harder than modelling the use of scripts. It is an example of a class of problems which have been extensively studied in AI and machine learning - the class of concept learning problems, where some complex concept, or structure (such as a production rule, or rule script) must be induced from examples (Michalski 1986). The theory described here embodies a concept learning procedure which, under well-defined but fairly broad conditions, is optimal for the primate social domain. This is the form of learning which gives best possible fitness, and which we would therefore expect to evolve under the pressure of primate social competition. It is compared with other computational models of concept learning in section 6.2.

The problem of learning rule scripts consists of two sub-problems:

(1) Finding candidate rule scripts : The space of possible rule scripts is a very large one - the number of allowed rule scripts, even including just the simpler structures, may run into many billions. Some means is required to find a few good candidate rules to investigate, out of this vast space of possibilities.

(2) Knowing whether to 'believe' a candidate rule script : In this regard, there are two possible penalties from poor performance: (a) the penalty of not believing a true rule script (and therefore failing to apply it for social planning and prediction), and (b) the penalty of believing some untrue rule script (which is not a true causal regularity of your social milieu, but which appears to be true because of fluke events). Both these penalties lead to decreased fitness, and the learning mechanism needs to minimise the combined penalty of (a) and (b).

To solve both these problems, the concept of the information content of a script is important. For any script S, its information content I(S) can be approximately calculated as a sum of the information content on each node, which in turn is a sum of the information content from each slot on the node; for instance a slot 'gender: male' contributes one bit to this sum. From inspection of examples, typical primate rule scripts appear to have an information content in the range 20 - 100 bits.

If there is some rule script R which underlies a factual script F1, then all the information in R is also contained in F1, but there may also be extra information in F1 which is not in R; in this case we say that F1 includes R, written as F1 incl R. (Script inclusion is the inverse of subsumption in logic programming.) If the same rule script also underlies another factual script F2, then similarly F2 incl R. Given only the examples F1 and F2, but not knowing the rule R, what is the most likely form of the rule? Their script intersection, written as X = F1 int F2, is defined as the script with the largest possible information content which obeys both F1 incl X and F2 incl X; thus it is a good candidate for the rule R.

There is a simple procedure to calculate the intersection of any two or more scripts. This involves matching the scripts together, node by node, to maximise the overlap of information, and retaining only the slots and nodes which match, keeping only structure which is common to the two scripts. For instance, the rule script R of figure 3 is just the script intersection of the factual scripts F1 and F2 of figures 1 and 2. Script intersection automatically discovers the generalisations across individuals (creating wild card identities in R) which are an important property of the social domain (S6).

Suppose that a primate has a set of N factual scripts F1, F2,.... Fn, recording his recent social history. Form all the script intersections (Fi int Fj) between pairs of factual scripts. If two scripts Fi and Fj do not arise from some common underlying regularity R, then any similarities between them are mere coincidence, so the information content of their intersection Fi int Fj will be very small. If, however, Fi and Fj arise from the action of a rule script R, their intersection obeys (Fi int Fj) incl R, and must therefore have at least the information content of R; in fact (Fi int Fj) will be a good approximation to R, having only a few extra bits of information from other, coincidental, similarities between Fi and Fj. So taking pairwise intersections of factual scripts, and keeping only those results whose information content is above some threshold, is an effective and efficient way to find candidate rule scripts, giving a practical solution to sub-problem (1). Any rule whose effects have arisen more than once will be found in this way.

However, even if some candidate rule script seems to be indicated by two examples, it might have arisen just from spurious coincidences between those two incidents. A primate which accumulated many spurious rules, and acted as if they were true, would be at a disadvantage. How many examples are needed to 'believe' a rule script - without falling into the opposite trap of being an over-cautious slow learner ?

There is a Bayesian probabilistic criterion for learning, which minimises the combined penalty of (a) being a slow learner of true rules, and (b) believing spurious rules. Since there are many millions of possible rule scripts, and only a finite number of them are actually true, the prior probability for any rule script R to be true is very small; we model this small probability approximately by a form P(R) = C 2, where l is of order 2 or 3, thus penalising complex rule scripts with large I(R). Then if a set of factual scripts {F} appears to indicate some rule script R, we calculate the probability that R is true in the light of this evidence, in the usual Bayesian manner - comparing P(R) P({F}|R) with P(not R)P({F}|not R). In this way we can calculate the average expected penalties from (a) failing to believe a true R and (b) believing a spurious R, and minimise the sum of these penalties.

The result of this Bayesian analysis is that most rule scripts can be believed as soon as they have occurred in a rather small number - typically fewer than half a dozen - of examples in the set of factual scripts.The learning procedure is very fast, being able to learn a rule script from a few examples (any faster learning is not useful, because it would incur a greater penalty of learning spurious scripts). This fast learning contrasts with that given by neural nets and other reinforcement learning techniques, which typically require thousands of training examples to learn a regularity.

Note that the prior probability function favours simpler rule scripts, giving animals a kind of Occam's Razor-like tendency to believe the simplest set of rule scripts which can account for their experience; any extra rule is only believed when the evidence for it is statistically significant. At the same time, however, script intersection finds the most complex (information-rich) possible rule underlying two or more factual scripts; this enables animals to learn complex rules if they are true, and not to over-generalise.

This subtle tradeoff in the learning procedure allows an animal to learn both general rules and more specific exception rules at the same time, if both are true. For instance, it can learn the general 'retaliation' rule of figure 3, and a more specific rule that some individual (eg Claudius) tends not to retaliate. More examples are required to learn both a general rule and an exception, and the theory predicts how many examples are required. In this way primates can rapidly learn the important regularities of their social milieu.

3.5 A Consequence of the Learning Theory

While this learning mechanism is very efficient - learning most rule scripts from just a few examples - it has one simple consequence which may be important for experimental and observational studies. It implies that primates cannot learn any complex rule script from just one example. This result follows in the theory for two reasons:

1. The prior probability of a complex rule script being true is so small, that just one example cannot 'overcome' this small probability; it is more likely that the one case arose just by chance, so the rule should not be believed.

2. With only one example, there is no way to prune away irrelevant information about things which just happened to be going on in the example script, separating it from information which is genuinely involved the causal relation; so the resulting rule is likely to be too specific to be useful. (With two or more examples, script intersection is a very efficient way to prune out irrelevant information).

These reasons do not depend on the precise details of this theory, and may also hold in many other theories of social learning. The constraint only applies to complex scripts, with fairly large information content; simpler scripts might have such a large prior probability that they can be learnt from one example, just like taste-nausea conditioning in rats (Dickinson 1980) or may even be innate. However, this constraint against one-shot learning does apply to the kinds of complex scripts which would be needed, for instance, for tactical deception (Byrne & Whiten 1990, 1992).

3.6 Scripts in the Architecture of the Brain

In this theory, therefore, the primate Social Intelligence Module (SIM) continually receives pre-categorised, symbolic inputs from other cognitive subsystems such as the visual system. It arranges these inputs into factual scripts which form a record of the primateÕs social life. The factual scripts are continually input to the rule learning procedure (in 3.4 above) to find out new rule scripts, as soon as the evidence for each one becomes significant. At any moment the whole stock of rule scripts (learnt so far) can be used for prediction and planning of social actions, as described in section 3.3. This results in the SIM sending outputs to other motor subsystems, to execute the actions required by the SIM.

All this may take place as an automatic computation in the SIM, not necessarily linked to conscious awareness. Since our own conscious awareness is generally awareness of sense data (eg visual images, sounds of words) rather than of abstract symbolic structures like scripts, it seems likely that the SIM itself is not in conscious awareness; although it may cause activity in other brain modules, such as the LSM model, which does result in awareness.

An adult monkey may have many hundreds of rule scripts as well as the factual scripts from its experience. At any one time, typically only two or three of the rule scripts may apply. Any interference from other rule scripts would tend to lead to wrong conclusions.

This suggests that there are at least two distinct logical components to the SIM - a processing module where the script for the current situation is constructed, and a few appropriate rule scripts are unified with it to plan and predict, and a long-term memory where all rule scripts, and the historic scripts which are intersected together to form rule scripts, are stored. The long-term memory has a retrieval capability, to retrieve into the processing module just those scripts likely to be relevant to the current situation.

The script operations of unification, intersection and inclusion form a neat mathematical structure - the script algebra - which is similar to elementary set theory. A typical relation of the script algebra, true for any two scripts A and B, is that A = A È (A Ç B). These relations help to guarantee the self-consistency of the whole theory; for instance, if a rule script R is induced by script intersection from example scripts A, B, C, then the algebra shows that this can be done in any order, and R will not conflict with the examples which gave it.

Are scripts a declarative or a procedural knowledge representation? They can be anywhere along the spectrum between the two. A script with many scenes may represent a fixed procedure to achieve some goal. The same knowledge may also be represented as several smaller scripts (each with fewer scenes) which can be unified together to reach the same goal; but the smaller scripts are more like declarative pieces of cause-effect knowledge, and can be used more flexibly than the single large script. Finally, as we shall see in the next section, a script can represent a purely declarative piece of factual knowledge.

This script theory is distinctive in linking together the operations for inference (script unification) with the operation for learning (script intersection) in a tight, self-consistent structure, to make clear predictions about what can be learnt, how fast it can be learnt, and how it is used.

4. Comparisons with Observations