R.P.Worden
rworden@dial.pipex.com
In the paper `An Optimal Yardstick for Cognition' published in Psycoloquy volume 7, there was not space to include a derivation of the main result, the requirement equation, or of a subsidiary result concerning hill-climbing to the optimum. This is an extract from another paper in draft, which contains those derivations. References are as for the Psycoloquy paper.
Learning is a facet of cognition, so to find the best possible form of learning in a biological context, we must in turn find the best possible form of cognition. Under certain fairly broad conditions, this optimal cognition (i.e cognition which leads to best fitness) can be defined in a simple equation, the Requirement Equation. This equation is Bayesian in form, and so leads to a Bayesian form of learning; deriving the requirement equation gives us a clear notion of why, and under what conditions, Bayesian learning gives optimal fitness.
In many situations, animals use their brains to make decisions under uncertainty, and their survival depends on the outcome. Such situations have features in common:
¥ There is an underlying true state of affairs, but the animal may not know what it is.
¥ The choice of best action depends on the state of affairs; if the animal knew that, the choice would often be fairly straightforward.
¥ The animal has limited sense data, which depend on the state of affairs. These alter the balance of probability between possible states, but may not pick out one state of affairs uniquely.
Since all forms of cognition have evolved for some biological purpose, and that purpose involves some choice of actions, this characterisation applies in principle to any facet of cognition - deciding whether to eat something on the basis of its smell, whether to go foraging, where to forage, how to get home, and the immediate control of bodily movements. It also, as we shall see, applies to the `choice' of what generalisations can be learnt from experience.
This characterisation of cognition is formalised to make the mathematical analysis. In so doing, we make certain approximations, whose appropriateness must be judged in the biological context for any possible application of the results.
An animal's cognitive system helps it deal with certain classes of external states of affairs denoted by Si. Different classes of states are used to analyse different domains of cognition; we may choose to slice reality in one of several different ways to define the states. We may choose a class of Si which defines the food situation, ignoring all other details (states 'hungry', 'replete'); or a class which defines the predator situation, the social situation, or combinations of these, depending on the problem. i is an index ranging over all possible states of affairs in the class.
We consider the animal's life as a succession of discrete, independent encounters. Intelligence helps to maximise the animal's chances of surviving each encounter. During any encounter, just one of the possible states of affairs Si holds, with probability PA(Si) .
The duration of encounters is chosen according to the aspect of intelligence which we are analysing. For some aspects (particularly for learning cause-effect relations) long encounters are appropriate; within one encounter, the animal may observe enough events to learn from. A state of affairs Si may describe regularities which hold over an extended interval, such as "red berries are poisonous" or "thin branches bend"; or it may describe the situation at one moment, such as "predator about" or "lake frozenover".
The animal does not know which state of affairs holds; it does not observe states of affairs directly, but observes a configuration of sense data, Dm , caused by the state of affairs. Dm denotes all sense data, of any modality, which the animal has over the duration of the encounter and which relates to the state Si. m is an index ranging over all possible configurations of sense data.
There is a causal relation between a state of affairs Si and resulting sense data Dm. This relation is described by a conditional probability P(Dm|Si), to be read as "the probability of having the sense data Dm , given the state of affairs Si ".
Based on its sense data, the animal chooses to take some action denoted by Ak . Depending on the state of affairs Si and the action Ak, there is an outcome denoted by On ; typical outcomes are "find food","caught by predator", or "meet mate". The index k ranges over all possible actions, and n ranges over all possible outcomes. Outcomes are results with a defined survival or reproductive significance for the animal. There is a causal link between actions, statesof affairs and outcomes, described by a conditional probability P(On|Si Ak).
Each outcome has a value V(On ) to the animal. Values may be measured in different currencies such as net energy intake, probability of finding a mate, and so on. Different currencies are used for different problems; all currencies should be convertible to a common scale, defined by the probability of survival to adulthood and reproduction Ñ by the contribution of the outcome to the animal's Darwinian fitness or, more generally, its inclusive fitness. The ideas of currency used here are similar to those in evolutionary game theory (Maynard Smith 1982) and foraging theory (Stevens & Krebs 1983).
The animal needs to choose an action Ak to maximise the average value V of the possible outcomes On . By maximising these values for many different situations of different typesthroughout its life, the animal maximises its chances of survival to maturity, and of reproduction.
The way the animal chooses an action Ak, given sense data Dm , is summarised in a decision function Fk(Dm ). Fk gives a value for each action index k and sense data Dm . The animal acts as if, when having sense data Dm , it calculates all the decision functions Fk for each possible k, and chooses the action Ak with the largest Fk .
Decision functions can be used to describe any relation between sense data and actions ; that is, to describe the input-output relation of any cognitive system. So they give an external, functional specification of any possible brain. Our problem is: what is the best possible set of decision functions Fk ?
Figure 2 shows the relations between these concepts.
Figure 2: Flow of causation for the mathematical model of animal cognition. Time ordering is from left to right, and arrows flow from causes to effects. The probabilities P of states of affairs S, of sense data D and of outcomes O are discussed in the text. The horizontal dashed line denotes the interface between the animal and its environment.
We have a lot of freedom to choose the definition and duration of encounters, and the states of affairs S, depending on which aspect of cognition we are analysing. So we can usually ensure that assumptions (1) - (3) are justified, so that the requirement equation result (below) holds. There are some cases where we cannot do so; then the result can be used only as a starting point, possibly as a first step in a more complete analysis.
This situation is similar to the analysis of some problems of mechanics. Various powerful general results (such as conservation of energy, or momentum) hold for idealised systems which may need to be completely isolated, perfectly rigid or perfectly elastic, and so on. The real world is not like this, but we can often slice reality in ways which make it a good approximation; or if we cannot, it may still be a good guide to intuition and starting point for further analysis.
The requirement equation defines the best possible decision functions Fk(Dm) - those which give best average outcome V, and which therefore give greatest possible fitness.
The expected value from taking action Ak in the state of affairs Si is the average value of all possible outcomes, weighted by their probabilities :
Using the definition:
we can calculate the expected value from one state of affairs Si , which gives sense data Dm , if the animal uses decision functions Fk to choose an action:
The theta function picks out one term from the sum over actions Ak - the term which has the largest Fk. So it embodies the decision rule to pick out one action by calculating the Fk and picking the largest. U is the average value resulting from that choice.
The average value for a single encounter is got by summing over all states of affairs and sense data sets, weighted by their probabilities:
where :
Gk depends only on things outside the animal's control - on probabilities of states, probabilities of sense data, probabilities of outcomes and values of outcomes. Gk(Dm) is effectively the average value of doing action Ak when having sense data Dm .
In the design of a cognitive system, Gk cannot be varied, but the decision functions Fk can.
(2.4) is a sum over all Dm, where (because of the q function) for each Dm only one Gk enters the
sum. The maximum
is got by setting
since this choice means that for every Dm , the q function picks out the largest of the Gk and adds it to the sum. Any other Fk gives a worse average outcome, by sometimes picking a smaller Gk .
Therefore the optimum decision rule is given by
Thus equation (2.7) is the requirement for animal cognition - the optimum form of cognition in all circumstances. I shall call it the Requirement Equation.
We can multiply all the Fk by any positive common factor, and it will not alter the choice of action. This enables us to rewrite (2.7) in two parts :
(2.8) is recognisable as Bayes' theorem, applied to find the most likely state of affairs Si in the light of the sense data Dm; then (2.9) chooses the best possible action, in the light of the likely states Si. For many problems, finding the state by (2.8) is the hard part, and then choosing an action by (2.9) is comparatively easy.
There are important cases where the environmental probabilities entering the requirement equation (via the term PA(Si) of equation (2.7)) depend on the numbers and behaviour of conspecifics - which in turn are output of the equation. These arise particularly in competitive behaviour, such as competition for mates. In these cases the optimum solution of the Requirement Equation turns out to be the Evolutionary Stable Strategy (ESS) (Maynard Smith, 1982). The evolution of any form of competitive behaviour, such as the Hawk v. Dove example, can be analysed by the requirement equation.
In showing that best-fitness cognition is Bayesian in form, one might think that we have just confirmed an approach to modelling cognition which many people have been using for some time. The importance of this derivation is that:
(1) It shows just what assumptions and approximations are needed (in the mathematical model of this section) to derive the result - and hence shows where the optimum result might not be appropriate.
(2) In equation (2.4) it shows what fitness results from any other (non-Bayesian) decsion rule, or from a Bayesian decision rule with incorrect prior probabilities; hence it shows the form of the fitness function away from its peak. This form is important in understanding how evolution might converge towards the peak (see appendix A).
Evolution is a local hill-climbing process in the space of possible designs; might the evolution of brains get stuck on some local maximum, rather than moving towards the global maximum of the requirement equation ?
Gould (1980) uses the example of the Panda's thumb to illustrate how evolution adapts whatever is at hand for the job; which may be far from optimal. Many believe that similar local maxima occur in the evolution of brains; so that the brains we see today are sub-optimal bodges. In the framework of the requirement equation, we can prove a result which strongly suggests that they are not.
Every possible design of brain computes some choice of action Ak for each possible set of sense data Dm. Therefore every brain is equivalent to some set of decision functions Fk(Dm) - the set which picks out the same actions as it does for all sense data.
Express the decision functions Fk(Dm) as sums of a set of basis functions Ekl(Dm) :
If we choose a large enough set of these basis functions Ekl , then we can express the decision functions Fk(Dm) of any actual brain, to a good approximation, in terms of them (just as any continuous function can be expressed as a Fourier sum of Sine waves). So any possible brain is equivalent to some point in the multi-dimensional space of the weights hkl .
There is a particular set of values h0kl which gives the optimal decision functions Fk(Dm) of the requirement equation (2.7). We can then show that there are no other local maxima in the space of the weights hkl ; the peak is unique. This result is proved as follows :
By assumption, when all hkl = h0kl (at the peak of fitness) the decision functions are
where the Gk(Dm) are as given by equation (2.5) or (2.7) (the requirement equation), depending on probabilities in the environment. Away from the optimum, we write
so that
and consider the behaviour as l increases from zero. We wish to show that as lambda increases along any ray in the space of the h (i.e with any set of jkl ), average fitness decreases monotonically. For any lambda, the animal's fitness Vavg is as given by (2.4), repeated here:
This is a sum over all possible sets of sense data Dm ; so the animal's fitness is a weighted average of the expected payoffs Gk it has in different circumstances which give different sense data. We shall show that each term in the sum (for different Dm) decreases monotonically with l, so the average fitness also decreases monotonically.
Consider two values of lambda : lp and lq , such that 0 < lp < lq . For each lambda, it chooses the action Ak which has largest Fk(l) and then gets the average payoff Gk for that action; the chosen actions are Ap and Aq at lambda = lp and lq respectively.
For lambda = lp , Fp(lp) > Fq(lp) implies (dropping the dependence on Dm ):
For lambda= lq, Fq(lq) > Fp(lq) implies:
Adding (3.7) and (3.8) gives:
Since (lq - lp ) > 0, the sum in (3.9) must be positive. Therefore the left-hand side of (3.7) is negative; so (3.7) can only be satisfied if Gq Gp . This proves the result, that the average fitness G for any particular sense data Dm decreases as lambda increases. The result then extends to Vavg , the sum over all possible sets of sense data ( the function in (3.5) is never negative; it is either 0 or 1).
This shows that in the space of the weights hkl , as we move along any ray outward from the optimum point hkl = h0kl , the fitness decreases monotonically. Conversely, starting from any other point in the space, fitness increases monotonically along a straight line up to the unique maximum. There are no other local maxima in the space of weights hkl .
At any moment, the brains of a species form a cluster of points in this space of weights hkl. Different individuals have different genetic makeup, and occupy different points with different fitness. Overall, the species occupies some subset of the space, of much lower dimensionality than the whole space. Natural selection causes the species to explore the sub-space defined by its genetic parameters, and (after some generations) to cluster in the region of greatest fitness.
This peak region may not be anywhere near the overall maximum fitness point h0 . However, there is nothing to stop new genes arising, which make the design of the brain more complex and enable it to break out of its previous low-dimensionality design space. The result proved above guarantees that there is some direction of increasing fitness in which it can do so (e.g. along the straight line to the peak).
The physical design of bodies is design in a space of limited dimension, where there may be genuine local maxima, surrounded on all sides by regions of lower fitness. If the design for locomotion of greatest fitness is a wheel, there may be no path from legs to wheels which does not involve lower fitness than either somewhere along the path. However, the space of possible brains has very high dimensionality, and (as we have proved) has no local maxima. Therefore the brain of any species can always become a little more complex (by the evolution of extra genes) to break out of its current design space and circumvent any apparent local maxima; by a series of these breakouts, it can approach arbitrarily close to the true unique peak of fitness, given by the requirement equation.