Rhesus Monkeys (Macaca mulatta) Spontaneously Compute Addition Operations Over Large Numbers

Mathematics is a uniquely human capacity. Studies of animals and human infants reveal, however, that this capacity builds on language-independent mechanisms for quantifying small numbers ( ! 4) precisely and large numbers approximately. It is unclear whether animals and human infants can spontaneously tap mechanisms for quantifying large numbers to compute mathematical operations. Moreover, all available work on addition operations in non-human animals has confounded number with continuous perceptual properties (e


General methods
The logic, procedures, and apparatus in these experiments have all been employed in previous studies of rhesus monkey numerical representations (Hauser & Carey, 2003;Hauser et al., 1996). Moreover, apart from the particular operations studied here, these methods follow closely those of Hauser and Carey (2003).
All experiments were carried out with adult rhesus monkeys living on the island of Cayo Santiago, Puerto Rico (Rawlins and Kessler, 1987). We used a between subject design, and presented each subject with only one test trial, comprising either a possible or an impossible outcome; we expected longer looking times at the impossible events relative to both the prior familiarization trial as well as the possible test event.

Procedure
Before observing a test trial, we exposed each subject to two familiarization trials that included all of the stimuli, the apparatus, and actions ( Fig. 1). During Familiarization 1 (F1), the presenter removed the screen from in front of the stage to reveal the number of lemons (each about 6 cm in diameter) matched to the outcome of that subject's test trial. For instance, a monkey in the possible group of Experiment 1, 3C1Z4 or 8, saw four lemons, while a monkey in the impossible group saw eight. Because this trial included the number of stimuli comprised by the outcome of that subject's test trial, longer looking times during test could not reflect a preference for looking at novel displays, or a particular number of stimuli. At the end of the trial, the experimenter cleared the lemons on the stage.
During Familiarization 2 (F2) the presenter placed the same number of lemons on the stage as would be present at the beginning of the test trial prior to the addition of the occluder. For instance, in Experiment 1, the experimenter placed three lemons on the stage. During Test (T) the presenter started with an empty stage, and then placed the same number of lemons on the stage as he had during F2 (e.g. three, for Experiment 1). He then concealed the contents of the stage by lowering the occluder and, as the monkey watched, added the number of lemons to the stage that constituted the second term in the operation being tested. For example, for a 3C1 operation, he added one lemon and then removed the occluder to reveal either a possible or an impossible outcome.
During impossible test trials, the presenter surreptitiously added or removed lemons from the stage using a preloaded pouch attached to the back of the occluder. Before being informed of the experimental condition (possible or impossible) the recorder eliminated from analysis all trials in which the subject looked away from the display area during any portion of the addition events. Digitized video records were coded blind with respect to trial and condition (inter-coder reliabilities, 0.87-0.95). Some monkeys were excluded from the final analysis at coding due to poor video quality. Overall, we attempted to test a total of 264 monkeys. The final data set included 161 monkeys across all five experiments. No monkey participated in more than one of the five experiments presented here.
For all five experiments, we ran unpaired t-tests comparing mean looking times (s) during test (T) between groups who observed possible and impossible outcomes. We also ran paired t-tests for the possible and impossible outcome groups, comparing within subject looking times during F2 and T. All statistical tests are two-tailed with significance set at P!0.05.
3. Experiments 1-3: 3C1, 2C2, and 4C4Z4 or 8 The first three experiments aimed to establish that rhesus (a) could discriminate between numbers that differed by a large ratio, but fell outside the small number range (1-3), and (b) could correctly discriminate between these numbers as outcomes for various addition operations. Previous work with non-human primates has not found mathematical competence for numbers greater than 3-4, except in laboratory settings involving extensive training. Although tamarins have been shown to spontaneously discriminate between large numbers of sounds, seemingly constrained by a Weber limit (Hauser et al., 2002), no studies have explored whether they can use these representations to compute mathematical operations. Moreover, in a search task with the rhesus monkeys on Cayo, Santiago subjects failed to correctly choose a box with addition outcomes of eight pieces of food over a box with four or three pieces; however, they successfully picked the box with more food when the ratios were smaller but the total amount of food per box did not exceed four (e.g. 2v3 and 3v4, Hauser et al., 2000). Because these experiments involved the serial addition of food items, rhesus failures in the large ratio and large number cases could be the consequence of an inability to operate over magnitude representations of large numbers with addition.
Experiment 1 presented a 3C1 operation, Experiment 2 a 2C2 operation, and Experiment 3 a 4C4 operation. We predicted longer looking at the outcome of eight for Experiments 1 and 2, but longer looking at four for Experiment 3. Experiment 3 therefore provides a control for the possibility that success in Experiments 1 and 2 is the consequence of a preference for looking at larger numbers relative to small numbers.
These data confirm the hypothesis that rhesus monkeys spontaneously discriminate between and operate over representations of large numbers. What these data do not reveal, however, is the format of their representations, nor whether the monkeys in these experiments represent the number of objects in the displays, or just their continuous perceptual properties. Experiments 4 and 5 provide one step toward addressing these issues.

Experiment 4: 2C2Z4 or 6
Given the monkeys' success in the previous three experiments with outcomes that differed by a 1:2 ratio (4 V 8), here we tested them on numbers differing by a 2:3 ratio. If ratios, rather than absolute numbers constrain discrimination, then rhesus may fail at the smaller ratios. We therefore presented subjects with either a possible 2C2Z4 event or an impossible 2C2Z6 event.

Results and discussion
The difference in looking time (Fig. 2) for the possible 2C2Z4 outcome (NZ14), and the impossible 2C2Z6 outcome (NZ17) was not statistically significant (t(29)Z1.8, PZ 0.91). These results stand in contrast to those in Experiment 2, where subjects succeeded on a 2C2Z4 or 8 task. Thus monkeys spontaneously represent and operate over larger numbers in the form of noisy mental magnitudes. The noise is proportional to the value of the numbers represented, and therefore accounts for an ability to discriminate between numbers that differ by a large ratio, such as 1:2, but not numbers that differ by smaller ratios such as 2:3. Future experiments will explore whether the success at 1:2 extends to larger numbers such as 8 vs 16, as well as to other modalities, as appears to be the case for human infants.

Experiment 5: 3C1Z4 large or 8 small lemons
A confound in Experiments 1-4 is that number covaries with several continuous dimensions. For example, monkeys may expect a particular contour length or volume of lemons, as opposed to a specific number of lemons. Studies of large number addition have not yet adequately demonstrated that animals can spontaneously operate over representations of number as opposed to quantity. Accordingly, in Experiment 5 we replicate the design of Experiment 1, but explore whether subjects are sensitive to the size of the lemons used in these operations, and consequently, the overall lemon quantity involved.

Methods
We generated three lemon sizes (Fig. 3A): small, medium, and large. Thus, eight small lemons in a row were equal in total length to six medium lemons or four large lemons. Monkeys in both the possible and impossible test groups observed 3C1 operations with medium lemons. At test, however, monkeys in the impossible test group saw an outcome of eight small lemons, while monkeys in the possible group saw an outcome of four large lemons. If monkeys generated expectations on the basis of total lemon contour length or volume, then they should expect to see about four medium lemons in length or volume. Instead, monkeys in both test groups observed outcomes equal to about six medium sized lemons in total contour length and volume. We predicted that if monkeys actually represent the number of objects in the test outcomes, then they should look longer at a numerically impossible outcome of 3C1Z8 compared to a numerically possible outcome of 3C1Z4.

Stimuli
Lemons were cut at both their ends, while lying length-wise, so that large lemons measured 7.3 cm in diameter, medium lemons measured 4.9 cm, and small lemons measured 3.7 cm. Lined up next to one another four small lemons, six medium lemons, and eight large lemons all occupied a total length of 29.3 cm (Fig. 3A).

Procedure
During F1 monkeys saw the number and size of lemons that they would see as the outcome of their individual test trial (Fig. 3B). Thus monkeys who would see a numerically impossible 3C1Z8 small lemons during test (T) saw eight small lemons at F1; monkeys who would see a numerically possible 3C1Z4 large lemons at T, saw four large lemons at F1. Therefore, monkeys in the possible and impossible test groups were equally familiarized to their individual test outcomes.
During F2 monkeys in both test groups observed three medium sized lemons placed on the stage. During test, monkeys in both groups observed all operations conducted with medium sized lemons, though test outcomes included only small or large lemons. Specifically, all monkeys saw three medium sized lemons placed on the empty stage (as in F2). These lemons were then occluded, and another medium lemon was added to the stage behind the occluder. When the occluder was removed, monkeys in the possible test group saw four large lemons, while monkeys in the impossible test group saw eight small lemons. Accordingly, the events observed during T can be summed up as three medium lemons C1 medium lemon Z4 large lemons (numerically possible) or eight small lemons (numerically impossible).
Monkeys in both the possible and impossible test groups observed outcomes during T that included individual lemons different in size from those observed during the operation phase of the test trial. Moreover, the large lemons involved in the possible outcome were individually more different from (C2.4 cm in length) the expected medium sized lemons compared to the small lemons (K1.2 cm) used in the impossible test outcome. Similarly, both test outcomes occupied the same, incorrect contour length, and both outcomes included an impossible total lemon volume. Therefore, if monkeys in this experiment tracked a continuous quantity other than number, they would have looked equally long at both outcomes, or longer at the possible test outcome. However, monkeys looked longer at the numerically impossible outcome of eight instead of the numerically possible outcome of four, demonstrating that they represented number as opposed to continuous quantities. This is the first experiment to demonstrate that animals spontaneously represent large numbers, not continuous quantities, and therefore suggests that representations of number, specifically, are naturally available to animals.

Conclusions
In Experiments 1-3 rhesus monkeys correctly discriminated between outcomes of four and eight for the operations 3C1, 2C2 and 4C4. These results demonstrate that rhesus monkeys can spontaneously (no training) use representations of large numbers to compute the correct sums of addition operations. Experiment 4 shows that this capacity is constrained by a Weber limit since the monkeys failed to discriminate between outcomes of four and six when the operation was 2C2. Finally, in Experiment 5 we demonstrated that these abilities rely upon representations of number, per se, and not continuous amounts, since the monkeys succeeded in discriminating between outcomes of equal and incorrect continuous dimensions, but differing on the basis of number. Taken together, these results suggest that noisy magnitude representations of large numbers that have been studied extensively in lab training contexts (Biro & Matsuzawa, 1999;Brannon & Terrace, 1998;Kawai & Matsuzawa, 2000;Nieder et al. (2002)) are spontaneously available to animals for use in mathematical operations.
One recent critique of expectancy violation looking time methods is the claim that longer looking times during "surprising" test events may sometimes reflect a longer looking preference for novel events, or, in the case of number, merely for larger numbers (Cohen & Marks, 2002) as opposed to an understanding of the principles involved. Our experiments, together with others (Hauser & Carey, 2003) are insulated from this critique in several ways. First, because we employed a between subjects design, each monkey was familiarized at F1 only to either the possible or impossible test outcome. Consequently, monkeys in both groups were equally familiarized to their individual test outcomes. Second, in Experiment 3, we demonstrated that monkeys look longer at an outcome with a smaller number (4, not 8), so long as it is an impossible outcome. This result obtained even though monkeys in the impossible group of Experiment 3 saw four lemons on the stage during F1, four on the stage during F2, and then during T, they saw 4C4Z4. This insured that the impossible test group of this experiment always saw the same exact perceptual stimulus (four lemons) throughout a session. Finally, in Experiment 4, monkeys failed to discriminate outcomes of four from six following a 2C2 operation. An interpretation of our results involving a preference for novelty, familiarity, or larger numbers would have to account for the significant discrimination of four from eight in Experiment 2, but no discrimination of four from six. We conclude that these five experiments, considered together, can only be interpreted in terms of the rhesus monkeys' understanding of number, as opposed to any other, lower-level, perceptual features.
In previous studies (Flombaum, Junge, Santos, & Hauser, 2003;Hauser & Carey, 2003;Hauser et al., 2000;Sulkowski & Hauser, 2001) monkeys have been shown to represent small numbers (3-4) precisely, apparently with the use of discrete visual object representations as opposed to continuous magnitudes. That is, monkeys' in these experiments could compare numbers that differed by small ratios (i.e. 2:3 and 3:4), though they could not make any comparisons between sets of numbers that included any value greater than four. In one striking example, rhesus even fail to choose a bucket with eight pieces of food instead of a bucket with three pieces, though they succeed with comparisons of 3 versus 4 in the same study (Hauser et al., 2000). Given the results of the experiments reported here, an open question about the previous work is why the monkeys fail to use magnitude representations in discriminating ratios of 1:2 or more. One possibility is that the serial presentation of one item at a time in those studies fails to engage systems dedicated to processing large numbers, perhaps because these systems require the presentation of several items at once as occurs in the present studies. An additional possibility is that in the two-box choice experiments, subjects must continuously update the sums in two different spatial locations, as opposed to one in the current experiments; such updating may cause greater noise with respect to large number representations and their sums. Further research is required to illuminate this issue.
What our experiments do reveal, however, is that magnitudes can be employed to represent and manipulate numbers at least as small as 4, values that fall within or at the boundary of the small number range. Rhesus successfully discriminated between 4 and 8 lemons, but not between 4 and 6. No evidence thus far has demonstrated that animals or human infants can make cross format number comparisons by representing a small set, say of four lemons with discrete object representations, and then a larger set, of eight lemons, simply as "more than four". Moreover, such a comparison cannot account for the data presented here because these types of operations should allow for the accurate discrimination of four and six if they allow for the discrimination of four and eight. Therefore, monkeys must represent the number 4, in our studies, with noisy magnitudes.
These results are consistent with training experiments in which monkeys appear to represent all positions on the number line from 1-9 with mental magnitudes (Brannon & Terrace, 1998).
Finally, these experiments raise several important questions for future research. Specifically, we do not yet understand the limits on the large number system, and especially, the extent to which different input modalities and training alter the Weber ratios. One puzzling aspect of our experiments is the particular 1:2 ratio that monkeys spontaneously discriminate. In training experiments (Brannon & Terrace, 1998) as well as spontaneous experiments with tamarin monkeys (Hauser et al., 2002) 2:3 ratios and smaller have been found to be within the resolution of the large number system. One possible reason for this discrepancy is that operating over magnitude representations with addition introduces additional noise that impairs discrimination (Barth, 2001;Barth et al., 2003). Previous experiments involved only numerical comparisons, not addition. Other possibilities, however, include the consequences of extensive training on discrimination, or even species and modality differences. When an animal is trained for months and often years before testing, does such experience improve discriminability of small ratios by altering the analog magnitude system per se, or does it alter attention and thus reduce the level of noise that might influence performance? If a species has evolved sensory specializations in response to particular ecological or social pressures, might their capacity for numerical discrimination differ depending upon whether the test material matches or fails to match these specializations? For example, a highly visual animal such as a rhesus monkey might show better discrimination with visual stimuli than a highly auditory creature such as a bat. Exploring the role of artificial and species-typical experience in fine tuning the large approximate number system will help refine our understanding of the evolution of number representation in animals and humans.