(C) PLOS One [1]. This unaltered content originally appeared in journals.plosone.org. Licensed under Creative Commons Attribution (CC BY) license. url:https://journals.plos.org/plosone/s/licenses-and-copyright ------------ Modelling how cleaner fish approach an ephemeral reward task demonstrates a role for ecologically tuned chunking in the evolution of advanced cognition ['Yosef Prat', 'Institute Of Biology', 'University Of Neuchâtel', 'Neuchâtel', 'Redouan Bshary', 'Arnon Lotem', 'School Of Zoology', 'Faculty Of Life Sciences', 'Sagol School Of Neuroscience', 'Tel Aviv University'] Date: 2022-01 What makes cognition “advanced” is an open and not precisely defined question. One perspective involves increasing the complexity of associative learning, from conditioning to learning sequences of events (“chaining”) to representing various cue combinations as “chunks.” Here we develop a weighted graph model to study the mechanism enabling chunking ability and the conditions for its evolution and success, based on the ecology of the cleaner fish Labroides dimidiatus. In some environments, cleaners must learn to serve visitor clients before resident clients, because a visitor leaves if not attended while a resident waits for service. This challenge has been captured in various versions of the ephemeral reward task, which has been proven difficult for a range of cognitively capable species. We show that chaining is the minimal requirement for solving this task in its common simplified laboratory format that involves repeated simultaneous exposure to an ephemeral and permanent food source. Adding ephemeral–ephemeral and permanent–permanent combinations, as cleaners face in the wild, requires individuals to have chunking abilities to solve the task. Importantly, chunking parameters need to be calibrated to ecological conditions in order to produce adaptive decisions. Thus, it is the fine-tuning of this ability, which may be the major target of selection during the evolution of advanced associative learning. Introduction In an effort to understand the evolution of cognition, a wide range of studies has been focused on identifying cognitive abilities in animals that appear “advanced” (a term that is commonly used but is loosely defined [1]) and exploring the ecological conditions that could possibly favour their evolution (e.g., [2–7]). Accurate navigation [8], social manipulations [9], or flexible communication [10], for example, may all be considered advanced cognitive abilities. Yet, mapping these skills along phylogenetic trees and their relation to social or ecological conditions (e.g., [11,12]) does not explain how such abilities evolved through incremental modifications of their mechanistic building blocks. Earlier views of cognitive evolution were based on some postulated, loosely defined genetic adaptations, such as language instinct [13,14], mind reading abilities [15], or mirror neurons [16,17], but those are increasingly replaced by approaches relying on explicit associative learning principles that can gradually form complex representations of statistically learned information [18–26]. In line with these recent views, in order to understand the critical steps in cognitive evolution, one should identify specific modifications that can elaborate simple learning processes and make them better in some way, so that they can improve decision-making and eventually enhance fitness. In other words, understanding the evolution of cognition requires to explain cognitive abilities first in terms of their possible mechanisms (proximate level of explanation) and then in terms of how such mechanisms could have evolved as a result of gradual modifications that improve biological fitness (ultimate level of explanation). A relatively simple and well-understood example is the extension of simple conditioning through second-order conditioning in a process known as chaining [27,28]. In this process, a stimulus associated with a primary reinforcer (such as a sound associated with receiving food) becomes a reinforcer by itself, and then a stimulus reinforced by the new reinforcer may become a reinforcer, and so on, allowing to represent sequences of statistical dependencies. Such sequences could, in turn, facilitate navigation [29,30] or even social learning [31], which can clearly be adaptive. Further elaborations of associative learning that may allow to construct a detailed representation of the environment and to support statistical learning and decision-making, such as those required for learning visual or vocal patterns [32,33], grammatical rules [6,34], or for the planning of sequential actions [35–37] are less well understood. It has become clear, however, that a critical requisite for such cognitive skills is the ability to represent 2 or more data units as a group that has a meaning that is different from (or independent of) the meaning of its components (as in the word carpet, which is not related to car or pet). This ability has appeared in the literature under different names, such as configurational learning [38,39], chunking [23,40,41], or segmentation [42], all of which are quite similar and involve the learning of configurations, patterns, and hierarchical structures in time and space [43]. In its simple form, known as configurational learning, this ability allows to learn, for example, that the elements A and B are associated with positive reward while their configuration AB is not rewarded and should therefore be avoided (a task known as negative patterning [44]). Configurational learning of this type is contrasted with elemental learning, which is based on the behaviour expected from simple associative learning [45,46]. Research on configurational learning has been focused mainly on identifying the brain regions supporting this ability (e.g., [39,47–49]), giving relatively little attention to the cognitive processes generating configural representations (but see [39]). More attempts to consider these possible processes has been made in the context of chunking or segmentation (e.g., [18,42,50]), but only recently, theoretical work has started to address the question of how chunking mechanisms evolve under different ecological conditions, and what is their role in cognitive evolution [23,51,52]. A unique model system that may provide a remarkable opportunity to study the evolution of chunking is that of the bluestreak cleaner wrasse (Labroides dimidiatus), which feeds on ectoparasites removed from “client” fish [53]. Field observations and laboratory experiments have shown that at least some of these cleaner fish are capable of solving a problem known as the market problem (or the ephemeral reward task) [54–57]. The market problem entails that if approached by 2 clients, cleaners must learn to serve a visitor client before a resident client, because the latter waits for service while the former leaves if not attended. In other words, a preference for a visitor when approached by a visitor and a resident provides the cleaner with 2 meals while failing to do so may result in losing one of them (see details in [54,56,58,59]). In order to choose correctly, the cleaner also has to distinguish between residents and visitors based on their appearance or behaviour (and there might be multiple client species acting as residents and visitors within a cleaner’s home range [54]). In the lab, clients of different types were replaced with plates of different colours, each offering 1 food item and acting either a visitor or a resident, which was sufficient for some of the cleaners to solve the problem correctly [55,60]. These experiments suggest that cleaners can distinguish visitors from residents by associating certain visual cues with their previous behaviour. Interestingly, individuals captured in different habitats demonstrated different learning abilities of the market problem in the lab, and adult cleaners seem to learn better than juveniles [56,58,61,62]. Such intraspecific variation in cognitive abilities suggests some role for the ecological and the developmental circumstances in the fish life history. The lab market task may first appear as a two-choice experiment, testing whether animals can learn to choose the option that yields the largest total amount of food. Nevertheless, while preferring a larger amount in a simple two-choice task seems almost trivial for most animals [63], the market version, in which a double amount is a product of a sequence of 2 actions (i.e., choosing ephemeral and then approaching the enduring item) has been proven difficult for a range of species [62,64–66] (but see [67–69]). Follow-up studies on pigeons and rats (reviewed in [70]) showed that letting the subject make a first decision but delaying the consequences, i.e., delaying the access to the rewarding stimuli, strongly improves performance [70,71]. One interpretation of these results is that the delay helped animals to connect their initial choice to both consequences; the first, and then the second reward, both of which occurred within a short time span after the relatively long delay. While delaying the consequences of the initial choice may be helpful under some conditions, recent theoretical work suggests that under natural conditions, basic associative learning is insufficient for solving the market task, which instead warrants some form of chunking ability [72]. The reason for that is that the commonly used laboratory task presents a relatively simple version of the problem compared to the natural situation. It only presents visitor–resident pairs, for which choosing the visitor first always entails double rewards and choosing the resident first always entails a single reward. In nature, however, cleaners face also resident–resident as well as visitor–visitor pairs, and most often, only a single client approaches. As a result, choosing visitor first may not always entail double reward (e.g., in visitor–visitor pairs, the second visitor is likely to leave) and choosing resident first may not always result in a single reward (e.g., in resident–resident pairs, the second resident is likely to stay). Indeed, the theoretical analysis carried out by Quiñones and colleagues [72] showed that for solving the natural market problem, it is necessary to have distinct representations of all different types of client combinations (visitor (v) + resident (r), r + r, v + v, r, and v), which means the ability to represent chunks. Yet, the analyses did not explain how such representations are created, and to what extent ecology causes variation in the cleaners’ ability to create such representations. The goal of the present paper is to fill up this critical gap. Following Quiñones and colleagues’ demonstration that chunking is necessary for solving the natural market problem, here we use the cleaner fish example as a means to study the evolution of an explicit chunking mechanism and the ecological conditions that favour its success. Thus, we investigate how the very same problem—choosing between 2 options where one yields the double amount of food—set into an increasingly complex ecology (of facing varying combinations of these options) selects for the evolution of increasingly advanced associative learning abilities. Our model is based on a weighted directed graph of nodes and edges, which initially form a simple associative learning model, and can then be modified to become an extended credit (chaining-like) model, or a chunking model (see details and definitions below). This approach allows to compare between clearly defined learning mechanisms and to pinpoint the modifications responsible for a presumed evolutionary step that improves cognitive ability. We analyse the 3 learning models’ performance in 3 tasks: the basic quantitative choice task, the laboratory market task, and the market task embedded in a sequence of varying configurations (“natural market task”). For the latter, we explored to what extent different densities and frequencies of client types select for different tendencies to form chunks (a critical parameter in the model), and how such different tendencies may affect the cleaners’ ability to solve the market problem. The core model Internal representation. Our core model consists of a weighted directed graph G = (N, E), with nodes N, edges E, and additionally edge weights W, node weights U, and node values F (Fig 1A). The basic model includes 3 internal nodes representing 3 behavioural states: N = {V, R, X}, where: V–serving (feeding on) a visitor–client, R–serving a resident–client, and X–waiting for clients (empty arena). These are the 3 states (responses to environmental cues) required to represent the market problem and are therefore available to, and perceived by, the cleaner fish in our simulations (Fig 2). Note that at this stage, the cleaner does not understand the behavioural differences between a resident and a visitor, yet we assume it can distinguish between their external characteristics (e.g., the cleaner can identify their colours as different colours). Edge weights are updated according to the sequential appearance of the states, i.e., whenever n j appears after n i the weight of the edge n i →n j , i.e., W(n i , n j ), increases (by 1 unit, in our simulations). Thus, edge weights represent the associative strength between nodes experienced one after the other. For simplicity, we ignore weight decay (forgetting) in the present model (see Discussion section where we address this issue and explain why it should not change significantly the results). Node weights and values are attached to the cleaner’s decisions (see decision-making below) according to their occurrence and association of their outcome with food, i.e., whenever node n i is chosen, the weight U(n i ) increases (by 1 unit, in our simulations), and the value F(n i ) increases by the amount of food reward provided (which, unless otherwise specified, is assumed to be 1 unit per client if served successfully, and zero otherwise). The value of a node could be regarded as the strength of its association with food, which can also be represented as the weight of the edge between the node and a reinforcer food node (the weights of green arrows in Fig 1A). The weighted directed graph constitutes the cleaner’s internal representation of the market environment. The cleaner’s decisions regarding which clients to serve depend only upon this representation (Fig 1A). PPT PowerPoint slide PNG larger image TIFF original image Download: Fig 1. Model design—internal representation. (A) The core model contains a network of 3 elements (blue circles) representing perceived states: V–serving a visitor–client, R–serving a resident–client, X–absence of clients. The value of each node is represented by the weight of its association (width of green arrows) with the reinforcer (food reward; green circle). Edge weights (width of black arrows) represent the strength of the associations between sequential states. This is also the internal representation of the extended credit model. (B) An example of a possible representation in the chunking model: A new element (VR; purple circle) represents the configuration (the chunk) of “V and then R”. https://doi.org/10.1371/journal.pbio.3001519.g001 PPT PowerPoint slide PNG larger image TIFF original image Download: Fig 2. Model simulations. The cleaner in our simulations may encounter different combinations of client pairs awaiting its service: (A) the cleaner must choose between 2 clients of different types according to the model’s decision process; (B and C) the cleaner chooses with equal probabilities between 2 clients of the same type; (D and E) the cleaner serves the only available client; and (F) the cleaner waits for clients to visit its cleaning station. https://doi.org/10.1371/journal.pbio.3001519.g002 Initially, the states are considered unknown to the fish and their corresponding values, weights, and the weights of their connecting edges are set to zero. Most learning models use prior values for cues or states, which are commonly set to zero (often implicitly). Here, we model such a prior by imposing a threshold on the weight of a node before any increase in its value F can occur. Specifically, F(n k ) is initialized to zero and would not change as long as U(n k )