(C) PLOS One [1]. This unaltered content originally appeared in journals.plosone.org.
Licensed under Creative Commons Attribution (CC BY) license.
url:https://journals.plos.org/plosone/s/licenses-and-copyright

------------


Modelling how cleaner fish approach an ephemeral reward task demonstrates a role for ecologically tuned chunking in the evolution of advanced cognition

['Yosef Prat', 'Institute Of Biology', 'University Of Neuchâtel', 'Neuchâtel', 'Redouan Bshary', 'Arnon Lotem', 'School Of Zoology', 'Faculty Of Life Sciences', 'Sagol School Of Neuroscience', 'Tel Aviv University']

Date: 2022-01 

What makes cognition “advanced” is an open and not precisely defined question. One perspective involves increasing the complexity of associative learning, from conditioning to learning sequences of events (“chaining”) to representing various cue combinations as “chunks.” Here we develop a weighted graph model to study the mechanism enabling chunking ability and the conditions for its evolution and success, based on the ecology of the cleaner fish Labroides dimidiatus. In some environments, cleaners must learn to serve visitor clients before resident clients, because a visitor leaves if not attended while a resident waits for service. This challenge has been captured in various versions of the ephemeral reward task, which has been proven difficult for a range of cognitively capable species. We show that chaining is the minimal requirement for solving this task in its common simplified laboratory format that involves repeated simultaneous exposure to an ephemeral and permanent food source. Adding ephemeral–ephemeral and permanent–permanent combinations, as cleaners face in the wild, requires individuals to have chunking abilities to solve the task. Importantly, chunking parameters need to be calibrated to ecological conditions in order to produce adaptive decisions. Thus, it is the fine-tuning of this ability, which may be the major target of selection during the evolution of advanced associative learning.

Introduction

In an effort to understand the evolution of cognition, a wide range of studies has been focused on identifying cognitive abilities in animals that appear “advanced” (a term that is commonly used but is loosely defined [1]) and exploring the ecological conditions that could possibly favour their evolution (e.g., [2–7]). Accurate navigation [8], social manipulations [9], or flexible communication [10], for example, may all be considered advanced cognitive abilities. Yet, mapping these skills along phylogenetic trees and their relation to social or ecological conditions (e.g., [11,12]) does not explain how such abilities evolved through incremental modifications of their mechanistic building blocks. Earlier views of cognitive evolution were based on some postulated, loosely defined genetic adaptations, such as language instinct [13,14], mind reading abilities [15], or mirror neurons [16,17], but those are increasingly replaced by approaches relying on explicit associative learning principles that can gradually form complex representations of statistically learned information [18–26]. In line with these recent views, in order to understand the critical steps in cognitive evolution, one should identify specific modifications that can elaborate simple learning processes and make them better in some way, so that they can improve decision-making and eventually enhance fitness. In other words, understanding the evolution of cognition requires to explain cognitive abilities first in terms of their possible mechanisms (proximate level of explanation) and then in terms of how such mechanisms could have evolved as a result of gradual modifications that improve biological fitness (ultimate level of explanation).

A relatively simple and well-understood example is the extension of simple conditioning through second-order conditioning in a process known as chaining [27,28]. In this process, a stimulus associated with a primary reinforcer (such as a sound associated with receiving food) becomes a reinforcer by itself, and then a stimulus reinforced by the new reinforcer may become a reinforcer, and so on, allowing to represent sequences of statistical dependencies. Such sequences could, in turn, facilitate navigation [29,30] or even social learning [31], which can clearly be adaptive.

Further elaborations of associative learning that may allow to construct a detailed representation of the environment and to support statistical learning and decision-making, such as those required for learning visual or vocal patterns [32,33], grammatical rules [6,34], or for the planning of sequential actions [35–37] are less well understood. It has become clear, however, that a critical requisite for such cognitive skills is the ability to represent 2 or more data units as a group that has a meaning that is different from (or independent of) the meaning of its components (as in the word carpet, which is not related to car or pet). This ability has appeared in the literature under different names, such as configurational learning [38,39], chunking [23,40,41], or segmentation [42], all of which are quite similar and involve the learning of configurations, patterns, and hierarchical structures in time and space [43].

In its simple form, known as configurational learning, this ability allows to learn, for example, that the elements A and B are associated with positive reward while their configuration AB is not rewarded and should therefore be avoided (a task known as negative patterning [44]). Configurational learning of this type is contrasted with elemental learning, which is based on the behaviour expected from simple associative learning [45,46]. Research on configurational learning has been focused mainly on identifying the brain regions supporting this ability (e.g., [39,47–49]), giving relatively little attention to the cognitive processes generating configural representations (but see [39]). More attempts to consider these possible processes has been made in the context of chunking or segmentation (e.g., [18,42,50]), but only recently, theoretical work has started to address the question of how chunking mechanisms evolve under different ecological conditions, and what is their role in cognitive evolution [23,51,52].

A unique model system that may provide a remarkable opportunity to study the evolution of chunking is that of the bluestreak cleaner wrasse (Labroides dimidiatus), which feeds on ectoparasites removed from “client” fish [53]. Field observations and laboratory experiments have shown that at least some of these cleaner fish are capable of solving a problem known as the market problem (or the ephemeral reward task) [54–57]. The market problem entails that if approached by 2 clients, cleaners must learn to serve a visitor client before a resident client, because the latter waits for service while the former leaves if not attended. In other words, a preference for a visitor when approached by a visitor and a resident provides the cleaner with 2 meals while failing to do so may result in losing one of them (see details in [54,56,58,59]). In order to choose correctly, the cleaner also has to distinguish between residents and visitors based on their appearance or behaviour (and there might be multiple client species acting as residents and visitors within a cleaner’s home range [54]). In the lab, clients of different types were replaced with plates of different colours, each offering 1 food item and acting either a visitor or a resident, which was sufficient for some of the cleaners to solve the problem correctly [55,60]. These experiments suggest that cleaners can distinguish visitors from residents by associating certain visual cues with their previous behaviour. Interestingly, individuals captured in different habitats demonstrated different learning abilities of the market problem in the lab, and adult cleaners seem to learn better than juveniles [56,58,61,62]. Such intraspecific variation in cognitive abilities suggests some role for the ecological and the developmental circumstances in the fish life history.

The lab market task may first appear as a two-choice experiment, testing whether animals can learn to choose the option that yields the largest total amount of food. Nevertheless, while preferring a larger amount in a simple two-choice task seems almost trivial for most animals [63], the market version, in which a double amount is a product of a sequence of 2 actions (i.e., choosing ephemeral and then approaching the enduring item) has been proven difficult for a range of species [62,64–66] (but see [67–69]). Follow-up studies on pigeons and rats (reviewed in [70]) showed that letting the subject make a first decision but delaying the consequences, i.e., delaying the access to the rewarding stimuli, strongly improves performance [70,71]. One interpretation of these results is that the delay helped animals to connect their initial choice to both consequences; the first, and then the second reward, both of which occurred within a short time span after the relatively long delay.

While delaying the consequences of the initial choice may be helpful under some conditions, recent theoretical work suggests that under natural conditions, basic associative learning is insufficient for solving the market task, which instead warrants some form of chunking ability [72]. The reason for that is that the commonly used laboratory task presents a relatively simple version of the problem compared to the natural situation. It only presents visitor–resident pairs, for which choosing the visitor first always entails double rewards and choosing the resident first always entails a single reward. In nature, however, cleaners face also resident–resident as well as visitor–visitor pairs, and most often, only a single client approaches. As a result, choosing visitor first may not always entail double reward (e.g., in visitor–visitor pairs, the second visitor is likely to leave) and choosing resident first may not always result in a single reward (e.g., in resident–resident pairs, the second resident is likely to stay). Indeed, the theoretical analysis carried out by Quiñones and colleagues [72] showed that for solving the natural market problem, it is necessary to have distinct representations of all different types of client combinations (visitor (v) + resident (r), r + r, v + v, r, and v), which means the ability to represent chunks. Yet, the analyses did not explain how such representations are created, and to what extent ecology causes variation in the cleaners’ ability to create such representations. The goal of the present paper is to fill up this critical gap.

Following Quiñones and colleagues’ demonstration that chunking is necessary for solving the natural market problem, here we use the cleaner fish example as a means to study the evolution of an explicit chunking mechanism and the ecological conditions that favour its success. Thus, we investigate how the very same problem—choosing between 2 options where one yields the double amount of food—set into an increasingly complex ecology (of facing varying combinations of these options) selects for the evolution of increasingly advanced associative learning abilities. Our model is based on a weighted directed graph of nodes and edges, which initially form a simple associative learning model, and can then be modified to become an extended credit (chaining-like) model, or a chunking model (see details and definitions below). This approach allows to compare between clearly defined learning mechanisms and to pinpoint the modifications responsible for a presumed evolutionary step that improves cognitive ability. We analyse the 3 learning models’ performance in 3 tasks: the basic quantitative choice task, the laboratory market task, and the market task embedded in a sequence of varying configurations (“natural market task”). For the latter, we explored to what extent different densities and frequencies of client types select for different tendencies to form chunks (a critical parameter in the model), and how such different tendencies may affect the cleaners’ ability to solve the market problem.

The core model Internal representation. Our core model consists of a weighted directed graph G = (N, E), with nodes N, edges E, and additionally edge weights W, node weights U, and node values F (Fig 1A). The basic model includes 3 internal nodes representing 3 behavioural states: N = {V, R, X}, where: V–serving (feeding on) a visitor–client, R–serving a resident–client, and X–waiting for clients (empty arena). These are the 3 states (responses to environmental cues) required to represent the market problem and are therefore available to, and perceived by, the cleaner fish in our simulations (Fig 2). Note that at this stage, the cleaner does not understand the behavioural differences between a resident and a visitor, yet we assume it can distinguish between their external characteristics (e.g., the cleaner can identify their colours as different colours). Edge weights are updated according to the sequential appearance of the states, i.e., whenever n j appears after n i the weight of the edge n i →n j , i.e., W(n i , n j ), increases (by 1 unit, in our simulations). Thus, edge weights represent the associative strength between nodes experienced one after the other. For simplicity, we ignore weight decay (forgetting) in the present model (see Discussion section where we address this issue and explain why it should not change significantly the results). Node weights and values are attached to the cleaner’s decisions (see decision-making below) according to their occurrence and association of their outcome with food, i.e., whenever node n i is chosen, the weight U(n i ) increases (by 1 unit, in our simulations), and the value F(n i ) increases by the amount of food reward provided (which, unless otherwise specified, is assumed to be 1 unit per client if served successfully, and zero otherwise). The value of a node could be regarded as the strength of its association with food, which can also be represented as the weight of the edge between the node and a reinforcer food node (the weights of green arrows in Fig 1A). The weighted directed graph constitutes the cleaner’s internal representation of the market environment. The cleaner’s decisions regarding which clients to serve depend only upon this representation (Fig 1A). PPT PowerPoint slide

PNG larger image

TIFF original image Download: Fig 1. Model design—internal representation. (A) The core model contains a network of 3 elements (blue circles) representing perceived states: V–serving a visitor–client, R–serving a resident–client, X–absence of clients. The value of each node is represented by the weight of its association (width of green arrows) with the reinforcer (food reward; green circle). Edge weights (width of black arrows) represent the strength of the associations between sequential states. This is also the internal representation of the extended credit model. (B) An example of a possible representation in the chunking model: A new element (VR; purple circle) represents the configuration (the chunk) of “V and then R”. https://doi.org/10.1371/journal.pbio.3001519.g001 PPT PowerPoint slide

PNG larger image

TIFF original image Download: Fig 2. Model simulations. The cleaner in our simulations may encounter different combinations of client pairs awaiting its service: (A) the cleaner must choose between 2 clients of different types according to the model’s decision process; (B and C) the cleaner chooses with equal probabilities between 2 clients of the same type; (D and E) the cleaner serves the only available client; and (F) the cleaner waits for clients to visit its cleaning station. https://doi.org/10.1371/journal.pbio.3001519.g002 Initially, the states are considered unknown to the fish and their corresponding values, weights, and the weights of their connecting edges are set to zero. Most learning models use prior values for cues or states, which are commonly set to zero (often implicitly). Here, we model such a prior by imposing a threshold on the weight of a node before any increase in its value F can occur. Specifically, F(n k ) is initialized to zero and would not change as long as U(n k )<Q, i.e., at the first Q occurrences of n k . We set Q = 10 throughout all simulations, which implies that the value of a node will increase above zero only from the 11th serving of a client. Decision-making. When a cleaner fish is presented with 2 clients, it must choose which one to serve first. If both clients are of the same type (i.e., v (visitor) and v, or r (resident) and r), the cleaner chooses one with equal probabilities. However, when 2 contrasting types are present (i.e., v and r), the decision is made according to the values associated with serving each type, F(R) and F(V). As described by Eq 1, a soft-max function is employed (see [72]) over the normalised values f(n i ) and f(n j ) such that the probability of choosing n i is: (Eq 1) where is the average payoff associated with the node n k . Note that the numerator, F(n k ), is the sum of all obtained reward items associated with the state n k (i.e., the accumulated number of food items obtained after the cleaner has chosen n k ), and the denominator is a count of all occurrences of n k (i.e., the number of times the cleaner has chosen n k , regardless of whether this choice had been fulfilled). The probability of choosing n j is π j = 1−π i . In the market problems presented here, both client types provide the same immediate reward. Thus, it is quite intuitive that learning only first order associations cannot provide any discrimination between them and, consequently, would fail in developing a preference for visitors (which is the essence of solving the market problem). Indeed, as we shall see in the Results section, the core model was never successful in solving the market problem (either in its simple laboratory version or more complex natural setting). Yet, it serves as a null model and as a stepping-stone for the more advanced learning models. A linear operator model. To compare our core model with a similar known benchmark we used the linear operator learning model [73], which is a basic and widely used learning model [74] that does not involve chaining or chunking. The learner updates the value f(i) of state i at time t such that f(i) t = (1−α)f(i) t−1 +αφ(i) t , where φ(i) t is the reward attached to state i at time t and α is a learning rate parameter. To choose between clients based on their updated values, we used the same soft-max decision-making rule applied by the core model (see above).

The extended credit model A straightforward approach to consider higher-order dependencies is to enable association of states with their “future” rewards. We call this model the “extended credit” model. The network representation of the extended credit model is the same as that of the core model (Fig 1A), but in this model, the learner associates an obtained reward with the current state as well as with the previous one. Specifically, while encountering a sequence (n i , n j ), if n i is rewarding, then F(n i ) increases, and if n j is rewarding, then both F(n i ) and F(n j ) increase (i.e., the credit assignment of the reward is extended also to the previous state). Hence, if both states are similarly rewarding, the first one will be associated with double the food by the end of the sequence, as it was also associated with a delayed reward. Theoretically, credit assignment could be extended in more than one step backward and the credit could also change (e.g., decrease) with time (similarly to “chaining” [75]). Note that although the model extends the credit to a previous state, it does not represent, in the credit extension, the identity of the consecutive state, which donated the extra reward. Thus, the extended credit model cannot learn to distinguish between different sequences (sequential combinations or configurations) of states (e.g., V→R, V→X, R→R, etc.). The decision-making process of the extended credit model is the same as in the core model (see above).

The chunking model Another way of identifying high-order dependencies is via configurational learning, or chunking, as mentioned in the Introduction section. To model how acquired experience leads individuals to create chunks, we employ a chunking procedure in our model in which sequences occurring more often than expected, according to the distribution of their elements are “chunked” into a new element (Fig 1B). Specifically, a sequence (n i , n j ) would become a new element “n i n j ” of the internally represented network (i.e., a new node in the graph G) whenever (Eq 2) where W(n i , n j ) is the number of observed occurrences of the sequence n i →n j , M is the total number of observed states (or pair sequences), P(n k ) is the observed frequency of the element n k , and is the standard deviation of a binomial distribution, with the probability of an event n i →n j being P(n i )P(n j ): (Eq 3) C p ≥ 0 is a chunking avoidance parameter. This parameter is important, as it governs the behaviour of our model, or in other words, the conditions under which a chunk will be created (see Discussion section for possible implementations of this parameter in the brain). Note that when C p = 0, any slight above chance co-occurrence of n i and n j would result in chunking. This is probably too much chunking because it can easily happen in nature for almost any 2 elements as a result of stochastic deviations from the frequency expected by chance. Using a C p that is greater than zero implies that a chunk will be created only when the co-occurrence is higher than expected by a certain threshold. Additionally, chunks would not be created as long as W(n i , n j )<Q, i.e., during the first Q occurrences of the sequence n i →n j . This rule enforces a minimal sample size before statistical inference could be done (for chunking). In this model (see Fig 1B), whenever a chunk is created, it is treated as a new node and is being associated with food whenever chosen by the cleaner alongside food reward (but only after its first Q occurrences, as required for other elements). For instance, if the sequence V→R is chunked into a new element “VR,” further choices of the sequence V→R will increase the association of the element “VR” with the reward by 2 units (as this is the observed reward during the processing of the sequence). On the other hand, if the sequence R→V is chunked into a new element “RV” (which could happen in the natural market problem; see simulated environments below), further choices of the sequence R→V will usually increase the association of the element “RV” with the reward by 1 unit only (since the visitor leaves if not served first). The decision-making process of the chunking model is the same as in the core model (see above), but here, more choices may become available. For example, after the chunk “VR” is created, a cleaner faced with a visitor and a resident client simultaneously can choose to serve the resident (R) or to perceive them as the chunk “VR” and to execute the sequence V→R (i.e., approach the visitor and then the resident). On the other hand, if the chunk “RV” was also created, an additional option exists, which is the choice of executing the sequence R→V. Importantly, in this case, soon after approaching the resident, the visitor would leave the arena so the outcome of choosing and attempting to execute the sequence R→V may end up with serving only R (depending on the simulated environment; see below) and being reward by only 1 unit (see above). We assume that if a chunk has already been created the cleaner never chooses the first element alone if presented with both elements (i.e., if “VR” is already represented in the network, and “RV” is not, the cleaner should only choose between “R” and “VR” when presented with both client types simultaneously).

[END]

[1] Url: https://journals.plos.org/plosbiology/article?id=10.1371/journal.pbio.3001519

(C) Plos One. "Accelerating the publication of peer-reviewed science."
Licensed under Creative Commons Attribution (CC BY 4.0)
URL: https://creativecommons.org/licenses/by/4.0/


via Magical.Fish Gopher News Feeds:
gopher://magical.fish/1/feeds/news/plosone/