(C) PLOS One
This story was originally published by PLOS One and is unaltered.
 . . . . . . . . . .


Adaptive search space pruning in complex strategic problems [1]

['Ofra Amir', 'Faculty Of Industrial Engineering', 'Management', 'Technion - Israel Institute Of Technology', 'Haifa', 'Liron Tyomkin', 'Yuval Hart', 'Department Of Psychology', 'Hebrew University Of Jerusalem', 'Jerusalem']

Date: 2022-08 

People have limited computational resources, yet they make complex strategic decisions over enormous spaces of possibilities. How do people efficiently search spaces with combinatorially branching paths? Here, we study players’ search strategies for a winning move in a “k-in-a-row” game. We find that players use scoring strategies to prune the search space and augment this pruning by a “shutter” heuristic that focuses the search on the paths emanating from their previous move. This strong pruning has its costs—both computational simulations and behavioral data indicate that the shutter size is correlated with players’ blindness to their opponent’s winning moves. However, simulations of the search while varying the shutter size, complexity levels, noise levels, branching factor, and computational limitations indicate that despite its costs, a narrow shutter strategy is the dominant strategy for most of the parameter space. Finally, we show that in the presence of computational limitations, the shutter heuristic enhances the performance of deep learning networks in these end-game scenarios. Together, our findings suggest a novel adaptive heuristic that benefits search in a vast space of possibilities of a strategic game.

Search problems usually have a common trade-off between accuracy and computational resources; Finding the best solution usually requires an exhaustive search, while limiting computations usually decreases the quality of solutions. Yet, humans provide high-quality solutions for complex problems despite having limited computational resources. How do they do that? Here, we analyze people’s behavior in a strategic game of “k-in-a-row” that has an enormous space of possibilities. We find that people strongly prune the search space by using scoring strategies to evaluate each possibility and augment this pruning with a shutter heuristic that limits their search to the possible winning paths from their last move. Similar to other adaptive heuristics, the shutter heuristic provides a strong reduction in the computations the searcher needs to carry out while maintaining on par accuracy rates. Finally, this adaptive heuristic generalizes to the performance of deep learning networks when playing with limited computational resources.

Funding: YH was supported by a grant from the ISF (grant no. 3081/21) and OA was supported by a grant from the ISF (grant no. 2185/20). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

Data Availability: The data from the user study can be accessed at this link: https://github.com/OptimalSearchSpacePruning/OptimalSearchSpacePruning_Data . The code for this work simulations, statistical analysis and figure plotting can be accessed at this link: https://github.com/OptimalSearchSpacePruning/OptimalSearchSpacePruning_AnalysesCode . The code for the deep learning neural networks can be accessed at this link: https://github.com/lironT74/AlphaZero_Gomoku .

Our study focuses on the computations of end-game scenarios in the strategic game, “k-in-a-row”. To infer the features of people’s search in this large search space, we map the specific trajectories of people’s search in finer details, going beyond the observation of only the end decisions of their search (i.e. the decision on the next move). We do that by asking participants to find a winning move in different configurations of “k-in-a-row” boards using a ‘sandbox’ board where they can try different moves before committing to the chosen move. We find that the number of nodes participants explore in their search (i.e. their search size, which is measured by the number of moves they tried on the sandbox board) is limited and their accuracy rates fall much slower than the massive increase in the algorithmic complexity of the boards. In their search, participants combine two layers of pruning—First, participants use scoring strategies to evaluate the different moves. Second, participants augment their pruning by focusing the search on potential sequences for winning enabled by their last move. This “shutter” heuristic leads to an inherent blindness to the opponent’s winning moves, and thus bears costs. However, we show that the shutter heuristic outperforms other strategies in the context of “k-in-a-row” end-game scenarios. We scan different shutter size values with a broad range of computational conditions (e.g., complexity levels, noise levels, branching factor, and computational limitations) which demonstrate the dominance of the shutter pruning heuristic for the majority of these conditions. Lastly, we find that the shutter heuristic greatly enhances the winning rate of deep learning neural networks in the game when their computational resources are limited.

A fruitful avenue to study these questions is the analysis of people’s behavior in complex strategic games. Games like chess [ 24 , 25 ], Go [ 26 ], and Tic-Tac-Toe [ 27 , 28 ] provide a rich tree-like search space. Each game state maps to a node in the search tree, with the different possible actions leading to alternative future states, which in turn also branch out to different sequences depending on the chosen actions. Importantly, these search trees have an exponential number of branching possibilities of different values, and thus typically cannot be fully explored. Previous studies explored chess players’ quality of moves, recall of game states, and behavior under time pressure to identify key differences between experts and novices [ 24 , 29 , 30 ]. Of special note are recent works [ 27 , 28 ] that used a “4-in-a-row” game as a platform to study players’ sequential decision making throughout the game, and modeled players’ search as a tree-search process. The search is based on an iterative “best-first” search algorithm that scans possible moves from a given state, and prunes all states that are below the best move value minus a threshold. The scoring strategy for each board state is estimated by a weighted linear combination of benefits from creating sequences of connected squares (and also not-connected 2 squares that might create a winning path) and the value of each square occupied by the player’s pieces. This gain is then subtracted by the possible loss due to similar sequences created by the opponent. Importantly, in all these experimental paradigms, the features of the search are inferred from the actual moves participants decided to play. Their search trajectories, however, are not directly observed and thus key parameters of the search remain mostly unknown.

Recent works on planning and decision making have begun to tackle the question of how people handle extremely large search spaces. These works propose pruning mechanisms that limit the computations involved in the search for the correct action. For example, Huys and colleagues show that people prune the branches of the tree of possibilities that lie after a large loss, even if that behavior results in sub-optimal rewards [ 5 ] and that there’s an interplay between mechanisms of fragmentation, stochastic memoization, and losses pruning that trade accuracy and computational costs during the search process [ 6 ]. Another line of studies [ 8 , 22 , 23 ] suggests arbitrating mechanisms which toggle between goal-directed and habitual evaluations based on current information. A goal-directed, model-based planning produces a more accurate decision but is time consuming since it searches deep into the tree of future possibilities. Habitual evaluations, on the other hand, are considered model-free and take into consideration only the estimated value of the next action, without searching the tree of future possibilities (and thus saving computations). These works suggest that people arbitrate between goal-directed and habitual estimations as well as decide which directions of the search tree to further expand (plan-until-habit) by comparing the benefit of an accurate response and the reduction of uncertainty in estimations with the cost of the loss of reward caused by the longer search time.

One line of studies of cognitive search processes suggested that people’s cognitive search is akin to random sampling of the search space [ 13 – 16 ], similar to Markov Chain Monte-Carlo (MCMC) sampling processes [ 17 ]. This framework proved highly useful in the description of many cognitive processes—learning of rules [ 13 ], theories [ 14 ], language [ 15 ], and intuitive physics [ 16 ], to name a few. However, this type of search demands high computational resources and long computation times. It thus begs the question of how these processes may be relevant when the search is done over enormous spaces of thought under time and computational constraints [ 18 – 21 ].

Due to its importance, mapping the cognitive mechanisms that support strategic planning and decision making is an ongoing quest [ 1 – 11 ]. A fundamental approach to the study of strategic planning and decision making is to treat the underlying cognitive mechanism as an information processing system that performs a search in the space of all possible solutions [ 12 ]. When searching a branching tree of possibilities there are two layers to the search—First, evaluating the value of the current possibilities which we term the scoring strategy. Second, deciding which possibilities to further explore in order to gain more information. This exploration process can be either random or directed. In the latter case, it usually involves ignoring parts of the space which we term pruning. Thus, any mapping of cognitive search mechanisms should address these two layers of the search.

Making decisions is hard. Making good decisions is even harder. Yet people make strategic decisions almost every day. From marital and career choices to the decision of where to get lunch, many of our decisions are made over a vast space of interdependent possibilities which form a tree of branching decision sequences. What are the cognitive mechanisms that support the search for solutions on such enormous tree-like spaces?

Results

The experimental task Each participant (Amazon mechanical Turk, N = 915, see Methods) saw one of ten different board configurations. The ten boards are composed of five boards with varying difficulty (two 6x6 boards and three 10x10 boards, denoted boards I–V, see Fig 1, Methods and S1 Fig) and five additional versions of the same boards where we added the next best move for both the ‘X’ and the ‘O’ players. This resulted in truncated versions of the boards with one move toward the solution already revealed. Participants had to find the move that will force a win for the ‘X’ player within 3 (6x6 truncated boards), 4 (6x6 full boards or 10x10 truncated boards), or 5 (10x10 full boards) moves. Participants interacted with a ‘sandbox’ board where they saw the initial board configuration and searched for the winning move by simulating ‘X’ and ‘O’ moves. We recorded all participants’ moves, timings, and whether their answers were correct and validated (see S1 Text and S2 Fig for the analysis of the time intervals between moves). We defined the number of nodes explored by participants as the size of their search (that is, the number of clicks they made on the sandbox board during their session) and for each board configuration, we defined board complexity as the number of nodes an optimal tree search algorithm (alpha-beta pruning search [31], see Methods) needs to explore to find the correct solution. PPT PowerPoint slide

PNG larger image

TIFF original image Download: Fig 1. Examples of board configurations. Left, board I (6x6 squares) full condition, ‘X’ wins within 4 moves (correct solutions are ‘f3’ or ‘f5’). Right, board IV (10x10 squares) full condition, ‘X’ wins within 5 moves (correct solution is ‘e8’). https://doi.org/10.1371/journal.pcbi.1010358.g001

Participants’ success rate declines much slower than the increase in board’s algorithmic complexity We found that while board complexity varied across ∼2000 folds (49 to 97098 moves), participants’ search size varied across ∼2 folds (21–40 moves) and did not show a clear relation with board complexity (Spearman correlation, r = 0.08, 95% CI = [0.01,0.14], p = 0.018, see Fig 2A, blue dots vs. gray dots). Similarly, there was no correlation between board complexity and search time (Spearman correlation, r = 0.02, 95% CI = [-0.05,0.08], p = 0.56, see S3 Fig which also shows that the limited search is not due to time limitations). Consequently, the rate of successful solutions decreased with the complexity of the board (Spearman correlation, r = -0.31, 95% CI = [-0.37, -0.25], p < 0.001), yet success rate decreased by 2–3 folds over an increase in complexity of 3 orders of magnitude (Fig 2B). PPT PowerPoint slide

PNG larger image

TIFF original image Download: Fig 2. Participants search size does not depend on the board’s complexity. A) Participants’ search size (blue) is small and fixed compared with the search size of alpha-beta pruning (gray). B) The percent of participants who solved correctly each of the board configurations, by board complexity. As board complexity increases, participants’ success rate decreases, yet at a slower rate compared to the board’s complexity increase. Error bars depict 95% confidence intervals. https://doi.org/10.1371/journal.pcbi.1010358.g002 We note that since participants were recruited from Amazon mTurk, there might be an incentive to finish tasks quickly and reduce the number of moves in the task. However, the slower decline of the success rate compared to the large increase in algorithmic board complexity may suggest that participants prune the search space to overcome the increased complexity of the problem. What then are the computational pruning mechanisms that facilitate finding the correct solution despite the limited search size?

Participants use scoring strategies in their search We hypothesized that participants’ limited tree search can be partially explained by the use of scoring strategies that reflect the deep structure of the game. Each scoring strategy serves to evaluate the quality of moves, thus enabling participants to explore more promising nodes in the tree and to avoid the exploration of low scoring nodes. We evaluated five different scoring strategies (see Methods and S1 Text, S4 Fig, and Table A in S1 Text for the evaluation of these scoring strategies): Density: A square’s score is the number of its neighboring squares marked by ‘X’. Linear: A square’s score is the sum of scores of its potential winning paths, where each path score is the number of squares with ‘X’ in it. Thus, a square on a path of 2 ‘X’s and another path of 3 ‘X’s will have a score of 5. Non-linear: Same as Linear but scores increase non-linearly with the number of ‘X’s in the path: i is the number of ‘X’ in path i). In a ‘5 in a row’ board, the same square as in the Linear example will have a score of Interaction: Same as the Non-linear scoring, augmented by an interaction term between the shared paths. The added score is Forcing: This scoring is similar to interaction, but includes an additional component: if placing an ‘X’ in the square results in an immediate threat (a path that now contains n − 1 ‘X’ markers with a potential to win in the next move), the score of the square is augmented by a large constant (10 points) since by forcing ‘O’ to a particular move, ‘X’ essentially blocked all other paths for ‘O’ and controls the game flow. To assess the total score of each square for e.g. the ‘X’ player, we consider the possible benefits for ‘X’ from gaining that square and add to it the incurred costs for the ‘O’ player from missing that square (see Methods, Eq 2). Thus, each square receives scores from all the gained paths with ‘X’ marks going through this square and all the scores from the prevented paths with ‘O’ marks going through this square. We further note that because the algorithm compares squares’ scores within a scoring strategy, it is the relative scores of different squares that are important rather than their absolute values in each scoring strategy (see also Table A S1 Text). The experimental paradigm allows us to further examine the search pattern itself, meaning the trajectories of moves on the board that the participants chose to explore. We next calculated the probability distribution of participants’ moves and compared it with the distribution induced from the different scoring strategies (see Fig 3A for an example of the distribution of participants’ first moves compared to the prediction of the “Interaction” scoring strategy, and S6 Fig for the distributions in all board configurations). PPT PowerPoint slide

PNG larger image

TIFF original image Download: Fig 3. Participants’ moves indicate the use of scoring strategies. A) The distributions of first moves predicted by the “Interaction” scoring strategy and the distributions of participants’ actual first moves on this board. B) Log-likelihoods of scoring strategies’ predictions of participants’ moves. C) Percent of participants fitted to each scoring strategy. D) Log-likelihoods of scoring strategies’ predictions of participants’ moves. There are two configurations for each scoring strategy: using the scoring strategy as is (“base”) and adding the best fitted shutter value per each participant (see Methods). For all scoring strategies, adding the shutter significantly improved the log-likelihood and AIC scores (in all cases p < 10−5 using a likelihood ratio test). All error bars are 95% confidence intervals. https://doi.org/10.1371/journal.pcbi.1010358.g003 The distribution of participants’ moves reflects participants’ internal values of the squares on the board. To calculate the similarity between the behavioral data and each scoring strategy’s predictions we computed the log-likelihood of the move probabilities based on the different scoring strategies, with a lapse rate parameter fitted to each participant (see Methods), and then computed the Akaike information criterion (AIC) to choose the best fitted model to participants′ search trajectories (see Methods for more details). We find that the predictions of the more sophisticated scoring strategies: “Forcing”, “Interaction” and “Non-linear” are closer to participants’ choices than the simpler scoring strategies (Mean log-likelihood and 95% CI (closer to zero is better), “Density”: mean = -3.27 (AIC = 5978), 95% CI = [-3.31, -3.22], “Linear”: mean = -3.19 (AIC = 5837), 95% CI = [-3.23, -3.15], “Non-linear”: mean = -2.91 (AIC = 5324), 95% CI = [-2.95, -2.87], “Interaction”: mean = -2.95 (AIC = 5395), 95% CI = [-2.99, -2.91], “Forcing”: mean = -2.86 (AIC = 5236), 95% CI = [-2.9, -2.81]; see Fig 3B, see S5 Fig for a comparison of log-likelihood scores using Monte-Carlo Tree Search, and see S9 Fig for a sensitivity analysis of the “Forcing” strategy free parameter). To examine the distribution of scoring strategies among participants, we also calculated the percent of participants that employ each strategy. For each participant we simulate all models based on the different scoring strategies with a fitted lapse rate parameter to account for noisy choices (see Methods). We match for each participant the scoring strategy with the best AIC score (see Methods). We find that for more than half of the participants, the “Interaction” and “Forcing” scoring strategies provided a better fit (58% of the population). A minority of participants were using the simpler scoring strategies, termed “Density” and “Linear” (Percent of participants, Density: mean = 8%, 95% CI = [6%, 9%], Linear: mean = 6%, 95% CI = [5%, 8%], Non-Linear: mean = 28%, 95% CI = [25%, 31%], Interaction: mean = 35%, 95% CI = [32%, 38%], Forcing: mean = 23%, 95% CI = [20%, 25%], see Fig 3C and additional analyses in S1 Text and S7 Fig). Since the “Interaction” scoring strategy provides better fits for most participants (Fig 3C) and has one less parameter in its model compared to the “Forcing” scoring strategy, next sections use the “Interaction” strategy as the base scoring strategy model (unless stated otherwise). Both participants’ search size and their entire search trajectories indicate that participants search using scoring strategies. These scoring strategies combine non-linear scores of each path and interactions between different paths to allow participants to vigorously prune the search space. Next, we tested whether players augmented their search with additional pruning mechanisms.

Participants’ previous moves influence their current search choices Another possible mechanism for pruning the search space is using the memory of previous moves (and previous calculations), instead of deciding anew for each game state as the game evolves. To test whether participants’ search patterns exhibit memory (i.e., an influence of previous moves on future moves), we compared participants’ distribution of moves from a given board configuration in two different conditions—one distribution is the distribution of moves in the truncated board (where participants see the board after one optimal move for both the ‘X’ and ‘O’ players) and the second distribution is participants’ distribution of moves on the full board after participants chose the same optimal first moves for ‘X’ and ‘O’ (as in the truncated version). If players are not influenced by their previous choices, the distribution of moves in the full and truncated boards should be similar. We find that the distribution of moves in the two conditions was highly different (Fig 4A and 4B). When comparing the entropy of participants’ moves in the full and truncated conditions of the same board configuration (see Methods), we find higher entropy for participants’ moves in the truncated board than in the full board (full: mean entropy = 1.41, 95% CI = [1.30, 1.50], truncated: mean entropy = 2.42, 95% CI = [2.37, 2.46], around 40% entropy reduction, Mann-Whitney test U = 0, p = 0.006, Rank biserial correlation = 1, Fig 4C). This finding suggests that participants’ choice of the next move depends on the moves made before it. In particular, the lower entropy for moves in the full board suggests that as participants explore a specific search trajectory further, they narrow down the moves they consider (see S10, S11, S12 and S13 Figs for additional analyses and comparisons with the dynamics of a Monte Carlo tree search algorithm). PPT PowerPoint slide

PNG larger image

TIFF original image Download: Fig 4. Participants’ search is influenced by their previous choices and is not explained by an internal search model or by reduced attention to the opponent. A) Distribution of participants’ moves on the full version of board configuration II. B) Distribution of participants’ moves on the truncated version of board configuration II. C) Mean entropy of the distribution of participants’ first moves in the truncated boards and the distribution of moves in the equivalent board state in the full boards. D) Mean entropy of the distribution of participants’ first moves in the truncated boards and the distribution of moves in the equivalent board state in the full boards as predicted by the “Interaction” scoring strategies with an added shutter heuristic (shutter = 0). **, p < 0.01. E) Mean entropy of the distribution of participants’ first moves in the truncated boards and the distribution of moves in the equivalent board state in the full boards as predicted by the “Interaction” scoring strategy when ignoring opponent’s winning paths, thus focusing only on one’s own pieces. F) Mean entropy of the distribution of participants’ first moves in the truncated boards and the distribution of moves in the equivalent board state in the full boards as predicted by an internal search model, using alpha-beta pruning search with branching factor k = 7 and limited depth d = 1. All error bars are 95% confidence intervals. https://doi.org/10.1371/journal.pcbi.1010358.g004 We note that comparing the entropy of participants from the full board to the entropy of participants from the truncated board might introduce a selection bias: Participants from the full board are required to play two optimal moves before reaching the same board configuration as that of the truncated board. Thus, while in the truncated board all types of participants are represented, in the full board, only less noisy participants (i.e. those that reached the optimal position) are represented. One might then suspect that the entropy difference is caused by a selection bias. To test this, we chose for each participant her best fitted model and simulated either three moves on the full board or one move on the truncated board. In the full board, we keep only the simulated moves that reached the truncated board’s configuration (i.e. did two optimal moves before the third move). We then calculated the entropy over the moves’ distribution for each board and averaged all entropy values across all boards. We find that the selection bias shows a small entropy reduction (around 7%) that cannot explain the entropy difference in the behavioral data (full board entropy: mean = 3.23, 95% CI = [3.23, 3.23], truncated board entropy: mean = 3.47, 95% CI = [3.47, 3.47], see also section 10, S1 Text and S11 Fig). As another test, we compared the entropies of only the participants which solved correctly the board configurations in both full and truncated boards. This comparison focuses on participants on the two boards that found the optimal path and so are probably less noisy. This analysis produced similar results as the entire behavioral data (see S11 Fig). These two analyses suggest that the entropy difference in the behavioral data does not stem from a selection bias. Another option is that participants use different scoring strategies due to the different complexities of the two board configurations, however, we did not find a difference in the distribution of the fitted scoring strategies between the two conditions (see S8 Fig).

A path shutter prunes the search space and predicts players’ blindness to winning ‘O’ moves The influence of previous moves on current choices suggests that participants use additional pruning mechanisms beyond the use of scoring strategies. We hypothesized that this pruning is done via a mechanism of a path shutter heuristic which focuses the search to potential winning sequences stemming from the last move in the search tree. We measured shutter size by calculating the distance of a move from the potential winning sequences induced by the previous ‘X’ move, such that moves that are part of a currently possible winning sequence are considered at zero distance, the moves next to them (Manhattan distance = 1) are at distance 1 and so on (see Methods and Fig 5A). PPT PowerPoint slide

PNG larger image

TIFF original image Download: Fig 5. Participants who exhibited a narrower shutter were more likely to find the winning move, but were also more likely to miss winning moves for the ‘O’ player. A) Illustration of the shutter heuristic: assuming the player’s last move was f5 (shown in a black circle), there are three potential paths to win induced by this move, f6–f3, f5–f2 and c5–f5 (their squares marked in blue). Squares on these paths are considered at distance 0 from the last move. Squares adjacent to these squares (Manhattan distance = 1) are considered at distance 1 from the last move (marked in orange). B) The probability to find the winning move of participants with different shutter sizes. All differences were statistically significant: **, p < 0.005 ***, p < 0.001. C) The proportion between the probability for missed ‘X’ winning moves and the probability of missed ‘O’ winning moves in the computational simulations (left) and by participants (right). Narrow shutter shown in blue, medium in orange and wide in green. Differences between narrow and wide shutter size and medium and wide shutter size were statistically significant, ***, p < 0.001. All error bars are 95% confidence intervals. https://doi.org/10.1371/journal.pcbi.1010358.g005 The shutter heuristic can explain the differences in entropy between equivalent full and truncated board states observed in the behavioral data. To that aim, we used the “Interaction” scoring strategy with a shutter heuristic (shutter = 0) to simulate moves on the board and compute their expected entropy. We observed a reduction in entropy in the full boards compared to the truncated boards which is similar to that observed in the behavioral data (full: mean entropy = 1.3, 95% CI = [1.23, 1.39], truncated: mean entropy = 2.76, 95% CI = [2.66, 2.88], around 50% entropy reduction, Fig 4D). We further find that participants exhibiting a smaller shutter size in their search (hence, more focused) were more successful at solving the given board configuration (Success rates, the number of trials solved correctly out of the total trials, narrow shutter: mean = 0.49, 95% CI = [0.44, 0.55]; medium shutter: mean = 0.31, 95% CI = [0.25, 0.36]; wide shutter: mean = 0.21, 95% CI = [0.17, 0.26]; Statistical significance, narrow vs. wide: Mann-Whitney test U = 31114, p < 0.001, Rank biserial correlation = 0.28; narrow vs. medium: Mann-Whitney test U = 37356, p < 0.001, Rank biserial correlation = 0.19; medium vs. wide: Mann-Whitney test U = 39967, p = 0.004, Rank biserial correlation = 0.1; see Fig 5B). We note that the success rates are fairly low despite the solution being within a few moves. Yet, since the searched space is combinatorial in nature and the branching factor is large, the search space is large as well, making the problem difficult (as is also indicated by the boards’ algorithmic complexity). However, the shutter heuristic also bears costs—If participants are using a shutter that focuses on moves on potential ‘X’ paths to win, they should miss more ‘O’ winning moves than ‘X’ winning moves (where misses of winning moves are events where a participant could make a winning move, but made another move instead). To infer the effects of a shutter on the probability to miss either ‘O’ or ‘X’ winning moves, we simulated the search while varying the shutter size—As the shutter narrows, fewer squares outside the possible winning paths of ‘X’ are considered (see Methods). Indeed, the computational simulations indicate that as the shutter narrows, the likelihood of missing ‘O’ winning moves increases and the likelihood of missing ‘X’ winning moves decreases, resulting in a decreasing ratio of their likelihoods (Simulations, Median ratio of missed ‘X’ wins to missed ‘O’ wins: narrow shutter: median = 0.05, 95% CI = [0, 0.16], medium shutter: median = 0.45, 95% CI = [0.38, 0.53], wide shutter: median = 0.67, 95% CI = [0.62, 0.73], all differences are significant with p < 0.001, Fig 5C, see S1 Text for the statistical tests). Thus, the computational simulations predict that the ratio of missed ‘X’ wins to missed ‘O’ wins grows with shutter size. These predictions agree with the behavioral data—As participants’ shutter size increases, the ratio between missed ‘X’ wins to missed ‘O’ wins grows as well (Participants, Median ratio of missed ‘X’ wins to missed ‘O’ wins: narrow shutter: median = 0, 95% CI = [0, 0], medium shutter: median = 0.33, 95% CI = [0.27, 0.42], wide shutter: median = 0.5, 95% CI = [0.44, 0.67], all differences are significant with p < 0.001, Fig 5B, see S1 Text for the statistical tests). This increase in the ratio of missed wins of participants as shutter size increases is driven by both a significant increase in missed ‘X’ wins and a significant decrease in missed ‘O’ wins (see S15 Fig). We note that the problem of finding a winning move is different from a general search problem since participants know that there is a sequence of moves that necessarily wins. This knowledge may guide participants to adhere to the optimal path to win, and this might show as a shutter heuristic. Another possibility is that shutter moves have intrinsically higher values and thus are chosen. However, participants more frequently chose moves that adhere to the shutter heuristic even when their previous move was not the optimal one (i.e., not on the winning path, % of moves with shutter 0 when previous move was not optimal = 65%). In addition, as exemplified by their blindness to their opponent’s winning moves, participants chose more frequently to adhere to the shutter heuristic even when their current choice was not the optimal one given their current board configuration (50% of moves with shutter 0 were not optimal, see also section S15 in S1 Text and S16 Fig). Similarly, on board V, using the “Forcing” strategy diverts participants from the correct solution, yet they still use it regardless (68% of participants show maximal likelihood for the “Forcing” strategy). Finally, we examined the fit of participants’ moves to a model that combines the scoring strategies and the shutter heuristic. We find that adding the shutter heuristic significantly improved the log-likelihood of the predictions made by all scoring strategies (When comparing each base model to a model augmented by a shutter parameter, all models significantly improved (AIC base − AIC base+shutter > 826, and p < 10−5 using a likelihood ratio test, see Fig 3D).

An internal search model and adjusted weighting of opponent moves do not explain the observed effect of previous moves Another possible explanation for the entropy difference between the full and truncated boards is that people use a limited look ahead strategy such as a short mental internal search on the tree before they mark their chosen move on the ‘sandbox’ board. In such a case, participants can re-use previous calculations in their next internal searches. These previous calculations might prune the search space and thus reduce the entropy of moves’ distribution on the third move. To test this, we implemented two internal search models, one that prunes the space via the alpha-beta pruning algorithm and the other that implements a Monte Carlo Tree Search (MCTS) algorithm. The alpha-beta pruning internal search model implements a depth-first search up to depth d and uses the scoring strategy to evaluate the nodes it reaches. The algorithm prunes tree branches that show worse performance compared to the already found alternatives. Based on this search, the next move on the board is chosen. For the next move, only branches emanating from this specific tree branch are considered, thus reducing the possibilities of next moves. For the entropy calculation we considered only the results of the chosen moves in the simulation and not the moves in the entire internal search that the simulated agent did to reach the chosen move. We simulated different realizations of the model’s two parameters: the depth of the internal search (with values between 1–3) and the number of nodes considered at each level of the tree (with values between 5–10). In Fig 4F we show that this model still does not achieve the entropy reduction shown in the behavioral data. The alpha-beta pruning internal search model predicts a lower entropy in the truncated boards (entropy of the best fitting model, full: mean = 1.51, 95% CI = [1.4, 1.61], truncated: mean = 1.83, 95% CI = [1.73, 1.92], around 17% entropy reduction, Fig 4F). However, the reduction in entropy is smaller than that observed in the behavioral data. In particular, while the entropy value in the full boards is similar to that observed in the behavioral data, the entropy in the truncated boards is lower (1.83 for internal search vs. 2.42 in the behavioral data), possibly because of the added internal search. Furthermore, the log-likelihood of the best fitted alpha-beta pruning model are worse than the log-likelihood of the shutter heuristic (log-likelihood of best fitting alpha-beta internal search model: -3.17, 95% CI = [-3.24, -3.09] compared with a log-likelihood of -2.21, 95% CI = [-2.17, -2.25] for the “Forcing” strategy augmented with a shutter model). We also implemented a different internal search model using Monte Carlo Tree Search (MCTS) with a limited search and with a memory of past computations (see Methods). However, this model also does not predict the reduction in entropy between the full and truncated board configurations, and does not fit the behavioral data well in terms of the log-likelihood (see section S11 in S1 Text and S12 and S13 Figs). The observed effect of “blindness” to the opponent’s winning moves could alternatively be explained by a mechanism that gives more attention to players’ own pieces (‘X’) than to their opponent’s pieces (‘O’). To test whether this model can better explain the behavioral data, we simulated agents with a differential attention on their own pieces than their opponent’s pieces (ranging from equal attention to both players, i.e., 50% attention to the opponent’s pieces, to no attention to opponent’s pieces, with jumps of 10%, see Methods and Eq 2). First, for each participant we took their best fitted strategy and compared the log-likelihood scores of this scoring strategy with the shutter heuristic or with the differential weighting model. The two models achieve similar log-likelihood results (shutter heuristic log-likelihood:-2.12, differential weighting log-likelihood: -2.15, see section S11 in S1 Text and S17 Fig). We then calculated the entropy of moves in the full and truncated boards according to the differential weighting of attention simulations. We find that there is no entropy reduction, i.e., that the entropy of the full board and the truncated board are not significantly different when not paying attention to the opponent’s pieces (mean entropy full: 2.53, 95% CI = [2.43, 2.6], mean entropy truncated: 2.53, 95% CI = [2.46, 2.64]; Mann-Whitney test U = 124695, p = 0.47, Rank biserial correlation = 0.002, Fig 4E). Lastly, we note that augmenting the shutter model with a decreased attention to opponent’s pieces improved the log-likelihood fit to the behavioral data, thus indicating that while differential attention cannot fully account for the behavior of participants in the task, it might play a role in the players’ search mechanism (see S17 Fig).

Path shutter pruning is dominant for most search parameters and board configurations The use of a path shutter shows both benefits (higher likelihood of solving the problem and smaller search size) and costs (missing potential winning moves of the opponent, leading to a wrong solution). We next examined whether using a shutter heuristic incurs a trade-off between using computational resources and the probability to reach the correct solution. To that aim, we simulated the search strategy using the alpha-beta pruning algorithm with the “Interaction” scoring strategy and varied the size of the shutter (Methods). We ran the algorithm on all ten board configurations (I–V, full and truncated) and varied the search algorithm’s main parameters—The noise levels in the score of each square (noise values ranging from 0–2.5), the branching factor which is the maximal number of expanded squares in each level of the search tree (values ranging from 3–10), and the limit on the maximally allowed search size (values ranging from 30–200). In total, we scanned 1760 different boards and search configurations, each simulation repeated 100 times for a total of 176,000 runs (see Methods). To assess the optimality of the search, we considered two performance measures: 1) The probability of finding the winning move (accuracy), and 2) The number of score evaluations performed throughout the search (computational resources). For each configuration, we examined whether there is a trade-off between the reduction in computation achieved by the shutter heuristic and the probability of finding the winning move. We consider a configuration to present a trade-off if there was a statistically significant difference (p < 0.05, using bootstrap) between the mean values of the two parameters when comparing different shutter values. We find that in most settings, the algorithm with the narrow shutter dominates all other shutter sizes, as it requires substantially lower amounts of computation while not sacrificing the probability to find a winning move compared to larger shutter size values (Fig 6A). Only in a few scenarios, we observe a trade-off between computation size and successful search, where multiple shutter size values lie on the Pareto front (Fig 6B). PPT PowerPoint slide

PNG larger image

TIFF original image Download: Fig 6. For most of possible search parameters, search with a narrow path shutter is the dominant search strategy. Computation reduction is computed as A) Example of a board configuration where a narrow shutter is the dominant search strategy since it shows similar probabilities for finding the winning move at substantially lower number of computations. Simulation parameters were: complexity [830], noise level [0.5], branching [5], limit number of moves [30]. B) Example of a board configuration with a trade-off between computation amount and the probability to find a winning move. For the given simulation parameters, an increase from shutter size 0 to shutter size 1 incurs a significantly higher probability to find a winning move but the number of computations increases as well. Simulation parameters were: complexity [49], noise level [0.5], branching [5], limit number of moves [30]. C) The phase space of board complexity vs. noise levels (aggregated over branching factor and search size limitations, see Methods). Each square shows the proportion of configurations in which there was a trade-off between using narrow vs. wider shutter size values: dark blue indicates configurations where the narrow shutter dominates the Pareto front (no trade-off), dark red indicates a trade-off between narrow and wider shutter values (trade-off between computation resources and accuracy). All error bars are 95% confidence intervals. https://doi.org/10.1371/journal.pcbi.1010358.g006 A comparison between the states where a narrow shutter is dominant and states where there is a trade-off between shutter sizes indicates that trade-offs are more likely to occur when the complexity of the board is low and noise levels are high (Fig 6C). When squares’ scores are accurately computed based on the scoring strategy (low noise), the path shutter is less likely to hamper performance as the best moves are likely to be selected. For such cases, focusing on the current path reduces computation while maintaining accuracy rate. However, when solving low complexity boards in the presence of higher noise in score evaluations, exploring alternative paths becomes more important and viable, and thus a wider shutter can improve performance. Importantly, for high complexity boards, the probability of finding the winning move is a-priori very low while the probability of following non-beneficial paths is high, thus widening the shutter size substantially increases the computations made and can also lead to deleterious effects by exploring less promising parts of the search space.

[END]
---
[1] Url: https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1010358

Published and (C) by PLOS One
Content appears here under this condition or license: Creative Commons - Attribution BY 4.0.

via Magical.Fish Gopher News Feeds:
gopher://magical.fish/1/feeds/news/plosone/