[HN Gopher] Mastering the Game of Stratego with Model-Free Multi... ___________________________________________________________________ Mastering the Game of Stratego with Model-Free Multiagent Reinforcement Learning Author : aklein Score : 100 points Date : 2022-07-11 15:32 UTC (7 hours ago) (HTM) web link (arxiv.org) (TXT) w3m dump (arxiv.org) | spywaregorilla wrote: | Stratego is an odd choice I feel. Evaluating it must be really | hard. A significant chunk of the game is just trying to remember | which unit is which of the things you've seen so far. Which | humans generally can't do very well but machines can do easily. | Beating hand crafted bots is good. | | Can expert humans beat the hand crafted bots? I'm guessing no. | Also, what's stratego like without the hidden units? Is that... | hard? | boringg wrote: | I don't think you are that familiar with how good Stratego is | played. It isn't strictly a memorize where your opponents units | are. There's a significant degree of bluffing and posturing. | thekiptxt wrote: | To the extent that machines can't replicate. If I place my | hand on a piece that's illegal to move, feign that I'm | reconsidering, and then move a different piece, then my | opponent may suspect that the originally touched piece may be | moved. | | How can I use these 200 IQ moves against a bot? | elcomet wrote: | This does not apply when playing machines. Like facial | expressions or gestures do not apply when playing poker | with machines. | spywaregorilla wrote: | It isn't strictly about memorizing where your opponents are, | but having a perfect memory would be an enormous advantage, | the point of being an entirely different game imo. | ghaff wrote: | Yes, playing it as a kid I remember one pretty common thing | to do was to make movements that made it easy for your | opponent to get confused about whether a piece was that | known piece or something else. Not the whole strategy but | take that piece out and you definitely change the game I | would think relative to typical human play. Especially in | the context of it normally being more of a kids game. | treis wrote: | Seems like it would be easy to test. See how the AI does | when revealed pieces stay revealed compared to a standard | game. | | Also, given this is online competitive play likely that a | significant portion of top human players are cheating and | have aides to keep track of pieces. | spywaregorilla wrote: | > Seems like it would be easy to test. See how the AI | does when revealed pieces stay revealed compared to a | standard game. | | It would effectively need to be a different AI to use | that information. And what does it play? Humans? Other | AI? | | > Also, given this is online competitive play likely that | a significant portion of top human players are cheating | and have aides to keep track of pieces. | | That's a great point. It may even be condoned in some | circles. | TemplateRex wrote: | The pace in online games is too fast (4s per move | usually, with a 12m buffer) to use pen and paper aides. | Short of having some sort of screen-scrabbing AI-tool | that auto-labels pieces as they are revealed (doable by a | decent programmer, but Stratego is no poker, there's no | money in it), I think it's safe to assume top-level | players don't cheat. Top-level play in live tournaments | has a high correlation with top-level online play as | well. | EthanHeilman wrote: | Most good players of stratego have perfect memory of all | revealed information during a game. That's basic table | stakes for playing, like memorizing the dictionary is for | scrabble or memorizing openings for chess. It isn't very | hard to do, since most moves don't reveal the value of | piece and those moves to do reveal a piece tend to led to | the destruction of that piece within a move or two. A | computer isn't going to have an advantage there. | Imnimo wrote: | From the paper: | | > Successes in the game have been limited, with artificial | agents only able to play at a level comparable to a human | amateur, see e.g. (14-20). | riku_iki wrote: | It could be because no one seriously tried to build | competitive AI player. | Buttons840 wrote: | _I think_ the best human players can beat the best computer | players in Stratego. Thus, Stratego is an excellent choice. | janosett wrote: | You might consider reading the linked abstract: "... Stratego | has been a grand challenge for the field of AI for decades, and | existing AI methods barely reach an amateur level of play." | spywaregorilla wrote: | Sounds like bullshit to me. I'm not convinced. Here's a paper | from 2021 that suggests as much. It's hard, sure, but it's | also not really seriously explored. | | > Compared to other games like Chess and Go, not much work | has been done on creating an AI agent for Stratego. As such, | the available literature is far and few between, and mostly | consists of bachelor's and master's theses. In fact, most | agents created for Stratego are largely undocumented or | closed-source, making it difficult to effectively ascertain | exactly which particular methods and techniques have been | applied and how effective they were. | | edit: the claim about bots not beating humans appears to | hold, but I'm not convinced its just shoddy bot quality. | JoshCole wrote: | Just to give a sense of hardness, Stratego's game tree is | infinitely larger than Go's game tree, because in imperfect | information you select a continuous strategy vector over | action space whereas in Go you select a discrete action. | Meanwhile Stratego is also infinitely more complex than Go | because in Go there are less moves with every move, but in | Stratego moves don't monotonically decrease the remaining | game length. | | Beyond those two infinities of greater complexity, Stratego | is imperfect information, so it is played relative to the | infostate, not the state. In Stratego there are thirty | three pieces with unknown information on the first move. We | can definitely get a lower upper bound by applying | abstraction via domain knowledge, but just so I don't have | to deal with the complexity I'll state that there are at | most 8683317618811886495518194401280000000 different states | associated with the infostate of your first move. | Meanwhile, on your first move in Go, you are in at most 1 | state. | | In practice though the average length of a game of Go is | ~200ish, the average length of Stratego ~400ish. | | Much like chess, I would expect optionality to be an | important strategic consideration in Stratego. So branching | factors are likely selected for such that they get higher | by good agents. In contrast to something like Go, the | branching factor would tend to diminish over time. | | Mostly sharing this because I think most people are bad at | reasoning about complexity - our mind is really good at | making complex things seem simple and simple things seem | complex because we can actually deal with the complexity of | simple things in their full complexity but dealing with the | full complexity of the complicated is intractable. I | imagine a lot of people's mind latch onto the things like | "we get information so playing well means memorization" and | don't pay as much focus to the dizzying complexity; but | playing well isn't memorizing. Playing well is playing | perfectly according to your uncertainty and it just so | happens that as part of doing that you sometimes reduce | your uncertainty by scouting. | spywaregorilla wrote: | This is all technically true, but it's also misleading I | feel. Chess and Go have a lot of states, and you need all | of them. The entropy of the games are enormous. Move that | bishop and the significance of every piece on the board | could change. Stratego has verrrryy low entropy. You can | only move one piece one space at a time aside from | scouts. You can also enter scenarios where the game just | simplifies on some dimensions immensely to which some | fairly dumb algorithms can win the game. If the enemy | loses their top dude and their spy, your top guy is | invincible against anything that has ever moved, which | isn't victory assured but its probably relatively easy to | write an ai to win from that point on. | | Maybe this is just me but when I thought about stratego | I'm pretty sure I only thought about a few units at a | time. Moving all of them is a bad idea because of bombs | anyway. Your mental model of the game does not consider | the state of units in the back row of either side. It | probably doesn't even consider all of the units in the | front. | | I also suspect a lot of optimal play is just doing dumb | shit back and forth to force your opponent into making | the more aggressive move in a lot of cases without | technically forfeiting which most humans likely wouldn't | do but ai would like to do if not constrained otherwise. | | But I agree a simple tree search on possible moves is | unlikely to work very well until the end game. | JoshCole wrote: | If I'm understanding you correctly, you're saying that | the rules of Stratego let you apply an abstraction rule | which drastically simplifies the game. I agree you can do | this. I think this point is remarkably similar to the | technique of blueprint abstraction. | | Of course, we ought to be focused on the situations which | lead to these nice situations - which leads us back to | the imperfect information situations which lead to those | positions we can reason about. | | Interestingly, you can invert your point and get a | similar rule more generally. The trick is to backward | induction on the abstracted category transitions. So your | point is actually so valid that is valid even in | situations where your examples don't apply. I hope you | can see I'm strong manning here. I agree with you that | this should be done and is critical to making things | tractable. | | There is an issue though, at least in the generalized | case, which is that perfect information is much easier | than imperfect information. | | See, in perfect information games when you do the | backward induction step you actually just flat out solve | the game. Meanwhile, in imperfect information games, this | backward induction step merely makes solving the game | more tractable. Chess endgames are the backward induction | version of your forward induction from the rules | abstraction, but the abstraction rule is so hard to | determine we wouldn't usually think of it that way. | Notice that chess is hard enough that we don't have | endgame tables that go all the way back to the start of | the game? Yet the average game length in Stratego is | hundreds of moves longer and there are more then double | the number of pieces. | | I still think your point is right and someone who pushes | hard enough in this direction would manage to tackle the | complexity. So I basically agree with you. But if you | take standard approaches like counterfactual regret | minimization and throw it at the game without adjusting | them for the fact that the problem is "hard" then they | just wont terminate. | spywaregorilla wrote: | I guess I have two points. Perfect information stratego | is not a hard game at all. There's still a lot of moves | one can make but perfect play is easy to calculate. Near | perfect play is probably easy enough even for an amateur | player. My gut says many games will converge on this | state (early) if you have an unbeatable piece. | | Without perfect information, the number of states is | still mostly a red herring, because the differences | between the states are immaterial. The moves aren't super | important. If you decide to move frontline unit A against | the frontline unit on the opposite side of the field, | there may be several moves between that but you've only | made one noteworthy decision. Could be bad intuition. I | think a more abstract model here would do better and be | far simpler with some minor tactical move prioritization | or whatever. | TemplateRex wrote: | The best available bot that is also mentioned in the paper, is | Probe. Expert humans will score the same as DeepNash against | Probe. The best humans have no trouble recalling every piece | that moved, and the square it originated from. Top-level play | usually has very few moved pieces (since they are vulnerable to | your opponent's general once your own marshal gets revealed), | so memory is important but typically not the main bottleneck. | spywaregorilla wrote: | > Top-level play usually has very few moved pieces (since | they are vulnerable to your opponent's general once your own | marshal gets revealed), so memory is important but typically | not the main bottleneck. | | Interesting. Is top level play... boring? Stratego doesn't | have a lot of nuance to positional advantaging aside from | moving forward or back, and while I'd imagine there's | stalemate rules, there's probably a lot of nothing moves to | dance around getting super minor and uninteresting | advantages. Is that a correct statement? | TemplateRex wrote: | It depends on player attitudes. The maneuvering is simpler | than in chess or even checkers, although you can have very | intricate multi-piece pincers around the lakes. Most top | players however tend to be quite aggressive, sacking majors | or even colonels to expose the opponent's top two pieces, | while at the same time having their counter attacks ready. | Typically there are 3 simultaneous battles for each of the | 3 alleys. But, yeah, as in chess, there can be bloodless | draws or fortress positions where neither wants or can make | headway. | dr_orpheus wrote: | > A significant chunk of the game is just trying to remember | which unit is which of the things you've seen so far | | While this does get hard the further you go in the game, a | think a more significant portion of the strategy is trying to | predict newly moving units based on what is happening in the | game. i.e. I just took a 7 unit with my 8. And now my opponent | is coming straight towards me with a unit from elsewhere. Is it | a 9 or 10, is it a bluff to drive me off, to drive me in to a | spot where the actual 9 or 10 is waiting? | spywaregorilla wrote: | Computers are better at such problems. | aidenn0 wrote: | Did they renumber the pieces? In the set I played as a kid a | scout was 9 and a Marshall was 1... | wakamoleguy wrote: | Yes, around 2000 they swapped the numbers. Most recent | versions treat the Spy as 1 and Marshal as 10. Classic | versions ranked the Marshal as 1 and Scouts as 9, with the | Spy designated as S. | andruby wrote: | I bought a set to play with my kids and felt like | something was off but couldn't really recall. Thanks! | miiiiiike wrote: | Got a copy of Stratego (one of the old-style ones with descending | rank, as pleases the gods) so I could show my board game loving | girlfriend one of my favorite games from when I was a kid. She | hated it. | LionTamer wrote: | Seeing the paper title gave me so much nostalgia for playing | Stratego as a kid, that was always my favorite Board game. Glad | to see I'm not the only one who used to love that game, few of | my friends growing up played it. | jrussino wrote: | Did you still enjoy it? I loved playing this game with my Dad | when I was a kid and I'm wondering if it still holds up. | TedDoesntTalk wrote: | Played it recently with a 12 year old boy. He loves it. | jasone wrote: | It held up for me. I played a lot of Stratego with family and | friends as a kid, but there was one friend in particular who | routinely wiped the battlefield with my pieces. I couldn't | understand how he so consistently beat me, and as an adult I | was able to 1) reason more deeply about the game, and 2) | quickly learn deeper strategy from the Internet. As a result, | the game is even richer now to me than it was in the 1980s. | evouga wrote: | I mean. Stratego is a great game; I had a lot of fun playing it | at summer camps when I was a young boy. It's cool there's a good | AI for it. | | But this result feels a bit anticlimactic in a world where AIs | can already beat expert humans at go, six-player poker, | Starcraft, ... | 60secz wrote: | These are interesting not because you're solving for a game, | but because you're potentially partially solving a category of | problem. | goodside wrote: | It's explained right in the abstract why Stratego is a more | difficult game for AI than go or poker. | TemplateRex wrote: | You can view Stratego as the "Cartesian product" of a public | information board game and an imperfection information "card" | game. The board game has much simpler local tactics than e.g. | chess or checkers, although whole board tactics where 2+ high | pieces are trapping 1 lower piece defended by 1+ high piece | are extremely complicated to reason about. | | The "card" game can be viewed as a form of Limit Poker. The | bidding in poker is done with secret cards and public bets. | In Stratego, you bet _with_ your secret pieces, so it 's more | like a closed bid auction. But since there are only 10 | moveable ranks, the range of bluffs you can pull off is | rather limited compared to e.g. No Limit Poker. | | Each of the "subgames" in itself are quite tractable for | computers. But the numerical product of all public game | states times the number of secret information states is | humongous. Combine this with the fact that imperfect | information game trees don't decompose as nicely as e.g. | chess game trees, and computers will also not be able to | divide-and-conquer their way out of the numerical complexity | with brute force. Whereas humans can come a long way with | heuristics. | kzrdude wrote: | How far did they get with Starcraft? Stratego should be a | stepping stone to get there - it introduces imperfect | information. | confuseshrink wrote: | Starcraft in the form of Alphastar worked in the sense that | it could beat humans, at least in the short term. The problem | with the whole technique is that they had to tether it to the | human examples they had gathered in the form of a divergence | loss. | | I haven't checked out the linked paper yet but if they | managed to do something from first principles that would | still be an interesting development. | anonymoushn wrote: | spywaregorilla left a good reply about Starcraft. In the | games I watched, despite the AI being handicapped to use | human-like levels of APM and human-like viewport management, | it primarily fought using *clearly superhuman* blink stalker | micromanagement. Stalkers with barely any health would | typically survive engagements and get to re-engage when their | shields were fully recovered. On the other hand, a human | player managed to confuse the bot by repeatedly airdropping | units in its base, picking them up, and dropping them | elsewhere. The bot uselessly moved its mass of stalkers back | and forth while losing many probes and buildings. | | Edit: After looking at some games again, the AI also benefits | from precise target selection for the phoenix's graviton | beam. A human player might take 3 phoenixes and a bunch of | stalkers into a fight against an army containing 3 immortals, | some sentries, some stalkers, and some zealots, and use | graviton beam to pick up some mix of units including units | other than immortals. The bot can pick up only the immortals. | spywaregorilla wrote: | It's very difficult to say. There are many cookie cutter | strategies for RTS games and it's difficult to handicap the | ai to note use its machine precision and quickness of | thinking to just win everything on a tactical level. actions | per minute handicaps are not nearly nuanced enough to capture | this. Generally it seems that the absolute top humans are | better than bots strategically but the execution to do so is | really, really tough. And then of course there are dumb | exploits because you find some dumb weakpoint than any sane | human would quickly adjust their behavior for. | warrenm wrote: | I haven't played Stratego in decades! | | Loved it as a kid, though | hervature wrote: | I'm still trying to grok and implement the paper, but I studied | AlphaGo/AlphaZero/MuZero during my PhD. The core contribution | here is the Nash equilibrium component to imperfect information | games using only self-play. Note, there is no MCTS being done in | this paper. This differs from counter factual regret methods | (like the most famous Poker AIs) because it does not need to | compute for all possible "information sets" which makes it | intractable for even sufficiently complicated poker variants. It | should also be noted (as they do in the paper) that this is more | incremental than methodologically innovative as AlphaGo. This is | the AlphaZero step increment to NeuRD. As is my general critique | with their previous papers, they generally omit many engineering | details that prove to be very important. Here, they admit that | fine-tuning is vitally important (one of the 3 core steps) but | details are relegated to the supplementary materials. It also | opens up the question of if this new "fine-tuned" policy still | guarantees the Nash equilibrium which it obviously does not as | some mixed strategies are going to have sufficiently small | probability. I wish researchers would be more honest with "this | is a hack to get things to work on a computer because neural | networks have floating point inaccuracies". It doesn't ruin any | of the theory and no one is going to hold it against you. But it | causes all sorts of confusion when trying to reimplement. | algo_trader wrote: | > I'm still trying to grok and implement the paper, but I | studied AlphaGo/AlphaZero/MuZero during my PhD | | What is the SOTA on solving non-adversarial (single player?) | POMDPs? Are those considered to be much simpler problems? | hervature wrote: | POMDPs is exactly how one formalizes imperfect information | games. This is where the concept of information sets comes | from. To answer your question, any two player algorithm is | going to apply to single player games as it is trivial to | transform. For games like 2048, the "adversary" is simply the | opposite of your outcome. For games where you are trying to | maximize your score, this is the standard RL setting and any | of the Atari algorithms (including MuZero) can be used. | | In case you are wondering about cooperative multi-agent | games, I would check this group's publications: https://www.c | s.ox.ac.uk/people/publications/date/Shimon.Whit... | igorkraw wrote: | It strongly depends on what type of structure you can assume | and how expensive sampling is. Dreamerv2, agent57 on Atari, | dreamerv2 and the generalized agent model trained on 600 | tasks by deepmind might be worth looking into for different | approaches on pomdps, but you can do much better if you | impose physics priors by e.g. using neural ODEs for the | latent state modeling. | | POMDP just means "observations are not state" and that you | need to use a stateful policy to infer the state somehow, but | without further assumptions it's difficult to answer this | question | TemplateRex wrote: | What I don't understand is why they don't try to make | inferences about the opponent's private state. I get that the | full Bayesian update is intractable, but some sort of RNN or | LSTM should be able to produce pretty accurate estimates for | the opponent's private info. And with self-play, you can train | the deduction head of a NN by adding a KL-divergence between | inferred and ex-post observed pieces. That would both make you | guess better and also try and "jam" your opponent's inference | by randomizing your own piece distribution. | hervature wrote: | This is an interesting avenue for future research. The reason | why it is not as straightforward as you claim is because all | inference is going to depend on your perception of their | policy. That's why the Nash equilibrium is sought after | first. Because you should assume your opponent is perfect | until you start observing their suboptimal behavior that you | can exploit. Additionally, you would also have to handle the | meta part where the exploiting portion of the algorithm isn't | itself being exploited by the opponent. Somehow, you should | deviate slowly from the Nash equilibrium but revert quickly | if the opponent is abusing your new strategy. | TemplateRex wrote: | But their NN already outputs a policy conditional on public | and private info! Why not have a separate intermediate | branch in the NN that is fed with the current estimate of | private info (for both players) and outputs the policies | (again for both players) _given those info estimates_? | Wouldn 't it be possible to learn from that? | hervature wrote: | First, the neural network is taking the history of | observations into account. We don't know what the NN has | learned, but the NN is probably making some inference on | likelihood of opponent piece locations. They haven't | explicitly coded it to do that but it is difficult to | imagine a human-level AI not doing this. | | Second, what you are suggesting is probably best done as | a secondary process outside of learning the Nash | equilibrium. If you knew an opponent's policy, you would | need to recalculate your optimal counterplay for that | specific policy. This is completely orthogonal to the | goal of this paper which is to learn the Nash equilibrium | through self-play alone. | mensetmanusman wrote: | Would it be interesting if the posted the approximate kWhr energy | required to train? | voidfunc wrote: | I haven't played Stratego since I lost my board when I was in | third grade and brought it to play during recess... | | Is there a good online version these days? | hirundo wrote: | When I was a kid I "won" a Stratego game with a non-move that my | friend claimed was against the rules. So he claimed the win. | Could I get an umpire's call here? | | The issue is that when considering my next move, I picked up the | bomb piece, thought for a few moments, put it back down, and | moved another piece. My friend then, assuming that I had just | given away that it was _not_ a bomb, attempted to capture it, and | lost the attacker. | | He claimed that it was illegal to pick up that piece and put it | down again, although he had no objection until he learned that | I'd tricked him. We had never previously announced or enforced a | touch-it-move-it rule. | | So did I win that game or did he? That's not a question machine | learning could answer. | Swenrekcah wrote: | You won. Deception is obviously a critical part of all warfare. | yborg wrote: | By that argument, the opponent rightfully won, because | negotiation is also a critical part of warfare. He got his | opponent to give away a tactical battlefield victory on the | basis of mutually agreed ground rules. | | Of course, dissatisfaction with such outcomes has often | resulted in further wars. | aidenn0 wrote: | Official ISF tournament rules[1] say: | | > Touching one of his own pieces does not oblige a player to | move it. | | It also says: | | > Psychology, bluff and misleading manoeuvres are considered | important aspects of Stratego. Bluffing consists of all verbal | communication (talking) or non-verbal communication (acting, | mimic or feign) which is intended to mislead your opponent. All | forms of bluff are allowed at any time during the game, unless | prohibited by any other rule. Abuse will be considered | unsporting behaviour and can be penalized accordingly. | | So I think you won. | | 1: https://isfstratego.kleier.net/docs/rulreg/isfgamerules.pdf | toast0 wrote: | I agree | | > 5.2 Moving | | > Flag and Bombs are never moved (For the definition of | ,,move": see chapter 6). | | Chapter 6 shows the sequences of moves, and then Chapter 7 | says: | | > A move is made when: | | > a piece is released on another square than the starting | one, or | | > a player touches an opponent"s piece with one of his own | pieces or with the hand in which his own piece is held. | | So if the piece was picked up and replaced on the starting | square, it was not moved, and that's fine. | | On the other hand, these rules incorporate some other rules | by reference which includes: | | > The bombs and the flag may never be moved and therefore | remain in the same place throughout the duration of the game | | A bomb hasn't exactly remained in the same place if it's been | picked up, has it? | TemplateRex wrote: | In actual live game play, none of the top players use such | kids' ploys as touching bombs pretending to be moveable | pieces. Actual game play wouldn't change if the one touch | move rule from chess or checkers were to be introduced. | | The main thing the rules guard against is accidentally | tripping over your opponent's or your own pieces. | Especially during time trouble, it sometimes happens that | you drop a piece. If your opponent reveals < 4 of your | pieces, you can exchange all of them with any other of your | pieces. With >= 4 "accidents", it's an automatic forfeit. | If you tip over your opponent's flag, he is entitled to 3 | swaps (may or may not be bombs). | aidenn0 wrote: | I always wondered if the pieces could be redesigned to | have a strong bias towards landing "face down" when | bumped; I've definitely accidentally revealed pieces | before when making a quick move. | boringg wrote: | Hmm that you picked up the piece might have been an infraction. | I constantly touch the bomb pieces but I don't lift them off | the ground which implies that they can move. | | Tough call - glad that you are still carrying this from your | childhood though. I would wager that you won but through | borderline cheating. Still not sure TBH | milesskorpen wrote: | If you're playing seriously, I think if you touch a piece you | need to move it. | HWR_14 wrote: | You won because "He claimed that it was illegal to pick up that | piece and put it down again, although he had no objection until | he learned that I'd tricked him." | | I think "touch-move" is perfectly valid to enforce, but you | waive your right to do so once you move after that. If someone | touches a piece in chess, then moves another piece, you don't | get to go all the way until you're mated and then say there was | an error you should have won on. | | And if you touch an illegal to move piece, I would say there | the penalty would probably be revealing the bomb (or that it is | immobile), not forfeiture of the game. | Imnimo wrote: | Wow! I did exactly the same thing as a kid. I though I was | sooooo clever. ___________________________________________________________________ (page generated 2022-07-11 23:00 UTC)