[HN Gopher] Solving Wordle in 3.64 Guesses on Average, 99.4% of ...
       ___________________________________________________________________
        
       Solving Wordle in 3.64 Guesses on Average, 99.4% of the Time
        
       Author : tomlockwood
       Score  : 84 points
       Date   : 2022-01-23 20:42 UTC (2 hours ago)
        
 (HTM) web link (lockwood.dev)
 (TXT) w3m dump (lockwood.dev)
        
       | julianwachholz wrote:
       | Wordle is a perfect example of nerd sniping, I couldn't stop
       | playing once I started. But I wanted to challenge my friends to
       | some more difficult puzzles and wrote a tiny tool this weekend to
       | do just that!
       | 
       | You can create your own Wordle-like puzzles on https://word.rodeo
       | 
       | I've already received a ton of positive feedback from friends.
       | What are your thoughts?
        
         | depaulagu wrote:
         | It's not copying the URL to my clipboard, would it be possible
         | to expose the url as text so manual copying can be done?
        
           | julianwachholz wrote:
           | My partner noticed this too, but after a couple of tries the
           | same button will work again. Really hard to debug
           | unfortunately. It usually worked after a couple of tries. I
           | just pushed a fix that greys out the link if there's no URL,
           | hopefully that helps?
        
         | itrulia wrote:
         | Love it
        
       | prezjordan wrote:
       | It's interesting to see the edge cases where the algorithm
       | struggles to find the last letter.
       | 
       | > shave: [slate: 20202, share: 22202, shame: 22202, shape: 22202,
       | shake: 22202, shade: 22202]
       | 
       | I guess it needs to find the right choice of [r, m, p, k, d, v]
       | to get "shave" (since 'v' is so rare it takes 6 guesses!). Trying
       | "marks" and "paved" would narrow that list down in 2 guesses (and
       | a third to actually try "shave")
       | 
       | Anyone code this up yet? (I haven't, I took a greedy approach too
       | [0]). Wonder how to generalize such a plan.
       | 
       | [0]: https://github.com/jdan/wordle.ml
        
         | tomlockwood wrote:
         | Yeah, I don't want to give the game away on my next program but
         | its along those lines, and there are some challenges to making
         | it work that I'm still figuring out haha.
        
         | muti wrote:
         | The solver is playing in "hard" mode, where any yellow/green
         | letters must be used in further guesses
        
       | jutaz wrote:
       | Shameless plug: I've also made a solver, written in Rust. Main
       | goal was to learn the language, 2nd the trie structure behind the
       | search: https://github.com/jutaz/wordle-search
        
       | RivieraKid wrote:
       | Why not simply choose a guess which narrows the set of possible
       | solutions the most?
       | 
       | This could be improved by looking one step further, but this
       | would probably require some kind of optimization / heuristic /
       | approximation to make it computationally feasible.
        
         | tomlockwood wrote:
         | Watch this space :D
        
         | Retric wrote:
         | It really depends on what you are trying to optimize. Least
         | guesses on average may not want a 50/50 split it might want say
         | a 60/40 split where the 60% is 2 guesses and the 40% is 3. But
         | that loses to a 48:52 split which gives 2 guess 80% of the time
         | etc.
        
       | thxg wrote:
       | This [1] solves it in 3.4212 average guesses 100% of the time and
       | claims optimality, which is 99% of the difficulty it seems. For
       | the limited 2671-word set of possible words, I think the question
       | is now settled (for the whole 12k-word set, I don't think anyone
       | tried anything).
       | 
       | It was posted here four days ago [2].
       | 
       | [1]
       | http://sonorouschocolate.com/notes/index.php?title=The_best_...
       | 
       | [2] https://news.ycombinator.com/item?id=30006724
        
         | thxg wrote:
         | Ok, I found this one [1] with 100% wins on the whole 12k-word
         | set. But it optimizes for worst-case, not average.
         | 
         | [1] https://www.poirrier.ca/notes/wordle/
        
       | Edman274 wrote:
       | If you right off the bat guess "bound" and then positions 2, 3,
       | 4, and 5 all come up green, then your goal needs to then be to
       | use words like 'morse' to rule out (in one fell swoop) words like
       | 'sound', 'mound' and 'round'. Then, if it's not ruled out, you
       | can then choose words that just have m and s, or s and r.
        
         | muti wrote:
         | This is a valid strategy in easy (default) mode, but in "hard"
         | mode, you must use all yellow/green letters in subsequent
         | guesses
        
       | canjobear wrote:
       | Going for greens is a good heuristic, but the general solution is
       | to seek guesses that shrink the size of the set of compatible
       | words the most. https://langproc.substack.com/p/information-
       | theoretic-analys...
        
         | OJFord wrote:
         | Or, especially in first guess or so, shrink the set of letters
         | used by the remaining compatible words, which is subtly
         | different right?
        
         | npinsker wrote:
         | > From an information-theoretic perspective, the optimal
         | initial guess is the one which is maximally informative about
         | the target, given the resulting pattern.
         | 
         | This statement (which motivates the rest of the analysis) isn't
         | correct, because it doesn't take into account the highly
         | irregular search space of what you're actually allowed to
         | guess. Sets of possible words can vary a lot in terms of how
         | easily they can be further cut down: for example, if you're on
         | hard mode and your set of possible words is [BILLY, DILLY,
         | FILLY, HILLY, SILLY, WILLY], then you might be in trouble.
        
           | canjobear wrote:
           | Yeah, the analysis doesn't really work for hard mode as noted
           | in a footnote.
        
             | npinsker wrote:
             | I don't think it works for easy mode either. The
             | overwhelming majority of 6-word sets can be distinguished
             | in two guesses in easy mode (their EV is either 11/6 or 2),
             | while this set has an EV that's around 3.
        
               | canjobear wrote:
               | I think you're right. Although a first guess might reduce
               | the set size a lot, it might lock you into a situation
               | where all the possible second guesses are useless.
        
         | zeroonetwothree wrote:
         | Using essentially this method I get that the best initial guess
         | is RAISE (there are some others that are very close). It seems
         | the author of this post didn't use the actual word list from
         | the game so got a different word.
        
           | tomlockwood wrote:
           | I used the actual wordlist, yoinked out of the javascript
           | with my totally elite dev-tools-in-chrome-hacking!
           | 
           | But yeah, given the frequency of letters, in _position_ , in
           | the target list, the answer is SLATE.
        
           | canjobear wrote:
           | Yeah when I ran my script using the real wordlist I got RAISE
           | too. (Just updated the article.)
        
         | kccqzy wrote:
         | That's exactly my approach.
         | 
         | I wrote a very simple O(N^3) algorithm to determine that: for
         | each possible actual word and the guessed word, figure out
         | whether or not a word is eliminated. Then calculate a score
         | based on how many words can be eliminated from a particular
         | guessed word, averaged over all possible actual words.
         | 
         | (I originally thought O(N^3) would be way too slow to be
         | useful. But it runs fast enough with a few thousand words.)
         | 
         | Like a sibling comment, I also found that the best initial
         | guess is RAISE, although when I tried a different word list, I
         | got different words like LARES (I don't think it's in the
         | Wordle list).
         | 
         | One thing that surprises me is that by the time you get to mid-
         | game (after 2 guesses or so), occasionally it can be more
         | optimal to guess a word with repeated letters; I did not
         | originally expect that because I thought repeated letters are
         | wasted opportunity to test more letters, but it turns out
         | Wordle's handling of repeated letters can give you helpful
         | information about the number of occurrences of a letter. (For
         | example if you picked a word with 2 a's, the first one can be
         | green the second one can be black, essentially telling you
         | there is exactly one letter "a" in the word.)
        
       | LeoPanthera wrote:
       | See also Absurdle, an adversarial Wordle that is designed to be
       | as difficult as possible: https://qntm.org/files/wordle/
       | 
       | I won't spoil how it works, but click the "?" button on the page
       | if you want to know.
        
         | pxx wrote:
         | It's designed to be, but it isn't necessarily, as it uses the
         | size of the remaining solution set as a heuristic for how many
         | guesses are needed to split it. There are better, more
         | exhaustive searches of the solution space now.
        
       | vba616 wrote:
       | >so far what was taking 1GB of RAM in Python is taking, literally
       | 1MB in Rust
       | 
       | Is anybody hiring people to fix situations like this at work?
       | i.e. this script that was written in a hurry needs to be revised
       | to be 10^3 times more efficient, in space or time.
        
         | tomlockwood wrote:
         | As a data engineer, I sometimes run into problems like this,
         | but I generally find that 99.9% of them can be solved "well
         | enough" in python, or in a database. Moving code from Python to
         | Rust or similar has been required only a handful of times.
         | 
         | This specific task, the next program I'm working on, seems to
         | be a very "algo" based, comp sci type problem. I think most
         | companies want to say that they have these kind of problems,
         | but the reality often is, they need a batch script on a laptop.
         | 
         | So like, yeah, I have done things like this before in my work,
         | but often, solutions in Python are "good enough".
        
       | npinsker wrote:
       | I wrote a solver a few weeks ago that solves easy mode in 3.45
       | guesses and hard mode in 3.55 guesses. Rather than using
       | heuristics, it explored a large portion of the game tree (with a
       | little pruning) and I'm hopeful that it's optimal. Interestingly,
       | the EV-optimal hard mode strategy takes 3.52 guesses on average,
       | but requires 7 guesses a tiny fraction of the time, so I instead
       | exposed the best strategy that always requires 6 or fewer
       | guesses.
       | 
       | If you're interested, you can play with it here:
       | http://www.npinsker.me/puzzles/wordle/ and the source (written in
       | neophyte's Rust) is here:
       | https://gist.github.com/npinsker/a495784b9c6eacfe481d8e38963...
       | 
       | (edit: It's posted elsewhere in the thread, but this is actually
       | not optimal and someone else achieved better results!
       | http://sonorouschocolate.com/notes/index.php?title=The_best_...)
        
         | hgibbs wrote:
         | As a point of curiosity, what do you mean by optimal?
         | 
         | To me, I can see at least two (potentially different) cost
         | functions that a Wordle strategy can aim to minimise. Firstly,
         | you can try to minimise the expected number of guesses, and
         | secondly you can try to minimise the expected number of guesses
         | conditional on never losing the game. In principle you could
         | optimise for the first but not the second by allowing a small
         | list of hard words to take > 6 guesses on average.
         | 
         | Part of my point is the lack of an absolute definition of
         | optimal: strategies are only optimal up to the specified coat
         | function.
        
           | furyofantares wrote:
           | They specified that in the post, they were able to constrain
           | to a strategy that guarantees 6 or fewer and then optimize
           | for EV to get 3.55, but could get down to 3.52 by allowing
           | some small number of words to take more than 6.
           | 
           | I would be interested to know if 6 is the lowest you can
           | constrain it. Can you guarantee a solution in 5, and if so
           | what is the best EV with that constraint?
        
             | npinsker wrote:
             | Yes, you can -- I believe the EV was around 3.47 for that.
             | The "greedy" strategy (minimizing the worst-case size of
             | the largest set) also ends up using at most 5 words.
             | 
             | Perhaps unsurprisingly, you can't guarantee at most 4
             | words.
        
               | Sesse__ wrote:
               | You can get EV of 3.46393 with a guaranteed maximum of
               | five tries. (I don't know whether this bound is tight.)
        
           | hervature wrote:
           | Funnily enough, I think neither of those notions are the
           | correct thing to optimize. For me, you want to minimize the
           | maximum number of guesses. This leads to an equilibrium.
           | Otherwise, your opponent can just abuse your strategy.
        
             | pxx wrote:
             | Isn't that just #2, but changing the definition of losing
             | to be the least number of rounds where you can always win
             | (which for Wordle is 5)? These small changes to the
             | definition don't seem to change ev (or the code needed to
             | generate it) that much.
             | 
             | If you don't care about optimizing the EV at all, the
             | greedy tree already has the property you're interested in.
        
         | theolivenbaum wrote:
         | Nice! I was wondering almost the same thing this morning while
         | my wife was playing Wordle - ended up coding an assistant to
         | help you choose the optimum words.
         | 
         | If anyone wants to check it out, I published it here
         | https://curiosity.ai/wordle/ - source code is also available
         | here https://github.com/theolivenbaum/wordle-assist.
        
         | [deleted]
        
         | stavros wrote:
         | That looks great! It would be very interesting to see the set
         | of words "left" after a guess, too.
        
         | tomlockwood wrote:
         | Oh dang! I was guessing that the optimal approach would be
         | under 3.5 for easy mode, but was expecting it to be closer to
         | 3! Amazing work. My next go at the problem is in Rust, which I
         | am a total newcomer to. Thanks for sharing!
        
       | gbronner wrote:
       | The objective function is weighted average number of guesses,
       | with some penalty factor for >6 (aka losing), and optionally
       | different weights for getting it in 1,2,3,4,5,6
       | 
       | And you can solve it in a tree- at least for the lower levels
       | when the search space has been reduced
        
       | s_Hogg wrote:
       | This is really interesting, well done :)
       | 
       | As it stands, it seems like your algo will always use the same
       | first guess, have you thought about making that configurable and
       | seeing what happens with e.g. the number of words where it gets
       | stuck and runs out of time?
        
         | tomlockwood wrote:
         | Get to work Mr. Hogg!
         | 
         | Yeah, next program will effectively do this but in an utterly
         | insane way. Shall we brunch to discuss soon?
        
       | hoten wrote:
       | Jake Archibald shared an implementation that aims to narrow the
       | search space the most with each guess.
       | 
       | https://twitter.com/jaffathecake/status/1483057877875101697?...
        
       | Grustaf wrote:
       | As many have mentioned, it seems much more efficient to focus on
       | eliminating letters, which means that the second word should not
       | be very green at all.
       | 
       | Also, since there's only one word a day, I think we can assume
       | that it's chosen manually, and it won't be some archaic anglo-
       | saxon tool for shoeing horses or something, it will be a
       | relatively common word.
        
         | speg wrote:
         | It says it is designed for hard mode, in which I believe you
         | must use your green (and maybe yellow?) letters.
        
         | tomlockwood wrote:
         | I think it's chosen based on a random number generated from the
         | date - but one of the most recent games had the answer "REBUS",
         | which I've never heard or read before. But yeah, the target
         | list has many more common words in it than the list of all
         | possible guesses.
        
         | yardstick wrote:
         | It's chosen from a sequential list of words in the source code.
         | Each word was vetted by the woman this game was originally made
         | for.
         | 
         | See the NY Times article:
         | http://web.archive.org/web/20220106042510/https://www.nytime...
        
       | bbasketball wrote:
       | If you like Wordle you should check out https://wordhoot.com/
        
       | anotheryou wrote:
       | Why optimize for greens? Better to rule out more first no matter
       | the position. I think it narrows the search space better.
       | 
       | If you want to go fancy somehow calculate with bigramms. If you
       | guess c you don't have to guess k blindly too, etc.
        
         | canjobear wrote:
         | The generally optimal thing is to choose guesses to reduce the
         | set of possible targets as much as possible. This is the same
         | as choosing guesses with maximal entropy over the resulting
         | patterns (that is, the result you get back from the game should
         | be maximally informative). Because greens are rare, by
         | maximizing the probability of greens, you are approximately
         | maximizing the entropy of patterns, by distributing probability
         | mass among rare patterns. This is why going for greens is a
         | good heuristic.
        
           | ascar wrote:
           | This sounds like the optimal strategy for a single guess. But
           | shouldn't the optimal strategy for solving the game include
           | that there are multiple guesses? While some specific word
           | might provide the highest reduce in entropy for the first
           | guess, it might actually negatively impact the entropy
           | reduction of remaining possible words compared to another
           | first guess.
           | 
           | E.g. if RAISE is the optimal reduction in search space for
           | the given word list, what's the best 2nd guess for every
           | possible word? Now taking the average 2nd guesses into
           | account, is there a better first guess?
        
             | canjobear wrote:
             | Good point. Because the set of possible guesses is
             | constrained, the totally general solution needs to take
             | future guesses into account. There doesn't seem to be a way
             | to find the optimal guess other than searching the whole
             | game tree (which it looks like someone in another comment
             | did).
        
         | tomlockwood wrote:
         | > Why optimize for greens? Better to rule out more first no
         | matter the position.
         | 
         | I haven't yet experimentally tested this, but I hope to soon!
         | It may be that a green letter in position rules out more
         | possible targets than a grey letter.
        
           | anotheryou wrote:
           | I thought of yellows. Taking frequent letters over words with
           | sub-optimal letters but in the right positions.
           | 
           | But manually I'm no where close to 3.6 either :). Also often
           | too lazy.
        
             | tomlockwood wrote:
             | Yeah, I think my average is closer to 4, ugh! But the
             | program has shown me some strategies that seem interesting.
             | It always guesses "slate" first, and if that has no greys
             | it guesses "crony", and that one-two punch seems to do me
             | well fairly often.
        
               | anotheryou wrote:
               | interesting :) I go for "torah", "lines", ("ducky", but I
               | think the k is far from optimal)
               | 
               | apart from "h" and "i" it's the same letters (and I do
               | have c and y in my optional 3rd).
        
         | amluto wrote:
         | If you want to get really fancy, a full minimax solution is
         | probably feasible without massive resources. Pretend it's a two
         | player game with one player guessing and the other player
         | thinking of a potentially different word for each guess subject
         | to the constraint of being consistent with all previous
         | answers.
        
           | ummonk wrote:
           | Somebody had already tweeted about solving it and finding
           | that 3 guesses suffice, using alpha beta search (though he
           | didn't use the term "alpha beta" because he might have
           | reinvented it).
           | 
           | Unfortunately, I can't seem to dig up the tweet.
        
           | ford wrote:
           | There is a wordle parody (absurdle) [0] that does exactly
           | this.
           | 
           | [0] https://qntm.org/files/wordle/
        
             | pxx wrote:
             | No it does not. It uses a heuristic at every layer of the
             | tree. Specifically, instead of recursing down each subtree,
             | it chooses the set with the maximum number of remaining
             | elements, which isn't necessarily the set that is most
             | difficult to split.
        
           | sdwr wrote:
           | If you want to split hairs, minimax is optimal for worst-
           | case, but not for bringing down the average # of guesses.
        
       | CarVac wrote:
       | It looks like the losing cases are those where in hard mode you
       | can easily get 4 of the letters correct but you have more
       | potential words than guesses left.
        
         | tomlockwood wrote:
         | Yeah, it requires more analysis but I think counterintuitively
         | many of the games that go up to 5-6 are ones where the initial
         | guesses get a lot of greens, effectively making the remaining
         | guesses a set of 50/50s.
        
       | lottin wrote:
       | You can solve it with grep and /usr/share/dict/words but where's
       | the fun in that?
        
         | julianwachholz wrote:
         | Or just use `JSON.parse(localStorage.gameState).solution` but
         | that's not the point.
        
         | Phylter wrote:
         | The answers are all in it's JavaScript source code, but what's
         | the fun in that?
        
       | matthewfelgate wrote:
       | Thanks for sharing.
       | 
       | I have a feeling that a perfect algorithm should be able to solve
       | all wordles in 4 guesses. (but I might be wrong)
       | 
       | The problem with a lot of approaches is they seem to locally
       | optimise for each guess rather than optimise over all the guesses
       | (if that makes sense?)
       | 
       | I get the feeling guessing the most frequent letters might not be
       | the best approach, because does it cut down the number of
       | possible words much?
       | 
       | My intuition tells me if you have the right first 3 guesses you
       | should be able to cut the dataset down to 1 possible answer every
       | time.
       | 
       | However I don't know how to program it.
        
         | shagie wrote:
         | A bit ago on HN there was a link to Evil Wordle -
         | https://swag.github.io/evil-wordle/ (
         | https://news.ycombinator.com/item?id=29864418 ) and there's
         | another variation of it at https://qntm.org/files/wordle/
         | 
         | On NPR ( https://www.npr.org/2022/01/23/1075168693/youve-heard-
         | of-wor... )
         | 
         | > QNTM: So with Wordle, in theory, you can win in one guess.
         | I've won in two guesses a couple of times. Absurdle - you
         | cannot beat it in less than four guesses. Four is the limit.
         | But generally just good luck - you're going to need it.
         | 
         | The worst case for Wordle and evil world or Absurdle should be
         | the same... its just that evil and absurd are always the worst
         | case.
        
       ___________________________________________________________________
       (page generated 2022-01-23 23:00 UTC)