[HN Gopher] An AI system for solving crossword puzzles that outp... ___________________________________________________________________ An AI system for solving crossword puzzles that outperforms the best humans Author : DantesKite Score : 44 points Date : 2022-05-20 18:47 UTC (4 hours ago) (HTM) web link (twitter.com) (TXT) w3m dump (twitter.com) | zwieback wrote: | My dad was a big crossword puzzler. I asked him if he thought | that if you pick one of two possible answers to the first clue | whether it would be possible to solve the entire puzzle one way | or another way. He sat down and created a series of puzzles with | "themes", e.g. "north", "south" or "Schiller", "Goethe", where | all the major words were from one or the other theme. | | Anyway, it would be interesting what the AI would do with this, | would there be two hotspots in the solution space, one for each | variant? | mcherm wrote: | Also famously the November 5 1996 NYT puzzle where a clue about | the newly elected president could be solved either CLINTON or | BOBDOLE and all the crossing words had two solutions. | | If they trained the AI on the NYT archive then they would have | the results of testing it on this one. | thom wrote: | Note for Brits that this isn't cryptic (dare I say 'real') | crosswords, but I assume it could be retooled for that. | tialaramex wrote: | American Crosswords are different in two key ways as I | understand it: | | Firstly, all "serious" British crosswords are "Cryptic" ie once | you figure out what the answer is, it's apparent why that's the | correct clue, but figuring out the answer from the clue | involves lateral thinking and some skills learned from years of | staring at such clues. | | e.g. Private Eye's crossword 726 (back in April), clue 23 down, | | "He finally gets to penetrate agreeable person (relatively) | (5)" | | The correct answer is "Niece". "Nice" can mean agreeable, the | final letter of "He" is E, and so by having the letter E | "penetrate" the word nice you produce "niece", a person who is | a relative. | | [ and yes, Private Eye is a satirical magazine, the crossword | clues are, likewise, intended to make you a little | uncomfortable while you laugh ] | | Secondly, British crosswords are arranged with black "dead" | squares between letters to produce more of a lattice, in which | many letters only take part in one word, as a result longer | answers are common | | e.g. same crossword, clue 26 across is | | "Figure on getting your teeth into our statistical revelations | (6,9)" | | The answer was "Number Crunching". | jen729w wrote: | Brit here. I woke up one morning - I was 15, so this was in | the 90s - with the word 'microdot' in my head. The first | thought, clear as anything, as if it was painted across the | inside of my eyes. Microdot! | | Puzzled, I didn't move and set about figuring out why. | Eventually I realised that I had solved, in my sleep, a | crossword clue that I had not even gone to bed thinking | about. I'd read it at my grandma's house earlier the previous | day. | | Tiny picture makes computer work on time (8) | | The brain is amazing. I'm not even any good at the cryptic | crossword! | dane-pgp wrote: | I'm reminded of an article I read about an AI that competed in | a crossword competition and one particularly difficult clue it | faced was "Apollo 11 and 12 [180 degrees]". I don't know if it | would be allowed as part of a cryptic crossword, but the number | of letters in the (words of the) answer were 8, 4. | | The answer to that clue is included here: | | https://www.uh.edu/engines/epi2783.htm | nilstycho wrote: | That would usually be considered an invalid cryptic clue. | jamespwilliams wrote: | For cryptic crosswords I've found | https://www.crosswordgenius.com/ impressive (once you get past | the kind of clunky UI) | mnd999 wrote: | Anyone else getting a bit bored with all these AI does some super | specialised task better than humans after enormous amounts of | training. It's not very interesting anymore. | | Sure, it can do crosswords well but the average human that does | crosswords well can also do a zillion other things and this type | of AI is not getting us any closer to that. | joshcryer wrote: | But every specialized model like this is getting us closer to | "doing a zillion other things." By logic it is exactly one step | closer. The general AI agent will be composed of many such | models. | DantesKite wrote: | If you skim the paper, you'll realize what's most interesting | are the new techniques they developed to accomplish this, | advancing the field of machine learning in the process. | cinntaile wrote: | Now automatically send in the answers to the various weekly | magazine and newspaper competitions to get a passive prize | income. | ericwallace_ucb wrote: | Hi, I am the first author of this paper and I am happy to answer | any questions. You can find a link to the technical paper here | https://arxiv.org/abs/2205.09665. | mikeryan wrote: | Hey this is cool, I do the NYT Crossword every day. A few | questions. | | 1. You mention an 82% solve rate. The NYT puzzle gets "harder" | each day Monday through Saturday. Do you track the days | separately? If so I'd be curious how much of the 18% unsolved | end up on Fridays and Saturday. (for anyone who doesn't know | the Sunday puzzle is outside of the M-Sat range since its a | bigger puzzle). | | 2. Related to the above Thursday puzzles usually have "tricks" | (skipped letters and what not) in them or require a Rebus | (multiple letters in one space) - do you handle these at all? | | 3. Is this building an ongoing model and getting better at | solving? Or did you have to seed it with a set of solved | puzzles and clues? | | Sorry didn't have time to read the whole paper. | nickatomlin wrote: | Hi! I'm another author on this paper. To answer your | questions: | | 1. Monday puzzles are the easiest for our model, and | Thursdays are the most difficult. You can see a graph of day- | by-day performance here: | https://twitter.com/albertxu__/status/1527704535912787968 | | 2. Our current system doesn't have any handling for rebuses | or similar tricks, although Dr. Fill does. I think this is | part of why Thursday is the hardest day for us, even though | Saturday is usually considered the most difficult. | | 3. We trained it with 6.4M clues. As new crosswords get | published, we could theoretically retrain our model with more | data, but we aren't currently planning to do that. | sp332 wrote: | I don't suppose you gave more weight to more recent | puzzles? Is there a time period or puzzle setter that was | harder to solve because they favored an unusual clue type? | avrionov wrote: | Do you think your approach can be applied to other problems? | Imnimo wrote: | For handling cross-reference clues, do you think it would be | feasible in the future to feed the QA model a representation of | the partially-filled puzzle (perhaps only in the refinement | step - hard to do for the first step before you have any | answers!), in order to give it a shot at answering clues that | require looking at other answers? | | It feels like the challenges might be that most clues are not | cross-referential, and even for those that are, most | information in the puzzle is irrelevant - you only care about | one answer among many, so it could be difficult to learn to | find the information you need. | | But maybe this sort of thing would also be helpful for theme | puzzles, where answers might be united by the theme even if | their clues are not directly cross-referential, and could give | enough signal to teach the model to look at the puzzle context? | gardenfelder wrote: | https://github.com/albertkx/berkeley-crossword-solver ___________________________________________________________________ (page generated 2022-05-20 23:00 UTC)