[HN Gopher] Ken Thompson's NFA regex patent (1971) ___________________________________________________________________ Ken Thompson's NFA regex patent (1971) Author : jonstewart Score : 62 points Date : 2022-11-11 20:44 UTC (2 hours ago) (HTM) web link (patents.google.com) (TXT) w3m dump (patents.google.com) | joedrago wrote: | I read an article a million years ago which I thought described | this mechanism in a super friendly way: | | https://perl.plover.com/Regex/article.html | | It uses the idea of "putting a penny down" on the graph and | moving it around, and I found the conversational tone of the | description to be very easy to grok. Assuming this is the exact | same mechanism (which the author cites), it might resonate for | some of you too. | jonstewart wrote: | ken also wrote a paper about his algorithm, published in CACM in | 1968: https://dl.acm.org/doi/pdf/10.1145/363347.363387 (PDF) | HellsMaddy wrote: | https://en.wikipedia.org/wiki/Thompson's_construction | Cupertino95014 wrote: | Does anyone know if this patent was ever asserted by AT&T? (Of | course, it's long expired.) | ape4 wrote: | Do current regex implementations use this algorithm? | bla3 wrote: | Most don't, but re2 and a few others do. The ones that do use | it don't have exponential runtime on malicious inputs and lack | a few features (back references, mostly). | https://swtch.com/~rsc/regexp/ is a great resource on this. | iron2disulfide wrote: | > Most don't | | Is this true? I'm pretty sure that most of the regex engines | I've used (grep, ripgrep, re2, hyperscan) use Thompson's | construction or at least some NFA-based algorithm (not | necessarily Thomopson's; Glushkov automata in particular are | used by hyperscan). | burntsushi wrote: | Yes, all of those are finite automata based. Although grep | usually uses the POSIX system regex library. GNU grep will | use its own finite automata based approach for a subset (a | large subset) of regexes. | | But most regex engines are backtracking based. Perl, PCRE | (used by PHP and probably others), Oniguruma/Onigmo (used | by Ruby), Javascript, Java, Python. That covers a lot of | ground. | | Plus, popular reference web sites like https://www.regular- | expressions.info/ are heavily biased toward regex engines | that have features typically only implemented by | backtracking engines. | masklinn wrote: | There's also postgres's which descends from spencer's | (Tcl Advanced Regular Expressions) and supports | backreferences but IME is very hard to catch into | catastrophic backtracking. | | Possibly because you need to use the feature to trigger | the NFA mode and the baseline test case for catastrophic | backtracking doesn't, it just assumes the engine will | backtrack. | klodolph wrote: | As a side note, when converting the NFA to a DFA, you get an | exponential number of states, since each DFA state | corresponds to a set of NFA states. | burntsushi wrote: | True, and that's why libraries like RE2 don't build full | DFAs. They use lazy DFAs. DFA states are compiled as | needed. At most one state is built per byte of input, so | this sidesteps the exponential time bound. | | In all implementations I know of, memory use is bounded. | When memory fills up, the cache of DFA states is cleared. | If this happens too many times, RE2 quits the lazy DFA and | falls back to a Pike VM. (Effectively the Thompson NFA | matcher, but with capture group support.) ___________________________________________________________________ (page generated 2022-11-11 23:00 UTC)