[HN Gopher] The Greatest Regex Trick Ever (2014)
       ___________________________________________________________________
        
       The Greatest Regex Trick Ever (2014)
        
       Author : signa11
       Score  : 249 points
       Date   : 2021-07-08 16:49 UTC (6 hours ago)
        
 (HTM) web link (rexegg.com)
 (TXT) w3m dump (rexegg.com)
        
       | capitalbreeze wrote:
       | This is awesome!! Well done "Tarzan"
        
       | ppierald wrote:
       | I used to just go ask Friedl.
        
       | Tipewryter wrote:
       | The solution...                   not_this|(but_this)
       | 
       | ... is interesting. But since it returns the match in a submatch
       | I would say the \K approach is better:
       | (?:not_this.*?)*\Kbut_this
       | 
       | Because usually when you try hard to accomplish something with a
       | regex, you do not have the luxury to say "And then please
       | disregard the match and look at the submatch instead".
        
         | lifthrasiir wrote:
         | That doesn't work. `(?:"Tarzan".*?)*\KTarzan` should behave
         | identically without `\K`, and it will match `"Tarzan" "Tarzan"`
         | because the ungreedy quantifier ? still allows backtracking (it
         | just changes the search order). You want the possessive
         | quantifier + instead; `not_this|(but_this)` is equivalent
         | because regexp engines will not look back into once matched
         | string.
        
       | high_byte wrote:
       | it's nice. I'm way more dumbfounded by the prime thing though
        
         | rprenger wrote:
         | Me too. I had to look it up. This page has pretty good
         | breakdown:
         | 
         | https://itnext.io/a-wild-way-to-check-if-a-number-is-prime-u...
         | 
         | The main trick for me was you first have to convert the number
         | to unary, which was done outside of the regex.
        
       | asah wrote:
       | speaking as an old regexp wizard from before perl5, this is
       | indeed a great trick, have an upvote.
       | 
       | sadly, this trick still requires a code comment to explain.
       | Python example:                 # match tarzan but not "tarzan"
       | # see https://news.ycombinator.com/item?id=27774584       if
       | "tarzan" == re.search(r'"tarzan"|(tarzan)', myvar)[1]:
       | ...
       | 
       | which in practice means it probably deserves a function:
       | if re_search_but_exclude(r'tarzan', myvar, '"tarzan"'):
       | ...
       | 
       | I don't recommend monkeypatching re, i.e. re.search_but_exclude =
       | ...
        
         | dmurray wrote:
         | Is there a reason you have an r-string for the first arg but
         | not for the third one?
        
         | nytgop77 wrote:
         | A bit off topic, but the commented version was much clearer,
         | than the version with separate function. (full sentences are
         | very good at explaining things)
        
       | 123pie123 wrote:
       | of all the things ever invented in software, regex still amazes
       | me.
       | 
       | It's almost like nature, many simple rules coming together to
       | make extremely clever and fairly complex ideas
        
         | z3t4 wrote:
         | It took me over 15 years until I started to willingly use
         | RegExp, but now I can't live without it. It's like the curse of
         | knowledge, once you learn something you'll loose all empathy
         | and assume everyone else knows it too. It still surprises me
         | though, I've had bug like my regex matching terminal color
         | sequences messing up the data if it was colored.
        
         | usrusr wrote:
         | It feels like something that was more discovered than invented,
         | something that would exist even if nobody knew of its
         | existence. I get the same feeling when listening to Pharrell
         | Williams' Happy.
        
       | imglorp wrote:
       | Is anyone having trouble reading the page? It renders as dark
       | gray on slightly darker green and is illegible.
        
       | dorianmariefr wrote:
       | Please don't
        
       | beders wrote:
       | Please don't use regular expressions to parse Dyck languages. It
       | doesn't work.
        
         | lifthrasiir wrote:
         | Regexp for _tokenization_ does work. This entire essay boils
         | down to the fact that you can always postprocess matches and in
         | this case that corresponds to tossing unwanted tokens out.
        
       | miloignis wrote:
       | I'm not sure if any regex library exposes this, but since regular
       | languages are closed over compliment and intersection you could
       | theoretically do something like match("....string..",
       | regex("Tarzan") - regex("\"Tarzan\"")), where the - operation is
       | shorthand for intersection with the compliment. Does anyone know
       | if any regex libraries expose these sorts of operations on the
       | regular expression/underlying DFA?
        
         | amenghra wrote:
         | Greenery (python3) let's you manipulate regular expressions and
         | do things like compute intersections:
         | https://github.com/qntm/greenery
        
           | miloignis wrote:
           | This is exactly the type of thing I was thinking of, and
           | seems quite fully featured - thank you!
        
         | codeflo wrote:
         | Unfortunately (or perhaps fortunately), "regexes" as commonly
         | implemented in programming languages are only loosely related
         | to regular expressions from automata theory. With all their
         | extensions, they can recognize much, much more than just
         | regular languages, and I don't think they're closed under
         | complement (though I'm not sure). However, most regex engines
         | have a feature called negative lookahead assertions, (?!do not
         | match), which would almost work in the way you suggest.
         | 
         | You have to be careful about inputs like this though: "Inside a
         | string"Tarzan"Again inside a string"
        
           | User23 wrote:
           | Yeah, a DFA that recognizes a regular language can easily be
           | implemented with O(n) worst case behavior.
           | 
           | My attitude is generally that one should use regexes for
           | matching regular languages and if one needs a stack or even
           | Turing completeness then handle that in code around the
           | regex.
        
         | contravariant wrote:
         | Wouldn't that end up just being the same as 'regex(Tarzan)'?
         | Those regexes can't match the same thing, they can only
         | overlap.
         | 
         | What you want is something like all matches of regex("Tarzan")
         | not contained in a match for regex("\"Tarzan\""), which is a
         | bit trickier. That would require something like:
         | 
         | regex("Tarzan") - all-substrings(regex("\"Tarzan\""))
         | 
         | and I'm not sure regular languages are closed over the "all-
         | substrings" operation. Actually I'm pretty sure they aren't.
        
         | layer8 wrote:
         | > compliment
         | 
         | I'll take that as a complement.
        
         | sixo wrote:
         | Not exactly that but take a look at
         | https://github.com/mtrencseni/rxe ("literate regex"). I found
         | this on HN and recall the comment thread being good but I can't
         | find it now.
        
           | sodality2 wrote:
           | This perhaps? second result on hn.algolia.com.
           | https://news.ycombinator.com/item?id=20646174
        
       | lifthrasiir wrote:
       | My biggest grief with regexp is that it is just a compact code
       | disguised as something else. It is relatively common that you
       | want to scan a string but action codes intermixed. There is a way
       | to do that with regexp (Perl (?{...}) etc. or PCRE callouts), but
       | it is always awkward to put a code to a regexp. As a result we
       | typically end up with either a complex code that really should
       | have used a regexp but couldn't, or a contorted regexp barring
       | the understanding. The essay suggests `(*SKIP)(*FAIL)` at the
       | end, which is another evidence that a code and a regexp don't mix
       | well so a regexp-only solution is somehow considered worthy.
        
       | [deleted]
        
       | 1970-01-01 wrote:
       | For me, the site rendered dark gray text on a dark gray
       | background and is a chore to read as-is. Outline.com fixed my
       | issue with it: https://outline.com/YSYgsp
        
         | nabilhat wrote:
         | I got curious and looked back in archive.org to this page's
         | initial release in 2014. The text background started out as
         | good old reliable background-color: #EEEEEE, which was later
         | replaced with background: url("http://a.yu8.us/bg-tile-
         | parch.gif")
         | 
         | ...because what could possibly go wrong? From the latest
         | comment at the end of the page, the author would like you to
         | know that the outcome is your problem, because you're using the
         | wrong browser:
         | 
         |  _June 20, 2021 - 15:02_
         | 
         |  _Subject: RE: Undoing whatever is hiding this page._
         | 
         |  _Hi Allen, try a different browser. There 's no strange
         | shading on the page, your browser is deciding to display it in
         | a weird way. Regards, -Rex_
        
           | mmsc wrote:
           | Most likely using the HTTPS Everywhere addon. That website is
           | not available via HTTP, and the user must visit the page
           | first to accept the 'risk' of using the http version.
        
             | nabilhat wrote:
             | Firefox also defaults to HTTPS by default nowadays. Lots of
             | content blockers block third party content too. Regardless,
             | if _literally anything_ goes wrong with the third party
             | dependency that the article 's contrast depends on, the
             | best case scenario here is that the text falls back on the
             | body's background.
             | 
             | Interestingly, the author also appears to control yu8.us
             | 
             | Breaking one's own content by https-ing one site but not
             | another is a great example of why to not prop up a
             | website's basic legibility on a third party dependency,
             | even if it's one you own and control.
        
           | rentnorove wrote:
           | It's definitely nothing to do with the following string in
           | the response:
           | 
           | > Page copy protected against web site content infringement
           | by Copyscape
        
           | extra88 wrote:
           | Yes, they web author made the mistake of defining the
           | <article> background-color: #EEEEEE within a min-width 960px
           | media query. If the background image fails to load in wider
           | window, there's still a readable contrast between text and
           | background but on a phone or other narrow screen, the dark
           | background color set on the <body> is what's behind the
           | article text.
        
         | [deleted]
        
         | dang wrote:
         | " _Please don 't complain about website formatting, back-button
         | breakage, and similar annoyances. They're too common to be
         | interesting. Exception: when the author is present. Then
         | friendly feedback might be helpful._"
         | 
         | (It's not that the annoyances aren't annoying, it's that
         | they're so common that they lead to repetitive offtopicness
         | that compounds into more boring threads.)
         | 
         | https://news.ycombinator.com/newsguidelines.html
        
         | [deleted]
        
         | metalliqaz wrote:
         | firefox shows it as black(ish) text on a light yellow
         | background. I think you must be blocking something
        
       | jrm4 wrote:
       | Part of me reads these things and I'm like "neat trick", but most
       | of the time they more-or-less prove to me that Regex is doomed to
       | a steady and slow decline.
       | 
       | It's just not a particularly good "interface" for the task it is
       | intended to achieve, a little more ability to be "verbose" at the
       | possible price of succinctness I think would go a long way. I'm
       | more-or-less waiting for the "blank" in: "blank" is to Python
       | what Regex is to Perl.
        
         | gota wrote:
         | I dream that we will have something like Copilot but
         | exclusively for regex and working marvelously
         | 
         | "Find every 2nd instance of a dollar amount that is not encased
         | in quotes" outputting <insert regex here> would be awesome
        
       | smnrchrds wrote:
       | > The Greatest Regex Trick Ever
       | 
       | was to convince programmers it didn't exist?
        
         | [deleted]
        
       | throwanem wrote:
       | The greatest regex trick ever is knowing when _not_ to use one.
        
         | IncRnd wrote:
         | I've seen several regexs in various code reviews that are used
         | to validate user input but do so in an exponential manner that
         | can be exploited for simple DOS attacks.
        
           | xtracto wrote:
           | Ooooh or worse, I once caught someone's "email matching"
           | RegEx code during a code review that was opening the door for
           | some nasty SQL Injection or XSS attacks (kind of like
           | validating if the text field _contained_ a valid email.. but
           | not if it was ONLY a valid email).
           | 
           | The problem with RegEx is its "obscurity". However Maybe
           | someone could write a nice testing tool that would throw
           | millions of known exploits into each regex it finds in your
           | code to see if it is vulnerable.
        
           | CyberDildonics wrote:
           | Like what? I've never thought about what regex features are
           | exponential.
        
             | llbeansandrice wrote:
             | From the same site: https://www.rexegg.com/regex-explosive-
             | quantifiers.html
        
             | throwanem wrote:
             | It's more a question of which ones _can 't_ be. There are
             | some really nasty and not very obvious gotchas here;
             | https://regular-expressions.mobi/catastrophic.html has a
             | good dive into how, for example, backtracking combines with
             | incautious regex design to produce exponential behavior in
             | the length of input.
             | 
             | I don't have a hard and fast rule of my own about regex
             | complexity, but I do have a strong intuition over what's
             | now ca. 25 years of working with regexes dating back to
             | initial exposure in Perl 5 as a high schooler. That
             | intuition boils down more or less to the idea that, when a
             | regex grows too complex to comprehend at a glance, it's
             | time to start thinking hard about replacing it with a
             | proper parser, especially if it's operating over (as yet)
             | imperfectly sanitized user input.
             | 
             | Sure, it's maybe a little more work up front, at least
             | until you get good at writing small fast parsers - which
             | doesn't take long, in my experience at least; formal
             | training might make it easier still, but I've rarely felt
             | the lack. In exchange for that small investment, you gain
             | reliability and maintainability benefits throughout the
             | lifetime of the code. Much of that comes from the simple
             | source of no longer having to re-comprehend the hairball of
             | punctuation that is any complex regex, before being able to
             | modify it at all - something at which I was actually really
             | good, as recently as a decade or so ago. The expertise has
             | since expired through disuse, and that's given me no cause
             | for regret; the thing about being a regex expert is that
             | it's a really good skill for writing unreadable and subtly
             | dangerous code, and not a skill good for much of anything
             | else. Unreadable and subtly dangerous code was fine when I
             | was a kid doing my own solo projects for fun, where the
             | worst that'd happen is I might have to hit ^C. As an
             | engineer on a team of engineers building software for
             | production, it's not even something I would _want_ to be
             | good at doing.
        
               | User23 wrote:
               | > That intuition boils down more or less to the idea
               | that, when a regex grows too complex to comprehend at a
               | glance, it's time to start thinking hard about replacing
               | it with a proper parser
               | 
               | You can get some surprisingly complex yet readable
               | regexes in Perl by using qr//x[1] and decomposing the
               | pieces into smaller qr//s that are then interpolated into
               | the final pattern, along with proper inline comments in
               | the regexes themselves.
               | 
               | [1] https://perldoc.perl.org/perlre#/x-and-/xx
        
               | throwanem wrote:
               | You still have to reason about the whole thing, though.
               | This doesn't make that any easier, but I bet it makes it
               | _feel_ easier.
        
         | digitalsushi wrote:
         | The greatest regex /skill/ is knowing that a regex cannot
         | describe everything.
        
       | locallost wrote:
       | Very verbose writing for a very succinct regex.
        
       | kogus wrote:
       | This is a great trick. It says something about RegEx syntax that
       | matching a simple rule with a relatively clear expression is a
       | major accomplishment.
        
         | nytgop77 wrote:
         | Yup. Regex is not a silver bullet for "match stuff", and it is
         | wrong(ish) tool for following jobs:
         | 
         | - context sensitive matching
         | 
         | - matching with multi-char-exclusions
         | 
         | (regex is happy the most, when it's used to match "regular
         | language" things)
        
       | xrayarx wrote:
       | Long Page with practical regex advice for programmers, most
       | likely not useful for command line warriors
       | 
       | Lookbehind
       | 
       | Lookahead
       | 
       | Advanced handling of tags
       | 
       | Replace before matching
       | 
       | the best regex trick ever:
       | 
       | "Tarzan"|(Tarzan)
       | 
       | The whole site contains useful regex advice
        
       | jandrese wrote:
       | The more general tip is that a single regex isn't the only tool
       | you have. You don't have to get your final product one one step.
       | Almost every "disaster" regex comes from someone trying to do too
       | much at once.
       | 
       | One other solution would have been to run the regex twice, once
       | to pick up all instances of Tarzan, and a second on the results
       | of the first to filter out all instances of "Tarzan".
        
         | usrusr wrote:
         | A big source of trying to do too much is environments that
         | offer easy regex-based transformations defined as a pair of
         | regex and a single replacement string (that may contain
         | references to matching groups) and make other transformations
         | hard ("while find + rest"). When you have the option to provide
         | a "process match" closure instead of the replacement string the
         | lure of putting too much into a single regex almost collapses.
        
       | dang wrote:
       | One past thread:
       | 
       |  _The Greatest Regex Trick Ever (2014)_ -
       | https://news.ycombinator.com/item?id=10282121 - Sept 2015 (131
       | comments)
        
       | phl wrote:
       | As the examples in the article use xml, I just wanted to point
       | out that applying regex to xml has a lot of limitations and
       | should be avoided. See:
       | https://stackoverflow.com/questions/1732348/regex-match-open...
        
         | rascul wrote:
         | I was thinking about that great answer when I was reading the
         | article. Thanks for sharing it.
        
       | ComputerGuru wrote:
       | Very long build up to what is definitely a neat trick, although
       | without SKIP FAIL, it might cause explosive growth in the memory
       | usage as it allocated space for the results you don't need
       | (unless you use a streaming regex option).
       | 
       | Speaking of lengthy: this site breaks the iOS Safari scroll bar!
       | It just disappears altogether (even when scrolling up or down to
       | make it show, like you have to these days to please the UX
       | designers in Palo Alto).
        
         | toxik wrote:
         | The scroll bar works but for some reason it gets rendered very
         | bright. Scroll all the way up to the black background in the
         | header and you'll see it.
        
       | tus89 wrote:
       | Clicking on a http:// link these days feels like I have been
       | tricked into clicking on a phishing link in an email.
       | 
       | Good trick though.
        
         | ComputerGuru wrote:
         | This is why any attempts to make plain http sites throw up
         | scare warnings is a horrible idea. The internet is littered
         | with old websites that contain a wealth of knowledge and
         | deserve to remain accessible.
         | 
         | Just make browsers for into "read only" mode where input cannot
         | be accepted on non-secure pages. But don't wall them out!
        
       | crazygringo wrote:
       | > _" Tarzan"|(Tarzan)_
       | 
       | OK that's pretty clever (I certainly never thought of putting a
       | capturing group _inside_ only _one_ side of an  "or")...
       | 
       | ...but it doesn't seem particularly useful? It probably won't
       | work in most cases where this is just part of a larger
       | expression. You're usually using capturing groups in a particular
       | way for a good reason, and this would mess that up.
       | 
       | In contrast, the lookbehind+lookahead way is the "proper" and
       | intuitive way to write it, and works as part of any larger
       | expression.
       | 
       | So... +100 points for cleverness, but don't actually _use_ this
       | please. :)
        
         | RheingoldRiver wrote:
         | > In contrast, the lookbehind+lookahead way is the "proper" and
         | intuitive way to write it, and works as part of any larger
         | expression.
         | 
         | I would say, the "proper" way is to have a separate line of
         | code validating what's not there :)
        
           | crazygringo wrote:
           | I'm not following?
        
             | diarrhea wrote:
             | Not GP, but I'd go a very simple and verbose way, maybe
             | that's what they meant to. Match:
             | (.)Tarzan(.)
             | 
             | Then in an additional line of code assert
             | (Group 1 == Group 2) [?] "
             | 
             | This shifts the logic out of regex and into the surrounding
             | programming language context. That's arguably better, but
             | the resulting regex is extremely dull and unclever.
        
               | pimlottc wrote:
               | Don't forget to look out for matches at the boundaries of
               | the original string. I think it should be something like:
               | (^|.)Tarzan(.|$)
               | 
               | Though I'm not 100% sure offhand what the result in the
               | capturing groups would be.
        
               | RheingoldRiver wrote:
               | Yeah, that's more or less what I meant. Write a regex
               | (plus line of code) to make sure `Tarzan` appears. Then
               | write another regex and line of code to make sure
               | `"Tarzan"` doesn't appear.
               | 
               | Maybe at this point you aren't using regex even. Nice,
               | you solved two problems.
               | 
               | (I do appreciate regex and even use them a lot. But, I
               | use them enough to avoid them as much as possible.)
        
               | crazygringo wrote:
               | I mean, I guess if nobody on your team understands
               | regexes.
               | 
               | But generally, once you decide to use a regex in the
               | first place, you might as well put as much regular
               | everyday logic as you can in it. Otherwise you might as
               | well look for "Tarzan" with a dumb string search.
               | 
               | Lookbehinds and lookaheads aren't rocket science. And you
               | can always leave a comment about what they're doing if
               | you're worried other team members won't grok the syntax.
        
       | kristopolous wrote:
       | The ? syntax group has to be the most unmemorable of the bunch.
       | I've used it maybe over 1,000 times or so and I still have to
       | look up ?: Or ?! ?< or whatever else.
       | 
       | I used to have a laminated sheet on my wall at an office because
       | it was so terribly bad.
        
       | digitalsushi wrote:
       | Let me take these PhD level regex down to elementary school
       | awesome.
       | 
       | I have a process table and I want to grep it for the phrase
       | "banana":
       | 
       | ps auxww | grep banana
       | 
       | root 87 Jun21 0:26.78 /System/Library/CoreServices/FruitProcessor
       | --core=banana
       | 
       | mikec 456 450PM 0:00.00 grep banana
       | 
       | Argh! It also greps for the grep for banana! Annoying!
       | 
       | Well, I'm sure there's pgrep or some clever thing, but my
       | coworker showed me this and it took me a few minutes to realize
       | how it works:
       | 
       | ps auxww | grep [b]anana
       | 
       | root 87 Jun21 0:26.78 /System/Library/CoreServices/FruitProcessor
       | --core=banana
       | 
       | Doc Brown spoke to me: "You're just not thinking fourth
       | dimensionally!" Like Marty, I have a real problem with that. But
       | don't you see: [b]anana matches banana but it doesn't match 'grep
       | [b]anana' as a raw string. And so I get only the process I
       | wanted!
        
         | sandreas wrote:
         | This is really clever... I usually ended up with adding
         | | grep -v grep
         | 
         | like in                 ps auxww | grep banana | grep -v grep
        
         | sigg3 wrote:
         | _applause_
         | 
         | Never thought of that. Nice.
        
         | jackhalford wrote:
         | but what's wrong with pgrep -f though? I don't want to search
         | for clever trick every time I need to grep a process
        
         | stonewareslord wrote:
         | This almost always works, but it won't if the shell expands
         | your bracketed letter. See for example:                   $
         | echo [b]anana         [b]anana         $ touch banana         $
         | echo [b]anana         banana
         | 
         | You can escape the bracket and it will work:
         | $ echo \[b]anana         [b]anana
        
         | nick__m wrote:
         | I use prep -laf the-wanted-string https://man7.org/linux/man-
         | pages/man1/pgrep.1.html
         | 
         | But nice regex though
         | 
         | Edit : someone already posted that solution
         | https://news.ycombinator.com/item?id=27777901
        
       | Sniffnoy wrote:
       | I dunno, the "logic" solution seems like the obvious one to me;
       | if your boss really has that much trouble with propositional
       | logic that they don't immediately see why it works, well, that's
       | what code comments are for.
       | 
       | (...the trick is still cool, though; I can imagine other
       | situations where it would be more useful. However it does seem
       | like it potentially depends on the particular regex engine being
       | used, in contrast to the author's claim about it being totally
       | portable; yes, it'll compile on anything, but will it _work_?)
        
         | knodi123 wrote:
         | PCRE is a pretty well-defined standard, isn't it? And it's the
         | one used by most of the languages I've worked with, including
         | in MariaDB.
        
           | ComputerGuru wrote:
           | It doesn't even rely on PCRE, just core regex.
        
         | recursive wrote:
         | How could it not work. I've regularly relied on order or
         | matching, and never found an environment that didn't test left-
         | to-right for the `|` operator in regex.
        
           | bear8642 wrote:
           | > operator in regex.
           | 
           | regex is not regular expressions - if using NFA to match then
           | you're matching all alternates simultaneously.
           | 
           | Russ Cox has good pictures explaining idea in 'Regular
           | Expression Search Algorithms' section of
           | <https://swtch.com/~rsc/regexp/regexp1.html>
        
             | recursive wrote:
             | I'm talking about regex. Regex libraries in practical use
             | do not use NFA. I'm talking about actual code that's
             | written using normal languages. I'm familiar with the
             | difference between "regular expressions" as in "regular
             | languages".
        
               | burntsushi wrote:
               | Go's regexp package, Rust's regex crate and RE2 are
               | examples of regex engines that are very much in practical
               | use that use NFAs (among other things).
        
               | ivegotnoaccount wrote:
               | Lex/Flex, wich I think we can agree is used by "actual
               | code that's written using normal languages" use DFAs,
               | both inside rules and between rules, and they do not try
               | '|' cases left to right (They probably could have if they
               | wanted since there is a REJECT action that already force
               | them to store the list of all the rules/texts that were
               | matched):
               | 
               | a|ab {cout << "matched ab" << std::endl; } b { cout <<
               | "matched b" << std::endl; }
               | 
               | if provided with "ab", will match the first rule with
               | "ab", and not the first with "a" then the second with
               | "b".
        
               | [deleted]
        
       | praptak wrote:
       | This trick may be thought of as a simplification of the
       | systematic approach to parsing stuff, that is the lexer-parser
       | division of responsibilities.
       | 
       | The lexer uses regexes but only for splitting the input stream of
       | characters into tokens. Identifiers, integers, operators,
       | strings, keywords, opening brackets and whatnot - each type of
       | token is defined by a regex. This part is hopefully deterministic
       | and simple, although the lexer matches regexes for all kinds of
       | tokens at once, which is why lexer generators are often used to
       | generate lexers.
       | 
       | The heavy lifting is done by the actual parser which tries to
       | combine the tokens into something that makes sense from the point
       | of the grammar.
       | 
       | So in this trick the sub-regexes between |'s define the tokens
       | (the lexer part) while the group mechanism selects the single
       | token that we want to keep (a very very simple parser).
        
       | xtracto wrote:
       | This site reminded me the times when I interviewed candidates.
       | One of the interview problems was to write a function that would
       | validate if a given string was a valid IPv4 address (a la
       | 10.10.10.1).
       | 
       | Some of the candidates started by saying: "I know! I'll use a
       | Regular Expression", to what I replied: "Great!, now you have TWO
       | problems!"
        
       ___________________________________________________________________
       (page generated 2021-07-08 23:00 UTC)