[HN Gopher] Show HN: Regex Cheatsheet
       ___________________________________________________________________
        
       Show HN: Regex Cheatsheet
        
       Author : geongeorgek
       Score  : 361 points
       Date   : 2020-01-31 10:32 UTC (12 hours ago)
        
 (HTM) web link (ihateregex.io)
 (TXT) w3m dump (ihateregex.io)
        
       | Amarok wrote:
       | ^[a-z0-9_-]{3,15}$
       | 
       | The username reference doesn't match 16 characters as claimed
        
         | geongeorgek wrote:
         | I should match. the number 15 there means that repeat x up to
         | 15 times. so 1+15=16.
         | 
         | looks good to me
        
           | aratauto wrote:
           | That is not correct. 15 is total maximum number of repeats
           | including the first one. Even the diagram on
           | https://ihateregex.io/expr/username correctly says that loop
           | can be taken between 2 and 14 times.
        
           | asicsp wrote:
           | where does the extra 1 come from? a{2,5} means match 'a' two
           | to five times
        
       | kitd wrote:
       | This is really cool!
       | 
       | 2 points:
       | 
       | 1. it fiddled with my back button which is a bit annoying
       | 
       | 2. a better email sample is
       | ^[^@]+@[^@]+\.[^@]+$
       | 
       | which removes the 2 ampersands problem.
        
         | bmn__ wrote:
         | That's not how the spec works. Compliant solution:
         | https://stackoverflow.com/a/1917982
        
         | geongeorgek wrote:
         | Thank you!
         | 
         | I think I know what's wrong with your back button. I will fix
         | it.
         | 
         | And for the regex. will try it out and see if I can add it.
        
         | laumars wrote:
         | Even that is wrong because you can have privately owned TLDs (I
         | forget what they're technically called) like .google
         | 
         | So sundar.pichai@google is technically a valid address (whether
         | .google has any MX records is another matter)
         | 
         | Regex shouldn't really be used for email addresses anyway
         | because the only reliable way to authenticate an email address
         | is to literally send an email to that address.
        
           | donalhunt wrote:
           | .google does not have any MX records
        
           | bduerst wrote:
           | AFAIK none of the TLDs allow for MX records on just the TLD
           | 
           | i.e. johndoe@com will never exist
        
             | bradbeattie wrote:
             | What about things like root@localhost?
        
         | skrebbel wrote:
         | You'll probably want to add \S to those character classes as
         | well, or it matches "it's an @ sign. not an ampersand."
        
           | anamexis wrote:
           | Escaped or quoted whitespace is allowed in the local part of
           | email addresses.
        
       | geongeorgek wrote:
       | I used to spend hours trying to craft the perfect expression for
       | my scraping projects not realizing that I don't really know
       | regex.
       | 
       | This tool is a cheat sheet that also explains the commonly used
       | expressions so that you understand it.
       | 
       | - There is a visual representation of the regular expression
       | (thanks to regexpr)
       | 
       | - The application shows matching strings which you can play
       | around
       | 
       | - Expressions can be edited and these are instantly validated
        
       | darau1 wrote:
       | Nobody pointed it out, but there's also https://regexr.com/
       | 
       | It's how I learned regex years ago, and I still use it today to
       | test/build more complex patterns.
        
         | imafish wrote:
         | I love regexr. Has been a constant tab in my browser for years
         | now.
        
         | [deleted]
        
         | smartmic wrote:
         | Here is my goto resource for checking regexpr with railroad
         | diagrams: https://regexper.com/
        
         | jve wrote:
         | Well, there is a whole list of useful regex links posted 5
         | months ago when someone posted url to RegExr.
         | 
         | Enjoy: https://news.ycombinator.com/item?id=20614847
        
         | strig wrote:
         | My go-to is https://regex101.com/
        
           | darau1 wrote:
           | Didn't know about this. Thanks!
        
             | chirss wrote:
             | We use it on slack and irc for debugging people's regular
             | expressions all the time. Being able to have 30 revisions
             | to a base regex to troubleshoot is fantastic.
             | 
             | Plus the quiz is awesome.
        
           | 52-6F-62 wrote:
           | I use the same as a default. It's been a great help.
        
           | bepvte wrote:
           | I love regex101. It uses webassembly for some of its engines
        
           | huseyinkeles wrote:
           | I've been using regex101 for many years and love it! The
           | debugger [0] that it has is amazing!
           | 
           | [0] - https://regex101.com/debugger
        
       | esaym wrote:
       | Either I'm a regex wizard and don't know it, or perhaps I think I
       | know something but know nothing at all but I've never complained
       | about using regex expressions. I use them all the time without
       | thought. Never quite figured out the need for a cheatsheet
       | either, your language of choice should have a good documentation
       | page for any specific supported syntax.
        
       | crispyambulance wrote:
       | I use regex a lot but deliberately keep it simple.
       | 
       | One thing that confounded me often was positive and negative
       | look-arounds. I always got the expressions mixed up, until I just
       | put the expressions into a table like this...
       | look-behind  |  look-ahead
       | ------------------------------------         positive    (?<=a)b
       | |    a(?=b)         ------------------------------------
       | negative    (?<!a)b    |    a(?!b)
       | 
       | It's not hard, but for whatever reason my brain had trouble
       | remembering the usage because every time I looked it up, each of
       | those expressions was nested in a paragraph of explanation, and I
       | could not see the simple intuitive pattern.
       | 
       | Putting it into a simple visualization helps a lot.
       | 
       | Now, if I can find a similar mnemonic for backreferences !?
        
         | wahern wrote:
         | Maybe it's easier to remember that lookbehinds are evil from an
         | implementation standpoint, and even in Perl have arbitrary
         | limitations. If you see lookbehinds, look away! If you see
         | lookaheads, go ahead.
        
           | lonelappde wrote:
           | Lookbehinds stay behind.
        
           | glangdale wrote:
           | Oddly, lookbehinds are evil only in a specific backtracking
           | world. We never got around to implementing arbitrary
           | lookarounds in Hyperscan (https://github.com/intel/hyperscan)
           | but if we had done something in the automata world to handle
           | lookaround, lookbehinds are _way_ easier than lookaheads.
           | 
           | To handle a lookbehind, you really only need to occasionally
           | 'AND' together some states (not an operation you would
           | normally do in a standard NFA whether Glushkov or Thompson).
           | To handle lookaheads... well, it gets ugly.
        
           | ygra wrote:
           | It's something I really like about .NET's regular
           | expressions. Lookbehind has no limitations and will just
           | match backwards with all features you can use in other parts.
           | 
           | So depending on the language or flavor you're working in,
           | running away isn't really necessary.
        
         | geongeorgek wrote:
         | This is really intended for beginners. but I can confirm more
         | content is coming soon <3
        
       | asicsp wrote:
       | neat site! clicking an example opens up a playground with live
       | update and explanation and railroad diagrams, similar to sites
       | like regex101[1] and regulex[2]
       | 
       | one suggestion would be to mention clearly which tool/language is
       | being used, regex has no unified standard.. based on "Cheatsheet
       | adapted" message at the bottom, I think it is for JavaScript. I
       | wrote a book on js regexp last year, and I have post for
       | cheatsheet too [3]
       | 
       | [1] https://regex101.com/
       | 
       | [2] https://jex.im/regulex
       | 
       | [3]
       | https://learnbyexample.github.io/cheatsheet/javascript/javas...
        
         | geongeorgek wrote:
         | Totally agreed! Right now I only support javascript. But for
         | everything shown there, it's pretty much the same for most
         | flavors
        
       | ape4 wrote:
       | The IPv6 regex is surprisingly complicated.
        
         | geongeorgek wrote:
         | Yeah. this is when you start to have 2 problems
        
       | robert_tweed wrote:
       | OK, these kinds of regex tools get posted quite often. I get it,
       | regex is very confusing at first. And some of these use-cases
       | result in rather complex expressions nobody should be forced to
       | write from scratch (you are still remembering to write unit tests
       | for them though, right?)
       | 
       | But as someone who actually knows [some flavours of] regex fairly
       | well, what I would _really_ like, is a reference that covers all
       | the subtle differences between the various regex engines, along
       | with community-managed documentation (perhaps wiki pages) of
       | which applications  & API versions use which flavour of regex.
       | 
       | For example, the other day I wanted to run a find on my NAS. I
       | needed to use a regex, but the Busybox version of find doesn't
       | support the iregex option, so all expressions are case-sensitive.
       | With some googling, I was able to find out that the default regex
       | type is Emacs, but I wasn't able to find either a good reference
       | for exactly what Emacs regex does and doesn't support, nor any
       | information about how to set the "i" flag. In the end I had to
       | manually convert every character into a class (like [aA] for "a")
       | which was tedious, but quicker than trying to find a better
       | solution or resorting to grep.
       | 
       | A related, annoyingly common pattern is that the documentation
       | for `find` states that `--regex` specifies a regex, but it does
       | not state _which_ flavour of regex. The documentation for certain
       | versions of `find`, which support alternative engines, note that
       | the default is Emacs. From this I was able to infer (perhaps
       | wrongly) that the Busybox `find` uses Emacs-flavoured regex, but
       | ultimate I still had to resort to some trial-and-error. This
       | problem is all too common in API documentation.
        
         | mklein994 wrote:
         | I tend to go to https://www.regular-expressions.info when I
         | need to find out which features are supported between dialects.
         | Not always up-to-date, but has some good info.
        
         | geongeorgek wrote:
         | You're totally right. Right now this tool only supports the
         | javascript flavor of regex. That said, for all the simple
         | expressions shown there it's more or less the same for most
         | other engines. I guess that makes it okay.
        
         | 8bitsrule wrote:
         | By coincidence, I found this link a bit earlier today. It tries
         | to avoid flavors and exotic syntax.
         | 
         | https://rexegg.com/regex-quickstart.html
        
         | alexhutcheson wrote:
         | RE2 syntax[1] is a pretty good option to learn, because it's
         | mostly a "lowest common denominator" - if it works in RE2, it
         | should work in PCRE, Python, Javascript, etc. The reverse isn't
         | true - there is a bunch of syntax that RE2 doesn't support by
         | design, often to constrain performance bounds.
         | 
         | Emacs regexps are unfortunately their own weird beast - they
         | handle parentheses differently than other regexp engines,
         | because Emacs assumes that you'll be running regexps on Lisp
         | code a lot and want to easily match parentheses. The best
         | documentation on that syntax is (confusingly) in the Elisp
         | reference manual: https://www.gnu.org/software/emacs/manual/htm
         | l_node/elisp/Sy....
         | 
         | [1] https://github.com/google/re2/wiki/Syntax
        
         | chirss wrote:
         | regex101 does a good job at showing you what the selected
         | variant can do.
        
         | waz0wski wrote:
         | if you're on osx, the app Patterns is really good for testing
         | regex, and also has quick references for a variety of regex
         | 'engines' and also has decent matching explanations
         | 
         | https://krillapps.com/patterns/
        
         | justaj wrote:
         | Honestly, as a noob, this is one of the biggest reasons I have
         | such a hard time deciding to learn regex.
         | 
         | Python flavor would probably be different than PCRE, which is
         | probably different than JS flavor.
         | 
         | Even worse is that it might be too late to standardize all the
         | regex flavors because there is already _so much_ written in
         | different regex flavors that it just costs too much for them to
         | become obsolete in the future.
         | 
         | This is really demotivating.
        
           | new_guy wrote:
           | > Honestly, as a noob, this is one of the biggest reasons I
           | have such a hard time deciding to learn regex.
           | 
           | Clear your afternoon, and just learn it. Seriously, it takes
           | a couple of hours at best and then - BOOM - you're done for
           | the rest of your life.
        
             | absorber wrote:
             | > you're done for the rest of your life.
             | 
             | If that were so easy then I don't think much of these
             | cheatsheets would exist.
        
           | chirss wrote:
           | Honestly don't let this get you down, here's a learning plan
           | (use regex101 to learn)
           | 
           | 1) Learn PCRE regex. 2) Try regex golf or cross words to
           | learn PCRE regex. 3) Take the quiz on regex101.
           | 
           | Once you're done with all 3:
           | 
           | Learn the minor/major differences in the other languages.
           | There aren't many. For example this named capture group:
           | 
           | (?<somename>someregex)
           | 
           | Would look like this in a different language:
           | 
           | (?P<somename>someregex)
           | 
           | There's some differences about what language can and cannot
           | do like recursion because someone thought it was a great idea
           | to make javascript awful at regex, but that's besides the
           | point. Regex is totally worth learning.
        
         | celeritascelery wrote:
         | The O'Riley book "mastering regular expressions" has a whole
         | section dedicated to it. As well as several tables. But it
         | would be nice to have an online version.
        
       | __tk__ wrote:
       | I'm loving the graphs which for the first time in years are
       | giving me an idea of what an expression is actually doing. Just
       | because the visualization is kept in a form that is easy to
       | understand with a programming background but can also be
       | translated to the expression itself in a straightforward manner.
        
         | noxToken wrote:
         | Graphs for these really hammer home the point that regular
         | expressions aren't magic. Parsers have so many abilities that
         | when starting out, my expressions were horribly inefficient and
         | missed many corner cases. Learning to graph them just like
         | automata immediately made things easier.
         | 
         | When green devs are having trouble with regular expressions
         | (and don't have a formal computer science background), I like
         | to give them a crash course in DFAs.
        
         | leibnitz27 wrote:
         | I knocked up a silly dynamic regex grapher a while back as a
         | little teaching aid - mildly fun
         | 
         | https://www.benf.org/other/regexview/
        
         | geongeorgek wrote:
         | I can't take credit for the visualizations although
         | implementing it was a pain in the ass. It was originally
         | created by: https://regexper.com/
        
       | [deleted]
        
       | KenanSulayman wrote:
       | I don't understand why the Github repository lists regexper as
       | the source of the visual graph code but the frame only shows
       | iHateRegex as watermark?
       | 
       | If the only thing that is embedded in that frame was taken
       | entirely from a different project, that project should at least
       | be mentioned in the frame.
        
       | dan_hawkins wrote:
       | Is there a bug? In regexp for IPv4: https://ihateregex.io/expr/ip
       | expression ends with {3} but the diagram states "2 times" in
       | lower right - shouldn't it say "3 times"?
        
         | jve wrote:
         | I think it says "repeat 2" times. So basically you'v already
         | went through the group and then 2 more times.
         | 
         | Because if I specify x{0,3}, i have 2 paths - around x and thru
         | x + at most 2 more times
        
           | geongeorgek wrote:
           | Yep you are right
        
       | sylvanaar wrote:
       | Nothing will ever beat RegexBuddy when it comes to Regex tools.
       | It is an entire IDE just for regex, and has been my not-so-secret
       | weapon for a decade or more.
        
       | StavrosK wrote:
       | I love regex and have no trouble reading them, but still love
       | this tool, great job. I especially like the railroad diagrams,
       | for those cases where I brainfarted on a regex and it's doing
       | something other than what I intended. Thanks for this.
        
         | geongeorgek wrote:
         | I'm glad you like the tool <3 It will have a lot more content
         | soon :)
        
           | chirss wrote:
           | If you want some help swing by #regex on efnet, happy to
           | help.
        
       | blauditore wrote:
       | Would be nice to have a regex for parsing HTML...
       | 
       |  _grabs popcorn_
        
         | chirss wrote:
         | boom. https://regex101.com/r/PxSY4U/1 technically it does parse
         | it. :P
        
         | arkh wrote:
         | With subroutines and recursive patterns I think you could do
         | something parsing valid HTML.
         | 
         | Your sanity won't be left intact tho.
        
           | asicsp wrote:
           | how about this "match "A B C" where A+B=C"[1] for sanity?
           | 
           | [1] http://www.drregex.com/2018/11/how-to-match-b-c-where-
           | abc-be...
        
         | geongeorgek wrote:
         | Haha..careful. someone might take this seriously
        
         | bmn__ wrote:
         | Easy with a sufficiently powerful engine:
         | https://stackoverflow.com/a/4234491
         | 
         | Relies on ?(DEFINE): http://p3rl.org/perlre#(DEFINE)
        
           | quickthrower2 wrote:
           | There is a good comment on that answer:
           | 
           | > To sum up: RegEx's are misnamed. I think it's a shame, but
           | it won't change. Compatible 'RegEx' engines are not allowed
           | to reject non-regular languages. They therefore cannot be
           | implemented correctly with only Finte State Machines. The
           | powerful concepts around computational classes do not apply.
           | Use of RegEx's does not ensure O(n) execution time. The
           | advantages of RegEx's are terse syntax and the implied domain
           | of character recognition. To me, this is a slow moving train
           | wreck, impossible to look away, but with horrible
           | consequences unfolding
        
       | rubyn00bie wrote:
       | Nice work on this!
       | 
       | Something subtle, but I quite loved the email regex is, IMHO,
       | close to perfect: \S+@\S+\\.\S+
       | 
       | Because the "perfect" one is just absurd, and no one realizes
       | it's going to be so fucking absurd until they start getting
       | support cases and then go read something like this:
       | https://stackoverflow.com/a/201378/931209
       | 
       | > If you want to get fancy and pedantic, implement a complete
       | state engine. A regular expression can only act as a rudimentary
       | filter. The problem with regular expressions is that telling
       | someone that their perfectly valid e-mail address is invalid (a
       | false positive) because your regular expression can't handle it
       | is just rude and impolite from the user's perspective.
        
         | p4lindromica wrote:
         | Even this regexp has false positives.
         | 
         | The `ai` ccTLD ran their own mail server at the root, so an
         | address like `a@ai` was a valid email address.
         | 
         | They serve a website at the tld root: http://ai./
        
       | superasn wrote:
       | Regex are quite simple and useful but my only issue is with those
       | recursive things. Like how do you match balanced brackets? I have
       | a regex (pcre) copy-pasted for it but for the life of me I don't
       | get it or maybe nod my head but instantly ununderstand it. I wish
       | there was a simple to understand doc that teaches to me how I can
       | match something like:                   "(this is inside a
       | bracket (and this is nested or (double nested)))
       | 
       | P.S. I know token parsing is better for these things but still I
       | just want to learn the other thing too.
        
         | gizmo686 wrote:
         | Balanced paranthesis are not a regular language, so it s
         | theoretically imposdible to match them with regular
         | expressions.
         | 
         | In practice, most regexp implemenations you see are more
         | powerful then regular expressions. For instance, .net has a
         | balancing groups feature [0] for exactly this usecase.
         | 
         | [0] https://regular-expressions.mobi/balancing.html?wlr=1
        
           | superasn wrote:
           | The regex I've copy-pasted is this:                   $str =
           | "(this is inside a bracket (and this is nested or (double
           | nested)))";         do {
           | preg_match_all('~\(((?:[^\(\)]++|(?R))*)\)~', $str,
           | $matches);             echo $str = $matches[1][0] ?? '',
           | "\n";         } while($str);
           | 
           | Outputs this [1]:                   > this is inside a
           | bracket (and this is nested or (double nested))         > and
           | this is nested or (double nested)         > double nested
           | 
           | You're right that there is more processing involved (e.g.
           | while loop) but I still don't understand this part
           | '~\(((?:[^\(\)]++|(?R))*)\)~'
           | 
           | [1] https://rextester.com/MEH86820
        
         | chirss wrote:
         | Can you explain the problem further?
        
           | superasn wrote:
           | please see my reply to @gizmo686
        
             | chirss wrote:
             | I guess I don't understand. Mind throwing up an example
             | with multiple test strings on regex101.com ? I'd like to
             | take a look and see if I can make a regex which does what
             | you want.
             | 
             | So if you could write the examples there, and then a
             | description like you would tell your mom of what you want
             | I'll see what I can do.
        
       | mNovak wrote:
       | I always refer back to http://rexegg.com/ Not a tool as such, but
       | a good reference if you know how it works and just need to
       | refresh on syntax.
        
       | vzidex wrote:
       | Very cool! The site that worked best for me to learn regex was
       | https://regexcrossword.com/ - after solving my way through all of
       | them (I got really hooked when I discovered the site) I found I
       | was alright at regex.
        
         | geongeorgek wrote:
         | Thank you for sharing that. looks good
        
       | binarysneaker wrote:
       | These regexs are garbage. Others have suggested better sites for
       | learning how to construct regexs, and stackoverflow has plenty of
       | great examples.
        
         | geongeorgek wrote:
         | Why don't you link them with the comment
        
       | philshem wrote:
       | I have a secret hobby of answering python + regex questions on
       | stackoverflow with pure python.
        
         | geongeorgek wrote:
         | I'm gonna pretend I didn't read this
        
         | johnnylambada wrote:
         | Examples?
        
           | philshem wrote:
           | _secret_
        
       | samat wrote:
       | This is very neat, thank you!
        
       | Glench wrote:
       | Plug for Verbal Expressions (no affiliation), which has an
       | alternate way of compiling more human-readable regexes for a
       | dozen languages: http://verbalexpressions.github.io/
        
         | linusjs_ wrote:
         | I remember that library. A year after I made regexpbuilder
         | https://www.npmjs.com/package/regexpbuilder that library
         | suddenly appeared, and was basically a rip-off of the concept I
         | appear to have created (there was no such other library before
         | regexpbuilder), but is also fairly useless because it doesn't
         | look like it could represent more than about 10% of the
         | possible regular expressions. Yet there was no mention of my
         | library at all in the readme of verbal expressions.
        
         | geongeorgek wrote:
         | This looks nice
        
         | certifiedloud wrote:
         | A CLI version of this would be pretty useful to me.
        
       | xxsaculxx wrote:
       | Nice tool! I personally use https://regex101.com/ as I like the
       | explanations and quick reference.
        
       | adambowles wrote:
       | >/h.llo/ the '.' matches any one character other than a new line
       | character... matches 'hello', 'hallo' but not 'h llo'
       | 
       | in the cheatsheet is false. (https://regexr.com/4tc48)
       | 
       | `.` can match any character except linebreaks (including
       | whitespace)
        
         | jodrellblank wrote:
         | `.` "can" match any character including linebreaks if the regex
         | engine is in re.DOTALL mode (Python) or SingleLine Mode (.Net).
        
       | mimixco wrote:
       | This is awesome! Thank you! I hate regex, too, but I love your
       | inline railroad diagramming tool.
        
         | geongeorgek wrote:
         | Haha thank you <3
        
       | axegon wrote:
       | This is awesome but.... I don't hate regex. Matter of fact, I
       | love regex.
        
       | kazinator wrote:
       | There is no way I would just plop that IPv6 regex into any
       | serious program. :)
        
       | hyperpape wrote:
       | Really nice idea.
       | 
       | I found that you can see your own regex with railroad diagram by
       | going to one of the prepopulated examples and editing it.
       | However, it wasn't clear to me that's the intended use of the
       | tool. It's either a little side-effect, or not super-
       | discoverable.
        
       | dana321 wrote:
       | One thing i've always missed from the Perl programming language
       | is the regex operators.
       | 
       | You could do:                 my $var='foo foo bar and more bar
       | foo!!!';            if($var=~/(foo|bar)/g){  # does the variable
       | contain foo or bar?              print "foo! $1 removing
       | foo..\n";              # remove our value..
       | $var=~s/$1//g;            }
        
         | radiac wrote:
         | So did I: https://github.com/radiac/python-perl/
        
       | lfglopes wrote:
       | I used to use this site http://txt2re.com which is now off the
       | grid, at the least since yesterday. :(
       | 
       | Unlike most regex helpers, in this one you would start with the
       | text you want to filter/parse and then it would suggest you
       | possible extractions.
       | 
       | Do you know any alternatives?
        
       | olalonde wrote:
       | Thumbs up for the relatable domain name.
        
       ___________________________________________________________________
       (page generated 2020-01-31 23:00 UTC)