[HN Gopher] Show HN: Regex Cheatsheet ___________________________________________________________________ Show HN: Regex Cheatsheet Author : geongeorgek Score : 361 points Date : 2020-01-31 10:32 UTC (12 hours ago) (HTM) web link (ihateregex.io) (TXT) w3m dump (ihateregex.io) | Amarok wrote: | ^[a-z0-9_-]{3,15}$ | | The username reference doesn't match 16 characters as claimed | geongeorgek wrote: | I should match. the number 15 there means that repeat x up to | 15 times. so 1+15=16. | | looks good to me | aratauto wrote: | That is not correct. 15 is total maximum number of repeats | including the first one. Even the diagram on | https://ihateregex.io/expr/username correctly says that loop | can be taken between 2 and 14 times. | asicsp wrote: | where does the extra 1 come from? a{2,5} means match 'a' two | to five times | kitd wrote: | This is really cool! | | 2 points: | | 1. it fiddled with my back button which is a bit annoying | | 2. a better email sample is | ^[^@]+@[^@]+\.[^@]+$ | | which removes the 2 ampersands problem. | bmn__ wrote: | That's not how the spec works. Compliant solution: | https://stackoverflow.com/a/1917982 | geongeorgek wrote: | Thank you! | | I think I know what's wrong with your back button. I will fix | it. | | And for the regex. will try it out and see if I can add it. | laumars wrote: | Even that is wrong because you can have privately owned TLDs (I | forget what they're technically called) like .google | | So sundar.pichai@google is technically a valid address (whether | .google has any MX records is another matter) | | Regex shouldn't really be used for email addresses anyway | because the only reliable way to authenticate an email address | is to literally send an email to that address. | donalhunt wrote: | .google does not have any MX records | bduerst wrote: | AFAIK none of the TLDs allow for MX records on just the TLD | | i.e. johndoe@com will never exist | bradbeattie wrote: | What about things like root@localhost? | skrebbel wrote: | You'll probably want to add \S to those character classes as | well, or it matches "it's an @ sign. not an ampersand." | anamexis wrote: | Escaped or quoted whitespace is allowed in the local part of | email addresses. | geongeorgek wrote: | I used to spend hours trying to craft the perfect expression for | my scraping projects not realizing that I don't really know | regex. | | This tool is a cheat sheet that also explains the commonly used | expressions so that you understand it. | | - There is a visual representation of the regular expression | (thanks to regexpr) | | - The application shows matching strings which you can play | around | | - Expressions can be edited and these are instantly validated | darau1 wrote: | Nobody pointed it out, but there's also https://regexr.com/ | | It's how I learned regex years ago, and I still use it today to | test/build more complex patterns. | imafish wrote: | I love regexr. Has been a constant tab in my browser for years | now. | [deleted] | smartmic wrote: | Here is my goto resource for checking regexpr with railroad | diagrams: https://regexper.com/ | jve wrote: | Well, there is a whole list of useful regex links posted 5 | months ago when someone posted url to RegExr. | | Enjoy: https://news.ycombinator.com/item?id=20614847 | strig wrote: | My go-to is https://regex101.com/ | darau1 wrote: | Didn't know about this. Thanks! | chirss wrote: | We use it on slack and irc for debugging people's regular | expressions all the time. Being able to have 30 revisions | to a base regex to troubleshoot is fantastic. | | Plus the quiz is awesome. | 52-6F-62 wrote: | I use the same as a default. It's been a great help. | bepvte wrote: | I love regex101. It uses webassembly for some of its engines | huseyinkeles wrote: | I've been using regex101 for many years and love it! The | debugger [0] that it has is amazing! | | [0] - https://regex101.com/debugger | esaym wrote: | Either I'm a regex wizard and don't know it, or perhaps I think I | know something but know nothing at all but I've never complained | about using regex expressions. I use them all the time without | thought. Never quite figured out the need for a cheatsheet | either, your language of choice should have a good documentation | page for any specific supported syntax. | crispyambulance wrote: | I use regex a lot but deliberately keep it simple. | | One thing that confounded me often was positive and negative | look-arounds. I always got the expressions mixed up, until I just | put the expressions into a table like this... | look-behind | look-ahead | ------------------------------------ positive (?<=a)b | | a(?=b) ------------------------------------ | negative (?<!a)b | a(?!b) | | It's not hard, but for whatever reason my brain had trouble | remembering the usage because every time I looked it up, each of | those expressions was nested in a paragraph of explanation, and I | could not see the simple intuitive pattern. | | Putting it into a simple visualization helps a lot. | | Now, if I can find a similar mnemonic for backreferences !? | wahern wrote: | Maybe it's easier to remember that lookbehinds are evil from an | implementation standpoint, and even in Perl have arbitrary | limitations. If you see lookbehinds, look away! If you see | lookaheads, go ahead. | lonelappde wrote: | Lookbehinds stay behind. | glangdale wrote: | Oddly, lookbehinds are evil only in a specific backtracking | world. We never got around to implementing arbitrary | lookarounds in Hyperscan (https://github.com/intel/hyperscan) | but if we had done something in the automata world to handle | lookaround, lookbehinds are _way_ easier than lookaheads. | | To handle a lookbehind, you really only need to occasionally | 'AND' together some states (not an operation you would | normally do in a standard NFA whether Glushkov or Thompson). | To handle lookaheads... well, it gets ugly. | ygra wrote: | It's something I really like about .NET's regular | expressions. Lookbehind has no limitations and will just | match backwards with all features you can use in other parts. | | So depending on the language or flavor you're working in, | running away isn't really necessary. | geongeorgek wrote: | This is really intended for beginners. but I can confirm more | content is coming soon <3 | asicsp wrote: | neat site! clicking an example opens up a playground with live | update and explanation and railroad diagrams, similar to sites | like regex101[1] and regulex[2] | | one suggestion would be to mention clearly which tool/language is | being used, regex has no unified standard.. based on "Cheatsheet | adapted" message at the bottom, I think it is for JavaScript. I | wrote a book on js regexp last year, and I have post for | cheatsheet too [3] | | [1] https://regex101.com/ | | [2] https://jex.im/regulex | | [3] | https://learnbyexample.github.io/cheatsheet/javascript/javas... | geongeorgek wrote: | Totally agreed! Right now I only support javascript. But for | everything shown there, it's pretty much the same for most | flavors | ape4 wrote: | The IPv6 regex is surprisingly complicated. | geongeorgek wrote: | Yeah. this is when you start to have 2 problems | robert_tweed wrote: | OK, these kinds of regex tools get posted quite often. I get it, | regex is very confusing at first. And some of these use-cases | result in rather complex expressions nobody should be forced to | write from scratch (you are still remembering to write unit tests | for them though, right?) | | But as someone who actually knows [some flavours of] regex fairly | well, what I would _really_ like, is a reference that covers all | the subtle differences between the various regex engines, along | with community-managed documentation (perhaps wiki pages) of | which applications & API versions use which flavour of regex. | | For example, the other day I wanted to run a find on my NAS. I | needed to use a regex, but the Busybox version of find doesn't | support the iregex option, so all expressions are case-sensitive. | With some googling, I was able to find out that the default regex | type is Emacs, but I wasn't able to find either a good reference | for exactly what Emacs regex does and doesn't support, nor any | information about how to set the "i" flag. In the end I had to | manually convert every character into a class (like [aA] for "a") | which was tedious, but quicker than trying to find a better | solution or resorting to grep. | | A related, annoyingly common pattern is that the documentation | for `find` states that `--regex` specifies a regex, but it does | not state _which_ flavour of regex. The documentation for certain | versions of `find`, which support alternative engines, note that | the default is Emacs. From this I was able to infer (perhaps | wrongly) that the Busybox `find` uses Emacs-flavoured regex, but | ultimate I still had to resort to some trial-and-error. This | problem is all too common in API documentation. | mklein994 wrote: | I tend to go to https://www.regular-expressions.info when I | need to find out which features are supported between dialects. | Not always up-to-date, but has some good info. | geongeorgek wrote: | You're totally right. Right now this tool only supports the | javascript flavor of regex. That said, for all the simple | expressions shown there it's more or less the same for most | other engines. I guess that makes it okay. | 8bitsrule wrote: | By coincidence, I found this link a bit earlier today. It tries | to avoid flavors and exotic syntax. | | https://rexegg.com/regex-quickstart.html | alexhutcheson wrote: | RE2 syntax[1] is a pretty good option to learn, because it's | mostly a "lowest common denominator" - if it works in RE2, it | should work in PCRE, Python, Javascript, etc. The reverse isn't | true - there is a bunch of syntax that RE2 doesn't support by | design, often to constrain performance bounds. | | Emacs regexps are unfortunately their own weird beast - they | handle parentheses differently than other regexp engines, | because Emacs assumes that you'll be running regexps on Lisp | code a lot and want to easily match parentheses. The best | documentation on that syntax is (confusingly) in the Elisp | reference manual: https://www.gnu.org/software/emacs/manual/htm | l_node/elisp/Sy.... | | [1] https://github.com/google/re2/wiki/Syntax | chirss wrote: | regex101 does a good job at showing you what the selected | variant can do. | waz0wski wrote: | if you're on osx, the app Patterns is really good for testing | regex, and also has quick references for a variety of regex | 'engines' and also has decent matching explanations | | https://krillapps.com/patterns/ | justaj wrote: | Honestly, as a noob, this is one of the biggest reasons I have | such a hard time deciding to learn regex. | | Python flavor would probably be different than PCRE, which is | probably different than JS flavor. | | Even worse is that it might be too late to standardize all the | regex flavors because there is already _so much_ written in | different regex flavors that it just costs too much for them to | become obsolete in the future. | | This is really demotivating. | new_guy wrote: | > Honestly, as a noob, this is one of the biggest reasons I | have such a hard time deciding to learn regex. | | Clear your afternoon, and just learn it. Seriously, it takes | a couple of hours at best and then - BOOM - you're done for | the rest of your life. | absorber wrote: | > you're done for the rest of your life. | | If that were so easy then I don't think much of these | cheatsheets would exist. | chirss wrote: | Honestly don't let this get you down, here's a learning plan | (use regex101 to learn) | | 1) Learn PCRE regex. 2) Try regex golf or cross words to | learn PCRE regex. 3) Take the quiz on regex101. | | Once you're done with all 3: | | Learn the minor/major differences in the other languages. | There aren't many. For example this named capture group: | | (?<somename>someregex) | | Would look like this in a different language: | | (?P<somename>someregex) | | There's some differences about what language can and cannot | do like recursion because someone thought it was a great idea | to make javascript awful at regex, but that's besides the | point. Regex is totally worth learning. | celeritascelery wrote: | The O'Riley book "mastering regular expressions" has a whole | section dedicated to it. As well as several tables. But it | would be nice to have an online version. | __tk__ wrote: | I'm loving the graphs which for the first time in years are | giving me an idea of what an expression is actually doing. Just | because the visualization is kept in a form that is easy to | understand with a programming background but can also be | translated to the expression itself in a straightforward manner. | noxToken wrote: | Graphs for these really hammer home the point that regular | expressions aren't magic. Parsers have so many abilities that | when starting out, my expressions were horribly inefficient and | missed many corner cases. Learning to graph them just like | automata immediately made things easier. | | When green devs are having trouble with regular expressions | (and don't have a formal computer science background), I like | to give them a crash course in DFAs. | leibnitz27 wrote: | I knocked up a silly dynamic regex grapher a while back as a | little teaching aid - mildly fun | | https://www.benf.org/other/regexview/ | geongeorgek wrote: | I can't take credit for the visualizations although | implementing it was a pain in the ass. It was originally | created by: https://regexper.com/ | [deleted] | KenanSulayman wrote: | I don't understand why the Github repository lists regexper as | the source of the visual graph code but the frame only shows | iHateRegex as watermark? | | If the only thing that is embedded in that frame was taken | entirely from a different project, that project should at least | be mentioned in the frame. | dan_hawkins wrote: | Is there a bug? In regexp for IPv4: https://ihateregex.io/expr/ip | expression ends with {3} but the diagram states "2 times" in | lower right - shouldn't it say "3 times"? | jve wrote: | I think it says "repeat 2" times. So basically you'v already | went through the group and then 2 more times. | | Because if I specify x{0,3}, i have 2 paths - around x and thru | x + at most 2 more times | geongeorgek wrote: | Yep you are right | sylvanaar wrote: | Nothing will ever beat RegexBuddy when it comes to Regex tools. | It is an entire IDE just for regex, and has been my not-so-secret | weapon for a decade or more. | StavrosK wrote: | I love regex and have no trouble reading them, but still love | this tool, great job. I especially like the railroad diagrams, | for those cases where I brainfarted on a regex and it's doing | something other than what I intended. Thanks for this. | geongeorgek wrote: | I'm glad you like the tool <3 It will have a lot more content | soon :) | chirss wrote: | If you want some help swing by #regex on efnet, happy to | help. | blauditore wrote: | Would be nice to have a regex for parsing HTML... | | _grabs popcorn_ | chirss wrote: | boom. https://regex101.com/r/PxSY4U/1 technically it does parse | it. :P | arkh wrote: | With subroutines and recursive patterns I think you could do | something parsing valid HTML. | | Your sanity won't be left intact tho. | asicsp wrote: | how about this "match "A B C" where A+B=C"[1] for sanity? | | [1] http://www.drregex.com/2018/11/how-to-match-b-c-where- | abc-be... | geongeorgek wrote: | Haha..careful. someone might take this seriously | bmn__ wrote: | Easy with a sufficiently powerful engine: | https://stackoverflow.com/a/4234491 | | Relies on ?(DEFINE): http://p3rl.org/perlre#(DEFINE) | quickthrower2 wrote: | There is a good comment on that answer: | | > To sum up: RegEx's are misnamed. I think it's a shame, but | it won't change. Compatible 'RegEx' engines are not allowed | to reject non-regular languages. They therefore cannot be | implemented correctly with only Finte State Machines. The | powerful concepts around computational classes do not apply. | Use of RegEx's does not ensure O(n) execution time. The | advantages of RegEx's are terse syntax and the implied domain | of character recognition. To me, this is a slow moving train | wreck, impossible to look away, but with horrible | consequences unfolding | rubyn00bie wrote: | Nice work on this! | | Something subtle, but I quite loved the email regex is, IMHO, | close to perfect: \S+@\S+\\.\S+ | | Because the "perfect" one is just absurd, and no one realizes | it's going to be so fucking absurd until they start getting | support cases and then go read something like this: | https://stackoverflow.com/a/201378/931209 | | > If you want to get fancy and pedantic, implement a complete | state engine. A regular expression can only act as a rudimentary | filter. The problem with regular expressions is that telling | someone that their perfectly valid e-mail address is invalid (a | false positive) because your regular expression can't handle it | is just rude and impolite from the user's perspective. | p4lindromica wrote: | Even this regexp has false positives. | | The `ai` ccTLD ran their own mail server at the root, so an | address like `a@ai` was a valid email address. | | They serve a website at the tld root: http://ai./ | superasn wrote: | Regex are quite simple and useful but my only issue is with those | recursive things. Like how do you match balanced brackets? I have | a regex (pcre) copy-pasted for it but for the life of me I don't | get it or maybe nod my head but instantly ununderstand it. I wish | there was a simple to understand doc that teaches to me how I can | match something like: "(this is inside a | bracket (and this is nested or (double nested))) | | P.S. I know token parsing is better for these things but still I | just want to learn the other thing too. | gizmo686 wrote: | Balanced paranthesis are not a regular language, so it s | theoretically imposdible to match them with regular | expressions. | | In practice, most regexp implemenations you see are more | powerful then regular expressions. For instance, .net has a | balancing groups feature [0] for exactly this usecase. | | [0] https://regular-expressions.mobi/balancing.html?wlr=1 | superasn wrote: | The regex I've copy-pasted is this: $str = | "(this is inside a bracket (and this is nested or (double | nested)))"; do { | preg_match_all('~\(((?:[^\(\)]++|(?R))*)\)~', $str, | $matches); echo $str = $matches[1][0] ?? '', | "\n"; } while($str); | | Outputs this [1]: > this is inside a | bracket (and this is nested or (double nested)) > and | this is nested or (double nested) > double nested | | You're right that there is more processing involved (e.g. | while loop) but I still don't understand this part | '~\(((?:[^\(\)]++|(?R))*)\)~' | | [1] https://rextester.com/MEH86820 | chirss wrote: | Can you explain the problem further? | superasn wrote: | please see my reply to @gizmo686 | chirss wrote: | I guess I don't understand. Mind throwing up an example | with multiple test strings on regex101.com ? I'd like to | take a look and see if I can make a regex which does what | you want. | | So if you could write the examples there, and then a | description like you would tell your mom of what you want | I'll see what I can do. | mNovak wrote: | I always refer back to http://rexegg.com/ Not a tool as such, but | a good reference if you know how it works and just need to | refresh on syntax. | vzidex wrote: | Very cool! The site that worked best for me to learn regex was | https://regexcrossword.com/ - after solving my way through all of | them (I got really hooked when I discovered the site) I found I | was alright at regex. | geongeorgek wrote: | Thank you for sharing that. looks good | binarysneaker wrote: | These regexs are garbage. Others have suggested better sites for | learning how to construct regexs, and stackoverflow has plenty of | great examples. | geongeorgek wrote: | Why don't you link them with the comment | philshem wrote: | I have a secret hobby of answering python + regex questions on | stackoverflow with pure python. | geongeorgek wrote: | I'm gonna pretend I didn't read this | johnnylambada wrote: | Examples? | philshem wrote: | _secret_ | samat wrote: | This is very neat, thank you! | Glench wrote: | Plug for Verbal Expressions (no affiliation), which has an | alternate way of compiling more human-readable regexes for a | dozen languages: http://verbalexpressions.github.io/ | linusjs_ wrote: | I remember that library. A year after I made regexpbuilder | https://www.npmjs.com/package/regexpbuilder that library | suddenly appeared, and was basically a rip-off of the concept I | appear to have created (there was no such other library before | regexpbuilder), but is also fairly useless because it doesn't | look like it could represent more than about 10% of the | possible regular expressions. Yet there was no mention of my | library at all in the readme of verbal expressions. | geongeorgek wrote: | This looks nice | certifiedloud wrote: | A CLI version of this would be pretty useful to me. | xxsaculxx wrote: | Nice tool! I personally use https://regex101.com/ as I like the | explanations and quick reference. | adambowles wrote: | >/h.llo/ the '.' matches any one character other than a new line | character... matches 'hello', 'hallo' but not 'h llo' | | in the cheatsheet is false. (https://regexr.com/4tc48) | | `.` can match any character except linebreaks (including | whitespace) | jodrellblank wrote: | `.` "can" match any character including linebreaks if the regex | engine is in re.DOTALL mode (Python) or SingleLine Mode (.Net). | mimixco wrote: | This is awesome! Thank you! I hate regex, too, but I love your | inline railroad diagramming tool. | geongeorgek wrote: | Haha thank you <3 | axegon wrote: | This is awesome but.... I don't hate regex. Matter of fact, I | love regex. | kazinator wrote: | There is no way I would just plop that IPv6 regex into any | serious program. :) | hyperpape wrote: | Really nice idea. | | I found that you can see your own regex with railroad diagram by | going to one of the prepopulated examples and editing it. | However, it wasn't clear to me that's the intended use of the | tool. It's either a little side-effect, or not super- | discoverable. | dana321 wrote: | One thing i've always missed from the Perl programming language | is the regex operators. | | You could do: my $var='foo foo bar and more bar | foo!!!'; if($var=~/(foo|bar)/g){ # does the variable | contain foo or bar? print "foo! $1 removing | foo..\n"; # remove our value.. | $var=~s/$1//g; } | radiac wrote: | So did I: https://github.com/radiac/python-perl/ | lfglopes wrote: | I used to use this site http://txt2re.com which is now off the | grid, at the least since yesterday. :( | | Unlike most regex helpers, in this one you would start with the | text you want to filter/parse and then it would suggest you | possible extractions. | | Do you know any alternatives? | olalonde wrote: | Thumbs up for the relatable domain name. ___________________________________________________________________ (page generated 2020-01-31 23:00 UTC)