hngopher.com

       [HN Gopher] Java Verbal Expressions
       ___________________________________________________________________
        
       Java Verbal Expressions
        
       Author : victor106
       Score  : 220 points
       Date   : 2020-11-25 15:29 UTC (7 hours ago)
        
 (HTM) web link (github.com)
 (TXT) w3m dump (github.com)
        
       | 6gvONxR4sf7o wrote:
       | regex suffers from the same problem that inlined code does.
       | 
       | For example, if you saw this in a code review, what would you
       | say: log_and_return(rank_by_time(compute_recommendations(get_data
       | (client_id,date), find_nearest_neighbors(client_id))))
       | 
       | You'd tell them to create some intermediate variables. But when
       | it's a regex, apparently we're all fine with this:
       | 
       | /((([A-Za-z]{3,9}:(?:\/\/)?)(?:[-;:&=\\+\$,\w]+@)?[A-Za-z0-9.-]+|
       | (?:www.|[-;:&=\\+\$,\w]+@)[A-Za-z0-9.-]+)((?:\/[\\+~%\/.\w-_])?\?
       | ?(?:[-\\+=&;%@.\w_])#?(?:[\w]*))?)/
        
       | TazeTSchnitzel wrote:
       | By using this "builder" syntax you gain:
       | 
       | * not having to distinguish special characters in the pattern
       | being matched from special characters part of regex syntax
       | 
       | * no ambiguity as to whether something is a digraph or not
       | 
       | * no escaping hell
       | 
       | * unambiguous human-readable names for all the regex features
       | used
       | 
       | * the ability to use whitespace to clearly separate different
       | parts of the regex
       | 
       | * the ability to comment parts of the regex
       | 
       | It sounds great to me. Have you ever tried making a regex
       | matching something with backslashes in it, and then you have to
       | put that regex inside a string literal? Have you ever had to
       | switch between different regex environments and not known which
       | symbols require escaping, or what is the correct way to write
       | something in a particular environment? I've had all these
       | problems.
        
         | TimTheTinker wrote:
         | Many of those gains can be had by using first-class and more
         | full-featured regexes, like those that that are available in
         | other languages (Ruby, Perl):
         | 
         | - escaping hell isn't that much of a problem, since you're only
         | ever escaping something once (not like a regex in a string)
         | 
         | - several languages support separating regexes across several
         | lines
         | 
         | - regex commenting (including named groups) is a standard
         | feature in many languages, and that's besides using first-class
         | comments across multiple lines
         | 
         | I think you do have a point about digraphs (or homographs), but
         | unless I misunderstand, those would be a problem whether or not
         | the character(s) are part of a string vs. a first-class regex.
         | As for unambiguous human-readable names for regex features
         | used, tools like this (https://regexr.com/) are available and
         | very effective.
         | 
         | I might prefer Java Verbal Expressions over java.util.regex,
         | but to me that's more of a knock on Java and its lack of
         | proper, first-class regexes than anything else.
        
       | jehna1 wrote:
       | Anyone looking for a non-Java implementation: This library has
       | been ported to 30+ languages, and you can find a list of them at
       | http://verbalexpressions.github.io/
        
       | yoz wrote:
       | Sure, it's much easier to read, especially when it comes to
       | finding and understanding a two-character diff in a 50-char
       | regex.
       | 
       | Sure, I get the benefits of type safety.
       | 
       | Sure, it'll save me time debugging when I accidentally create an
       | invalid regex.
       | 
       | But _what am I meant to do with all that time saved?_ Read a
       | book? Write more code? I don 't get it. Let me waste the time on
       | a ludicrously arcane syntax where I spend half the time looking
       | at every bracket trying to understand if it's a control character
       | in that particular context, because the ego trip I get from
       | mastering this ridiculousness is HUGE!
       | 
       | (Yes, I understand regex syntax. I've been able to explain the
       | phrase "zero-width negative lookbehind assertion" for the past
       | twenty years. I inhaled the Friedl book and got utterly high on
       | the idea that the awesome power of regular expressions - which
       | are genuinely great in how they ease flexibility in accepting
       | input - is entwined with their completely inhuman syntax. But I
       | was wrong.)
        
         | grishka wrote:
         | I've never had any problems writing and debugging regular
         | expressions after I came across this: https://regex101.com
         | 
         | And since regexes are usually write-once, adding this
         | complexity on top of them serves no additional benefit. If
         | anything, it'd probably make it _harder_ for the next guy to
         | understand your code.
        
           | ziml77 wrote:
           | I bought RegexBuddy years ago and have loved it for
           | debugging. However it only runs on Windows. Found regex101
           | recently and I think it's a great alternative (though I
           | almost didn't check it out because the domain has SEO abuse
           | site vibes).
        
         | sixo wrote:
         | It's fun to write regex but it is absolutely miserable to read
         | it. This looks like an improvement.
        
           | szatkus wrote:
           | It is. For most parts regexes like "\d+" are ok, but when
           | there is something more complicated I pull Verbal Expessions
           | into a project. To these days reactions on CR were mostly
           | positive or netural at worst. If it was built into the
           | standard library I would probably use it instead of regex,
           | but adding a new dependency and interfacing it with libraries
           | that expect Java regex objects has its cost.
        
         | sergeykish wrote:
         | Regular expression defines graph, graphical representation
         | looks like a better choice:
         | 
         | https://jex.im/regulex/#!flags=&re=%5E(%3F%3Ahttp)(%3F%3As)%...
         | 
         | Usage example -- CSS Syntax Module Level 3 documentation:
         | 
         | https://www.w3.org/TR/css-syntax-3/#string-token-diagram
         | 
         | and JSON specification:
         | 
         | https://www.json.org/json-en.html
         | 
         | Have not found visual editor, made sample in quiver:
         | 
         | https://q.uiver.app/?q=WzAsMTEsWzIsM10sWzAsNl0sWzEsMCwiXiJdL...
        
           | wwright wrote:
           | Graphs may be more clear, but if we rule out visual editors,
           | IMO this approach is still a net positive.
        
         | maweki wrote:
         | > Sure, I get the benefits of type safety.
         | 
         | It seems not ;)
         | 
         | It's as if your java-compiler would stop warning you on
         | forgotten semicolons and would instead error out during runtime
         | when it reaches the statement with the missing semicolon.
         | 
         | It's not your time saved. It's time saved not running the test
         | suite, for example. An uncompilable regex is a category of
         | errors that you can ban completely from your program. Like java
         | bans syntax errors as a category of (runtime) errors. It's time
         | saved as any developer will not break this in a way that is not
         | a semantic error. It's time and mind saved not thinking about a
         | whole class of errors.
        
           | lgeorget wrote:
           | Fortunately C++ won't suffer from this kind of problems since
           | there, you can make your regex builder a constexpr!
           | 
           | (lol)
        
             | [deleted]
        
           | yoz wrote:
           | Thank you for the clarification! To be clear: my post is
           | sarcastic, and I was trying to say that this library looks
           | like a significant usability improvement over traditional
           | regex syntax.
        
         | brown9-2 wrote:
         | Saving time is not just about getting to use it elsewhere - you
         | also save time fixing bugs and the harm they can cause.
        
         | pwdisswordfish4 wrote:
         | Verbose does not 'easier to read' make; especially when you
         | don't know whether 'anythingBut' means (?!...) or [^...].
         | 
         | Type safety is nice, sure, but it's a rather small benefit in
         | this case. It doesn't mean abandoning commonly-understood
         | syntax is worth it. Most regular expressions are short enough
         | to make errors visible with the naked (or IDE-assisted) eye.
         | 
         | This library at best looks like a crutch for a deficient
         | language (which Java admittedly is), and at worst an
         | unnecessary obfuscation layer.
        
           | deepsun wrote:
           | You don't like Java, I see.
           | 
           | This tool has little to do with Java, except that author
           | decided to implement it in it. It's a regular expression
           | composer. You could implement it in any other general-purpose
           | language.
        
             | jehna1 wrote:
             | It already is. In 30+ of them. You can find them on:
             | http://verbalexpressions.github.io/
        
           | shawnz wrote:
           | Surprisingly it emits [^...]*
           | 
           | See: https://github.com/VerbalExpressions/JavaVerbalExpressio
           | ns/b...
        
           | alisonkisk wrote:
           | The "deficient language" is "regex" not Java.
           | 
           | Escape codes and special characters for regex semantics is a
           | deficiency of the 1980s programming world, not Javam
        
             | romanoderoma wrote:
             | They can be hard to read, but I don't think they are
             | deficient, on the contrary I think they are very elegant
             | 
             | Stephen Cole Kleene was a brilliant mathematician and when
             | he invented regexs in the 50s of the past century, he
             | anticipated a lot of concepts that became popular in
             | computer science, such as recursion (which he also founded
             | as a branch of mathematics and computer science together
             | with Alonzo Church, Kurt Godel and Alan Turing)
             | 
             | Java on the other hand has some deficiencies here and there
             | and it's not really a modern language free from old cruft
        
               | admax88q wrote:
               | > (?:[a-z0-9!#$%&' _+
               | /=?^_`{|}~-]+(?:\\.[a-z0-9!#$%&'_+/=?^_`{|}~-]+) _| "(?:[
               | \x01-\x08\x0b\x0c\x0e-\x1f\x21\x23-\x5b\x5d-\x7f]|\\\\[\x
               | 01-\x09\x0b\x0c\x0e-\x7f])_")@(?:(?:[a-z0-9](?:[a-z0-9-]
               | _[a-z0-9])?\\.)+[a-z0-9](?:[a-z0-9-]_ [a-z0-9])?|\\[(?:(?
               | :25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\\.){3}(?:25[0-5]|2
               | [0-4][0-9]|[01]?[0-9][0-9]?|[a-z0-9-]*[a-z0-9]:(?:[\x01-\
               | x08\x0b\x0c\x0e-\x1f\x21-\x5a\x53-\x7f]|\\\\[\x01-\x09\x0
               | b\x0c\x0e-\x7f])+)\\])
               | 
               | Such elegance.
               | 
               | Might as well write code in machine code while we're at
               | it.
        
               | Tainnor wrote:
               | That's like complaining that you can write ugly code in
               | any language. The problem isn't the regular expression,
               | it's that email addresses, while they technically may
               | form a regular language (not sure if they 100% do), are
               | an insanely complicated such language and not a very nice
               | one.
               | 
               | How would you write a specification for that language in
               | any other way that was _more_ elegant? Sure, you could
               | make it more verbose, but that wouldn 't make it easier
               | to understand the whole of it, or why it is the way it
               | is.
        
               | admax88q wrote:
               | Almost every RFC writes their grammars in some form of
               | BNF not in regular expressions. RFC are written to be
               | understandable.
               | 
               | > Sure, you could make it more verbose, but that wouldn't
               | make it easier to understand the whole of it, or why it
               | is the way it is.
               | 
               | Absolutely it would. The way to understand a large thing,
               | is to understand the smaller components and then put them
               | together. Regular Expressions to do not compose well.
        
               | Tainnor wrote:
               | > Regular Expressions to do not compose well.
               | 
               | That is patently untrue. Regular expressions compose
               | under a number of important mathematical operations, such
               | as union, intersection and concatenation. If your PL
               | supports string interpolation, it's trivial to compose
               | them in these ways (well ok, maybe not intersection).
               | Nobody says that your regex needs to be written as a
               | single string.
        
               | ajuc wrote:
               | Elegant regexes are almost unheard of in real world no
               | matter if it's e-mail or anything else.
        
               | notreallytrue wrote:
               | Principia Mathematica wants a word in private
               | 
               | P.s. do you realise how much harder it would be to
               | understand the same thing in machine language?
        
               | Yeroc wrote:
               | Is there any language 20+ years old without cruft?
        
               | notreallytrue wrote:
               | Haskell? (30 years old)
               | 
               | Where are the Lispers when you need them? :)
        
               | wwright wrote:
               | Haskell absolutely has waaaay too much cruft. Have you
               | read the 30 page articles recommending which extensions
               | to use? Have you ever seen MTL? Read any documentation
               | written by Edward Kmett?
        
           | colonwqbang wrote:
           | > Verbose does not 'easier to read' make
           | 
           | Java is founded on the opposite principle, I think.
        
           | dehrmann wrote:
           | Yeah; I'd rather have proper multiline strings in Java and a
           | regex documented with the COMMENTS flag set. What I don't
           | need is a regex builder. Or SQL builder, for that matter.
        
             | [deleted]
        
             | AlphaSite wrote:
             | I think java has (or is getting) multiline stings now.
        
               | MHordecki wrote:
               | Java got them in the most recent version 15:
               | https://openjdk.java.net/jeps/378
        
               | dehrmann wrote:
               | This took them _way_ too long for how easy it is to add
               | and how many headaches it would prevent.
        
       | oweiler wrote:
       | I'm an average developer but never found regular expressions too
       | hard to write or even read.
        
         | theparanoid wrote:
         | Anything but the simplest regexes are tricky to correctly
         | write.
        
       | jacobwilliamroy wrote:
       | How do I learn regex? I get confused because it seems like maybe
       | there's more than one kind of regex floating around out there,
       | and since regex is made of lots of punctuation symbols, it's very
       | hard to search for things about it on the web. Is there a single
       | book I can read? A couple books? Does it depend on my runtime
       | environment?
        
       | gambler wrote:
       | Once you start thinking about it, it's mind boggling that we have
       | thousands of languages and yet most of them don't have built-in
       | facilities to construct and parse grammars (at least context-free
       | ones). Every single designer seems to think that _their_ language
       | is finally good enough and will not be used as a starting point
       | for another one.
        
         | throwaway_pdp09 wrote:
         | Because building in a parser is inappropriate; it isn't
         | generally worth it. You use a separate tool or framework, you
         | don't build it into a language.
        
         | rbonvall wrote:
         | Raku (Perl 6) has grammars as first-class citizens:
         | https://docs.raku.org/language/grammars#Creating_grammars
        
       | TimTheTinker wrote:
       | I am all for developer ergonomics, and I'm a fan of Ruby... but
       | the problems this library would add to a codebase/project seem
       | too big to be worth the benefits:
       | 
       | - non-standard syntax requiring its own documentation, which
       | developers would have to consult separately (even if they already
       | know regular expressions) to modify generated regular expressions
       | 
       | - removing the ability to test and validate regular expressions
       | independently of the codebase (say, in the terminal, a small
       | shell script, or using an online tool)
       | 
       | - a new rabbit hole to traverse when debugging a problem
       | 
       | - assuming the security risks associated with handing over regex-
       | building to a library built by someone else (even more so if the
       | regex is parsing private or protected data)
       | 
       | - adding a new dependency that may or may not be maintained in
       | the future
       | 
       | For those who would want to use this library, I would suggest
       | using a separate tool to build and/or understand regular
       | expressions. Here's one example, and I'm sure there are others:
       | https://regexr.com/
        
       | tasogare wrote:
       | I did a little class like this with a fluent API in C# to
       | generate regex in a project that requires big ones. It make
       | working with regex super easy and super maintainable.
        
       | soco wrote:
       | Looks a bit abandoned though, doesn't it. Otherwise I'd love it
       | for the safety and readability (while I'd still need to re-learn
       | all what got forgotten in the last half a year before I used
       | regex last time)
        
       | swlkr wrote:
       | It's semi-related, but if you're into easier regex, have a look
       | at janet's PEGs
       | 
       | https://janet-lang.org/docs/peg.html
        
       | jjevanoorschot wrote:
       | For everyone that doesn't see the point, take a look at the
       | example of parsing a long string [0]. The verbal expression is
       | _much_ easier to read than the regular expression.
       | 
       | [0]
       | https://github.com/VerbalExpressions/JavaVerbalExpressions/w...
        
         | pavon wrote:
         | I don't see it. The regex is mostly hard to read because they
         | formatted it poorly and put in a bunch of unnecessary non-
         | capturing groups. I find this to be just as easy (if not
         | easier) to read as their first example:
         | String pattern = (             "(\d+)\t"+
         | "(\d+)\t"+             "([0-1])\t"+
         | "(http://localhost:20\d{3})\t"+             "([0-1])\t"+
         | "(\d+)\t"+             "([0-1])\t"+             "(\d+)\t"+
         | "(\d+)\t"+             "([0-1])\t"+             "(\d+)\t"+
         | "(STR[0-2])"          );
         | 
         | And this is just as easy to read as their second example:
         | String num = "(\d+)\t";         String bool = "([0-1])\t";
         | String url = "(http://localhost:20\d{3})\t";         String str
         | = "(STR[0-2])";         String pattern =
         | num+num+bool+url+bool+num+bool+num+num+bool+num+str;
         | 
         | And yes, I do frequently split up my regexes like that to make
         | them more readable.
         | 
         | The only improvement I see is that you don't have messy
         | escaping in the url. That is genuinely nice. It motivates me to
         | start using an regEsc() function instead of doing it by hand.
         | However, I find "capt().endCapture()", and other verboseness to
         | be a step backwards.
         | 
         | Edit: Actually, from what I can tell, all the escaping was
         | unnecessary in this case as well. Updated examples without
         | unneeded escape characters.
        
       | flying_sheep wrote:
       | That really depends on how complicate the regular expression is.
       | For me this debate sounds like arguing assembly vs C. We will
       | need some sort of abstraction to develop a higher-level stuff in
       | case we need it.
        
       | jjice wrote:
       | While neat, I think that if you're a developer, you'd be better
       | off learning basic regular expressions instead so you can use
       | them in whatever language you'd like. Depending on this would
       | probably just make moving to a new code base that doesn't use
       | this a lot more confusing.
       | 
       | A normal regex with a comment above it explaining what it does
       | (for complex cases) always worked well for me.
        
         | nsxwolf wrote:
         | I can't learn them. I've tried for over 20 years and every time
         | I use them the knowledge is deleted from my brain immediately.
         | A library like this would be very helpful if it worked.
         | 
         | One problem is that I'm more likely to need regex almost
         | anywhere but Java code.
        
           | rhacker wrote:
           | At the bottom there's a list of other languages that support
           | the same API
        
         | TonyTrapp wrote:
         | Even as a developer you may have to assemble a regular
         | expression at runtime, at which point a library that can do it
         | for you may be much more handy than having to assemble the
         | string yourself.
         | 
         | And even if you know regex by heart - assembling it with
         | function calls can still be better / safer just like you
         | shouldn't insert SQL parameters by hand into your SQL query
         | strings.
        
         | spatx wrote:
         | I think there is value in both cases. I've seen many developers
         | that have struggled with regex even with all those hundreds of
         | tools to learn and to build/test regex. This could be useful to
         | them to start with, and they can learn regex according to their
         | time/needs. I see solutions like this as a choice, and the fact
         | that people are using these shows that there is value in having
         | that choice, even if is not obvious to us at first glance.
        
         | patal wrote:
         | That does not work so well if you're working in a team. A
         | fairly complex regular expression is always hard to read.
         | 
         | We see this as early as in code review and as late as when you
         | find a production bug because expectations of the surrounding
         | code have changed.
         | 
         | For those reasons, we usually break regexes into parts anyway,
         | and name and comment the single parts. Using the library's
         | example, we might have:                 protocol =
         | "^(?:http)(?:s)?"       protocol_separator = "(?:\:\/\/)"
         | url = "(?:www\.)?(?:[^\ ]*)$"              regex = protocol +
         | protocol_separator + url
         | 
         | Which turns out to be in the direction of these Java Verbal
         | Expressions. I find the Verbal Expressions idea really
         | enlightening.
        
       | 1f60c wrote:
       | The HN title isn't very informative.
       | 
       | Maybe you could change it to something like:                 Java
       | Verbal Expressions: a DSL for regular expressions
        
       | [deleted]
        
       | skocznymroczny wrote:
       | Looks interesting. I find out all my regexes are pretty much
       | write-only. When I come back to them few months later, I can't
       | make much of them and it's easier for me to start from scratch.
       | Tools such as https://regex101.com/ are amazing though for
       | development of regexes and later trying to make sense of them.
        
       | jug wrote:
       | I'm not sure if this is great or crazy! I try to not be swayed by
       | the handpicked examples because this at least _feels_ like a
       | design that could get messy once you try to do the particularly
       | gnarly regexeps that this library claims it was designed for. If
       | it's great, it should already have been done long ago, hmm...
        
       | bwestergard wrote:
       | This is a nice API. It seems to get right up to the edge of
       | becoming a parser combinator library.
       | 
       | Is it actually improving performance to use the regular
       | expressions internally to evaluate matches?
        
       | justin_oaks wrote:
       | This project would be better if it wasn't exactly a 1-to-1
       | mapping from words/methods to regular expressions. For example,
       | the regex "\d+" maps to the code "digits().oneOrMore()". That
       | doesn't read well in English because it's odd to have an
       | adjective after the noun (i.e. we say "red bird" not "bird red").
       | 
       | Also, a serious weakness in regex is they are "write only", or
       | hard to read. That's because they are compact and don't have
       | discernible sections that are then assembled together.
       | 
       | You can do that yourself in Java by assigning chunks of regex to
       | variables and then concatenating them together, but the regex
       | engine doesn't let you do that itself. You can't name sections of
       | the regex or insert comments into it.
       | 
       | The example
       | ^(?:http)(?:s)?(?:\:\/\/)(?:www\.)?(?:[^\ ]*)$
       | 
       | could be better if it could be broken down into named pieces or
       | commented like this:                   ^
       | (?:http)(?:s)? # http or https         (?:\:\/\/)     # ://
       | (?:www\.)?     # optional www.         (?:[^\ ]*)     # rest of
       | URL (no spaces)         $
        
         | dmarlow wrote:
         | I love your example of how it should be explained. This helps
         | people correlate the verbal aspects to the regex parts they
         | described. This ultimately reinforces and helps people learn
         | regex more deeply.
        
         | throwaway_pdp09 wrote:
         | I thought java regexes had comments?
         | 
         | https://docs.oracle.com/en/java/javase/11/docs/api/java.base...
        
           | justin_oaks wrote:
           | Huh, I didn't know that. I've read through a fair amount of
           | Java code with regexes and never seen anyone use comments.
           | Maybe it's because Java doesn't have proper multi-line string
           | support built into the language.
           | 
           | If you don't have multi-line support in the language then
           | you're more likely to put the comments outside the string:
           | String regex=           "^"          +"(?:http)(?:s)?" //
           | http or https          +"(?:\:\/\/)"     // ://
           | +"(?:www\.)?"     // optional www.          +"(?:[^\ ]*)"
           | // rest of URL (no spaces)          +"$";
        
             | throwaway_pdp09 wrote:
             | ... which has to be a better way of doing it (comments +
             | regexps in digestible chunks) than having a rather wordy
             | library.
        
       | abhinai wrote:
       | Beautiful though a little verbose!
        
       | chrisbrandow wrote:
       | solve a problem with regex: now you have 2 problems.
       | 
       | well, now you have 3.
        
       | ebiester wrote:
       | I wrote one of these in 2002, back in college, after being
       | inspired by Icon and SNOBOL. From Wikipedia:                 s :=
       | "this is a string"       s ? {                               #
       | Establish string scanning environment           while not pos(0)
       | do  {          # Test for end of string               tab(many('
       | '))              # Skip past any blanks               word :=
       | tab(upto(' ') | 0)  # the next word is up to the next blank -or-
       | the end of the line               write(word)                 #
       | write the word           }       }
       | 
       | I really think we lost out when we went toward regular
       | expressions rather than SNOBOL/Icon syntax, but I don't think a
       | direct substitute is as much the issue.
        
       | prabhatjha wrote:
       | This is a fantastic idea -- the kind you see and go why the heck
       | this was not done before. Such a huge time saver.
        
       | redmorphium wrote:
       | Reminds me of https://github.com/francisrstokes/super-expressive
        
         | PaulHoule wrote:
         | That kind of thing works even better in Java because the static
         | type system enforces it.
         | 
         | In particular generic methods don't have the problem of type
         | erasure that affect generic classes so many things you would
         | want to do with types "just work".
         | 
         | Almost everybody is afraid of it, but $ works just fine as an
         | identifier and can be used to make a DSL that looks like jQuery
         | in Java.
         | 
         | Maybe someday i will write a class like:
         | class("some.namespace.MyClass").method(...)
        
       | antpls wrote:
       | That would definitely help code reviews and maintenance. Is there
       | anything similar for Python?
        
       | ajainy wrote:
       | of course as others pointed out, writing direct exp might be
       | optimal or every dev should learn about it.
       | 
       | BUT in my whole career span, whenever I have to use regex, I
       | spend couple of hrs learning and testing. This kind of library
       | for Java open doors for many other things. (testibility, default
       | library using default methods etc.., integration with streaming
       | ). And as community adds to it, it can be optimized internally.
       | All end user needs to do upgrade versions. Can be extended part
       | of javax validation specs.
        
         | murkle wrote:
         | Another key point: makes the code readable!
        
       | dailygrind___ wrote:
       | I think Regex is too low-level and a problem worth abstracting.
       | It works fine with simple patterns but I don't really see how a
       | pattern like this:
       | 
       | /((([A-Za-z]{3,9}:(?:\/\/)?)(?:[-;:&=\\+\$,\w]+@)?[A-Za-z0-9.-]+|
       | (?:www.|[-;:&=\\+\$,\w]+@)[A-Za-z0-9.-]+)((?:\/[\\+~%\/.\w-_]
       | _)?\??(?:[-\\+= &;%@.\w_]_)#?(?:[\w]*))?)/
       | 
       | (https://stackoverflow.com/questions/161738/what-is-the-best-...)
       | 
       | contributes to readability.
        
       | ridaj wrote:
       | Is the implied contention that
       | `regex().capt().digit().oneOrMore().endCapt().tab()` is easier to
       | read than `([0-9]+)`?
       | 
       | If so, maybe this isn't for everyone :)
        
       | miked85 wrote:
       | I feel like one would be much better off and more efficient by
       | just learning regular expressions.
        
         | hcarvalhoalves wrote:
         | The fact there are countless RegEx cheatsheets and pages like
         | https://regex101.com/ or https://regexr.com/ is evidence
         | RegExes are not intuitive or easy to remember. Composing plain-
         | english functions can be easier to remember, and editors can
         | provide auto-complete.
        
           | ed25519FUUU wrote:
           | The argument isn't that they're easy to use, it's that
           | they're widely used and widely available in virtually all
           | languages. You'll encounter them.
           | 
           | The time it takes to learn regular expressions will pay off
           | because you'll be reading them and writing them for your
           | whole career.
        
             | king_magic wrote:
             | Eh, not necessarily. 15 years into my career, I can count
             | the number of times I've needed regular expressions on two
             | fingers.
        
           | flatiron wrote:
           | IntelliJ at least has a built in regex maker that you can
           | test against strings in the IDE. Pretty close to auto
           | complete.
        
           | pwdisswordfish4 wrote:
           | It's only evidence that they have to be learned, like
           | everything else.
        
           | djeiasbsbo wrote:
           | I'd say the bigger issue are the different regex
           | implementations. If you use Java, Javascript and grep you
           | already have to know the peculiarities of each
           | implementation...
        
         | teknopurge wrote:
         | I love seeing new things and building, I also want to
         | understand why people would find value in this? Is it because
         | people are learning things differently and find this easier to
         | digest instead of using regexs? or native substring
         | tokenization/boolean primitives?
         | 
         | The new me is being less critical and positive...
         | (smileyface.jpg)
        
         | nerdponx wrote:
         | I agree.
         | 
         | But if you want better readability and comments, Python's
         | "verbose" regex (?x) is a beautiful thing. You can usually also
         | just construct regular expressions incrementally by
         | concatenating strings or whatever your language supports.
        
       | tekknolagi wrote:
       | A buddy of mine made Remake
       | (https://docs.rs/remake/0.1.0/remake/) with this kind of thing in
       | mind. It's a DSL for composing regular expressions in a readable
       | way.
        
       | throwsofaraway wrote:
       | Trying to simplify something that doesn't simplify inherently
       | isn't always a good idea. Regex is pretty close to the least
       | level of abstraction that is necessary to get the job done. It
       | could probably be improved on, but probably not by much.
       | 
       | Some commenters below mentioned this Java syntax is a good idea
       | and using endless number of regex cheatsheets as a testament to
       | why regex is not simple enough and should be replaced. It's
       | almost silly that this is even an argument on HN. Take for
       | example quantum physics, there are lots of videos and guides that
       | try to explain how it works, in fact some of the smartest people
       | tried to explain it, even Richard Feynman. But he famously said
       | if you think you understand quantum mechanics you don't
       | understand quantum mechanics.
       | 
       | Some things cannot be reduced any further, this does not mean
       | those things are always simple in nature or somehow were designed
       | in a convoluted way on purpose.
       | 
       | At least when it comes to regex it's important to keep in mind
       | what Einstein said, "everything should be as simple as possible
       | but no simpler."
       | 
       | It's ironic that people apply reductionism to simplify regex, a
       | thing that itself one could argue is a prime example of
       | reductionistic design, yet they complain it's too abstract while
       | applying reductionism.
        
       | rendall wrote:
       | This project seems to be rediscovered every so often
       | 
       | https://news.ycombinator.com/from?site=github.com/verbalexpr...
        
       | chubot wrote:
       | Related: Oil has an regex syntax that composes and doesn't have
       | escaping problems:
       | 
       | https://www.oilshell.org/release/latest/doc/eggex.html
       | 
       | Direct link to example:
       | 
       | https://www.oilshell.org/release/latest/doc/eggex.html#examp...
       | 
       | A longer example:
       | 
       | http://www.oilshell.org/blog/2019/12/22.html#eggex
        
       | cutler wrote:
       | Maybe if Java left the Stone Age and fixed the need to escape
       | regex metacharacters this wouldn't be necessary.
        
       | cratermoon wrote:
       | this link has been posted 15 times on NH. First time was over 7
       | years ago https://news.ycombinator.com/item?id=6200070
        
       | tomp wrote:
       | Is it just me or does this seem like a very bad idea? I mean it
       | _seems_ nicer but the reality is, if you don 't know how Regexes
       | work, you won't understand the nuances of the "verbal" regex
       | either... Also, some optimisation maybe?
       | ^(?:http)(?:s)?(?:\:\/\/)(?:www\.)?(?:[^\ ]*)$
       | 
       | could be better written as
       | ^https?://(?:www[.])?[^ ]*$
       | 
       | Or am I missing something? In that case, I'll readily admit this
       | library is a good idea :)
        
         | wffurr wrote:
         | Now we have three problems...
        
         | dkarl wrote:
         | Sadly, the ability to mash autocomplete instead of looking at
         | the doc page for regular expressions will be a major selling
         | point.
        
           | Gaelan wrote:
           | Why sadly? Autocomplete does a ton for making APIs more
           | discoverable and easy to use.
        
             | dkarl wrote:
             | The intersection of debugging regexes and debugging code
             | written by someone cycling through autocomplete looking for
             | methods that sound right should not be real. It should be a
             | myth, a region of programmer hell, a scary story to tell
             | children about what will happen to them after they die if
             | they don't document their code. May Dijkstra strike down
             | anyone who succeeds in bringing this horrible idea to
             | production.
        
         | InfiniteRand wrote:
         | I think there's certain use case for this, a moderate regex
         | user who's not an expert and not fully comfortable with regexes
         | but knows the basics, and who is in a project where they need
         | to heavily use regexes for a limited amount of time and where
         | they will need to maintain this code going forward.
         | 
         | If you use regexes a lot, you are better off learning regexes,
         | if you use regexes a little, this is a lot to learn to avoid
         | learning a little about regexes. But there is a moderate user
         | sweet spot where I could see this useful.
        
         | ajuc wrote:
         | I like it as a simplistic builder. Much easier to read,
         | autocompletes, and (I assume) handles escaping for you (because
         | it knows you put only raw data inside).
         | 
         | Just escaping alone is a big selling point for me.
        
         | toxik wrote:
         | In fact, why test for www at all? It is a subset of [^ ]*
         | anyway.
        
           | [deleted]
        
         | CapacitorSet wrote:
         | It seems that the cruft really boils down to using groups even
         | where there is no ?/*/+ qualifier.
        
         | simias wrote:
         | I think it's a great idea... if you already know regex.
         | Effectively it's just a different syntax for the same construct
         | after all, it doesn't simplify anything, it just makes it more
         | readable. Oh and it makes escaping a non-issue, which already
         | almost sells me on the idea completely, since it seems that 50%
         | of the time I spend writing regex is figuring out what needs
         | escaping and how.
         | 
         | Writing regexes is not much of an issue usually (although the
         | many dialects in common use are always a source of frustration)
         | but reading them is always a pain, for me at least. For quick
         | and dirty shell scripts or vim editing it's great, for stuff
         | that's supposed to be long lived and actively maintained in a
         | codebase I think this verbal approach is a great idea, at least
         | in theory.
         | 
         | Regarding the optimization of the intermediate result it should
         | only be a problem if you actually need to output these regexes
         | for other uses or if you need to compile many of them at
         | runtime with performance constraints. If your regexes are pre-
         | compiled then the resulting DFA should look the same as far as
         | I can tell.
         | 
         | If somebody makes a Rust crate with a similar concept I'll be
         | sure to try it out next time I have to write regexes in a
         | codebase.
        
           | dehrmann wrote:
           | > I think it's a great idea... if you already know regex
           | 
           | It's actually a bad idea in this case because regex is mostly
           | the same in every modern language, so if you know it, you
           | know it everywhere. What you don't know is this.
           | 
           | I agree with the common complaint that regex is effectively
           | write-only, but this is only half due to its terse syntax. A
           | pattern can be pretty complex on its own, and complex things
           | are hard to understand. Imagine what code matching behavior
           | of a complex regex would look like.
        
             | simias wrote:
             | > It's actually a bad idea in this case because regex is
             | mostly the same in every modern language, so if you know
             | it, you know it everywhere. What you don't know is this.
             | 
             | I disagree, at least in my experience there are significant
             | differences between multiple regex engines I'm used to use
             | regularly. In no particular order: are parens and other
             | operators treated literally by default or do they need to
             | be escaped? Are character class like '[:alpha:]'
             | understood, or do I need to write them explicitly?
             | Similarly, do I have access to \w \W \s and friends? Can I
             | use + to mean {1,} ? Can I use '?' to match 0 or 1 (common)
             | or do I have to use = (vim)? Or maybe just {0,1}? But then
             | should I escape the braces? Do I have recursion? Do I have
             | named captures?
             | 
             | Those are not theoretical concerns, that's stuff I
             | routinely end up getting wrong because I forget that this
             | one feature that works in pcre does not work in vim or
             | works differently in sed etc...
        
               | dehrmann wrote:
               | > are parens and other operators treated literally by
               | default or do they need to be escaped?
               | 
               | > Can I use + to mean {1,} ? Can I use '?' to match 0 or
               | 1 (common) or do I have to use = (vim)? Or maybe just
               | {0,1}? But then should I escape the braces?
               | 
               | I think that's just older tools like vi and sed. Perl,
               | Python, Java, and Javascript use a similar modern version
               | where + and ? work, and parentheses and braces don't need
               | to be escaped.
        
             | lucb1e wrote:
             | > if you know it, you know it everywhere. What you don't
             | know is this.
             | 
             | Right, one language might have anythingBut(" ").endofline()
             | and the next language might have a different . operator
             | like anythingBut(" ")->endofline() or it might even require
             | nesting calls. None of these things are a significant
             | hurdle and if we standardize the names (endofline,
             | anythingBut, ...) then you can make the same argument. It's
             | a chicken and egg argument: just use regex because that
             | works everywhere -> it's not universally implemented -> it
             | won't work everywhere.
             | 
             | And aside from that, I have a similar experience to the
             | sibling comment: when using some command line tool that I
             | forgot (is it sed? Vim?) the default is that \\( is a
             | capture group whereas in normal regex ( is a capture group.
             | Grep offers you three regex variants to choose from. I have
             | to look up regex syntax or do trial and error every time I
             | don't use a language that I use daily. And I don't know all
             | of regex to begin with, I just know everything I ever
             | needed but people posted examples here with (?:x) which I
             | don't know. I once read it and remembered it for a few days
             | I think... so anyway, consistent and descriptive method
             | names seems a lot easier especially when you consider
             | autocompleting IDEs.
        
           | hansjorg wrote:
           | Rust version of the same library:
           | 
           | https://github.com/VerbalExpressions/RustVerbalExpressions
           | 
           | Implementations for 36 different languages:
           | 
           | http://verbalexpressions.github.io/
        
         | pwdisswordfish4 wrote:
         | Well, there's at least one advantage: apparently this builder
         | library automatically escapes literal strings passed to it, so
         | you no longer need to worry about injection bugs if you
         | construct patterns dynamically (cf. parametrised queries versus
         | 'come on, just use mysql_real_escape_string, it's not that
         | hard'),
         | 
         | I'm not sure this alone pulls its weight, though; most of the
         | time, regular expressions are fixed at compile time. And I'd
         | still prefer something that mostly preserves commonly-
         | understood pattern syntax. Having to guess whether
         | 'anythingBut' means (?!...) or [^...] is not encouraging.
         | 
         | (This was apparently ported from JavaScript, where it is even
         | more pointless: template literals can take care of the escaping
         | part without abandoning standard pattern syntax. But as far as
         | I know, Java has no equivalent feature.)
        
         | bjarneh wrote:
         | > Is it just me or does this seem like a very bad idea?
         | 
         | It's not just you. As you say this can only truly be used by
         | people you understand regular expressions; and they would most
         | likely prefer not to use this stuff.
         | 
         | It seems the whole IT industry is obsessed with helping us do
         | all sorts of things, even simple things, which in the end often
         | makes things more complex. Different query languages that
         | translate to SQL to help us out, which often create super-
         | complex SQL. All sorts of wrappers to avoid us having to deal
         | with all sorts of formats (JSON/XML..). Hopefully those
         | wrappers do something useful with those date-objects you know
         | you have in there somewhere...
        
           | marcinzm wrote:
           | >It's not just you. As you say this can only truly be used by
           | people you understand regular expressions; and they would
           | most likely prefer not to use this stuff.
           | 
           | I know regex and I hate writing it. It's unreadable and I
           | need to spend time remembering/googling/checking the exact
           | syntax. And, of course, the syntax differs from
           | implementation to implementation in subtle but important ways
           | (ie: need to double escape in python, etc.).
        
             | cutler wrote:
             | Perl and Ruby don't need to escape regex metacharacters so
             | why do Python and Java? It's just archaic.
        
           | wutbrodo wrote:
           | > It's not just you. As you say this can only truly be used
           | by people you understand regular expressions; and they would
           | most likely prefer not to use this stuff.
           | 
           | There's a niche where this might be useful, but by definition
           | it's small. I understand regexes a moderate amount, and can
           | construct arbitrarily complex ones when necessary. But I do
           | it just infrequently enough that it can be painful and
           | halting above a certain level of complexity, with lots of
           | testing and reference-checking. It'd be nice to use something
           | sane like this, and I think I fall squarely into the category
           | of "people who understand regexes but would prefer to use
           | stuff like this". Though as I said, this niche is almost by
           | definition small, and on top of that I can't remember the
           | last time I used Java.
           | 
           | Completely independently, in any non-trivial engineering
           | system, readability is important, and this helps a lot there.
        
           | pydry wrote:
           | A lot of IT is the parsing and mapping of one kind of
           | language (whether markup, DSL, Turing complete) on to
           | another.
           | 
           | Doing it right is a delicate balancing act of being just
           | powerful enough to express everything the user needs without
           | devolving into an unreadable or repetitive mess. Some people
           | manage to achieve neither.
        
           | dehrmann wrote:
           | > Different query languages that translate to SQL to help us
           | out
           | 
           | That and UI SQL builders. What I want is typeahead column
           | names, not a dropdown for the column, the operator, etc.
        
           | _jal wrote:
           | Yep, and SQL-builders are the first thing I thought of, too.
           | 
           | These tools are great for letting someone build something
           | they don't understand, and leaves them completely adrift when
           | something goes wrong.
           | 
           | The next step is they bring this nonstandard thing to "the
           | expert", who has to figure out their tool before they can
           | figure out what's going wrong...
        
             | simias wrote:
             | I don't think SQL builders are a good comparison because:
             | 
             | - SQL can already be made fairly readable by default, it's
             | not just a long series of cryptic tokens. The main point of
             | SQL builders is not to make SQL more readable, it's to make
             | SQL approachable by people who don't know SQL.
             | 
             | - There can be several ways of achieving the same result in
             | SQL, with sometimes deep performance implications, so it's
             | really important to understand what is being executed and
             | in what order. Regular languages are much simpler and while
             | the string representation of the regex might end up longer
             | than the handcrafted equivalent, the runtime performance
             | should end up being the same since in the end it's all
             | deterministic finite automatons.
             | 
             | - SQL builders have to be at least a little bit opinionated
             | to be really useful, in general they make it easy to create
             | simple queries but can quickly become limiting for complex
             | queries, especially if you already know SQL. These "verbal
             | expressions" on the other hand can easily map 1:1 with raw
             | regex constructs, allowing somebody who already knows regex
             | to express exactly the same logic, just in a more verbose
             | and human readable way.
             | 
             | This verbose syntax operates at exactly the same level of
             | abstraction as normal regex, it's just a syntactical
             | transform effectively. It's like JSON vs. CBOR or something
             | like that.
        
               | _jal wrote:
               | > There can be several ways of achieving the same result
               | in SQL, with sometimes deep performance implications
               | 
               | Which is also very true of regexes, especially the more
               | feature-rich ones variants.
               | 
               | And the existence of variants was a large part of what I
               | was getting at.
               | 
               | > it's just a syntactical transform effectively
               | 
               | Yes, it is tooling that helps people do things they don't
               | understand.
        
         | lmilcin wrote:
         | No, you are not. These "verbal" expressions are nothing more
         | than a builder for actual expression. So you can't actually use
         | it without understanding regular expressions.
        
           | disgruntledphd2 wrote:
           | They're much easier to scan in a large codebase though, which
           | I suspect is the major advantage.
        
           | jariel wrote:
           | " These "verbal" expressions are nothing more than a builder
           | for actual expression."
           | 
           | It may be under the hood, but there's no reason for it to be.
           | 
           | There's nothing inherent in our regexes that would imply they
           | are 'the language' for that purpose, it just so happens we
           | really only have one commonly used one.
           | 
           | Like most things invented forever ago, there might be
           | opportunities for a 'cleaner, better way'.
        
             | lmilcin wrote:
             | Obviously, there might be occasions to improve.
             | 
             | But, regular expressions seem quite well optimize from my
             | point of view.
             | 
             | Regular expressions are used for exact same task regardless
             | of programming language -- using single expression language
             | regardless of programming environment seems like a huge
             | advantage. It can be embedded in configuration file, as a
             | string in a database, on a web page or deep in backend
             | code, and it will still work the same.
             | 
             | The "Java Verbal Expressions" already have "Java" in the
             | name and so are complete loss when it comes to portability.
             | 
             | Then comes the fact that "Java Verbal Expressions" are many
             | times more code that actual regular expressions. That isn't
             | easier to scan, it is much worse.
             | 
             | Regular expressions are very succinct and you can express a
             | lot in a single line of it. Comparable JVE-s would require
             | many lines and wouldn't be more readable for anybody other
             | than a person that doesn't know regexes at all.
        
       | lmilcin wrote:
       | It is a huge amount of code for a relatively simple expression.
       | 
       | I see not a single situation where this would actually look more
       | readably than a proper regex.
       | 
       | Unless... you don't want to learn regular expressions and then
       | you have two problems...
        
         | [deleted]
        
       | mrkeen wrote:
       | Even though I still use regexes in rare circumstances - e.g.
       | inside config files, parser combinators already do a much better
       | job than this (or regexes) when you are writing maintainable
       | code:                   warcEntry = do             header <-
       | warcHeader             crlf             body <- do
       | contentLength <- getContentLength header
       | compressionMode <- getCompressionMode header
       | warcbody contentLength compressionMode             crlf
       | crlf             return (WarcEntry header body)
       | 
       | If you accept crlf as "carriage-return-line-feed", the rest
       | basically reads as pseudocode. crlf could have just as easily
       | been written (string "\r\n") I guess.
       | 
       | Parser combinators can:
       | 
       | * call out to other parsing functions (e.g. warcHeader) - so you
       | can build your code out of testable units.
       | 
       | * bind results to variables and start using them during the
       | parse, e.g. warcHeader returns data containing contentLength and
       | compressionMode, which is then fed to the warcbody function so it
       | knows what to expect.
        
       | pandemic_region wrote:
       | WHERE HAVE YOU BEEN ALL MY LIFE
        
       | surfsvammel wrote:
       | Unlike many others, I actually like this idea. I know regular
       | expression, but many of my colleagues do not. They often have a
       | hard time understanding what a particular regex do, event though
       | I often document them step by step. Something like this would
       | make it more readable.
       | 
       | I do agree with others here, that it seems a bit rough around the
       | edges and some optimisation might be needed. But I think the idea
       | itself is sound.
        
         | maweki wrote:
         | Maybe you should look at visualizations like Regex Railroad
         | Diagrams. This is what helps me most.
        
       | zvrba wrote:
       | I limit my brain-time on constructing a regex to 5 minutes max.
       | If it takes me longer than that, I reach for a parser. Pick the
       | right tool for the job.
        
       | maweki wrote:
       | It's pretty verbose, but it is useful in the sense that you have
       | type-safety between character groups and the control characters.
       | It's neat that it only allows you to create valid Regexes (I hope
       | it does). At least you have static safety that your parenthesis
       | for capture group are properly closed.
       | 
       | This advantage is not explained. Not being able to construct
       | invalid regular expressions is a good static safety guarantee
       | that you don't get when you embed DSLs as strings.
       | 
       | Edit: This is the same reason why we would prefer jOOQ to
       | embedded String-SQL, if speed/dependencies are of no concern.
       | You're not allowed to construct invalid SQL as the java-type-
       | system gives you these guarantees when using an embedded DSL
       | instead of a String-DSL. This is very powerful, but of course
       | only works if the type system of the host language is powerful
       | enough.
        
       | laszlokorte wrote:
       | is(4).equalTo(5.plus(eulers_constant.toThePowerOf(1.toTmaginaryUn
       | it().times(rationBetweenCircumferenceOfACircleToItsDiameter))))
        
       | stickfigure wrote:
       | This is cool, but I'm disappointed to see the horrid builder
       | pattern show up again. Imagine you had to use StringBuilder every
       | time you wanted to manipulate a String?
       | 
       | Just make all fields final and combine the builder and 'working'
       | class into a single immutable object. Like String.
       | 
       | `build()` everywhere is syntactic noise, and you either lose
       | immutable safety (by passing around builders everywhere, as in
       | the examples) or composability (by passing around the 'sealed'
       | objects). Builders are an antipattern that should only be used in
       | cases where extreme performance is required.
        
         | x87678r wrote:
         | I wish you were interviewing me. This is Java world you're
         | talking about and if you can't squeeze a dozen Gamma Design
         | Patterns into your code you aren't good enough.
        
         | noema wrote:
         | The main intent of Builder isn't performance, but to avoid a
         | combinatorial explosion of constructors for every possible set
         | of parameters.
        
           | lalaithion wrote:
           | So why have constructors for every possible set of
           | parameters?
           | 
           | Why                   VerbalExpression.regex()
           | .startOfLine().then("http").maybe("s")
           | .then("://")      .maybe("www.").anythingBut(" ")
           | .endOfLine()      .build();
           | 
           | Instead of                   new VerbalExpression()
           | .startOfLine().then("http").maybe("s")
           | .then("://")      .maybe("www.").anythingBut(" ")
           | .endOfLine();
        
             | szatkus wrote:
             | Both are equally readable to me, but with the builder
             | pattern you have an ability to fork a builder. Cloning
             | objects in Java could be messy.
        
       | a_e_k wrote:
       | Emacs has had an Emacs Lisp version of this for a long time. It's
       | implemented as a macro so it can build the string regexp at
       | compile time.
       | 
       | https://www.gnu.org/software/emacs/manual/html_node/elisp/Rx...
        
       | quickthrower2 wrote:
       | Take 3 more steps in this direction and you can shed the regex
       | entirely and have parser combinators
        
       | kleiba wrote:
       | Looking at the example, my immediate reaction was that the main
       | advantage would be the `anything_but` method, relieving me from
       | the cumbersome construction of stuff like this:
       | (?:[^t]|t(?:[^r]|r(?:[^u]|u(?:[^m]|m[^p]))))
       | 
       | What a time-saver it would be to write
       | anything_but("trump")
       | 
       | Except, then you look at the source code and see this:
       | public Builder anythingBut(final String pValue) {
       | return this.add("(?:[^" + sanitize(pValue) + "]*)");         }
       | 
       | Sad face :(
        
         | recursive wrote:
         | Why would you be constructing stuff like that? It consumes the
         | input string up until it differs. When is that useful in a
         | regex?
        
           | enricozb wrote:
           | How else would you write that you want to match all strings
           | that don't contain string X? If you were matching at a
           | specific position, you should use a negative lookahead
           | (?!xyz), but I think in some cases you might need the mess
           | above.
        
             | recursive wrote:
             | I can't imagine a case where the mess would be useful at
             | all.
             | 
             | Negative lookahead is the only way I can imagine this being
             | possibly useful.
             | 
             | I.E. "Give me a string that's not trump and has a vowel in
             | it".
             | 
             | Given "trunk", that mess above would match all of "trun".
             | Would good is matching a prefix going to do ever?
        
       | aparsons wrote:
       | The example isn't a correct URL test regex (far from correct
       | actually - even though there are plenty of edge cases regular
       | regex strings tend to miss also)
        
         | jefftk wrote:
         | Their example is showing what the library can do, not trying to
         | determine which strings are URLs.
        
           | im3w1l wrote:
           | If you put it as a showcase, then people will use it.
        
       | cfv wrote:
       | It'd be absolutely bonkers if you could use this exact same DSL
       | to _generate_ valid strings
        
         | recursive wrote:
         | Well, since you can use the regex itself to generate valid
         | strings, it's certainly possible.
        
         | slifin wrote:
         | https://github.com/lambdaisland/regal
         | 
         | Is a regex DSL that will let you do that, wouldn't be surprised
         | to see others
        
         | m12k wrote:
         | You can - use this to generate a regex, then run that regex
         | through one of these libraries:
         | https://stackoverflow.com/a/22133/126183
        
       | s4n1ty wrote:
       | Wow, pretty sure I played with something like this for Python in
       | the 90s. People have been trying to replace regexps with
       | something more readable for a _long_ time.
       | 
       | This seems like a decent attempt, although the syntax for
       | captures looks a little clumsy.
        
       ___________________________________________________________________
       (page generated 2020-11-25 23:00 UTC)