[HN Gopher] Just Write the Parser
       ___________________________________________________________________
        
       Just Write the Parser
        
       Author : matt_d
       Score  : 106 points
       Date   : 2020-10-20 14:44 UTC (1 days ago)
        
 (HTM) web link (tiarkrompf.github.io)
 (TXT) w3m dump (tiarkrompf.github.io)
        
       | [deleted]
        
       | wrs wrote:
       | One reason to use a parser generator for a _new_ language is that
       | it will tell you that your language is inherently ambiguous or
       | otherwise broken in a way you didn't realize. A hand-coded parser
       | tends to just codify your bad assumptions.
       | 
       | It can actually be a good idea to maintain a YACC (or other)
       | grammar for your language just to run the tool as a kind of
       | "grammar linter", even if you're going to write the parser by
       | hand.
        
       | epistasis wrote:
       | There's been a ton of code that I've written that in retrospect,
       | I should never have written.
       | 
       | But I've never regretted when I wrote a parser. Maybe because it
       | takes such a large activation energy to get over the hump and
       | actually do it, that I only do it when absolutely necessary. But
       | it always seems easier and more useful than I thought before
       | doing it.
       | 
       | Now that this post has prompted that realization, I wonder if it
       | will stay true...
        
       | jtolmar wrote:
       | I'd highly recommend looking at a few toy parser projects, like
       | this and Crafting Interpreters (which are both recursive decent,
       | with very different styles), writing however much of a typical
       | parser you think makes sense "for fun," and then saving that
       | somewhere you can get at it.
       | 
       | Writing a parser from scratch is kind of an obnoxious hurdle, but
       | if you have some code you're already familiar with that you can
       | copy-paste in and modify to your purposes, that's a lot more
       | accessible.
       | 
       | (Also hats off to this author for making a parser so damn terse.)
        
       | Const-me wrote:
       | Did something similar a few times when I needed that.
       | 
       | Here's an open source example: https://github.com/Const-
       | me/vis_avs_dx/tree/master/avs_dx/Dx... It's mostly in C++, just a
       | few offline pieces in C# in T4 templates (they produce C++ code).
        
       | tiarkrompf wrote:
       | Author here - happy to answer questions, as always.
       | 
       | Thanks also for feedback on the format of the article. It's a bit
       | of an experiment on how to present dense information effectively.
       | Some more rationale here:
       | https://tiarkrompf.github.io/notes/?/octopus-notes/
        
         | musicale wrote:
         | What's the best way to handle unary "-" and other unary
         | operators?
         | 
         | I'd like to be able to write expressions like "-2^-4" or "a cos
         | b + a sin b".
        
         | LeonB wrote:
         | Octopus notes is an interesting idea. Thanks for the link on
         | it.
         | 
         | And I do like to write a lot of little parsers. I worked with
         | someone once who would somehow transform every business problem
         | into "we need to write a compiler". Sounds crazy but he
         | achieved brilliant results and has gone on to great things.
        
       | tofflos wrote:
       | > Enter an expression on the left and see the parse tree change!
       | 
       | I don't get it. The left input field is uneditable in Firefox,
       | Edge and Chrome. Navigation is broken across the board. Even
       | copying and pasting the quote above somehow included most of the
       | page even though I only selected that one sentence.
       | 
       | This looks very cool and I would love to try it out! :(
        
         | blcArmadillo wrote:
         | Yeah I think something is broken because at least in chrome
         | when you try to type in the left box the following exception
         | keeps being thrown in the console                 VM463:82
         | Uncaught TypeError: Cannot read property 'startContainer' of
         | undefined           at HTMLPreElement.eval (eval at run (eval
         | at runScriptElement (octopus-2.js:525)), <anonymous>:82:40)
        
           | tiarkrompf wrote:
           | I just pushed a fix for Chrome (apparently it doesn't report
           | affected selection ranges for input events). Hope it works
           | now!
        
             | tofflos wrote:
             | Thanks! It works for me now in Edge (Chromium).
        
             | LeonB wrote:
             | it works for me now on Chrome.
        
       | prezjordan wrote:
       | Aside (sorry!) but https://tiarkrompf.github.io/notes is really
       | cool and it's fun to see this format
        
         | jraph wrote:
         | Wow, thank you very much for this link!
         | 
         | This is very inspiring.
         | 
         | First, there is an impressive overlap with what I like and what
         | I have been doing. The notes on the piano, the parser, on
         | GraphViz and on React all hit very close if not exactly things
         | I've thought about or programmed.
         | 
         | Second, the means of presenting things is awesome too. I get
         | lost very easily so there are probably ways to improve but this
         | is very clever and interesting. The creativity and pedagogy
         | here is incredible at all levels.
         | 
         | I am very impressed.
         | 
         | I need to keep an eye on this. I wish there was an RSS feed.
         | I'll contact him.
        
           | tiarkrompf wrote:
           | Thanks - very glad to hear. RSS should be doable
        
         | mamcx wrote:
         | I like the bit about CPS, but wish it show how do the last part
         | without having the async machinery in the host language.
         | 
         | What I have a hard time to get is how implement fully delimited
         | continuations. All the examples I know reuse the machinery of
         | the host so is hard to translate (in my case, to Rust)
        
         | brundolf wrote:
         | The only annoying thing is that the scroll bar is in the middle
         | of my screen. I wish they'd max-widthed the text while leaving
         | the scrollable element full-width
        
           | ofrzeta wrote:
           | The other annoying thing is the manipulation of the browser
           | history :) In my opinition it should keet its hands off it.
        
         | awinter-py wrote:
         | those scroll-matched curly braces are something
        
       | eyelidlessness wrote:
       | Considering that something as simple and limited as JSON has been
       | a source of security vulnerabilities from ambiguities in the
       | spec, thanks but nope. Declarative definitions of a grammar help
       | identify those ambiguities... and if the grammar is defined
       | separately from its interpretation, it opens the door to
       | invisible ambiguities that can become another vector for
       | vulnerabilities.
       | 
       | Unless you have provable implementations (great if you do!),
       | declarative grammars meaningfully limit points of failure.
        
       | ben509 wrote:
       | > Why simpler is better and why you don't need a parser
       | generator.
       | 
       | As far as I can see, this isn't fully answered, unless the claim
       | is strictly limited to the question of need.
       | 
       | In my case, I certainly _want_ a parser generator. I 'm working
       | on a language, and I did a very early version using a hand-rolled
       | recursive descent parser.
       | 
       | Then I realized I wanted a syntax that was human friendly, so I
       | graduated to megaparsec. That worked, it's an elegant way to
       | describe a parser, but as a language gets complex you have to
       | work in implicit orderings (magic, frankly) to handle the limits
       | of recursive descent.
       | 
       | Finally, I realized I needed a spec and an implementation in sync
       | with that spec, so I went with BNFC[2], and it even generates
       | some pretty documentation[1] for me.
       | 
       | This is the classic debate over using a domain specific language
       | vs. a general purpose language, so there's no hard answer one way
       | or the other. For me, the deciding factor was I didn't want to
       | write, test, etc. my own parser, and I think there being a
       | canonical way for others to parse my language (in other languages
       | as well) is helpful.
       | 
       | [1]: https://tenet-lang.org/spec-syn.html
       | 
       | [2]: https://bnfc.digitalgrammars.com/
        
         | hardwaregeek wrote:
         | How do you handle good error messages or error recovery? I've
         | played around with parser generators but I never figured out
         | how to do either in a satisfactory fashion. Granted, my hand
         | written parser doesn't do error recovery very well either.
        
         | dfabulich wrote:
         | Right below the line you quote, the author makes three
         | arguments:
         | 
         | > * It's highly instructive, in a way that using a parser
         | generator is not. To quote Feynman: "What I cannot create, I do
         | not understand"
         | 
         | > * It's an important skill: most real-world compilers use
         | hand-written parsers because they provide more control over
         | error handling, significant whitespace, etc.
         | 
         | > * It's not actually difficult!
         | 
         | The exercises themselves are meant to justify these arguments.
         | 
         | I think the exercises conclusively demonstrate that #1 and #3
         | are true. If you can easily follow these exercises, and if, by
         | the end, you feel like you've learned something useful, then it
         | follows that hand-rolling a parser is instructive and not too
         | difficult.
         | 
         | I strongly agree with the author that if you're developing a
         | language for the first time, you shouldn't use a parser
         | generator at first. You should hand-roll a parser first, and
         | then, when you see the limitations of your hand-rolled parser,
         | adopt a parser generator, now with full understanding of what
         | the generator is doing for you (and not doing for you).
         | 
         | As for #2, the article demonstrates how to do error handling in
         | a manner that would be a head-scratcher in Yacc.
         | 
         | You claim here that it was easier to develop a "human friendly"
         | syntax with megaparsec than it was to do that in a hand-rolled
         | recursive-descent parser. That could be true, but that's not my
         | experience. To be convincing, you'd need to do what the author
         | did, (even though you say the author didn't): you'd need to
         | justify your argument with specific examples.
        
           | avmich wrote:
           | > > * It's highly instructive, in a way that using a parser
           | generator is not.
           | 
           | Top-down parsers are nice, but even better would be not to
           | have to write them each time you need to parse something.
           | Also top-down parsers don't handle (well) some of useful
           | grammar constructs, and it's tedious to remember always
           | formulate grammars in a certain way. Next, grammars are for
           | many people more convenient to work with than actual parser
           | code. So parser generators have their place.
           | 
           | > To quote Feynman: "What I cannot create, I do not
           | understand"
           | 
           | And this can be true for parser generators. After you've
           | figured how to do that you could be tempted to never use
           | external tools, and have a nice jump from, say, arbitrary CFG
           | to a generalized LR, while controlling all the parts in
           | between.
           | 
           | > * It's not actually difficult!
        
         | mannykannot wrote:
         | Once, I wanted a parser, and being the sort of person who
         | thinks there is value in seeing what other people have done
         | before, I took a look at parser generators.
         | 
         | The learning curve seemed long and steep, much more than seemed
         | necessary for my modest requirements, so I rolled my own.
         | 
         | I got a parser out of the exercise, but also an appreciation
         | for why the tools exist, and a new-found motivation for
         | learning how to use them.
        
       | neurotrace wrote:
       | I don't want to be that guy but I originally thought the CSS was
       | broken. Harsh black on harsh white, everything squeezed in to the
       | left third of my screen, and an unused scrollbar on the bottom.
       | 
       | That being said, I love the information. I don't care for the
       | workflow that comes with the most popular parser generators and I
       | inevitably want more control over my parser as I start adding
       | error handling, etc. It's easier to just write it by hand in the
       | first place
        
         | millstone wrote:
         | I think the design is great. Simple and easy to read.
        
           | neurotrace wrote:
           | Different strokes for different folks, I suppose. I found it
           | to be confusing and painful
        
           | Supermancho wrote:
           | I'm not quite sure why they went with the borked "half the
           | screen is empty" look. Ok, it's easier to read, but that
           | negative space is bad design, regardless.
        
         | contravariant wrote:
         | There are unused scrollbars everywhere. It's quite an
         | interesting example of scrollbar blindness [1].
         | 
         | [1]:https://news.ycombinator.com/item?id=24293421
        
       | cwp wrote:
       | If you're writing a compiler, sure, a hand-written parser is
       | probably better than a generated parser.
       | 
       | However, I'd also make the related claim that we should be much
       | more ready to write formal parsers for other use cases. Far too
       | often developers implement a "pile of regular expressions" where
       | a parser--even a generated parser-would be a better choice.
        
       | samatman wrote:
       | An alternate title: "Just Shotgun Parse Your Inputs".
       | 
       | This isn't bad advice _across the board_ , but it is bad advice
       | for many applications.
       | 
       | This is how you end up with context-sensitive formats, or ones
       | where the semantics of a particular particle require running the
       | parser, or worse. It exposes a rich surface area for exploitation
       | with adversarial inputs.
       | 
       | That said, the advantages cited at the front of the article are
       | all real: error recovery and decent error messages are unsolved
       | problems with existing parser generators, although ANTLR is
       | pretty good at it.
       | 
       | But, especially if you're developing the data format rather than
       | just implementing it, I strongly suggest using something like
       | Instaparse which can generate a parser directly from a grammar.
       | Once this is solidified, it's reasonable to hand-roll something,
       | but even there, I suggest using a nice parser combinator library,
       | like hammer or Nom.
       | 
       | This provides more discipline: as long as you're coloring inside
       | the lines, you'll end up with a parser which actually implements
       | your spec. When you start adding functions to provide context,
       | recover from errors, and provide meaningful feedback in the form
       | of error messages for unexpected inputs, you'll know what you're
       | doing: a classic recursive-descent parser can't provide the same
       | conceptual separation between the parsing logic and the logic to
       | support ergonomics or one-pass compiling or whatever add-ons
       | you're putting in there.
        
         | specialist wrote:
         | _" where the semantics of a particular particle require running
         | the parser, or worse"_
         | 
         | Yup.
         | 
         | Consider the much maligned (apache) httpd.conf and makefile.
         | Their code was the source of truth for correctness. No matter
         | horrible terrible their syntax, they would have been sufferable
         | if they had grammars.
         | 
         | Bespoke parsing of expressions benefits from having an implicit
         | grammar, like this OC. But threshold for needing explicit
         | grammars is pretty low.
         | 
         | Prime examples are all the data exchange formats. Descriptions
         | of JSON and CSV are short and naively reasonable, but
         | insufficient. As we've seen, any ambiguity that can happen will
         | happen.
        
       ___________________________________________________________________
       (page generated 2020-10-21 23:00 UTC)