[HN Gopher] Designing a Language Without a Parser
       ___________________________________________________________________
        
       Designing a Language Without a Parser
        
       Author : thunderseethe
       Score  : 54 points
       Date   : 2023-07-04 18:52 UTC (4 hours ago)
        
 (HTM) web link (thunderseethe.dev)
 (TXT) w3m dump (thunderseethe.dev)
        
       | ftomassetti wrote:
       | Honestly designing a parser is easy: just start using ANTLR and
       | perhaps add later an AST layer. However if you do not want to go
       | that I suggest looking in projectional editing, for example
       | JetBrains MPS or Freon by Jos Warmer
        
       | wpietri wrote:
       | Ooh, I like this. Too many people start projects at the logical
       | beginning. But what you really want early on in a project is to
       | maximize speed of exploration of the interesting parts.
       | 
       | To me there's a clear analogy with startups. The naive conception
       | of starting a company is that you get a pile of money so you can
       | hire a bunch of people and create important infrastructure. But
       | with startups, you're trying something new, so the most efficient
       | use of time is to find the riskiest hypotheses and test them as
       | directly as possible. That often involves doing things that seem
       | wrong if you proceed in the "logical" way. E.g., I knew a
       | successful UGC company that didn't implement accounts and logins
       | until like 6 months in. But that was fine, because actual
       | accounts were not needed to figure out whether the business
       | worked.
        
       | porcoda wrote:
       | I like this. The focus on lexing/parsing for language
       | implementation often overshadows the fact that the bulk of the
       | work in any language effort is in the semantics and analysis
       | (e.g., everything after one has a parse tree). On the production
       | compiler I get paid to work on, front end work makes up at most
       | 10% of the team time and effort. Even on experimental language
       | projects that we occasionally play with, the front end is usually
       | given minimal attention - just enough to have a syntax that we
       | can use to start instantiating ASTs and doing the interesting
       | work. More often than not, I punt on a front end altogether and
       | just piggyback on some existing language (e.g., Haskell, ML, etc)
       | and basically explore the semantics or analysis questions I'm
       | interested in via a DSL-style embedding and ignore syntax
       | altogether.
        
       | hardwaregeek wrote:
       | One option you could also try is to use an existing language's
       | syntax. Plenty of languages have high quality parser libraries by
       | now, like swc/rome/acorn for JavaScript, rustc_parse for Rust,
       | et. Of course syntax is influenced by semantics, so you'll end up
       | wanting to remove or add syntax, but you could probably get
       | decently far before that ends up a problem.
        
         | User23 wrote:
         | > Of course syntax is influenced by semantics
         | 
         | Only to the extent that AST structure depends on syntax. For
         | something like s-expressions defining new semantics never
         | requires new syntax since arbitrary trees suffice to
         | syntactically express any AST.
        
           | dfox wrote:
           | One of my exploratory projects while I still though that
           | academics career made sense involved replacing smalltalk's
           | stack based bytecode with incrementally transformed
           | S-expression trees. Naturally the first thing I went out to
           | do was writing S-expression reader and writer in smalltalk,
           | my advisor at the time told me that it is pointless busywork
           | and I should just use ST literal syntax, it looked somewhat
           | ugly with all the # there, but saved a lot of time.
           | 
           | Well, 12 years later (ie. early 2023) I realized that I don't
           | really have any kind of cool sideproject and started
           | implementing the same idea in C (with the added goal of the
           | VM being natively multithreaded with fine-grained locking
           | along the lines of JikesRVM and WebKit). Well, I have stub
           | implementations of classes needed for the AST representation
           | and S-expression reader and writer...
        
       | glonq wrote:
       | > _I can't tell you why I keep returning to this venture when
       | I've failed at it so many times._
       | 
       | Sorry if this sounds stupid or obvious, but with this kind of
       | thing I find that it's easier to cross the finish line if you
       | maintain humble goals. Focus on just getting a working end-to-end
       | MVP. Refine and enhance it down the road; don't get stuck trying
       | to make version 0 an awesomely praiseworthy effort.
        
       | zarathustreal wrote:
       | See also: https://www.amazon.com/PROGRAM-PROOF-Samuel-
       | Mimram/dp/B08C97...
        
         | Jtsummers wrote:
         | Also available for free from the author:
         | https://www.lix.polytechnique.fr/Labo/Samuel.Mimram/teaching...
         | [pdf]
         | 
         | Course page:
         | https://www.lix.polytechnique.fr/Labo/Samuel.Mimram/teaching...
        
       | djedr wrote:
       | This is one thing I designed Jevko[0][1] for.
       | 
       | If you have an idea for a format or a language and would like to
       | quickly start hacking on the layer above the syntax, Jevko is an
       | option.
       | 
       | It's meant to be even simpler and hackable than S-expressions.
       | 
       | It gets you from a string to a tree in the least amount of steps.
       | 
       | See here[2] if interested.
       | 
       | Happy hacking!
       | 
       | [0] https://jevko.org/ [1]
       | https://djedr.github.io/posts/jevko-2022-02-22.html [2]
       | https://gist.github.com/djedr/151241f1a9a5bc627059dd9b23fc74...
        
         | Jtsummers wrote:
         | With the square brackets it has a bit of a Rebol feel to it.
         | Was that intentional or coincidental?
        
           | djedr wrote:
           | I suppose a bit of both.
           | 
           | I was more directly inspired by Lisps, but I do prefer the
           | original M-expressions and the syntactic choices that REBOL
           | and Red make.
           | 
           | I think placing the operator before the opening bracket
           | better emphasizes its special significance and can reduce
           | nesting for constructs like `f[x][y]` (vs. `((f x) y)` in
           | Lisps). Square brackets somehow seem more aesthetically
           | pleasing to me. And there is a practical reason to prefer
           | them, especially if your syntax uses only one kind of
           | brackets -- square brackets are the easiest to type on an
           | average keyboard.
           | 
           | So REBOL-like syntax is nicer. As were M-expressions. They
           | probably didn't catch on, because they were not minimal
           | enough, compared to S-expressions. And maybe because
           | S-expressions were fully implemented first.
        
       | 082349872349872 wrote:
       | While I agree designing syntax before you know your semantics is
       | unwise, please consider also that there are several actually
       | parserless languages extant: lisps, forths, APLs, smalltalks,
       | various Edinburgh languages, etc.
       | 
       | (Esterel --IIRC-- is the only language of which I'm aware that
       | explicitly has two syntaxes, one traditionally parser based and
       | one that, in principle, could be parserless)
        
         | jdougan wrote:
         | It wasn't compiled, but the way Smalltalk-72 did ?integrated?
         | parsing is worth understanding.
         | 
         | http://worrydream.com/EarlyHistoryOfSmalltalk/
        
         | zokier wrote:
         | I don't like the term parserless in this context. its not like
         | you can just mmap an lisp source file and cast it to the Ast
         | type from the article.
         | 
         | Parsing something might be trivial but its still parsing
        
           | mostlylurks wrote:
           | "Not parsed ahead of time" might be the better qualification.
           | At least in some forths you can cease parsing the file (in
           | the "outer interpreter") at any point and perform any kind of
           | computation or IO that you've previously defined, and based
           | on that do whatever you want with the rest of the text of the
           | source file (or just the next couple of tokens if you want),
           | including parsing it manually in some other manner than what
           | the outer interpreter would do by default. I haven't gone
           | that deep in lisp, but I hear reader macros allow something
           | similar, though perhaps they might be more restrictive by
           | requiring a transformation into valid lisp trees / values,
           | whereas forth allows you to just do whatever, and if that
           | happens to have the side effect of adding new functions to
           | the dictionary, so be it.
        
           | 082349872349872 wrote:
           | in that case, pretend I wrote "grammarless"
           | 
           | (in the examples above --which missed the prologs-- I'm
           | pretty sure the parsing is trivial on the order of "you can
           | see everything that handles 'parsing' without needing to
           | scroll a window", and in several of those examples it'd still
           | be true even if your windows were only 25 lines long)
        
             | whartung wrote:
             | > in that case, pretend I wrote "grammarless"
             | 
             | Gonna call you out on that as well. Forth, sure. Forth
             | is...Forth.
             | 
             | But Lisp is not grammarless. Smalltalk is not either. The
             | Lisp reader, notably Common Lisp, is a non-trivial piece of
             | code. The Common Lisp lambda list, as a language construct,
             | is not trivial either. (Dare I mention the CL LOOP macro?)
             | Just because everything is "a list, symbol, or constant"
             | does necessarily make the parsing problem trivial.
             | 
             | It SEEMS simple, but then you get into it, and you find you
             | fall into the weeds.
        
       | kibwen wrote:
       | This is one of the cases where you actually _do_ want Lisp-style
       | s-expressions, because they don 't need any real parsing;
       | functionally speaking, they are _already_ the AST. (This is why
       | you sometimes hear people saying that Lisp  "doesn't have
       | syntax".)
        
         | patrec wrote:
         | Exactly. And if you are paren-phobic, you could also have a
         | look at Postscript, Forth or even Smalltalk/Self -- the last
         | two are about the most minimal you can get with infix operators
         | (but no precedence) and "keyword-arguments" (sort of), well,
         | unless you want to go all-in on infix, and do APL.
        
         | fiddlerwoaroof wrote:
         | > This is why you sometimes hear people saying that Lisp
         | "doesn't have syntax".
         | 
         | The other reason is that a language like Common Lisp is defined
         | in terms of the data-structures used by the language and the
         | language has no unique text representation: the "default"
         | reader uses a slightly extended version of s-expressions, but
         | any data structure in the language can be evaluated and any
         | transform from text to data structures can be a textual syntax
         | for Common Lisp.
        
       | electroly wrote:
       | > However, once I start constructing a parser, progress slows to
       | a crawl
       | 
       | I can't really relate here. The parser is the easiest part of a
       | compiler; the work only increases from there. I feel like if you
       | ran out of steam at the parser, you never had enough steam to
       | write a whole compiler. I don't think removing the parser will
       | take you across the finish line if you otherwise were running out
       | of steam.
       | 
       | My advice is to write your language in vertical slices. Write the
       | parsing, semantic checking, and code generation for the simplest
       | features first and progressively add feature slices, rather than
       | trying to write the entire parser for a fully-baked language
       | before proceeding. Consider including "print" as a built-in
       | statement so you can print things out (and thus write tests)
       | before you have working expressions and function calls.
        
         | tester756 wrote:
         | Parser for "4-lulz language" or something that will also be
         | used in IDEs, so has robust error recovery, can perform partial
         | updates, etc?
         | 
         | I feel it's just that it is possible to say that work on the
         | parser side is completed
         | 
         | meanwhile optimizations? you can probably endlessly improve
         | stuff
        
           | electroly wrote:
           | Talking about the same thing as the article: hobby/learning
           | languages. If you click the links to the author's projects
           | you can see they have yet to finish a compiler or really even
           | come close. Definitely not talking about sophisticated
           | production-grade languages here; OP is trying to complete
           | their first compiler _at all_. I think writing compilers is a
           | really neat and useful learning project but you have to be
           | smart about not biting off more than you can chew.
           | 
           | I would absolutely recommend not supporting incremental
           | compilation or error recovery in your first compiler. Just
           | stop everything at the first error. Save that for your second
           | hobby compiler, or better yet, the first commercial compiler
           | that you get paid to work on.
        
             | 082349872349872 wrote:
             | > _writing compilers is a really neat and useful learning
             | project but you have to be smart about not biting off more
             | than you can chew._
             | 
             | Writing compilers is an especially useful project when you
             | learn roughly how much you _can_ chew. (Algol 68 is a nice
             | example of a bunch of smart people, with relevant domain
             | expertise, biting off way too much for the machines of the
             | time.)
        
         | smasher164 wrote:
         | > I don't think removing the parser will take you across the
         | finish line if you otherwise were running out of steam.
         | 
         | Everyone's different. The problem with parsing isn't the
         | difficulty, but rather the potential for endless bikeshedding.
         | You're having to make a ton of opinionated decisions that in
         | turn produce more questions about your syntax. If your
         | personality is like mine in that it's a bit obsessive
         | "completing" a phase, then parsing feels like an endless
         | quagmire. In comparison, AST -> type inference -> codegen feels
         | more structured and straightforward.
        
       | carterschonwald wrote:
       | I always start with the AST and just work forward and backward
       | from there.
        
       ___________________________________________________________________
       (page generated 2023-07-04 23:00 UTC)