[HN Gopher] Designing a Language Without a Parser ___________________________________________________________________ Designing a Language Without a Parser Author : thunderseethe Score : 54 points Date : 2023-07-04 18:52 UTC (4 hours ago) (HTM) web link (thunderseethe.dev) (TXT) w3m dump (thunderseethe.dev) | ftomassetti wrote: | Honestly designing a parser is easy: just start using ANTLR and | perhaps add later an AST layer. However if you do not want to go | that I suggest looking in projectional editing, for example | JetBrains MPS or Freon by Jos Warmer | wpietri wrote: | Ooh, I like this. Too many people start projects at the logical | beginning. But what you really want early on in a project is to | maximize speed of exploration of the interesting parts. | | To me there's a clear analogy with startups. The naive conception | of starting a company is that you get a pile of money so you can | hire a bunch of people and create important infrastructure. But | with startups, you're trying something new, so the most efficient | use of time is to find the riskiest hypotheses and test them as | directly as possible. That often involves doing things that seem | wrong if you proceed in the "logical" way. E.g., I knew a | successful UGC company that didn't implement accounts and logins | until like 6 months in. But that was fine, because actual | accounts were not needed to figure out whether the business | worked. | porcoda wrote: | I like this. The focus on lexing/parsing for language | implementation often overshadows the fact that the bulk of the | work in any language effort is in the semantics and analysis | (e.g., everything after one has a parse tree). On the production | compiler I get paid to work on, front end work makes up at most | 10% of the team time and effort. Even on experimental language | projects that we occasionally play with, the front end is usually | given minimal attention - just enough to have a syntax that we | can use to start instantiating ASTs and doing the interesting | work. More often than not, I punt on a front end altogether and | just piggyback on some existing language (e.g., Haskell, ML, etc) | and basically explore the semantics or analysis questions I'm | interested in via a DSL-style embedding and ignore syntax | altogether. | hardwaregeek wrote: | One option you could also try is to use an existing language's | syntax. Plenty of languages have high quality parser libraries by | now, like swc/rome/acorn for JavaScript, rustc_parse for Rust, | et. Of course syntax is influenced by semantics, so you'll end up | wanting to remove or add syntax, but you could probably get | decently far before that ends up a problem. | User23 wrote: | > Of course syntax is influenced by semantics | | Only to the extent that AST structure depends on syntax. For | something like s-expressions defining new semantics never | requires new syntax since arbitrary trees suffice to | syntactically express any AST. | dfox wrote: | One of my exploratory projects while I still though that | academics career made sense involved replacing smalltalk's | stack based bytecode with incrementally transformed | S-expression trees. Naturally the first thing I went out to | do was writing S-expression reader and writer in smalltalk, | my advisor at the time told me that it is pointless busywork | and I should just use ST literal syntax, it looked somewhat | ugly with all the # there, but saved a lot of time. | | Well, 12 years later (ie. early 2023) I realized that I don't | really have any kind of cool sideproject and started | implementing the same idea in C (with the added goal of the | VM being natively multithreaded with fine-grained locking | along the lines of JikesRVM and WebKit). Well, I have stub | implementations of classes needed for the AST representation | and S-expression reader and writer... | glonq wrote: | > _I can't tell you why I keep returning to this venture when | I've failed at it so many times._ | | Sorry if this sounds stupid or obvious, but with this kind of | thing I find that it's easier to cross the finish line if you | maintain humble goals. Focus on just getting a working end-to-end | MVP. Refine and enhance it down the road; don't get stuck trying | to make version 0 an awesomely praiseworthy effort. | zarathustreal wrote: | See also: https://www.amazon.com/PROGRAM-PROOF-Samuel- | Mimram/dp/B08C97... | Jtsummers wrote: | Also available for free from the author: | https://www.lix.polytechnique.fr/Labo/Samuel.Mimram/teaching... | [pdf] | | Course page: | https://www.lix.polytechnique.fr/Labo/Samuel.Mimram/teaching... | djedr wrote: | This is one thing I designed Jevko[0][1] for. | | If you have an idea for a format or a language and would like to | quickly start hacking on the layer above the syntax, Jevko is an | option. | | It's meant to be even simpler and hackable than S-expressions. | | It gets you from a string to a tree in the least amount of steps. | | See here[2] if interested. | | Happy hacking! | | [0] https://jevko.org/ [1] | https://djedr.github.io/posts/jevko-2022-02-22.html [2] | https://gist.github.com/djedr/151241f1a9a5bc627059dd9b23fc74... | Jtsummers wrote: | With the square brackets it has a bit of a Rebol feel to it. | Was that intentional or coincidental? | djedr wrote: | I suppose a bit of both. | | I was more directly inspired by Lisps, but I do prefer the | original M-expressions and the syntactic choices that REBOL | and Red make. | | I think placing the operator before the opening bracket | better emphasizes its special significance and can reduce | nesting for constructs like `f[x][y]` (vs. `((f x) y)` in | Lisps). Square brackets somehow seem more aesthetically | pleasing to me. And there is a practical reason to prefer | them, especially if your syntax uses only one kind of | brackets -- square brackets are the easiest to type on an | average keyboard. | | So REBOL-like syntax is nicer. As were M-expressions. They | probably didn't catch on, because they were not minimal | enough, compared to S-expressions. And maybe because | S-expressions were fully implemented first. | 082349872349872 wrote: | While I agree designing syntax before you know your semantics is | unwise, please consider also that there are several actually | parserless languages extant: lisps, forths, APLs, smalltalks, | various Edinburgh languages, etc. | | (Esterel --IIRC-- is the only language of which I'm aware that | explicitly has two syntaxes, one traditionally parser based and | one that, in principle, could be parserless) | jdougan wrote: | It wasn't compiled, but the way Smalltalk-72 did ?integrated? | parsing is worth understanding. | | http://worrydream.com/EarlyHistoryOfSmalltalk/ | zokier wrote: | I don't like the term parserless in this context. its not like | you can just mmap an lisp source file and cast it to the Ast | type from the article. | | Parsing something might be trivial but its still parsing | mostlylurks wrote: | "Not parsed ahead of time" might be the better qualification. | At least in some forths you can cease parsing the file (in | the "outer interpreter") at any point and perform any kind of | computation or IO that you've previously defined, and based | on that do whatever you want with the rest of the text of the | source file (or just the next couple of tokens if you want), | including parsing it manually in some other manner than what | the outer interpreter would do by default. I haven't gone | that deep in lisp, but I hear reader macros allow something | similar, though perhaps they might be more restrictive by | requiring a transformation into valid lisp trees / values, | whereas forth allows you to just do whatever, and if that | happens to have the side effect of adding new functions to | the dictionary, so be it. | 082349872349872 wrote: | in that case, pretend I wrote "grammarless" | | (in the examples above --which missed the prologs-- I'm | pretty sure the parsing is trivial on the order of "you can | see everything that handles 'parsing' without needing to | scroll a window", and in several of those examples it'd still | be true even if your windows were only 25 lines long) | whartung wrote: | > in that case, pretend I wrote "grammarless" | | Gonna call you out on that as well. Forth, sure. Forth | is...Forth. | | But Lisp is not grammarless. Smalltalk is not either. The | Lisp reader, notably Common Lisp, is a non-trivial piece of | code. The Common Lisp lambda list, as a language construct, | is not trivial either. (Dare I mention the CL LOOP macro?) | Just because everything is "a list, symbol, or constant" | does necessarily make the parsing problem trivial. | | It SEEMS simple, but then you get into it, and you find you | fall into the weeds. | kibwen wrote: | This is one of the cases where you actually _do_ want Lisp-style | s-expressions, because they don 't need any real parsing; | functionally speaking, they are _already_ the AST. (This is why | you sometimes hear people saying that Lisp "doesn't have | syntax".) | patrec wrote: | Exactly. And if you are paren-phobic, you could also have a | look at Postscript, Forth or even Smalltalk/Self -- the last | two are about the most minimal you can get with infix operators | (but no precedence) and "keyword-arguments" (sort of), well, | unless you want to go all-in on infix, and do APL. | fiddlerwoaroof wrote: | > This is why you sometimes hear people saying that Lisp | "doesn't have syntax". | | The other reason is that a language like Common Lisp is defined | in terms of the data-structures used by the language and the | language has no unique text representation: the "default" | reader uses a slightly extended version of s-expressions, but | any data structure in the language can be evaluated and any | transform from text to data structures can be a textual syntax | for Common Lisp. | electroly wrote: | > However, once I start constructing a parser, progress slows to | a crawl | | I can't really relate here. The parser is the easiest part of a | compiler; the work only increases from there. I feel like if you | ran out of steam at the parser, you never had enough steam to | write a whole compiler. I don't think removing the parser will | take you across the finish line if you otherwise were running out | of steam. | | My advice is to write your language in vertical slices. Write the | parsing, semantic checking, and code generation for the simplest | features first and progressively add feature slices, rather than | trying to write the entire parser for a fully-baked language | before proceeding. Consider including "print" as a built-in | statement so you can print things out (and thus write tests) | before you have working expressions and function calls. | tester756 wrote: | Parser for "4-lulz language" or something that will also be | used in IDEs, so has robust error recovery, can perform partial | updates, etc? | | I feel it's just that it is possible to say that work on the | parser side is completed | | meanwhile optimizations? you can probably endlessly improve | stuff | electroly wrote: | Talking about the same thing as the article: hobby/learning | languages. If you click the links to the author's projects | you can see they have yet to finish a compiler or really even | come close. Definitely not talking about sophisticated | production-grade languages here; OP is trying to complete | their first compiler _at all_. I think writing compilers is a | really neat and useful learning project but you have to be | smart about not biting off more than you can chew. | | I would absolutely recommend not supporting incremental | compilation or error recovery in your first compiler. Just | stop everything at the first error. Save that for your second | hobby compiler, or better yet, the first commercial compiler | that you get paid to work on. | 082349872349872 wrote: | > _writing compilers is a really neat and useful learning | project but you have to be smart about not biting off more | than you can chew._ | | Writing compilers is an especially useful project when you | learn roughly how much you _can_ chew. (Algol 68 is a nice | example of a bunch of smart people, with relevant domain | expertise, biting off way too much for the machines of the | time.) | smasher164 wrote: | > I don't think removing the parser will take you across the | finish line if you otherwise were running out of steam. | | Everyone's different. The problem with parsing isn't the | difficulty, but rather the potential for endless bikeshedding. | You're having to make a ton of opinionated decisions that in | turn produce more questions about your syntax. If your | personality is like mine in that it's a bit obsessive | "completing" a phase, then parsing feels like an endless | quagmire. In comparison, AST -> type inference -> codegen feels | more structured and straightforward. | carterschonwald wrote: | I always start with the AST and just work forward and backward | from there. ___________________________________________________________________ (page generated 2023-07-04 23:00 UTC)