[HN Gopher] Incremental Parsing in Go ___________________________________________________________________ Incremental Parsing in Go Author : willdaly Score : 134 points Date : 2022-10-22 13:45 UTC (9 hours ago) (HTM) web link (dev-nonsense.com) (TXT) w3m dump (dev-nonsense.com) | diffxx wrote: | > The parsers produce a sequence to tokens, not a full syntax | tree. Writing a tokenizer is much easier than parsing full syntax | trees...Most other editors don't construct the full syntax tree. | | Syntax highlighting is nice, but fast semantic analysis is the | real holy grail. | | I have come to think that the best way to develop a new language | would be to implement a text editor in that language before | releasing the language. The editor should (ideally) have emacs | and vim key bindings, though it would surely at least have the | bindings that the language author uses. The compiler/interpreter | for the language would embedded in the editor. This would allow | for a much richer editing experience that goes beyond syntax | highlighting. Indeed, the source code would become like a living | document in the editor where they editor could display inline | information about both syntax _and_ semantics. The editor need | not be fancy. It could be written as a terminal application, kind | of like a language specific nano or vim. | | If the language/editor author is careful in how they design the | editor, all of the syntactical and semantic tooling should be | exportable into packages that can be consumed by other editors | with plugin systems/lsp like vscode or neovim. Then the rich | editing experience can be relatively easily exported to any other | text editor. Tool authors would also then be able to write static | analysis/linting/formatting/whatever tools on top of the semantic | tooling that the language supports. | | In some ways what I am describing is a more minimalist version of | what one would get out of an image/IDE based language like | smalltalk but the code representation would still remain as text | files. The editor then becomes like a REPL except rather than | being an ephemeral process, the REPL state is continually being | written to disk and is resumable at any time from any computer | capable of running the editor. | maxbond wrote: | I think it would be sufficient & more valuable to implement a | language server for your new language, which keeps you focused | on the parts related to your language rather than the struggles | of implementing an editor, and then you'll be able to drop into | VSCode, neovim, etc. | thechao wrote: | Does anyone have an _good_ intro tutorial for writing a | language server in, say, C for some simple language? I find | most of the docs a little too inscrutable to follow for just | a bit of dabbling. | Gibbon1 wrote: | I agree with the parent. The big advantage of what he's | suggesting is code objects exist as first class objects. They | don't have to creased by parsing a text file. Which means the | editor can directly manipulate them. | | If you've ever used CAD programs especially Schematic Capture | and PCB Layout programs you'll get the idea. Everything | displayed on the screen is an object in a database. | | Instead of a parser blindly parsing a text file and coming | across a struct definition, a struct is object created by the | programmer. Which means you can have hard links between the | objects that make up the program. And those can be directly | manipulated with manually or programmatically. | | The big advantage comes from maintenance and refactoring. | Change the name of a field? Happens in exactly one place in | the program. So instead of a diff with 1537 files changes you | have 'renamed struct fobar to struct foobar'. Change a | comment? Well it's just a changed comment. | maxbond wrote: | I'd like to see languages like that, but it's a massive | lift. Our systems of source control and CI/CD are built | around the assumption that languages are text files that | are somewhat line oriented. It's valuable for people to be | able to use different editors and tools, and text files are | the integration point that makes this work. | | Fallible parsers are a bit of a hack but they're a hack | that works and is widely deployed already, and I'm not | convinced the value of moving to a symbolic/binary | representation creates enough value to justify the risk | involved in the migration. That being said I'd love to see | it happen. We'd end the tabs and spaces debate once and for | all! | diffxx wrote: | If you write your own editor with language server features, | writing a language server implementation _should_ (dangerous | word) be relatively easy. By writing the editor in your | language, you prove that your language is actually useful for | a real world problem. You also have the flexibility to add | any functionality that you wish without contorting your | thinking to whatever may be imposed on you by the lsp model. | | It isn't necessarily clear to me that writing a language | server implementation is that much easier than writing an | editor. To be fair, I have personally done neither (though I | have written REPLs) so perhaps I am wildly off in my | estimation. | maxbond wrote: | I don't really disagree, but would point out that, for the | languages I hack on, they're DSLs that wouldn't be | appropriate to write an editor in. I agree that you should | be writing code in your language & solving problems with it | though, even before there's a working implementation. It | does really help hone your vision and focus on the problems | you're solving. | | I've not written a language server either, and I'm positive | it's one or two orders of magnitude more difficult than a | simple editor, but I think the time is better spent for | many projects because it gets you into the nitty gritty | mechanics of your language & sets you up for a good | developer experience. If your language specializes in | writing things like editors though, that would certainly | not be the case. | | Tangentially, my language development hot take is that YAML | is a really great platform to start with, so you can focus | on prototyping your runtime & semantics and kick the can on | parsing & syntax. Parsing takes up a lot of time and you | might not know if this language idea is worthwhile or not | yet. Additionally, YAML is pretty darn good and you may | never need to move off of it. | diffxx wrote: | > for the languages I hack on, they're DSLs that wouldn't | be appropriate to write an editor in | | Ha, I have been working on a DSL for writing DSLs, which | perhaps explains my perspective a bit. | maxbond wrote: | Sounds cool! If you have something public I'd love to see | it. Email is in my profile. | diffxx wrote: | It's not quite ready to share but hopefully soon :) | maxbond wrote: | Best of luck & happy hacking | xyzzy4747 wrote: | For max optimization, wouldn't it be better to create a Rust or C | library for parsing that Go links into? I personally don't see | the usefulness of trying to optimize Go itself too much as it's | handicapped by the runtime and garbage collection. | Thaxll wrote: | I've seen some real world example where Go was as fast or | faster than Rust for CPU / io intensive task. | | Go is a fast language even with a GC. | | https://github.com/boyter/scc/#performance | akira2501 wrote: | For maximum return on investment, wouldn't it better to focus | on something other than raw speed? I personally don't see the | usefulness of trying to make everything in Rust itself too much | as it's handicapped by it's compiler and lack of specification. | tester756 wrote: | C library for parsing? | | isn't it dangerous from security perspective? | xyzzy4747 wrote: | It's just an example. The options are really Rust (what I'd | prefer), C++, C, or perhaps something like Nim that compiles | to C. | | If you're trying to make an unoptimized parser, then use | whatever you want. | [deleted] | [deleted] | pharmakom wrote: | For some reason we insist that language parsers are implemented | in the language itself, even when the language isn't great for | parsers. | 37ef_ced3 wrote: | You're in for a big surprise. Try using the language. | | Spend some time using Go, and you will be impressed by its | performance. | | You'll wonder, "Were all those haters on Hacker News | misinformed?" | bugfix-66 wrote: | Correct, Go is fast, very close to C. | | And just like in C, if you want to avoid memory management | overhead you can use a slice of structs, integers instead of | pointers, and a freelist (if needed). For example, here is a | pointerless sparse bit vector: | | https://bugfix-66.com/7256e0772dc3b02d72abf15b171731c933fd44. | .. | | The article is storing parses in a balanced binary tree, like | a packrat memoizing parser. | | Here is the fastest balanced search tree in Go. It allocates | (and uses Go's garbage collector) but you can easily use a | slice of structs with integer index pointers and a freelist | instead: | | https://bugfix-66.com/c93e950965804eba90a34e0055985b1c42d5a1. | .. | | The above code will perform very similarly to C. | xyzzy4747 wrote: | If you're making something requiring CPU optimization as a | core feature, you might as well go with one of the fastest | languages instead of handicapping your project from Day 1. Go | is not considered one of the fastest. It's better for network | or filesystem logic that is I/O limited. | samatman wrote: | The optimization here is using incremental parsing, so that | changing parse state goes from O(n) to may-as-well-be-O(1). | It's probably linear with tree depth. | | Any language is fast enough to do this, certainly Go is. | Naive parser combinators written in slow languages can | tokenize six-figure LOC files fast enough that the user | won't notice. | jerf wrote: | This is kind of a test of how nuanced your understanding of | programming languages can be. | | Rust with a bit of effort put into optimization will be | faster than Go with a bit of effort put into optimization, | it is true. However, you need to double-check your | intuition for how big and how consequential the delta is, | because I'd guesstimate it as roughly a factor of two, | possibly a touch less. It is true that Rust does a | _crapton_ more "optimizations", but a _lot_ of those | optimizations have diminishing returns. | | A factor of 2 may still sound large, but in practice it | isn't as large as it sounds, because my qualification "a | bit of effort put into optimization" is not redundant. Go | with a bit of optimization will probably beat someone's | first draft of Rust. Go with a ton of careful optimization | will probably beat Rust with a bit of optimization. The raw | performance of the two are reasonably close, and smaller | than the improvements you can usually get with | optimization. So Rust's speed advantage, which is real, | generally only matters in cases where you're going to | optimize very heavily. | | Is this one of them? For that I can give a solid... Maybe! | There are times and places where parsing is something you | want optimized to within an inch of its life, certainly. | However... it isn't all the places, and some of your | intuitions may lead you astray if you're not careful; you | might think a heavy duty programming language would need a | great parser, but if it's going to chew on optimizations | for possibly literally 100x the time, it may matter a lot | less. | | In general, Rust is capable of going faster than Go (again, | I'd guestimate about a factor of 2 with isolate tasks where | it may be able to go event faster, but that only matters if | that's the bulk of your program), but Go is going to be | fast enough that that only matters in certain limited | places where you're willing to put some non-trivial effort | into performance in the first place. | | This is in contrast to a comparison between Go/Rust and | Python, where even casually written Go/Rust can outpace | optimized pure Python, even before we start talking about | how much better Go/Rust will be using multiple CPUs. This | is because Python is just that slow, and let's not even | talk about how slow Python can be if you don't write it to | be optimized and you start heavily using all its features | without realizing how expensive they are. From the point of | view of Python, Go and Rust have very similar performance | characteristics. (But then, of course, one must be careful | with Python because something like NumPy will blow your | socks off when it turns out to not really be Python at | all.) | | It's a rich, nuanced problem space and should not be | approached with sloganeering and "my language is better | than yours". | | My summary of Go is: It's a slow compiled language... but | it is still a compiled language, and it is faster than | pretty much everything that isn't, possible exception of | LuaJIT, and the delta between slowest compiled and fastest | compiled is only ~2-3x, which in the overall space of | programming language speed isn't actually that great. | kaba0 wrote: | Not sure if rust vs go would be the best example here. | Rust vs Java would be a better one -- go has a very | primitive GC in comparison, and java does optimize hot | loops to a higher degree, so a naive code base would be | very hard to beat in a lower level language. | richieartoul wrote: | I do a lot of "high throughput" stuff at work in both Go | and Java, and the Go stuff is usually faster by default. | | Java tends to win for really naive programs where the | author didn't bother caring about performance or | allocations at all, but if any care was put into it at | all Go usually wins in my experience. | | The trope that Go's GC is primitive in comparison to | Javas is not really accurate. You can't consider a | language's GC in isolation. | | Java's GC and JIT are extremely complex because the | language semantics are terrible for performance by | default. The "everything is an object" model made sense | when the language was designed and main memory access | times were roughly equal to a CPU cycles, but that's no | longer true by a factor 100 to 200 now. | | Go's GC makes different trade offs (low latency, | extremely high concurrency, tight integration with the | runtime and scheduler) because the language semantics are | much more sympathetic to modern hardware ("true" structs, | automatic escape analysis, etc), so it can. | kaba0 wrote: | Sure, Go can get away with more primitive GC exactly | because it has "value types", so less garbage is created. | But they are still much worse, lower latency only means | that they pause threads to get more breathing space if | they have been allocating too heavily, they are | absolutely not even close to the same league Java's low | latency ZGC does. | geodel wrote: | > they are absolutely not even close to the same league | Java's low latency ZGC does | | This is the kind of thing always offered without any | serious numbers extracted from real life or even | realistic test programs. | | So even if technically true in very narrow sense it is | more of _high performance car marketing_ with fancy | algorithm and data structure names. By the time GC are | used in end user programs with tons of libraries, | frameworks, design patterns and inefficient to implement | business rules those GCs show little difference that | fancy ads promised on _TV_. | richieartoul wrote: | It's really the usage of the word primitive that I'm | arguing with. Java's GC comes with a lot of additional | trade offs that Go's doesn't. | | For example, the fact that the Java GC is copying and | generational means that there is a LOT more overhead | introduced by write barriers. | | If you benchmark the rate at which the GCs can clean up | garbage, Java always wins, but the Java GC impairs you a | lot more in other situations that the Go one doesn't. | | It's trade offs, but the Go one makes much better trade | offs for modern hardware IMO. | kaba0 wrote: | Write barriers are a single local conditional on the fast | path, if I'm not mistaken. Also, since a JIT compiler is | in action, it may have a much wider range than every | object interaction. It's basically free on modern | computers with branch prediction. | | ZGC (the low-lat GC) does employ read barriers though | which will decrease throughput considerably, but latency | and throughput are almost universally opposites of each | other. | chrsig wrote: | Do you have benchmarks you can share? | kaba0 wrote: | Well, cross-language benchmarking is always hard, but for | purely testing the GC this is not too bad: | https://benchmarksgame- | team.pages.debian.net/benchmarksgame/... | | See how ahead Java is of any other managed language (and | it doesn't really make sense to do this benchmark with | non-GCd languages)? Though this is done with the G1 GC, | not the low-latency one - this default GC is more | throughput oriented with a max pause time target goal. | Also note how Java does use multiple times more memory, | as it "postpones running GC when it knows it still will | have enough time to collect all of it without running out | of the target goal" - this is also the reason why java is | quite ahead on "energy efficiency" reports as well. And | also, GCs work better with more heap space. | geodel wrote: | > this is also the reason why java is quite ahead on | "energy efficiency" reports as well. | | Very soon businesses would be asking for "dollar | efficiency" also. I think going by effort on Java and | their frameworks vendors to pack more instances of Java | process/pods on a VM, it is already been asked by tech | savvy customers. | | So that old _fact_ that on sever side programing | customers only care for raw throughput and not on machine | size because _RAM /CPU/disk is cheap_ is not working well | in cloud based deployments where now each of these | matter. | kaba0 wrote: | To be honest, I really don't get this microservice/cloud | hype, stackoverflow (which let's be honest will be bigger | than that 34th startup) runs on a single (very beefy | though) dedicated server machine. | | I pay like 5 dollars a month for a VM with very low | params, but even that will happily run anything. | Especially that the DB likely can't be shared the bus | factor will remain 1. | fsdjkflsjfsoij wrote: | Java requires a much more advanced GC and JIT because | Java programs tend to allocate a lot more and have | extremely bad memory layout when you're not restricting | yourself to primitives. Project Valhalla's value types | will significantly improve the situation. Relying so | heavily on the JIT also has other problems especially in | programs that have widely varying execution paths. | kaba0 wrote: | Surely, that's the incentive part for why the team spent | many many work hours improving their GC - just because | the JVM typically depends more on a good GC doesn't make | it any less useful - long running processes do make | significant use of automatic memory management. | | Also, Java's GCs are moving/compacting GCs, so while the | immediate memory representation is indeed inefficient, | again, for long running processes Java will place often | together-used objects physically close to each other, and | will defragment the heap. But Valhalla can't come soon | enough, I agree. | | > Relying so heavily on the JIT also has other problems | especially in programs that have widely varying execution | paths | | Has it? I would think that an AOT program would have a | worse time with widely varying execution paths, while a | JIT compiler is free to reoptimise based on changing | application state. | geodel wrote: | > just because the JVM typically depends more on a good | GC doesn't make it any less useful - | | I mean it feels like personal choice. Do I _praise_ the | spouse when they bring whole kitchen down while making a | dish and cleaning up quickly afterwards? Or do I take it | as "Well, you made mess so it was basic expectation from | you to clean up fast for later use" | kaba0 wrote: | I would wager that most applications have plenty of | object lifetimes that are not regular at all -- a web | server with some stateful sessions for example. So your | analog doesn't really make sense -- go can't avoid these | situations at all and will significantly perform worse in | these cases. | foldr wrote: | You'd generally expect Rust and Go to perform about the | same for CPU bound workloads. Rust has access to more | advanced codegen and optimizations via LLVM, but Go's | garbage collector will often be faster than refcounting (or | whatever manual memory management technique your Rust code | ends up using). This is especially so given that the GC | runs on a separate thread without any additional effort on | your part, making it almost 'free' in the context of | parsers (which tend to be single threaded). | | A real world example of this is esbuild. The author posted | on HN that his initial Rust implementation was actually | somewhat slower than the subsequent Go version. | super_flanker wrote: | > Go's garbage collector will often be faster than | refcounting (or whatever manual memory management | technique your Rust code ends up using) | | I'm not supporting the argument that everything should be | written in Rust (or whatever) for good performance. | However blanket statement like this is not true; micro- | benchmarks are often misleading. There are many factors | which affect the performance and they come with | tradeoffs, so you can choose what options favor you most. | At the end, objectively Rust offers more ways to optimize | your program. | dymk wrote: | This is a premature optimization, and keeping everything in | the same language has benefits like greatly simplified | tooling and building | xyzzy4747 wrote: | It's not a premature optimization - it's deciding the | maximum that the parser can be optimized in the future. | Choosing Go sets a lower ceiling. | | > Keeping everything in the same language has benefits | like greatly simplified tooling and building | | Surely there are other Go libraries that incorporate C, | C++, or Rust? Also if both parsers existed and were | equally easy to set up, and you were planning on doing a | ton of parsing, it would make sense to go with the faster | one. | dymk wrote: | It absolutely is a premature optimization. If it's fast | enough, then it's fast enough. The author hasn't | indicated that the current Go implementation is hitting a | ceiling imposed by the language yet. | | If you'd like to, you can provide some real-world | examples - or even microbenchmarks - showing that Go is | so much slower than <your choice here> that it's going to | make a difference. | | > Also if both parsers existed and were equally easy to | set up | | They're not equally easy to set up. Language interop is a | pain in the pass. | Jtsummers wrote: | Look at the current Makefile: | | https://github.com/aretext/aretext/blob/main/Makefile | | Build is literally a `go build ...` and install is `go | install`. Adding any other language to the mix would make | this a polyglot project and not be "equally easy to set | up". The other question is, do both parsers exist? In | this write-up they point to tree-sitter as a possibility | which is a JS program that produces C code. This would be | viable, but here's the author's take: | | > I considered integrating tree-sitter, an incremental | parsing library with parsers for many existing languages. | However, running JavaScript to generate parsers and | linking to a C library would have greatly complicated the | build process. Today, aretext can be built on almost any | platform using a single go install command. I've had | users install aretext on ARM laptops, FreeBSD servers, | Chromebooks, and Android phones. _To maintain | portability, I wanted a pure Go implementation._ | | So this wasn't some casual decision, but something they | at least considered long enough to describe here. | | And the parsing library itself is only around 1200 lines | total (comments, blanks, and code). The parsers for each | language add a lot more, of course, but should be roughly | equivalent given the same library and interface. I | imagine that if this project really takes off and | performance becomes a real problem they can do the | rewrite at that point. Right now, the code works, seems | to work fast enough for its author and primary users, and | it's trivial to install on any platform supported by Go. | So yes, it would have been a premature optimization to | complicate the build process, probably reduce the number | of supported platforms (or greatly increase the effort to | support the same number of platforms), just to have a | slightly faster parser. | rollcat wrote: | One of Go's primary goals has always been compilation speed. | | Go started out in C, and was later (post-1.0) incrementally | rewritten to be self-hosting. One of the authors (Ken Thompson) | is also one of the co-creators of C. I would argue these guys | know what they are doing. | kaba0 wrote: | I don't know, not implementing generics when it was pretty | obviously needed was a huge oversight, so I'm not sure. | | Also, the reason for compiler bootstrapping is more of a | "beauty thing", then practicality. It would definitely be | faster in a low-level language, but I doubt it would matter | as an end user. | fredrikholm wrote: | You aren't sure if _Ken Thompson_ knows what he 's doing? | kaba0 wrote: | As a software architect? Absolutely. Programming language | designer? Not sure, neither C or Go are good languages in | my personal opinion. | | EDIT: I meant to write that I think very highly of him as | an architect/developer. | sjansen wrote: | Experience has shown that often "worse is better". Go | does an amazing job of balancing complexity and power. I | haven't seen a "better" language that isn't either | slower, harder to become productive, or both. | | https://en.wikipedia.org/wiki/Worse_is_better | kcartlidge wrote: | > _not implementing generics when it was pretty obviously | needed was a huge oversight_ | | I _get_ the desire for generics. I do a lot of C# and have | used generics for a very many years. Yet I 've been writing | Go for around 6 or 7 years and other than in the beginning | (when I was new to it) I haven't found myself missing them | at all. | | In other words, for many people the lack of generics comes | across as an oversight. For others, including myself | (again, a heavy generics user in C#) that really isn't the | case. I write Go in the style of Go and it just hasn't been | an issue. | | Blanket statements are rarely true. YMMV. | kaba0 wrote: | Well, it is not a blanket statement, it's just the | generic truth (pun not intended) based on decades of | evolution of programming languages and a relatively | expensive mistake for Java, which would have been a | perfect opportunity to learn from. | | Sure, it is seldom missed as an end user, but as a | library user it is essential. That's why map and the like | had to be hard coded into the language, and why | concurrent versions couldn't be implemented for a long | time in the language, the same way it was done for Java | forever. | kcartlidge wrote: | > _Well, it is not a blanket statement, it's just the | generic truth_ | | A bit contradictory, really, but that's just semantics I | suppose. | | More importantly even if you could say 99% of devs agree | (and you can't because they don't) that still doesn't | make it an _oversight_. | | If they'd neglected to add generics because it _wasn 't | considered_, that's an oversight. If it was neglected | because in the opinion of the creators of the language it | _wasn 't needed for the purposes they created it for_, | that isn't an oversight but a thought-through engineering | decision. | | Of course you're free to disagree with that decision, but | an oversight it was not. | kaba0 wrote: | Fair enough, I may not have used the correct word, but it | is still a typical "told you" situation, both during | development, after go's initial appearance and ever since | until it finally was decided that it should be indeed | implemented. | kcartlidge wrote: | True. | | Whilst I haven't missed it in Go myself, enough other | people say they _do_ that its inclusion was inevitable. | Which means it probably should have gone in sooner. | chrsig wrote: | There's a big misconception that the creators of go | didn't _want_ generics. They 've stated a number of times | that they didn't have a design that they all thought was | adequate. | | After several years and attempts at a good enough | proposal, Ian Lance Taylor put one out that was able to | cross the finish line, and now we have generics. | kcartlidge wrote: | > _They 've stated a number of times that they didn't | have a design that they all thought was adequate_ | | You know what, with all the 'discussions' in recent years | about _whether_ Go should have generics I 'd actually | lost track of that amongst all the noise. Which is | irritating, as I do now remember the early conversations | about it. | | Thank you for the reminder. | sesm wrote: | This article uses the word 'rune' extensively. From the context I | assume it means 'lexem' or 'token' (i.e. the unit the | lexer/scanner produces and feeds to parser). But then the article | uses the word 'token' to mean the output of a parser ('keyword | token'), while the usual terminology is that parser output is | called a 'parse tree'. | | So, in this terminology, the parser consumes 'runes' and outputs | 'tokens', while the usual terminology is that parser consumes | 'tokens'/'lexems' and outputs 'parse tree'. | pgwhalen wrote: | Rune is a type alias in go, which more or less maps to the more | common words "character" or "code point". | | https://go.dev/blog/strings | sesm wrote: | Thanks! I don't know Go, so this terminology was surprising | to me. | sjansen wrote: | In Go, `rune` is an alias for `int32` and is used to indicate | the value is a Unicode "code point". | | For characters in the ASCII range, that means it's just a | character encoded using more bits. If you need to worry about | the full Unicode range then it's important to understand | Unicode Normalization Forms. | | https://go.dev/blog/strings | | https://en.wikipedia.org/wiki/Unicode_equivalence | tester756 wrote: | Difference in complexity of IDE's parser and Compiler's parser | feels like order of magnitude | | IDE's you want to be very fast, so you use techniques like | partial tree reparse and now when I think about it, then you also | may need to update other places | | like you use type that is defined somewhere below | | and at first parse that type definition doesn't compile, so type | usage above shows an error | | and when you change type definition so it compiles, then if you | only update that part of the tree, then previous would still | scream about the error | | it's really tricky | | the theory behind how to deal with all of this problems seems to | be easy | | but when you actually get to the coding, then you have to be | really thoughtful, careful and experienced in order to get the | modeling of right | | https://learn.microsoft.com/en-us/shows/seth-juarez/anders-h... | | ________ | | I have some experience with simpler and more complex parsers | | and for me no other type of software requires this much careful | thought as parsers do if you want to address all those things | like | | correctness, speed and recovery on broken code fragments, good | error messages, good code, maybe concurrency | binwiederhier wrote: | Interesting read. Thank you for sharing. I always found parsers | fascinating and mystical. It seems like these parser functions | (which i think are analogous to what Rob Pike calls state | functions) are a common way to do parsing, though i know very | little about it. I especially found the combinators intriguing, | though I don't care much for the functional programming syntax in | a language like Go. | | Anyway, thanks for sharing. | | Tangentially, I wrote a little mini parser [0] of my own for my | side project. It is inspired by Rob Pike's talk on parsers [1]. | It doesn't use state functions, but instead just uses the call | stack to keep track of where we are. | | [0] | https://github.com/binwiederhier/ntfy/blob/main/server/actio... | | [1] https://www.youtube.com/watch?v=HxaD_trXwRE and | https://go.dev/src/text/template/parse/lex.go | skohan wrote: | Yeah parsers are fun! We did a recursive descent parser for a | toy language in uni and I think it was one of the most | illuminating and fun projects we did at school. | | Lately I've been working on a tool to make it easy to implement | a parser, and I end up using it for everything, because DSL's | are so nice to work with. | [deleted] | throwaway290 wrote: | Off-topic but are there any aretext users? How does it fare? | pharmakom wrote: | > a successful parse consumes at least one rune | | This avoids infinite loops? | Jtsummers wrote: | Yes, there would be two outcomes for an attempted parsing. | Either it succeeds and makes progress (and eventually | terminates) or it fails and consumes nothing (and terminates | because eventually you run out of parsers to try). | hk__2 wrote: | If you want a hard/interesting parsing challenge, try Clojure's | #_ reader macro. It's a powerful construct that allows you to | comment the next form. If you're not used to Clojure, it's like | writing #_ in front of anything --a function, an array, a | keyword, etc-- to comment it, even if it's on multiple lines. | | For example: #_ (defn foo [x y] | (println x y)) | | This is equivalent to commenting the three lines. Things become | even harder when you learn that these thing "swallow" the next | form and can be used anywhere: (let [a #_ b 43] | #_ #_ hello (H N) (print a)) | | The code above is equivalent to the following: | (let [a 43] (print a)) | | All the rest is comments. | | The hard/interesting bit is that to tokenize you must construct a | syntax tree in order to correctly parse the next form, but in | order to construct a syntax tree you first need to tokenize the | code. | derek8bai wrote: | cool stuff ___________________________________________________________________ (page generated 2022-10-22 23:00 UTC)