[HN Gopher] Hacking the Go compiler to add a new keyword ___________________________________________________________________ Hacking the Go compiler to add a new keyword Author : todsacerdoti Score : 100 points Date : 2021-12-08 17:01 UTC (5 hours ago) (HTM) web link (avi.im) (TXT) w3m dump (avi.im) | not-my-account wrote: | A great video with a similar topic is George Hotz adding "fore" | loops to clang, which runs the body 4 times per loop. | | https://m.youtube.com/watch?v=ee1bXLDN60U | eatonphil wrote: | On the topic, there was another good post recently on hacking in | a new operator to the Go compiler. Yes they rebuild the ^ | operator but it's still very illustrative of hacking on the Go | project! | | https://medium.com/trendyol-tech/contributing-the-go-compile... | Maksadbek wrote: | There is a similar article here | https://eli.thegreenplace.net/2019/go-compiler-internals-add..., | where the author adds the `until` keyword to Go compiler. | avinassh wrote: | hey! author of the submitted post here. I did refer to Eli's | articles and they were incredibly helpful. I have mentioned | them in my post too. | | This was the only post I could find on internet which talked | about Go compiler internals. | Maksadbek wrote: | Yup, didn't read the beginning of your post and missed the | reference. I guess my comment is redundant since you already | added a reference. | samhw wrote: | As someone who once did the same thing to attempt to | strengthen the type system in Go, I offer you my endless | sympathy... | SilasX wrote: | Semi-related: in the nand2tetris course, they teach you enough so | they can add a keyword in their (admittedly toy) system -- the | course involves implementing the entire toolchain, including the | high-level language[1] compiler, which emits VM code, and the VM | code's translator into assembly. | | By the time you've implemented a compiler that "just works" for | the language, you notice that you have really inefficient code | for those times when all you need to do is increment a variable | by 1, given that the hardware has an opcode for that! In order to | have nice, general compilation across all use cases, you've | programmed the compiler to implement any case of "add X to | variable" via the VM commands for "push X's value onto stack, | push variable's value onto stack, call add, pop stack into X". | | So, I figured I could add "inc" as a keyword. You have the | compiler recognize that keyword and translate "inc <var>" into a | VM instruction, and then tell the translator how to turn that VM | instruction into something that makes use of the opcode. | | (Alternatively, you can have it just recognize when it's doing | something of the form "<variable> = <variable> + 1", but that's | trickier once you've written the whole VM emission step as a | single-pass operation.) | | I know, pretty basic stuff from the standpoint of a professional | compiler programmer, but pretty neat to be able to make an | addition to the language like that! | | [1] It uses "Jack", a syntactic-sugar-free Java-like language | lxe wrote: | > The hash method considers the token's first and second | characters and the length. | | Quite a kludgey optimization for the token hash | Leszek wrote: | V8 developer here (who happened to also implement perfect | hashing in V8's tokenizer) - perfect hashing is a very common | compiler optimisation, and as the sibling comment says, is | worth it for the runtime scanner speed improvement. If you do | end up adding new keywords (which is ~never for anything that | wants to preserve backwards compat), then you just recalculate | your perfect hash with gperf or by hand or however. | tptacek wrote: | The language isn't extensible, so these are changes that happen | very rarely; the compiler is simply optimized for the actual | task it has. What would be weird would be expending any real | effort --- or, worse, runtime cost --- for an engineering case | we know is never going to happen. | benhoyt wrote: | Yes, and specifically in this case no keywords have been | added to the Go language at all since its 1.0 release (I just | checked the 1.0.1 spec and there are still 25 keywords). Even | the addition of generics coming soon in 1.18 will add no new | keywords (though it will add a new token, "~"). | VWWHFSfQ wrote: | I find it disconcerting that the go compiler is such a mess that | it took this much effort just to alias a new keyword. I know the | Ruby internals are famously very nasty but I'm surprised go is | this bad. | throwaway894345 wrote: | What's a language that allows for easily adding new keywords? | What tradeoffs were involved in facilitating that property? Is | this property compatible with Go's goals? | nikanj wrote: | Many languages are easy and fast to extend. Namely toy | languages that are not really going anywhere. A dead-simple, | easy-to-extend compiler is a few thousand LOC - and produces | completely crap code for all platforms. | | If you want your language to actually have some real-world | usage, you need real-world performance numbers. Which tends | to lead to compiler codebases an few orders of magnitude | larger, and much gnarlier to extend. | adgjlsfhk1 wrote: | Also, good languages don't have many keywords, so they | aren't optimized for adding them. | wk_end wrote: | In this context (I see you Lispers) the difficulty of adding | a new keyword to the compiler is less a property of the | language than a property of how the compiler's implemented. | | Specifically, the weird stuff the author encountered like: | | * Generating the token list by parsing the comments of a | source file | | * only parsing up to a hard-coded token instead of all of the | known tokens (?!) | | * using a hacky token hashing mechanism that only looks at | the first two characters of the token | | have nothing to do with Go-the-language. | preseinger wrote: | A language and its principal compiler are I think not so | decoupled as you're implying. | philosopher1234 wrote: | Why does this mean its a mess? Why should the codebase be | optimized for adding new keywords, when that happens maybe once | a decade? Your comment seems overly negative. | benhoyt wrote: | Indeed. In fact, even with the release of generics in Go | 1.18, which is coming out in early 2022 exactly a decade | after Go 1.0, there will be no new keywords. So it won't even | have happened once in a decade. :-) | johnisgood wrote: | > Other than Eli's post, there are no documentation or articles | on Go compiler internals. How does someone get started working on | them? How do they navigate and find all these intricacies without | spending hours? Maybe Google has some internal documentation on | the Go compiler. | | It would be nice to have more information out there on the | internals of the Go compiler. Perhaps there is. | | I found stuff like: | | - https://github.com/emluque/golang-internals-resources | | - https://www.altoros.com/blog/golang-internals-part-1-main-co... | | - https://github.com/teh-cmc/go-internals | | But yeah, Eli's articles[1] are pretty good. | | [1] https://eli.thegreenplace.net/2019/go-compiler-internals- | add... | avinassh wrote: | Thank you for these links! I think I have seen the first two at | some point, but they weren't helpful. The `go-internals` looks | great, and I will check them out. | | I am also curious about the daily development cycle by a | regular Go contributor. How do they make changes, how do they | do quick tests before running the whole test suite etc | melony wrote: | The Go compiler is about as straightforward as it can get. Just | read the source: | | https://github.com/golang/go | avinassh wrote: | I don't think I would have figured out how one adds a new | token if not for Eli's post. This comment [0] perfectly | explains the quirks I ran into. | | As an exercise, can you help me figure out how to add a token | just from the source and discover these quirks? | | On second thought, reading from the source and figuring it | out could have been possible if you spent hours. But don't | you think it should also have some comments to navigate? | | [0] - https://news.ycombinator.com/item?id=29489113 | londons_explore wrote: | Google has an _amazing_ code search tool. You can try it | out here[1]. That generally makes browsing source code much | quicker and easier, which in turn makes understanding the | structure of huge codebases much easier. | | With that tool, I prefer to just dive into the source in | most cases rather than read documentation, especially when | there is a good chance the documentation is wrong/outdated. | | [1]: https://source.chromium.org | preseinger wrote: | Code explains what and how, but not why. Why is necessary | for building a robust mental model of any system, and can | only be provided by documentation (or other humans). | londons_explore wrote: | Googles source code files frequently have 50+ lines of | comments at the top of the file to explain the why... | johnisgood wrote: | > https://source.chromium.org | | Wow, it is pretty cool! Is there an open source software | that is similar to this? It reminds me of | https://elixir.bootlin.com/. I really want something like | these two. | | Currently checking out https://github.com/bootlin/elixir. | ferdowsi wrote: | I clicked into the Go Github repo and found documentation | pretty easily. The compiler code itself is well documented, and | Go's code navigation tooling itself helps learning. | | https://github.com/golang/go/tree/master/src/cmd/compile | avinassh wrote: | I did run into this. The page linked to is a high level | documentation with very few details which are specific to the | codebase. | | Take the example of adding a new token. You have to run go | generate to generate token strings. But nowhere in the docs | or in the code it is mentioned what exactly is the 'stringer' | and how to install it. | _wldu wrote: | This is a great demo. The old C compiler backdoor, but in Go: | | https://github.com/yeokm1/reflections-on-trusting-trust-go | | The Gopher Con Singapore (2018) video is a really great summary | (20 mins). He modifies the compiler and inserts the backdoor | during the presentation: | | https://www.youtube.com/watch?v=T82JttlJf60&list=PLq2Nv-Sh8E... | rodmena wrote: | > How do they navigate and find all these intricacies without | spending hours? Technical documentation of internals is not | important for corporate built languages -- which is a shame. | fefe23 wrote: | Why do we link to some dude applying a HOWTO instead of the | HOWTO? | quotemstr wrote: | If you do perfect hashing, you should make it infallible --- | retry map creation with tweaked hash functions until it works. | didip wrote: | I wonder if the author has heard of https://github.com/goplus/gop | | He'd have fun reverse engineering it. | avinassh wrote: | I have seen it on the HN, but I hadn't looked closely. Thanks | for linking again! ___________________________________________________________________ (page generated 2021-12-08 23:00 UTC)