[HN Gopher] Hacking the Go compiler to add a new keyword
       ___________________________________________________________________
        
       Hacking the Go compiler to add a new keyword
        
       Author : todsacerdoti
       Score  : 100 points
       Date   : 2021-12-08 17:01 UTC (5 hours ago)
        
 (HTM) web link (avi.im)
 (TXT) w3m dump (avi.im)
        
       | not-my-account wrote:
       | A great video with a similar topic is George Hotz adding "fore"
       | loops to clang, which runs the body 4 times per loop.
       | 
       | https://m.youtube.com/watch?v=ee1bXLDN60U
        
       | eatonphil wrote:
       | On the topic, there was another good post recently on hacking in
       | a new operator to the Go compiler. Yes they rebuild the ^
       | operator but it's still very illustrative of hacking on the Go
       | project!
       | 
       | https://medium.com/trendyol-tech/contributing-the-go-compile...
        
       | Maksadbek wrote:
       | There is a similar article here
       | https://eli.thegreenplace.net/2019/go-compiler-internals-add...,
       | where the author adds the `until` keyword to Go compiler.
        
         | avinassh wrote:
         | hey! author of the submitted post here. I did refer to Eli's
         | articles and they were incredibly helpful. I have mentioned
         | them in my post too.
         | 
         | This was the only post I could find on internet which talked
         | about Go compiler internals.
        
           | Maksadbek wrote:
           | Yup, didn't read the beginning of your post and missed the
           | reference. I guess my comment is redundant since you already
           | added a reference.
        
           | samhw wrote:
           | As someone who once did the same thing to attempt to
           | strengthen the type system in Go, I offer you my endless
           | sympathy...
        
       | SilasX wrote:
       | Semi-related: in the nand2tetris course, they teach you enough so
       | they can add a keyword in their (admittedly toy) system -- the
       | course involves implementing the entire toolchain, including the
       | high-level language[1] compiler, which emits VM code, and the VM
       | code's translator into assembly.
       | 
       | By the time you've implemented a compiler that "just works" for
       | the language, you notice that you have really inefficient code
       | for those times when all you need to do is increment a variable
       | by 1, given that the hardware has an opcode for that! In order to
       | have nice, general compilation across all use cases, you've
       | programmed the compiler to implement any case of "add X to
       | variable" via the VM commands for "push X's value onto stack,
       | push variable's value onto stack, call add, pop stack into X".
       | 
       | So, I figured I could add "inc" as a keyword. You have the
       | compiler recognize that keyword and translate "inc <var>" into a
       | VM instruction, and then tell the translator how to turn that VM
       | instruction into something that makes use of the opcode.
       | 
       | (Alternatively, you can have it just recognize when it's doing
       | something of the form "<variable> = <variable> + 1", but that's
       | trickier once you've written the whole VM emission step as a
       | single-pass operation.)
       | 
       | I know, pretty basic stuff from the standpoint of a professional
       | compiler programmer, but pretty neat to be able to make an
       | addition to the language like that!
       | 
       | [1] It uses "Jack", a syntactic-sugar-free Java-like language
        
       | lxe wrote:
       | > The hash method considers the token's first and second
       | characters and the length.
       | 
       | Quite a kludgey optimization for the token hash
        
         | Leszek wrote:
         | V8 developer here (who happened to also implement perfect
         | hashing in V8's tokenizer) - perfect hashing is a very common
         | compiler optimisation, and as the sibling comment says, is
         | worth it for the runtime scanner speed improvement. If you do
         | end up adding new keywords (which is ~never for anything that
         | wants to preserve backwards compat), then you just recalculate
         | your perfect hash with gperf or by hand or however.
        
         | tptacek wrote:
         | The language isn't extensible, so these are changes that happen
         | very rarely; the compiler is simply optimized for the actual
         | task it has. What would be weird would be expending any real
         | effort --- or, worse, runtime cost --- for an engineering case
         | we know is never going to happen.
        
           | benhoyt wrote:
           | Yes, and specifically in this case no keywords have been
           | added to the Go language at all since its 1.0 release (I just
           | checked the 1.0.1 spec and there are still 25 keywords). Even
           | the addition of generics coming soon in 1.18 will add no new
           | keywords (though it will add a new token, "~").
        
       | VWWHFSfQ wrote:
       | I find it disconcerting that the go compiler is such a mess that
       | it took this much effort just to alias a new keyword. I know the
       | Ruby internals are famously very nasty but I'm surprised go is
       | this bad.
        
         | throwaway894345 wrote:
         | What's a language that allows for easily adding new keywords?
         | What tradeoffs were involved in facilitating that property? Is
         | this property compatible with Go's goals?
        
           | nikanj wrote:
           | Many languages are easy and fast to extend. Namely toy
           | languages that are not really going anywhere. A dead-simple,
           | easy-to-extend compiler is a few thousand LOC - and produces
           | completely crap code for all platforms.
           | 
           | If you want your language to actually have some real-world
           | usage, you need real-world performance numbers. Which tends
           | to lead to compiler codebases an few orders of magnitude
           | larger, and much gnarlier to extend.
        
             | adgjlsfhk1 wrote:
             | Also, good languages don't have many keywords, so they
             | aren't optimized for adding them.
        
           | wk_end wrote:
           | In this context (I see you Lispers) the difficulty of adding
           | a new keyword to the compiler is less a property of the
           | language than a property of how the compiler's implemented.
           | 
           | Specifically, the weird stuff the author encountered like:
           | 
           | * Generating the token list by parsing the comments of a
           | source file
           | 
           | * only parsing up to a hard-coded token instead of all of the
           | known tokens (?!)
           | 
           | * using a hacky token hashing mechanism that only looks at
           | the first two characters of the token
           | 
           | have nothing to do with Go-the-language.
        
             | preseinger wrote:
             | A language and its principal compiler are I think not so
             | decoupled as you're implying.
        
         | philosopher1234 wrote:
         | Why does this mean its a mess? Why should the codebase be
         | optimized for adding new keywords, when that happens maybe once
         | a decade? Your comment seems overly negative.
        
           | benhoyt wrote:
           | Indeed. In fact, even with the release of generics in Go
           | 1.18, which is coming out in early 2022 exactly a decade
           | after Go 1.0, there will be no new keywords. So it won't even
           | have happened once in a decade. :-)
        
       | johnisgood wrote:
       | > Other than Eli's post, there are no documentation or articles
       | on Go compiler internals. How does someone get started working on
       | them? How do they navigate and find all these intricacies without
       | spending hours? Maybe Google has some internal documentation on
       | the Go compiler.
       | 
       | It would be nice to have more information out there on the
       | internals of the Go compiler. Perhaps there is.
       | 
       | I found stuff like:
       | 
       | - https://github.com/emluque/golang-internals-resources
       | 
       | - https://www.altoros.com/blog/golang-internals-part-1-main-co...
       | 
       | - https://github.com/teh-cmc/go-internals
       | 
       | But yeah, Eli's articles[1] are pretty good.
       | 
       | [1] https://eli.thegreenplace.net/2019/go-compiler-internals-
       | add...
        
         | avinassh wrote:
         | Thank you for these links! I think I have seen the first two at
         | some point, but they weren't helpful. The `go-internals` looks
         | great, and I will check them out.
         | 
         | I am also curious about the daily development cycle by a
         | regular Go contributor. How do they make changes, how do they
         | do quick tests before running the whole test suite etc
        
         | melony wrote:
         | The Go compiler is about as straightforward as it can get. Just
         | read the source:
         | 
         | https://github.com/golang/go
        
           | avinassh wrote:
           | I don't think I would have figured out how one adds a new
           | token if not for Eli's post. This comment [0] perfectly
           | explains the quirks I ran into.
           | 
           | As an exercise, can you help me figure out how to add a token
           | just from the source and discover these quirks?
           | 
           | On second thought, reading from the source and figuring it
           | out could have been possible if you spent hours. But don't
           | you think it should also have some comments to navigate?
           | 
           | [0] - https://news.ycombinator.com/item?id=29489113
        
             | londons_explore wrote:
             | Google has an _amazing_ code search tool. You can try it
             | out here[1]. That generally makes browsing source code much
             | quicker and easier, which in turn makes understanding the
             | structure of huge codebases much easier.
             | 
             | With that tool, I prefer to just dive into the source in
             | most cases rather than read documentation, especially when
             | there is a good chance the documentation is wrong/outdated.
             | 
             | [1]: https://source.chromium.org
        
               | preseinger wrote:
               | Code explains what and how, but not why. Why is necessary
               | for building a robust mental model of any system, and can
               | only be provided by documentation (or other humans).
        
               | londons_explore wrote:
               | Googles source code files frequently have 50+ lines of
               | comments at the top of the file to explain the why...
        
               | johnisgood wrote:
               | > https://source.chromium.org
               | 
               | Wow, it is pretty cool! Is there an open source software
               | that is similar to this? It reminds me of
               | https://elixir.bootlin.com/. I really want something like
               | these two.
               | 
               | Currently checking out https://github.com/bootlin/elixir.
        
         | ferdowsi wrote:
         | I clicked into the Go Github repo and found documentation
         | pretty easily. The compiler code itself is well documented, and
         | Go's code navigation tooling itself helps learning.
         | 
         | https://github.com/golang/go/tree/master/src/cmd/compile
        
           | avinassh wrote:
           | I did run into this. The page linked to is a high level
           | documentation with very few details which are specific to the
           | codebase.
           | 
           | Take the example of adding a new token. You have to run go
           | generate to generate token strings. But nowhere in the docs
           | or in the code it is mentioned what exactly is the 'stringer'
           | and how to install it.
        
         | _wldu wrote:
         | This is a great demo. The old C compiler backdoor, but in Go:
         | 
         | https://github.com/yeokm1/reflections-on-trusting-trust-go
         | 
         | The Gopher Con Singapore (2018) video is a really great summary
         | (20 mins). He modifies the compiler and inserts the backdoor
         | during the presentation:
         | 
         | https://www.youtube.com/watch?v=T82JttlJf60&list=PLq2Nv-Sh8E...
        
       | rodmena wrote:
       | > How do they navigate and find all these intricacies without
       | spending hours? Technical documentation of internals is not
       | important for corporate built languages -- which is a shame.
        
       | fefe23 wrote:
       | Why do we link to some dude applying a HOWTO instead of the
       | HOWTO?
        
       | quotemstr wrote:
       | If you do perfect hashing, you should make it infallible ---
       | retry map creation with tweaked hash functions until it works.
        
       | didip wrote:
       | I wonder if the author has heard of https://github.com/goplus/gop
       | 
       | He'd have fun reverse engineering it.
        
         | avinassh wrote:
         | I have seen it on the HN, but I hadn't looked closely. Thanks
         | for linking again!
        
       ___________________________________________________________________
       (page generated 2021-12-08 23:00 UTC)