[HN Gopher] The Road to the JIT ___________________________________________________________________ The Road to the JIT Author : lelf Score : 154 points Date : 2020-12-01 14:09 UTC (8 hours ago) (HTM) web link (blog.erlang.org) (TXT) w3m dump (blog.erlang.org) | paulgdp wrote: | For people interested in this subject, this recent HN post might | be also relevant: | | https://news.ycombinator.com/item?id=25253070 | | My interpretation is that the end goal seems similar, but this | project starts with the Rust language instead. | tiffanyh wrote: | > "Dialyzer was only about 10 percent slower with the JIT than | with HiPE" | | This is the first time I've seen AsmJIT compared to HiPE. | Interesting. | jolux wrote: | Is there currently a plan to move towards making the JIT the | default implementation? | di4na wrote: | Yes, it will be the default on all supported platform as of OTP | 24 | didibus wrote: | Oh, so is Erlang Beam VM code currently only running interpreted? | cpeterso wrote: | The article says: "The modern BEAM only has the interpreter." | dnautics wrote: | it's interpreted bytecode. The bytecode is a highly optimized | constrained language that looks a whole lot like machine | language. It would not be unreasonable to build hardware that | runs it, if you wanted to. I guess it would not be unreasonable | to say, it's interpreted in the same way that running arm- | compiled code on qemu on an x86 is interpreted. | didibus wrote: | Ok I see. Follow up question, isn't that the same as Python | then? Or at least compiled to .pyc Python code? | | And, would implementing the Beam bytecode using Graal | polyglot layer be a good idea then? Allowing it to leverage | JVM JIT ? | dasyatidprime wrote: | QEMU in fact does dynamic translation, which is more like how | JIT works in bytecode languages; this is why it's much faster | than some earlier machine emulator software. (I _think_ e.g. | Bochs didn 't have dynamic translation back when QEMU was | coming into play, but the Web suggests that there's at least | discussion around that feature so maybe that's not true | anymore.) | | The BEAM is currently a bytecode interpreter. The article | describes it as using threaded code, which if you look at the | C source is true in a sense on platforms where the C compiler | provides the GNU-style extension of being able to take the | address of labels for dynamic `goto` | (https://stenmans.org/happi_blog/?p=194 agrees with my skim | of `beam_emu.c`)--though if that's the only use as it seems, | I might find that an edge case of the term, since I associate | "threaded code" with the style used in Forth where "user- | level" subroutines can be pointed to directly. If computed | `goto` isn't available, then it uses a conventional | switch/case loop, repeatedly dispatching the result of | fetching the next bytecode instruction pointer value. | dnautics wrote: | Ah sorry, indeed. Bochs is a better analogy. | | Though that brings up a question... I wonder if you could | write a qemu backend to the BEAM, and start working towards | dynamic translation that way? | ngrilly wrote: | Erlang (the language, the runtime and the OTP framework) is a | fascinating alternative in the design space for programming | languages. Always happy to read about it, even if I don't use it | for my projects. | brightball wrote: | Agreed. After reading enough about it I've become convinced | that there are only 2 different platforms in the world. | OTP/BEAM and everything else. | | Every other language that I've studied seems to just cope with | the same problems in slightly different ways. | macintux wrote: | Erlang/OTP/BEAM are so unusual within the computing world: | highly opinionated components that, thanks to their | constraints, are keenly complementary and optimized for each | other. | | No kitchen sink language here: play by their rules or choose | a different environment. | joisig wrote: | I listened to a very interesting podcast episode [1] on the | Thinking Elixir Podcast on this subject recently, an interview | with John Hogberg and Lukas Larsson. Not covered in this blog | post but discussed in some detail in the podcast is that because | you get standard debugging symbols from the JIT, you will now be | able to use gdb and prof and similar tools when working with the | BEAM. | | [1] | https://podcasts.google.com/?feed=aHR0cHM6Ly90aGlua2luZ2VsaX... | kasperni wrote: | Seeing how many billions have been posted into JIT runtimes such | as CLR/JVM/V8. I'm questioning whether or not it is realistic to | create a production ready JIT as a side project? | bjoli wrote: | A tracing JIT is no small feat. You could go simpler and have a | template JIT, which has a lot less complexity. Guile went with | that, and the speedup from the already quite fast 2.2 to 3.0 | was significant. | | Andy has been working at Igalia with the JS engines of chrome | and Firefox, which makes me believe it might not be easily | reached by mere mortals, but looking at the source it is quite | easy to follow, even though I would not trust myself to make | any significant changes. | artemonster wrote: | LuaJIT? | [deleted] | MaxBarraclough wrote: | You beat me to it. | | If you're aiming to compete with HotSpot and .Net then you'll | need to invest millions, but not all JITs are this ambitious. | GNU Lightning is another example of a JIT with few people | behind it. | hinkley wrote: | The BEAM is mostly concerned with correctness and horizontal | scalability. I think a little vertical scalability from a JIT | raises the throughout of the system without really changing how | it works or what you use it for. If anything, maybe you write | less stuff in native code. | | A great deal of research has been published in the last 25 | years, and some of it invalidates earlier wisdom due to changes | in processor design. Just following this trail and applying the | 80/20 rule could get a lot done for a little effort. And a | simple JIT has half a prayer of being correct. | brazzy wrote: | What's "production ready"? | | Works, is stable and faster than an interpreter? Sure, that is | achievable for a motivated and skilled developer working on | their own. At least for a reasonably simple language, maybe not | for a beast like C++. | | Competitive with one of those bigcorp-funded ones? Nope. | toast0 wrote: | It's important to note that the goals of this BEAM JIT are a | lot more modest than those other JITs. | | There's no goal of heroic optimization; the optimization goal | is really just to remove the overhead of interpretation. | | This removes the need to apply the JIT only to some code, | because it's fairly simple, it's fast enough to apply when the | code is loaded, and so all code is JITed to native as it's | loaded. Or you're on an unsupported platform and all code is | interpretted. | | Because it's all or nothing, testing the OTP release should | uncover any JIT bugs; you won't have the hard to track bugs | sometimes seen in other systems where a function's correctness | depends on whether or not it was JITed and that depends on | runtime state. That won't mean no JIT bugs, of course, but they | should be easier to track down. | whizzter wrote: | It's really a situation of the 90/10 rule, you'll get the | first 90% of the benefits from 10% of the work and then it's | many many incremental changes that gets progressively more | expensive. | | Also in the case of Erlang the language model will reward | these first steps even more than most subsequent | optimizations because much Erlang code in general don't have | much tight loops of the kind that are prominent in benchmarks | were a good register allocator would provide huge wins. | | More on the 90/10 rule we need to remember that these | expensive JIT's are very complicated with optimization levels | and interpreters combined with tons of GC options whereas | here they explicitly just dumped JIT-interpreter cross- | calling to simplify the design as well as a more | straightforward internal memory model with less complicated | edge cases. | fiddlerwoaroof wrote: | It sounds to me that "JIT" here just means "not a batch | compiler"? This seems to me to be more like the way Common | Lisp compilers work than what I think of a JITs | fiddlerwoaroof wrote: | I guess a difference is that Lisp implementations generally | cache the generated code between runs. | toast0 wrote: | I'm not super familiar with LISPs, but what the JIT is | doing here is transforming bytecode into native code, which | is what other JITs do. | | What it's not doing, but is commonly done in other JITs, is | any sort of runtime profiling and chosing of which modules | to transforming; all modules are transformed when loaded. | | As described in the article, previous JIT attempts with | BEAM did have that functionality, and they did meet the | project goals; profiling cost too much, the compilation | step was too expensive, and mixing modes between | interpreted modules and native modules added too much | complexity. | | I haven't looked at the code behind this, but from | articles, I haven't seen anything that would preclude | running the bytecode to native code transformation ahead of | time (or caching the just-in-time transformation for future | use), but it's not part of the implementation as of now. | azhenley wrote: | Putting multiple Erlang functions into a single C function to | avoid using C's stack sounds awesome but also horrifying. | chrisseaton wrote: | > Putting multiple Erlang functions into a single C function to | avoid using C's stack sounds awesome but also horrifying. | | Why's that horrifying? Isn't inlining a super-basic and well- | understood optimisation that any serious compiler would be | doing? | azhenley wrote: | No, this isn't inlining. They're managing their own call | stack and relying on undefined behavior. See the quote from | the article: | | > BEAM/C generated a single C function for each Erlang | module. Local calls within the module were made by explicitly | pushing the return address to the Erlang stack followed by a | goto to the label of the called function. (Strictly speaking, | the calling function stores the return address to BEAM | register and the called function pushes that register to the | stack.) | | > Calls to other modules were done similarly by using the GCC | extension that makes it possible to take the address of a | label and later jumping to it. Thus an external call was made | by pushing the return address to the stack followed by a goto | to the address of a label in another C function. | | > Isn't that undefined behavior? | | > Yes, it is undefined behavior even in GCC. | jpcooper wrote: | Why is it undefined behaviour? | IainIreland wrote: | You aren't allowed to goto a label in a different | function. | | From the C standard: | | > The identifier in a goto statement shall name a label | located somewhere in the enclosing function. | | `Shall` is a term of art here: | | > If a "shall" or "shall not" requirement that appears | outside of a constraint is violated, the behavior is | undefined. | | (Sections 6.8.6.1.1 and 4.2, respectively: | http://www.open- | std.org/jtc1/sc22/wg14/www/docs/n1256.pdf) | jpcooper wrote: | Appreciated. | chrisseaton wrote: | Ah sorry I see. | shiny wrote: | To anyone just reading comments, this is referring to the old | BEAM/C compiler, not the new JIT. ___________________________________________________________________ (page generated 2020-12-01 23:01 UTC)