[HN Gopher] A possible new back end for Rust ___________________________________________________________________ A possible new back end for Rust Author : obl Score : 431 points Date : 2020-04-21 13:39 UTC (9 hours ago) (HTM) web link (jason-williams.co.uk) (TXT) w3m dump (jason-williams.co.uk) | amelius wrote: | Does it support JIT compilation, i.e. specialization at runtime? | steveklabnik wrote: | Cranelift has a JIT, but I am not sure what the status of it is | as a rustc backend. | dwheeler wrote: | One cool advantage of having multiple compilers for a language is | that you can use one as a check on the other. | | For example, if you're worried that one of the compilers might be | malicious, you can use the other compiler to check on it: | https://dwheeler.com/trusting-trust | | Even if you're not worried about malicious compilers, you can | generate code, compiled it against multiple compilers, and | sending inputs and see when they differ in the outputs. This has | been used as a fuzzing technique to detect subtle errors in | compilers. | gbrown_ wrote: | > For example, if you're worried that one of the compilers | might be malicious, you can use the other compiler to check on | it: https://dwheeler.com/trusting-trust | | This still requires the use of a use of trusted compiler | though. Comparing two compilers arbitrarily shows if there is | _consensus_ , it does not give guarantees about _correctness_. | | From the link. In the DDC technique, source | code is compiled twice: once with a second (trusted) | compiler (using the source code of the compiler's parent), and | then the compiler source code is compiled using the | result of the first compilation. If the result is bit- | for-bit identical with the untrusted executable, then | the source code accurately represents the executable. | pitaj wrote: | Please don't quote with code blocks. Makes reading on mobile | very difficult. | | The quote reformatted: | | > In the DDC technique, source code is compiled twice: once | with a second (trusted) compiler (using the source code of | the compiler's parent), and then the compiler source code is | compiled using the result of the first compilation. If the | result is bit-for-bit identical with the untrusted | executable, then the source code accurately represents the | executable. | dwheeler wrote: | First, I forgot to disclose: I am the author of | https://dwheeler.com/trusting-trust . | | As discussed in detail in that dissertation, if you are using | diverse double compiling to look for malicious compilers, the | trusted compiler does not have to be perfect or even non- | malicious. The trusted compiler could be malicious itself. | The only thing you're trusting is that the trusted compiler | does not have the same triggers or payloads as the compiler | it is testing. The diverse double compiling check merely | determines whether or not the source code matches the | executable given certain assumptions. The compiler could | still be malicious, but at that point the maliciousness would | be revealed in its source code, which makes the revelation of | any malicious code much, much easier. | | You're absolutely right about the general case merely showing | consistency, not correctness. I completely agree. But that | still is useful. If two compilers agree on something, there | is a decent chance that their behavior is correct. If two | computers disagree on something, perhaps that is an area | where the spec allows disagreement, but if that is not the | case then at least one of the compilers is wrong. The check | by itself won't tell you whirch one is wrong, but at least it | will tell you where to look. In a lot of compiler bugs, | having some sample code that causes the problem is the key | first step. | et2o wrote: | Sounds fascinating. Are there real-world examples of | malicious compilers? | dwheeler wrote: | Yes, there was a malicious compiler system for Apple iOS | that was released in China a few years back and subverted | a large number of mobile applications, including apps | used in the US and Europe. There was also a subverted | Delphi compiler a number of years back, though I don't | think the subversion was dangerous it was more like a | test case. And of course, Ken Thompson demonstrated the | attack in the 1980s. There may be others, but I remember | those offhand. | philsnow wrote: | IIRC this was feasible because people in China are behind | the GFW which throttles/blocks the mac app store, so most | people download from in-country caches, which circumvents | a lot / all of the app signing that Apple uses. | 411111111111111 wrote: | i read a story about a compiler adding malware to the | compiled binary once. | | they kept getting owned until they supposedly found a | pretty dump hack which just appended the backdoor to the | final compilation on the build server... | | no clue if it was just a story though, as i personally | havent experienced anything like that before. | gryfft wrote: | I don't think this is what you're looking for, but Coding | Machines[1] is a great little story in which the Ken | Thompson hack[2] plays a role. | | [1]https://www.teamten.com/lawrence/writings/coding- | machines/ | | [2]https://www.win.tue.nl/~aeb/linux/hh/thompson/trust.ht | ml | dwheeler wrote: | Yes, that's right, that's another story about a subverted | compiler. I don't have any way to verify it, but I have | no reason to doubt the story. It is quite possible, and | not even that difficult to do if you want to be that | malicious. I don't have a URL for it, maybe someone else | can provide that. | gbrown_ wrote: | Ha, I didn't even notice the username! I agree consensus | (or lack thereof) is an useful property to demonstrate. I | think I may have been a bit of a pedant in my prior | comment. | kohtatsu wrote: | Neat. Reduces attacks to conspiracies. | steveklabnik wrote: | Yep! This is a very good property, and part of why mrustc is a | big deal. | jlebar wrote: | If there are any rust people here, you've probably considered | that you can speed up your debug llvm builds by enabling some | optimizations. SimplifyCFG comes to mind, but, like, you can | experiment. I presume the reason you haven't is because you want | to preserve debug info, and llvm isn't great at that when | optimizations are on. | the8472 wrote: | You can customize the debug profile or create an intermediate | profile between release and debug in your Cargo.toml. Debug | info and optimization levels can be configured separately. | | If by speed up you mean compile times and not runtime behavior | then there's also some unstable compiler flag that allows | adding specific llvm passes. | Koshkin wrote: | I've been wondering lately if the modern compilers should all be | using C as the intermediate language (or some language-specific | code optimization opportunities could be lost if they do that). | edwintorok wrote: | The semantics of C aren't very well defined, there is a lot of | ambiguity in the form of undefined and implementation defined | behaviour. This ambiguity is often needed to build an efficient | optimizing compiler. | | When you have a higher level language with more accurately | defined semantics, running it all through C would risk | introducing undefined behaviour. | | With an IR you can control and define the semantics more | closely to what your language needs. | jfkebwjsbx wrote: | > When you have a higher level language with more accurately | defined semantics, running it all through C would risk | introducing undefined behaviour. | | No, it wouldn't. When you target C you need to write a proper | backend for its abstract machine, rather than naively | rewriting code, of course. | | The C abstract machine is a fine IR, specially the later | editions of the standard. | ansible wrote: | I've got to wonder if any of the existing intermediate | representations would be appropriate with other programming | languages. | steveklabnik wrote: | This is true to varying degrees, you could say that LLVM-IR | and Java bytecode are two examples of this in action. | sambe wrote: | Doesn't Nim do that? | nimmer wrote: | Yes, Nim uses C and GCC and this gave it very fast compile | times and similar performance. It also runs on most devices | supported by GCC. | mhh__ wrote: | Just because you shouldn't doesn't mean you can't. | | (C itself is not specified very thoroughly but _C_ - a C | implementation - is, in the sense that it only does one thing | for a given line of code) | Ididntdothis wrote: | That's how C++ started out but as far as I know this had lots | of limitations in terms of optimization so they started writing | native C++ compilers. | mratsim wrote: | I would be surprised if optimization was the actual goal. C++ | is not faster than C. | pjmlp wrote: | Not only does C++ provide features that straight C | optimizers won't be able to match like templates and | constexpr, C++ shares a common subset with C, and libc all | major C compilers is actually written in C++ with extern | "C" entry points nowadays. | | This "C is faster than C++" is a bit dated by now. | Ididntdothis wrote: | It's not faster but it's also not much slower. I think some | features of C++ Would be hard to optimize by a compiler | that doesn't understand them so a C++ to C compiler may | produce slow or bloated code. | stephencanon wrote: | If you're trying to implement a language with substantially | different semantics from C (e.g. a substantially different | memory model, or without UB) the semantics of C make it really | unsuitable as an IR. | | You can't use C's casts (undef for out of range float -> int | conversions, for example), arithmetic (undef for signed | overflow), or shift operators (implementation-defined behavior | for signed right shifts, undefined behavior for left shifts | into the signbit or shift counts not in [0, n)). You can work | around these by defining functions with the semantics that your | language needs, but they get gross pretty quickly (they are | both much more verbose and more error-prone than having an IR | with the semantics you really want, and they require optimizer | heroics to reassemble them into the instructions you really | want to generate). Alternatively, you can use intrinsics or | compiler builtins, but then you're effectively locking yourself | to a single backed anyway, and might as well use its IR. | | The issues around memory models (especially aliasing, but also | support for unaligned access, dynamic layouts, etc) are worse. | | Even LLVM IR is too tightly coupled to the semantics of C and | C++ to be easily usable as a really generic IR for arbitrary | languages (Rust, Swift, and the new Fortran front end have all | had some struggles with this, and they're more C-like than most | languages). C is much worse in this regard. | raphlinus wrote: | I agree 99.44%. | | The behavior of shift operations on signed integers will be | fixed in C++20 and C2x, as part of the effort to require twos | complement representation. It is a massive potential source | of UB in currently standardized C and C++. | | All the other problems listed remain. | | [1]: http://www.open- | std.org/jtc1/sc22/wg21/docs/papers/2018/p090... | | [2]: http://eel.is/c++draft/expr.shift#2 | stephencanon wrote: | Even after C2x is finalized, people will be using C | compilers that don't conform to C++20 and C2x for at least | another decade, so you'll forgive me if I don't hold my | breath =) | jcranmer wrote: | C is a pretty lousy intermediate language: | | * It's missing several useful operators, such as classic bit | manipulation (count trailing zero, byteswap), or even 8- and | 16-bit arithmetic. Checked arithmetic is another useful one | that's not present (or even really possible in C's ABI). | | * Signed integer overflow is UB. | | * Utterly no support for SIMD types. | | * Proper IEEE 754 floating-point control is kind of spotty, | although it tends to be as bad or worse in most other | languages. | | * ABI control is poor. You can't come up with any way to return | multiple register values, for example. | | * Anything that's not a vanilla function isn't supported. No | uparg function support (required for Pascal), multiple entry | points (required for Fortran), or zero-cost exception handling | (required for C++). Hell, even computed goto isn't actually | supported. | | And all of this is assuming you have strong control over how | you expect implementation-defined behavior (e.g., sizeof(int)) | to work. | StreamBright wrote: | Just out of curiosity, what would be a great intermediate | language to transpile to (as of intermediate language)? | mhh__ wrote: | Firstly the programming language itself, then something | like LLVM IR. This is an answer to a slightly different | question but rewriting into the same language (i.e. C++ to | C++ then LLVM) can make debugging much simpler and | implementing features and specific optimizations much more | feasible if you don't have control over the backend. | | IR's should be terse, simple and dumb. I'm not sure any | "real" programming language fits that. | ufo wrote: | In practice I think it is hard to beat C in this regard. | You can go pretty far if you are willing adapt to its | quirks! And while C doesn't always allow for the best | optimization (such as returning multiple values via | registers), the workarounds often are still pretty fast. | | On a more theoretical level there has been some research on | what a better intermediate language would look like. One | project I found interesting was Mu VM, which offers some | niceties for compiling languages with a garbage collector. | | https://microvm.github.io/ | jcranmer wrote: | Of ones I'm a familiar with, LLVM IR is probably the best, | although it has other issues of its own (in particular, | floating point is done even worse than C). I'm not aware of | any language which is going to beat a retargeting | compiler's processor-agnostic IR. | | But even the "better C" languages tend to not really | attempt to expand C structurally. The changes amount to | fixing the egregious semantics (fixed-size types, no int- | promotion, define signed overflow, etc.), add vector types | and other operators, maybe tweak ABI a little bit, and add | a whole lot of syntactic sugar. And those languages that | explore beyond C's limited structural repertoire do so at | the cost of C's specificity. | | That said, ever since the last time someone asked me this | kind of question, I've been trying to design a portable | assembly language. | StreamBright wrote: | Interesting. I need to look into LLVM IR a bit more to | understand this subject better. | | >> I've been trying to design a portable assembly | language. | | Couldn't something like Forth fulfill this role? | aw1621107 wrote: | > in particular, floating point is done even worse than C | | Do you mind expanding on this or pointing me to places | where I can read more? | jcranmer wrote: | There is a hidden floating-point environment that | affects, and is affected by, every single floating-point | instruction. Predominantly, this is rounding mode | control, sticky bits, and exception control (does | overflow cause a SIGFPE?), although most processors have | some form of flushing denormals or treating them as 0s, | which isn't in IEEE 754. | | LLVM's floating point instructions assume that there is | no floating point environment [1]. And there's no real | facility to indicate that floating point instructions | might be affected. To remedy this, they've been working | on adding constrained floating point intrinsics. | | [1] More specifically, that the environment is set up to | the default rounding mode (round-nearest), all exceptions | are masked, and no one will ever care about sticky bits. | [deleted] | epage wrote: | While a C backend is great for compatibility, is it a | sufficient IL to express everything? For example, Rust has some | extra guarantees with aliasing that I'm unsure if C or C | extensions support yet that could offer greater optimizations | (currently not fully being used due to bugs in the LLVM | backend). | bronson wrote: | cfront demonstrated how this is a bad idea. And it was for C++, | about as C friendly as you can get. | bregma wrote: | Modern compilers generally have a language-specific front and | that generates an intermediate representation of the program | logic, which is then transformed into an abstract | representation (such as a single static analysis tree) for | optimization. That is then transformed into an abstract machine | description language, which gets further transformed by the | back end into concrete machine instructions or assembly code. | | Outside of the language-specific front end, compilers generally | have no knowledge of the programming language itself. There is | no technical advantage to transforming Rust into C when it | comes to the middle and back ends, which form the bulk of the | compiler. | | There are no language-specific optimization opportunities. | There are, of course, restrictions on what you can do in some | languages that eliminate optimization opportunities, but you're | not suddenly going to be able to take advantage of those | opportunities by transforming your code into a langue that | lacks the restrictions, because then you change the semantics | of your code. | jfkebwjsbx wrote: | > There is no technical advantage to transforming Rust into C | | There is a key one: the ability to use any C compiler out | there (including proprietary ones). This allows you to target | all platforms out there. | pjmlp wrote: | A dumb interpreter for the IR as bootstraping stage is a | better alternative. | | Plus very few platforms have only support for C and nothing | else, unless we are speaking about esoteric embedded CPUs. | steveklabnik wrote: | We have an interpreter for MIR. It isn't fast enough. | pjmlp wrote: | I clearly mentioned "as bootstraping stage", for nothing | else. | steveklabnik wrote: | I see, I think I misinterpreted you. Sorry! | steveklabnik wrote: | It's a sad thing that you've been downvoted for posting a | thought. Others have already said the drawbacks of this idea, | but there are also pros. | mratsim wrote: | Indeed. Time-to-market is an obvious one and esoteric | platform support, and maybe also debuggability. | Myrmornis wrote: | There wouldn't be any surprises, or cognitive dissonance, from | using very different paths for debug versus release builds? | | On a small project, personally I use --release sometimes during | development because the compile time doesn't matter that much and | the resulting executable is much faster: if I don't use --release | I can get a misleading sense of UX during development. | steveklabnik wrote: | This already happens a bunch, even with the current setups. | It's very natural if you come from a compiled language, and not | if you don't. The first step of someone saying "hey why is Rust | slow?" is five people replying "did you use --release". | Leherenn wrote: | It's funny, because I do the exact opposite. | | As a developer I usually have a pretty powerful machine, and | I've found that debug mode is a good way to approximate slow | computers, and something that is unbearably slow in debug will | bother some users later on. | runevault wrote: | This is an interesting idea, but I guess my one question is | how much does the slowness of debug relate to HOW it will be | slow in release? Since release optimizations can do pretty | radical things to the assembly generated it feels like it | wouldn't really be apples to apples. | crad wrote: | While it appears that cg_clif is faster to compile, does it | provide any performance benefit compared to cg_llvm? Are the | compiled binaries as fast as llvm compiled binaries? If not is | the use-case for development purposes only? | __s wrote: | Correct, cranelift is meant for faster development build cycles | | https://github.com/bytecodealliance/wasmtime/blob/da02c913cc... | wscott wrote: | From the article, it is pretty clear that the resulting code is | not as optimized as the LLVM backend. I didn't see any claims | of how much slower it would be, but clearly that will vary | greatly. Fast to compile is still really handy while | developing. | liquidify wrote: | >>"That's Bjorn3, he decided to experiment in this area whilst on | a summer vacation, and a year & half later single-handedly (bar a | couple of PRs) achieved a working Cranelift frontend." | | Is this guy human? This is amazing, and this guy should be given | an award. | andrewprock wrote: | The thing that struck me most about the article was this quote | from the Rust Survey (2019): | | "Compiling development builds at least as fast as Go would be | table stakes for us to consider Rust" | | Go was designed from the ground up to have super fast compile | times. In fact, there are some significant language issues | related to that design decision. | | Using one of the primary design goals that impacted language | structure as "table stakes" is almost certainly going require a | lot of effort with some serious unintended consequences. | | Improving compilation times sounds good. Aiming high is good. But | reaching "best of breed performance" is major initiative. | pjmlp wrote: | If you mean generics, D, Delphi, Ada and plenty of other | languages prove you can have them and still be pretty fast. | andrewprock wrote: | I mean interface{} | | https://golang.org/doc/effective_go.html#interfaces_and_type. | .. | The_rationalist wrote: | If I remember correctly, mozilla had layoff a few months ago and | the developper(s) of cranelift were in the bag. | | So is anybody currently paid to develop this backend? Without | human resources I fail to see how this would keep up with truly | supporting rust. | | As an aside, while the goal of faster build time is an important | one, for completeness sake, I must tell that the mentality of | rustc developers to be backend agnostic (an ideal) come at the | cost of preventing rustc from adopting most llvm attributes and | this fact is at the advantage of c++. | korpiq wrote: | This feels welcome to me. I tend to think a language needs | multiple independent implementations that only share the same | source language spec, in order to really tear a clear spec apart | from the quirks of any particular implementation. | | I find Rust (the spec, though also the implemenration) quite safe | and practical (a balance). It deserves some independent | implementations to secure a long and stable future. | | On the other hand, I want to use it on non-ARM embedded | platforms, where current cross-compilation through C produces | unusably big binaries. I dream this might increase hope for that, | too, eventually. | thesuperbigfrog wrote: | >> I find Rust (the spec, though also the implemenration) quite | safe and practical (a balance). It deserves some independent | implementations to secure a long and stable future. | | Where is the Rust spec? Unless something happened really | quickly that I was not aware of there is only the | implementation. | steveklabnik wrote: | https://doc.rust-lang.org/stable/reference/ is the closest | thing we have. It is not yet complete. | thesuperbigfrog wrote: | Thank you! I look forward to the day when there is a spec, | but I was surprised to see it mentioned and was wondering | if I missed something big. | pizlonator wrote: | This is really great. The world needs more diverse compiler tech. | The llvm monoculture is constraining what kind of compiler | research folks do to just the things that are practical to do in | llvm. | | I particularly suspect that if something like Cranelift gets | evolved more then it will eventually reach throughput parity with | llvm, likely without actually implementing all of the | optimizations that llvm has. It shouldn't be assumed that just | because llvm has an optimization that this optimization is | profitable anywhere but llvm or at all. | | Final thought, someone should try this with B3. | https://webkit.org/docs/b3/ | swagonomixxx wrote: | Devil's advocate: more diverse compiler tech will mean a more | fragmented community and a larger probability of divergence | across implementations. | | People think the C compiler community is dominated by GCC and | Clang, and it is, but there are literally 1000s of | implementations out there in the wild. Most are necessary, | because we need code generated for some obscure processor | architecture that's completely proprietary, but you can create | that "backend" in LLVM itself - it's a new target architecture | instead of e.g x86. | | The great thing about LLVM is that it's effectively the | quickest (and probably the best) way to generate machine code | without putting in too much effort, for a language. Whether | that language be a research language or an existing industry | language (say, C), that kind of establishment is hugely | valuable. | | A great example of a good monoculture is the Go monoculture. | Sure, there's gccgo, but the proportion of people using that | vs. the reference implementation is minimal, and that reduced | fragmentation is actually a good thing for practitioners (which | most engineers are, not PL researchers). | dmix wrote: | What about the usecase provided in the blog post, where one | is used for fast debugging/dev builds but you use LLVM for | production releases? Basically: why not both? | | I'm guessing this could create some divergence in terms of | what is supported by the compiler but I'm curious how much | that would matter in reality - for day-to-day serious project | development. I'm not familiar with language dev at the | compiler level, so I'm curious to hear if that's practical or | sane. | pizlonator wrote: | Llvm is absolutely not the least effort for generating | machine code. In many settings, it takes a fraction of the | effort of integrating llvm to create a template compiler that | goes straight to machine code. In many other cases, your best | bet is to have your compiler emit C and then feed that to a C | compiler of your choice. | | It's good to have divergence. Competition is good. Otherwise | people stop trying new things. | bluGill wrote: | Depends on your goals. Writing a front end, optimizer and | backend quickly gets to more work. I can write a c++ | compiler in a few months. It won't be good and to make it | good would be many many years of work. If I write a llvm | backend it might take a little longer (I doubt it), but I | automatically get all the optimizations llvm has plus a | good front end that doesn't have bugs in obscure corner | cases. (not claiming llvm is perfect but there will be less | bugs) | pizlonator wrote: | Still not as good as emitting C code in most cases? C | code gets optimized using either llvm or any other | optimizer so it's a more portable compile target. | msla wrote: | When you emit C, you're limited by C, at least if you | want to emit C as opposed to inline assembly wrapped in | C. For example, it's harder to have a function return | more than one value in C than it is in most | architectures, you can't do things with processor flags | (on architectures which have them), you're at the mercy | of the C compiler's optimizer as to vectorization and | loop unrolling, you can't always preserve semantic | information in the source code even when a "reasonable" | compiler would be able to use it to improve the machine | code... | | LLVM was created to _replace_ emitting C, by providing | programmers a way to turn source code into a | representation that is lower-level than C without having | to write the whole optimization and assembly code | generation pipeline. | Gibbon1 wrote: | > it's harder to have a function return more than one | value in C than it is in most architectures | | Biggest issue is the cultural aversion to returning | structs and tagged unions. | a1369209993 wrote: | And it's not even _hard_ , just ugly. Which is much less | of a problem for a compiler IR. | ufo wrote: | I think we can all agree that LLVM IR is a more powerful | compilation target than C. However, what Pizlo was saying | is that generating C can be simpler than generating LLVM | IR. A bunch of printfs can get you very far. | pjmlp wrote: | LLVM while a very successful project isn't nothing new as | idea. | | IBM had several LLVM like projects during the 70's, and | that is how their surviving IBM i and z/OS work anyway, | with language environments that AOT at installation time. | | Likewise there were projects like Amsterdam Compiler Kit | among others during the early 80's. | MauranKilom wrote: | > I can write a c++ compiler in a few months. | | Not that it substantially distracts from your point, but | I strongly doubt this. Or did you mean a heavily | restricted subset of C++? A C++ front end alone is so | complex to build that these guys make a living off of | licensing their front end code: https://www.edg.com/ | | (Fun fact: Microsoft rebuilt IntelliSense for C++ on the | EDG front end. Yes, that Microsoft with the MSVC | compiler. See | https://devblogs.microsoft.com/cppblog/rebuilding- | intellisen... and https://old.reddit.com/r/cpp/comments/b | dt8ep/does_msvc_still...) | | Even without compatibility cruft, you're looking at | multiple 100k LOC if their code base is anything to go | by. That's man-years, not man-months... | cfv wrote: | I still remember when Clang bringing LLVM along was seen as SO | OUT THERE and I'm just mentioning it because I find it weird to | be old enough to see fads in system languages come and start to | go. | | Just curious, do you have any examples of this "limitations" | you speak of? Sounds like a very interesting read. | fnord123 wrote: | LLVM's MCJIT library is 17MB. If you have a language that you | want to JIT and you thought you could embed your language | like lua (<100k), Python (used to be ~250k but now <3M), | you're looking at almost 20MB out of the gates. Not ideal! | | Also if you want to use llvm as a backend for your project | and expect to build llvm as part of a vendored package, the | llvm libraries with debug symbols on my machine was about | 3GB. Also not ideal. | Someone wrote: | As an example, WebKit had an LLVM-based JavaScript optimizer | in 2014 (https://webkit.org/blog/3362/introducing-the-webkit- | ftl-jit/), but dropped it for another one in 2016 | (https://webkit.org/blog/5852/introducing-the-b3-jit- | compiler...) | | In broad strokes, LLVM chooses to optimize for generating | good code for statically compiled code more than for, for | example, memory usage, compilation speed, or ability to | dynamically change compiled code. That doesn't make it | optimal for JavaScript, a language that's highly dynamic and | often is used in cases where compilation time can easily | dwarf execution time. | pizlonator wrote: | Worth noting that B3's biggest win was higher peak | throughput. It generated better code than llvm. It achieved | that by having an IR that lets us be a lot more precise | about things that were important to our front end compiler. | | It's not even about what language you're compiling. It's | about the IR that goes into llvm or whatever you would use | instead of llvm. If that IR generally does C-like things | and can only describe types and aliasing to the level of | fidelity that C can (I.e. structured assembly with crude | hacks that let you sometimes pretend that you have a super | janky abstract machine), then llvm is great. Otherwise it's | a missed opportunity. | pizlonator wrote: | Llvm makes some questionable choices about how to do SSA, | alias analysis, register allocation, and instruction | selection. Also it goes all in on UB optimizations even when | experience from other compilers shows that it's not really | needed. Maybe those choices are really fundamental and there | is no escaping them to get peak perf - but you're not going | to know for sure until folks try alternatives. Those | alternatives likely require building something totally new | from scratch because we are talking about things that are | fundamental to llvm even if they aren't fundamental to | compilers in general. | temac wrote: | I dislike UB, but I do at language level. When LLVM is | reached, UB can only have and only be continued to be | removed, never added (from a global point of view, applying | general as-if rules a compiler can always generate its own | boilerplate in which it knows something can not happen, | then maybe latter leverage "UB" to e.g. trim impossible | paths, that are really impossible in this case -- at least | barring other language level "real" UB). So are there | really any drawback to internal exploitation of "UB" (maybe | we should call it otherwise then) if for example the source | language had none? | enos_feedler wrote: | We are also seeing MLIR emerging as a compiler framework and | LLVM being a dialect of that. This is happening within the LLVM | project itself. From this point, it may be easier to write | compilers without bringing in all of LLVM with it. | pjmlp wrote: | This is also why I think it was great that Maxime eventually | graduated into GraalVM. | | Another tool for compiler research using modern approaches with | type safe languages. | fluffything wrote: | Isn't GraalVM completely tied to LLVM bitcode, and therefore | has all the same problems that LLVM has ? | iamrecursion wrote: | Not at all. There's an LLVM bitcode interpreter built on | top of GraalVM, but the VM itself is heavily reliant on the | internals of OpenJDK. | anp wrote: | Isn't that just the (nee sulong) llvm frontend? IIUC | GraalVM is deeply dependent on OpenJDK internals. | pizlonator wrote: | Not even remotely. | The_rationalist wrote: | Who is Maxime? | edwintorok wrote: | Worth mentioning other alternative small backends: | http://c9x.me/compile/ | bluejekyll wrote: | While the GP doesn't state this as an advantage, the Rust | community would benefit from a fully Rust toolchain. | bluGill wrote: | Why? Other than to prove it can be done what is the point. | | If rust was a huge community okay, but face it, they are | not. It is better therefore to focus their efforts where | they can make a difference. A new x where the existing ones | are just fine (this includes well maintained) is a waste of | resources. | | There are many possible good answers to the above question. | However I'm not sure they apply, and worse I believe they | will split resources that could be used to make something | else better. | fluffything wrote: | Cranelift - the compiler toolchain being discussed in | this post (previously known as Cretonne) - is actually | completely written in Rust, being developed (obviously) | by Rust programmers, that are members of the Rust | community. Its development started at Mozilla, which | still employs some of its developers to work on it full- | time. | | So.. the claim that the Rust community is not big enough | to achieve this is wrong, since they have already done | it.. | | The reason they are doing it, is that LLVM is not fine: | it is super _super_ slow. People want Rust to compile | instantaneously, and are willing to pay people full time | to work on that. | | D, for example, compiles much faster than C and C++, and | does this by having their own backend for unoptimized | builds. I don't know how big the D community is, so I | can't compare its size to the Rust community, but they | did it, and it payed of for them big time, so I don't see | why it wouldn't pay off for Rust as well. | int_19h wrote: | DMD inherited the backend from DMC++, which was the end | of a long line of optimizing C and C++ compilers going | back over a decade before the earliest D alphas. | bluGill wrote: | I didn't claim rust isn't big enough to do it. (that may | well be true given the large effort that went into llvm | over many years to make it a good optimizer - this is a | different debate though and I'm not sure if it is true) | | What I said was rust is better off focusing on problems | that are not solved well by other people. A fast modern | web browser (with whatever features is lacking) for | example. | jfkebwjsbx wrote: | > LLVM is not fine: it is super _super_ slow | | Source? LLVM is fast for what it does. | | What people usually complain about is rustc being slow | overall, not the LLVM passes. | chc wrote: | This is true to some degree -- Rust does more work than | most programming languages, and that work will always | take some time -- but the Cranelift backend is also | measurably faster than the LLVM one. | kibwen wrote: | _> What people usually complain about is rustc being slow | overall, not the LLVM passes._ | | The LLVM phases are usually the dominating factor in Rust | compile times (the other big single contender is the | linking phase). However, when the Rust developers point | this out, they are also careful to mention that this may | be due to the rustc frontend generating suboptimal IR as | input to LLVM; we can both acknowledge that LLVM is often | the bottleneck for Rust compilation while also not | framing it as a failure on LLVM's part (though at the | same time it is uncontroversial to state that LLVM does | err on the side of superior codegen versus minimal | compilation time, hence the niche that alternative | compilers like Cranelift seek to fill). | sambe wrote: | Why phrase it as "other than to prove it can be done" if | you already know there are good answers? I think the | following obviously do apply: | | 1) much easier for Rust community to contribute to the | compiler from end-to-end. | | 2) lower coordination cost with LLVM giving complete, | Rust-focussed control over code generation/optimisation. | Think about e.g. fixing noalias. | | 3) lower maintenance cost for LLVM integration/fork. | | It's also obvious that this needs to be weighed against | the loss of LLVM accumulated technology and contributors. | This is easy to underestimate (although I think 2)/3) are | also easy to underestimate). | bluGill wrote: | Because I don't think the possible good answers apply. | | Sure it is harder to contribute to the backend, but does | it matter? I've been doing c++ for years and never looked | at the backend. | | I'll grant lower coordination costs. However I believe | they are not outweighed by the advantages of the other | llvm contributions. | | If they need to fork llvm that is a problem. Either merge | it back in and be done (with some tests so whatever they | need is not broke), or there is a compelling reason as | llvm won't work with their changes. | aseipp wrote: | Yes, it does matter, because LLVM is an incredibly | complex piece of software. And when you work on a | compiler, it turns out you'll have to work on the | backend. When I worked on a compiler day-in-and-out, | there were _single files_ in LLVM that were bigger than | our entire in-house compilation backend put together. | Which do you think is more appealing to debug? When a bug | in code generation causes compiled programs to segfault, | it is not necessarily easy to debug if you aren 't | intimately familiar with the project, and this fact is | compounded when you consider not everyone hacking your | compiler is also a C++ programmer, knows LLVM's | architecture, and so on. It is literally hundreds of | thousands of lines of C++. The trigger test case is | probably a massive generated IR program generated by some | toolchain written in a completely foreign language, _for_ | a foreign language. Playing the game of "recover the | blackbox from the crash site" is not always fun. | | You can file bug reports, but not every part of the | project is going to receive the same level of attention | or care from core developers, and not everyone has the | same priority. For example the Glasgow Haskell Compiler | had to post-process LLVM generated assembly _for years_ | because we lacked the ability to attach data directly | next to functions in an object file (i.e. at an offset | directly preceding the function). Not doing this resulted | in serious, meaningful performance drops. That was only | fixed because GHC developers, not any other LLVM users, | fixed it after finding the situation untenable after so | long. But it required feature design, coordination, and | care like anything else and did not happen immediately. | On the other hand the post-processing stuff was a huge | hack and broke in somewhat strange ways. We had other | priorities. In the end GHC, LLVM, and LLVM users | benefitted, but it was not exactly ideal or easy, | necessarily. | | On the other hand, "normal" code generation bugs like | register misallocation or whatever, caused by extreme | cases, were occasionally fixed by upstream developers, or | patches were merged quickly. But absolutely none of this | was as simple as you think. LLVM is largely a toolchain | designed for a C compiler, and things like this show. | Rust has similarly stressed LLVM in interesting ways. | Good luck if your language has interesting aliasing | semantics! (I gave up on trying to integrate LLVM plugins | into our build system so that the code generator could | better understand e.g. stack and heap registers never | aliased. That would have resulted in better code, but I | gave up because it turns out writing and distributing | plugins for random LLVM versions your users want to use | isn't fun or easy, which is a direct result of LLVM's | fast-moving release policy -- and it is objectively | better to generate _worse code_ if it 's more reliable to | do so, without question.) | | Finally, LLVM's compilation time issues are very real. | Almost every project that uses LLVM in my experience ends | up having to either A) just accept the fact LLVM will | probably eat up a non-negligible amount of the | compilation time, or B) you have to spend a lot of time | tuning the pass sets and finding the right set of passes | that work based on your design and architecture (e.g. | earlier passes outside of LLVM, in your own IR, might | make later passes not very worth it). This isn't exactly | LLVM's fault, basically, but it's worth keeping in mind. | Even for GHC, a language with heavy "frontend | complexity", you might suspect type checking or whatever | would dwarf stuff -- but the LLVM backend measurably | increased build times on large projects. | | > Either merge it back in and be done | | It's weird how you think coordination costs aren't a big | deal and then immediately say afterwords "just merge it | back in and be done". Yeah, that's how it works, | definitely. You just email the patch and it gets | accepted, every time. Just "merge it back in". Going to | go out on a limb and say you've never actually done this | kind of work before? For the record, Rust has maintained | various levels of LLVM patches for years at this point. | They may or may not maintain various ones now, but I | wouldn't be surprised if still they did. Ebbs and flows. | | I'm not saying LLVM isn't a good project, or that it is | not worth using. It's a great project! If you're writing | a compiler, you should think about it seriously. If I was | writing a statically typed language it'd be my first | choice unless my needs were extreme or exotic. But if you | think the people working on this Rust backend are somehow | unaware of what they're dealing with, or what problems | they deal with, I'm going to go out on a limb and suggest | that: they actually do understand the problem domain | much, much better than you. | | Based on my own experience, I strongly suspect this | backend will not only be profitable in terms of | compilation time, which is a _serious_ and meaningful | metric for users, but will also be more easily understood | and grokked by the core developers. And Cranelift itself | will benefit, which will extend into other Rust projects. | pizlonator wrote: | Historically writing a compiler in the language that | you're promoting is a good way to really understand the | limitations of your language. | | I think this works so well because language designers | tend to understand compilers better than they understand | other software. | cpeterso wrote: | I heard Niklaus Wirth would only allow new compiler | optimizations (in his compilers for Pascal, Oberson, | Modula-2) that proved themselves by speeding up the | compiler itself. | aseipp wrote: | This is the mandatory rule for Chez Scheme which was only | broken once when their entire backend was rewritten, and | also (from what I have heard) a large guiding principle | for the C# compiler at Microsoft. | | It's extreme but it's a good idea because it treats | compilation time like an actual budget, which it is. You | can't just add things endlessly. But it's not easy to | achieve in practice. | pizlonator wrote: | Hahaha that sounds excessive! | | JavaScriptCore does it differently: many of our | benchmarks are either interpreters or compilers written | in JavaScript. | | One of those benchmarks, Air, is just the stack slot | coloring phase of JSC's FTL JIT (that JIT has >90 phases) | rewritten in JavaScript instead of C++. It runs like 50x | slower in JS than C++ even in JSC, which wins on that | test. So, probably it won't be possible to write a JS VM | in JS anytime soon. I mean, surely it'll be possible, but | it'll also be hilariously shitty. | [deleted] | ddevault wrote: | I use qbe, it's great. Here's a mostly feature-complete C11 | compiler based on qbe: | | https://git.sr.ht/~mcf/cproc | hawski wrote: | I see that cproc is under quite heavy development, but qbe | had last commit at the end of November. Is it considered | feature complete? I heard about it some months ago and was | quite interested in QBE, but it did not enjoy high tempo of | changes. It may be considered advantage, I know too little | to judge. | ddevault wrote: | It's complete enough to compile C11 programs - to me, | that's as good of a benchmark as anything. The main thing | qbe is missing for cproc's purposes is inline assembly | and VLAs. DWARF support would also be nice, but no one | seems to care enough to do the work yet. | dbcurtis wrote: | Are you saying that qbe does not generate _any_ debug | information, or just not DWARF format? | Rochus wrote: | Now it's called monoculture? Rather strange. But anyway: in | your terms you're just replacing LLVM monoculture by Rust | monoculture, isn't it? | yjftsjthsd-h wrote: | Yes, it's a monoculture when the majority of all compiler | work/research is happening on one compiler chain. (I feel | like GCC is still competitive enough to keep up some | competition, but Clang does have a _lot_ of backing.) And | yes, if we made a rust replacement and that somehow eclipsed | all other compiler suites it would be a monoculture and be | bad, but that 's unlikely and creating an alternative to the | most popular option reduces monoculture issues by adding more | options. | mratsim wrote: | LLVM has a library and modular approach which makes it | easier for people to contribute just in their area of | expertise instead of having to find their way in the | hundred of thousands of line of GCC. | Rochus wrote: | So if we join forces and create a reusable compiler backend | so not every compiler writer has to implement the same | optimizers and code generators over and over again, then | this is bad because it's a monoculture? How strange is | that? | | To me, it sounds more like political propaganda from a few | idealists who want to justify why - instead of | participating in a joint project - they want to develop | everything themselves from scratch in their favourite | technology. For this there is, nota bene, also a common | term: "Not invented here" syndrome. | yjftsjthsd-h wrote: | > So if we join forces and create a reusable compiler | backend so not every compiler writer has to implement the | same optimizers and code generators over and over again, | then this is bad because it's a monoculture? How strange | is that? | | Why is that strange? You now have a diverse set of | frontends and a monoculture on the backend. A world with | Chrome, Chromium, Edge, Brave, and the Yandex browser is | still a browser engine monoculture. | VHRanger wrote: | Note that the LLVM monoculture came about because of how much | of a pain GCC is to work with. | | And GCC being a pain to work with is a deliberate decision by | Stallman to avoid his baby being expanded upon by corporations | wahern wrote: | That sentiment is about 10 years out-of-date. Today, GCC | supports modules better than clang/LLVM, and has moved to a | minimal C++ coding standard. And time has proven that clang | and LLVM are no less a moving target than GCC--it turns out | that simply writing things in C++ with OOP doesn't | automatically guarantee API compatibility while preserving | the ability to hack on the implementation. | est31 wrote: | GCC can't do runtime retargeting. This is a major drawback | because suddenly you need your distro to think about your | pet niche target. Suddenly you need your build system to | choose the correct linker instead of just being able to use | ldd. I'm a big fan of GNU and the GPL, but clang is much | better in this regard. | pjmlp wrote: | Having a GCC monoculture wouldn't be much better. | yarrel wrote: | "Expanded upon" is a funny way of saying "incorporated into | systems that removed their users' freedom". | ddavis wrote: | It's unfair to rms to say that. He would be happy for | corporations to use and contribute to any project associated | with the GNU project (like GCC), if everyone wanted to play | along in GPL land (which of course isn't reality). | CJefferson wrote: | Unfortunately not. For years people wanted gcc to output a | nice parse tree for C++, which would have been plenty | useful for open source text editors, but was banned by RMS | as it would also be useful for closed source systems. | phkahler wrote: | Not sure, but I think his concern was more about the | introduction of opaque steps being introduced in the | compiler and becoming something people depend on. A weak | analogy might be nVidia drivers on linux, imagine a new | arch where part of the toolchain is a closed blob. | | It turns out that hasn't happened yet with LLVM and | allowing such things under LGPL may have worked. | craftinator wrote: | I agree with this point of view; the three E's, Embrace, | Extend, Extinguish, are already rampant in the compiler | industry, and I see his choices as sacrificing ease of | use for more transparency. | JoshTriplett wrote: | This was a legitimate concern, which helped in the days | GCC originated in, and hurt later on. Parsing C and C++ | well and doing intermediate code generation and | optimization was the hard part; taking the result and | generating code for a target architecture might well have | been proprietary for many architectures if it had been | allowed to be, in the era when UNIX and other OS vendors | were fighting against each other for every scrap of | differentiation. And frontends for some languages would | have been proprietary, as well. | | LLVM's extensible architecture was its most critical | property; its permissive license is an unfortunate side | effect of a rewrite. | | If GCC had come up with the "GCC Runtime Library | Exception" way back in the day, and provided a modular | architecture, half the innovation happening around LLVM | might have happened around GCC instead. (Might, not | "would have"; we can only speculate on what alternate | history might have occurred.) | yjftsjthsd-h wrote: | I think it's partially fair; gcc, in order to make it | impossible to add proprietary add-ons, deliberately has an | non-modular architecture, which makes it hard even for open | source extensions to exist. | benibela wrote: | There are non-llvm compilers. | | FreePascal for example has its own x86, arm, mips, sparc and | powerpc backends | WalterBright wrote: | The D programming language has 3 compilers, one with LLVM (LDC) | one with GCC (GDC) and one with the Digital Mars back end (DMC). | | It's great to have all three, as they each have different | characteristics in terms of speed, generated code, debug support, | platform support, etc. Supporting these three also helps maintain | proper semantic separation of code gen from front end. | tyrion wrote: | Thanks for nice article! Hoping the author reads the comments, I | would like to leave an, hopefully useful, feedback. | | It would greatly improve the reading experience of your blog if | you could make clickable the footnotes/references. | | For example when you say: | | > I've taken the chart from the 2016 MIR blog post[3] | | I have to scroll to the end of the page to find the blog post | (and then scroll back to resume reading). If [3] were clickable | it would be great. It would be even better if [MIR blog post] | were an actual link itself. | mttyng wrote: | This is awesome. It doesn't even seem that long ago when Boa was | started! Man, time flies and people do great things. Kudos to the | author and co-contributors for what Boa has become. | gok wrote: | Novel compiler backends are a super cool idea, but I don't think | it's going to help Rust compile speeds as much as this posts | suggests. The complexity of Rust's type system puts a pretty high | lower bound on compile times because of work the front end needs | to do. Plain C compiles quickly even with an LLVM backend, for | example. | pjmlp wrote: | Haskell, OCaml, SML, Idris also compile quite fast, with | complex type systems. | | Their secret? Multiple backends with different kinds of | optimizations. | | You don't need to compile for the ultimate release performance | when in the middle of compile-debug-edit cycle. | isatty wrote: | From my (limited) experience, Haskell does not compile fast, | especially if you're doing something that needs lenses. | pjmlp wrote: | It surely does, because Haskell is not one compiler | language, not only does it have multiple implementations, | which I concede almost everyone only cares about GHC, there | are interpreters and a REPL experience as well. | | You don't need to compile your program in one go using | GHC's LLVM backend, many times a GHCi session is more than | enough. | steveklabnik wrote: | While the type system does add to compile times, profiling | generally doesn't show that it's the current limiting factor | for compile times. Additionally, tools like rust-analyzer will | give you type errors pretty much instantaneously, though of | course that work is not finished. | | Also of note, this blog post isn't speculation; they posted | numbers from actually doing it. | gok wrote: | A 30% speedup is nothing to sneeze at, but it's not putting | Rust within spitting distance of Go or C for similar amounts | of code. | steveklabnik wrote: | Absolutely. I don't see where anyone is claiming that. | dtolnay wrote: | Rustc normally spends way more time in LLVM than in the | frontend. Rust parsing and type checking are very fast in | comparison to LLVM's codegen. | | Here is a chart from last September showing where the time goes | in compiling a large Rust codebase (rustc itself): | | https://gistpreview.github.io/?74d799739504232991c49607d5ce7... | | (Scroll down to the large horizontal bars once dependencies | have been built.) (Sorry if GitHub is down at the moment; try | later if it doesn't load.) | | The blue part of each bar is time in the frontend, the purple | part is time in LLVM. The largest bar (rustc) spans 105 seconds | in LLVM out of 140 total, or 75% in LLVM. Many of the subcrates | are even more dominated by LLVM time, for example look at | rustc_metadata or rustc_traits where >95% of compile time is | spent in LLVM. | tick_tock_tick wrote: | Rust is famous for throwing garbage IR at LLVM and hoping it | cleans it all up. They've made a lot of progress but | comparing the timing is very misleading when the work is | intentionally offloaded to LLVM. | int_19h wrote: | Isn't that kind of the point of having a relatively high- | level backend - to avoid the need for every front-end to do | the same tedious optimizations? | nitwit005 wrote: | You do have to have some sort of balance. With a large | enough input, any program will become slow. | dtolnay wrote: | My comment is in response to "super cool idea, but I don't | think it's going to help Rust compile speeds". Even a | compiler that emits garbage IR from the frontend would get | 20x faster with a magic instant backend if 95% of time is | currently spent in the backend. | | A magic instant backend is unrealistic, so Rust will need | to move some of the current backend work to frontend work | for things that can be done more efficiently on the | frontend. But the fact remains that there is an opportunity | for big improvement from a much faster backend. | aw1621107 wrote: | I think you might be surprised just how much time is spent in | codegen- and optimization-related code. | | For example, a bit over 75% of the time needed to compile the | regex crate can be attributed to codegen- and optimization- | related events, with a bit over 64% of that time spent in LLVM- | related events specifically [0]. Granted, I'm not certain | whether this is a release or debug build, but it does show that | there is room for significant wins by switching backends. | | As for why C can compile quickly with an LLVM backend while | Rust can't, I'm not sure. I've read in the past that rustc | generates pretty bad LLVM IR to pass to the backend, and it | takes time for LLVM to sort through that, but there's probably | some other factors in there too. | | [0]: https://blog.rust-lang.org/inside-rust/2020/02/25/intro- | rust... | edflsafoiewq wrote: | I wonder how much of it is just code style. A simple for-loop | in C is probably going to be an iterator blob in Rust with an | order of magnitude more code for the backend to chew through. | Rusky wrote: | Part of it may be code style, but another part comes from | just how much LLVM IR the frontend generates for just about | any code style. | | Part of the reason LLVM runs so much faster on C than on | Rust is that Clang is smarter about generating less/better | IR from the start, so LLVM's optimizer has less of a hole | to dig itself out of. | steveklabnik wrote: | We can compare them! https://godbolt.org/z/-Fzuqs | | (My screen is small so it's tough for me to read these | results, to be honest...) | edflsafoiewq wrote: | I believe that is the IR _after_ the optimizer has chewed | through it. | steveklabnik wrote: | Oh duh. Should be obvious that you can drop the -O flags, | at least. | edflsafoiewq wrote: | Yes, which gives about 40 lines of IR for C, and about | 1300 for Rust (Godbolt truncates it at 500). | Waterluvian wrote: | When writing a language like Rust, is the biggest challenge | simply deciding what Rust's features and behaviors should be? And | implementing the syntax and Rust -> LLVM compiler is really just | a chore for the individuals who are super familiar with the | implementation of these languages? Or is the technical | implementation also genuinely challenging and non-obvious? | kevinmgranger wrote: | The concept of lifetime management is relatively novel and | uncharted territory, if I understand correctly. There's only | some prior art. So implementing that must have been an | adventure and a half. | | And while I'm sure the folks who work on these languages are | wonderfully intelligent people, let's dispel this notion that | you need to be a super genius to implement a compiler or | something like that! | | It seems magical, like one of the hardest things you could | program-- but take a look through crafting interpreters, if you | will: http://craftinginterpreters.com/ | | "Nothing is particularly hard if you break it down into small | jobs." - Henry Ford | Waterluvian wrote: | I walked through the Java portion of Crafting Interpreters | and indeed, they can be much simpler than you might imagine | in your early years. I was more just paying a compliment than | suggesting that maintainers are superhuman. Everyone who | ships productive code is a wizard and you can be too! | | I modified my original question to avoid a potential | distraction from what I want to talk about. Thanks! | xscott wrote: | Do you have any links for the prior art? I'm sincerely | interested, thank you if you do. I'd also be interested in | any articles (or even blog posts) that describe Rust's | process for that from a compiler writer's point of view. | tadfisher wrote: | A language like Rust aims for "zero-cost abstractions", which | means the features and behaviors of the language _must_ be | evaluated in the context of the implementation. | Waterluvian wrote: | Let me attempt to unpack this into language I think I | understand: | | It's up to Rust's compiler to verify all the contracts made, | because in order for the binary to be "zero-cost", none of | those checks are being done at runtime. If you looked at the | output assembly, you would see what looks like very carefully | written code that shows no explicit signs of protecting | itself. Ie. there's no swaths of boilerplate assembly doing | borrow checking, out of bounds checking, etc. | eximius wrote: | That might be a stronger way of putting it than I would. | | The canonical example is iterator chains with complex logic | compiling down into vectorised and unrolled loops. Powerful | logical abstractions are used by the compiler to generate | code that does what it says to do without the runtime cost | of closures or whatever other logical but not mechanically | necessary things you have. | | Iterators can't go out of bounds, so the compiler can elide | those checks. There are still _some_ runtime costs at the | intersection of safety, ergonomics, and performance. Bounds | checking, overflow checking. But they have escape hatches | and are relatively rare in the language. Most things do | compile out. | steveklabnik wrote: | You're not wrong, exactly, but I'm not 100% sure this is | right. The way I would put it is this: Rust has made | certain commitments about performance. This means that | language changes have to be made in the context of how they | are implemented, because releasing a language feature that | causes a significant performance degradation would make it | not a good fit by definition. | steveklabnik wrote: | First of all, deciding features and behaviors is _not_ simple. | :) | | There are a number of technical implementation challenges in | the compiler. | | It is a large project, and Rust's got a really intense | stability policy. | | The compiler was bootstrapped very early, when the rate of | change of the language itself was still "multiple things per | day." This introduced significant architectural debt. | | There have been multiple projects that have re-written massive | parts of the compiler, and more ongoing. For example, non- | lexical lifetimes required inventing an entire additional | intermediate language, re-writing the compiler to use it, and | making sure that everything kept working while doing so. | | More recently, the compiler has been being re-done from a | classic, multiple-pass architecture to a more Roslyn, "query- | based" one. Again, this is being done entirely "in-flight", | while keeping a project that's used by a _lot_ of folks stable. | The rust-analyzer has made this project even more interesting; | a "librarification" strategy is being undergone to make the | compiler more modular. | | For some numbers on this kind of thing, | https://twitter.com/steveklabnik/status/1211667962379276288/... | and | https://twitter.com/steveklabnik/status/1211717308143587334/... | j88439h84 wrote: | > Rust's got a really intense stability policy. | | I know the code won't stop running, but I wonder how soon it | stops being idiomatic. If it's not idiomatic, it's harder to | maintain due to unfamiliar style and structure. Does Rust | have measures to deal with this issue? | [deleted] | steveklabnik wrote: | I think the closest thing is enforced rustfmt. I don't hack | on the compiler though, so maybe there's some stuff that | the team does that they don't broadcast super widely. | j88439h84 wrote: | I don't mean the code of the Rust compiler, I mean code | written in Rust becomes unidiomatic as the idioms change. | How fast does that happen, is it a problem, is it being | addressed? | steveklabnik wrote: | Okay, so hilariously, I thought you meant that at first, | but re-read your comment, and said "oh, but we're talking | about the compiler's maintenance and I mentioned how | often the language is changing, so I must have | misunderstood." Should have stuck with my gut! | | It is not a problem. A lot of processing steps go on | before the meat of the work gets done; many new idioms | end up boiling away entirely as part of this process. | Like, the borrow checker doesn't even know about loops; | by the time the code gets there, it's all been turned | into plain old gotos. The further you get into the | compiler, the simpler of a language it gets, and | everything is defined in terms of sugar of the next IR | down. | ansible wrote: | As it happens, Steve can provide some metrics for this: | | https://words.steveklabnik.com/how-often-does-rust-change | | https://www.reddit.com/r/rust/comments/fz8mwm/how_often_d | oes... | steveklabnik wrote: | I don't think this analysis really captures idiom | questions, so while it's related, I'm not sure it's the | right thing here :) | mcqueenjordan wrote: | Honestly, it's probably the perfect time to dive in, now | that async/await has dropped. | | During my time in rust, the major changes in idiomatic | code have been around Results/Errors, async/futures, and | a few macros and syntactic sugar goodies have evolved. | None of these evolutions were problematic to migrate to, | and all of them were moving in the right direction, IMO. | Waterluvian wrote: | Using the word "simply" is a quirk of mine. I'm very much | trying to express that I think it's by far the hardest part. | Thanks for your response! | steveklabnik wrote: | It's all good :) You're welcome! ___________________________________________________________________ (page generated 2020-04-21 23:00 UTC)