[HN Gopher] How Rust 1.64 became faster on Windows ___________________________________________________________________ How Rust 1.64 became faster on Windows Author : todsacerdoti Score : 160 points Date : 2022-10-23 13:43 UTC (9 hours ago) (HTM) web link (tomaszs2.medium.com) (TXT) w3m dump (tomaszs2.medium.com) | mgaunard wrote: | wongarsu wrote: | Great to see how huge the benefit of profile-guided optimization | is. I feel it's one of the more underappreciated techniques. Rust | adding support for it on windows, and showcasing what a big | improvement it makes on the compiler is pretty big (in addition | to just having a faster compiler) | mhh__ wrote: | I did pgo builds for the D compiler, was about 10% to 30% even | on some benchmarks. | | The subtle win is the space savings, no more Jackson pollock | inlining | londons_explore wrote: | Unfortunately, many projects never benefit from PGO, because | there is quite a lot of complexity involved in setting up a | profiling workload, storing the profile somewhere, and using it | for future builds. | | I'd like compiler writers to embed a 'default profile' into the | compiler, which uses data from as much opensource code as they | can find all over github etc. | | This default profile will improve the performance of lots of | libraries that everyone uses, and will probably still help | closed source code (since it will probably be written in a | similar style to opensource code). | hinkley wrote: | JITs do PGO all the time. It's their bread and butter. | pjmlp wrote: | Nowadays they also save PGO across executions, so that they | don't always start from zero. | | The most modern ones that is (Java, .NET, Android). | andrewaylett wrote: | The "default" profile "for PGO" is the compiler on its own -- | folk put a _lot_ of effort into making sure it will generally | compile arbitrary code well. And a big part of that is lots | of people running lots of open source code and measuring how | well it performs. | | The difficulty with "as much open source code as they can | find" is that we need to execute the code to make a profile. | And unless we're running the code under real-world | conditions, there's no guarantee that we'll generate a useful | profile. So we need to be a little careful about which code | we look at from a performance perspective. Even when we have | a profile, it's a count of branches taken for the specific | code that was compiled, and it's not normally applicable to | either a different version of the compiler or any input | that's not identical to the input used for profiling. With | link-time optimisations, even a "common" profile for library | code isn't necessarily going to be useful: which bits of a | library we'll try to inline will vary according to the code | that's calling it. | pca006132 wrote: | I think you can already build shared libraries with PGO, | although this doesn't really work with header only libraries | for C++... | darksaints wrote: | I think a cool project to work on would be model-based ML- | generated profiles that takes a set of parameters like: | | * application type (e.g. client, server, batch process, | parser, etc.) * target architecture, vendor, model, etc. * | target resources like RAM, HD Types, Network interfaces, etc. | | I would think you could get very close to an actual PGO level | of performance with just a handful of parameters and lot of | data. | branko_d wrote: | Perhaps a better approach would be some sort of per-library | profile? | londons_explore wrote: | If you can make it be zero effort for developers, thats a | good plan... But if it involves even a minor effort from | the developer, then most developers probably won't bother. | | I'm imagining for example a 'profile server', which anyone | can upload profiler data to, and that the compiler queries | to get profile data for any given file it wants to compile. | pjmlp wrote: | That is the beauty of modern JITs with feedback PGO data, it | can be saved across execution sessions and with time the data | with grow towards an optimal data point. | bruce343434 wrote: | > I'd like compiler writers to embed a 'default profile' into | the compiler, which uses data from as much opensource code as | they can find all over github etc. | | What would be the point? The whole thing about PGO is that it | measures which paths of _your_ code are "hot". | tsavola wrote: | Consider error handling paths. | tialaramex wrote: | Rust will already end up optimising out the error | handling that _can 't_ happen because Infallible is an | Empty Type (it makes no sense to emit code for an Empty | Type because no values of this type can exist, so during | monomorphization this code evaporates) | | (e.g. trying to convert a 16-bit unsigned integer into a | 32-bit signed integer can't fail, that always works so | its error type is Infallible, whereas trying to convert a | 32-bit _signed_ integer into an unsigned one clearly | fails for some values, that 's a | core::num::TryFromIntError you need to handle) | | So we're left only with errors which _don 't_ happen. But | who says? On my workload maybe the profile image file | doesn't exist 0% of the time since I'm actually making | the image files, so of course they exist, but in _your_ | workload the user gets to specify the filename and so | they type it wrong about 0.1% of the time, and in | somebody else 's workload the hostile adversary spews | nonsense filename values like "../../../../../etc/passwd" | to try to exploit bugs in some PHP code from 15 years | ago, so they see almost 10% errors. What would we learn | from a "general profile"? Nothing useful. | hinkley wrote: | Or a perennial favorite of mine: | | $ process Some Image Name.png | | Could not find file "Some" | | $ process "Some Image Name.png" | | Done. | a1369209993 wrote: | > Some Image Name.png | | ... Urg, _that_. | | If I ever implement a bespoke file system format, it is | going to be encoding-level impossible to represent file | names with spaces. Not FAT-style[0] "the spec says to | replace that with a underscore" or something, but more | "the on-disk character encoding does not contain any | sequence of bits that represents space". | | 0: (non-ex-)FAT stores filenames in all caps, but the | data of disk is ASCII, so you can just write lowercase | letters in the physical directory entries. (I've seen at | least one FAT implementation that actually uses that to | 'support' lowercase filenames.) | dhosek wrote: | Meh, I remember the move from DOS 3.3 to ProDOS back in | the Apple //e days and the loss of spaces in filenames | was something that seemed a regression to me. | dhosek wrote: | I'd rather see a ban on non-Unicode strings as file | paths. ^&*# Windows. | [deleted] | hinkley wrote: | And all this time I've been blaming Windows for bringing | us white spaces in file names. | londons_explore wrote: | Lots of _your_ code is library code that everybody uses... | | And lots of your code has similar hot paths to everyone | elses code. It turns out that `for x in pixels { }` is | probably going to be a hot loop... But `for x in | serial_ports { }` probably isn't a hot loop... | tnh wrote: | Agree this difficulty is the biggest obstacle to PGO's | success. A language/ecosystem that works out how to integrate | this as smoothly as testing would have a sizeable performance | boost in practice. | | The default profile is a nice hack. We do this by default for | C++ builds at [company], it works great. Teams that care can | build a custom profile which performs better, but most don't. | | > I'd like compiler writers to embed a 'default profile' into | the compiler, which uses data from as much opensource code as | they can find all over github etc. | | Working out how to build, let alone profile all that code is | no joke. And the result will be large, and maybe not that | much overlap with the average program. As a sibling points | out, maybe using ML to recognize patterns instead of concrete | code would help? | | I'd settle for profiling of the standard library. In an | ecosystem like Rust, per-crate default profiles that you | could stitch together would be amazing. | dijit wrote: | What is the common consensus on benchmarking test suites in Rust? | | From what I understood: Criterion was the gold standard, but | there is a built-in benchmark suite but that is only supported in | Nightly. | | What's the difference? | dochtman wrote: | There's the bencher crate as well, which provides a similar API | to nightly through macros that work on stable. On one of the | projects I maintain we reverted from criterion to bencher | because the criterion results sometimes made no sense. | throwup wrote: | Criterion is still the gold standard. | | Pros for Criterion over the stdlib: | https://github.com/bheisler/criterion.rs#features | | Downsides of Criterion: | https://bheisler.github.io/criterion.rs/book/user_guide/know... | dijit wrote: | Thanks throwup! | superjan wrote: | TLDR: Profile guided optimization was not supported on windows. | That has been enabled now. So this only helps you if you are | compiling on windows and want to go the extra mile of running PGO | builds. | itamarst wrote: | They're shipping PGO builds of the Rust compiler, so for faster | compilation you don't have to do anything ("Windows builds now | use profile-guided optimization, providing 10-20% improvements | to compiler performance", per Rust's release notes: | https://github.com/rust-lang/rust/blob/master/RELEASES.md). | bogeholm wrote: | After enabling PGO, that was used to compile `rustc` itself. | | So compiling Rust code on Windows is faster with 1.64 than | 1.63. | mastax wrote: | Tldr: PGO | | https://github.com/rust-lang/rust/pull/96978/ | [deleted] | SimonV1235 wrote: | hinoki wrote: | I thought PGO instrumentation works on basic blocks, and the | inlining, outlining, and register allocation optimisations are | all done on llvm's IR. So everything can happen in the backend. | | What sort of work is OS specific, or language specific? | | I've used PGO before, but I'm not familiar with the details. | Someone wrote: | Computing the profile wasn't possible on Windows | | FTA: _But there is one problem: PGO was up until now available | only on Linux._ | | I think they couldn't use a profile generated on Linux because | of differences in ABI and standard library. | | I also think generating a profile is OS dependent because you | want it to not have much of a performance impact. | eventhorizonpl wrote: | It's great to see every speed up :) | [deleted] | aliqot wrote: | I think your comments are being suppressed, they're all removed | after you post. You might check on that. | nalllar wrote: | https://news.ycombinator.com/newsfaq.html | | Look at the [dead] section | aliqot wrote: | A single flagged comment and we've declared someone | irredeemable. Amazing. | | https://news.ycombinator.com/item?id=31738035 | xyzzy123 wrote: | Look closer. | sp332 wrote: | I dont think that's right, because there are 6 non-dead | comments after that one. But very new accounts are likely | to be banned after a single flag, to avoid ban evasion. | aliqot wrote: | Those have been vouched for | jeff-davis wrote: | For databases, I'm reluctant to rely on PGO (profile-guided | optimization) because the workloads are so varied. There's a risk | of over-fitting to the profiled workloads at the expense of | others. | | Though there may be a lot of opportunity with some database | _subsystems_ that have a more consistent usage pattern. | | Edit: also, PGO is closely related to JIT techniques, which are | based on current runtime information rather than profiles | generated a long time ago on a workload that may or may not be | representative. | vlovich123 wrote: | I think in practice enabling PGO will be a net gain even if | it's suboptimal on some workloads (ie you should still see some | performance gain across the board even if the specific workload | isn't profiled). The reason is that it's using the profiles to | make decisions in lieu of heuristics which should be a win even | for non profiled workloads because heuristics are essentially | just general case profiles (ie tuned to a bunch of OSS software | out there). I'm unaware of any research showing PGO being worse | than not doing it even if your profile isn't the workload | (you'd probably have to try to specially build such a situation | and unlikely to come up in practice). | | Have you actually seen otherwise? | Filligree wrote: | Throughput isn't everything, and improving OPS at the | expensive of tail latency can be a problem. Depends on your | specific workload, but it isn't something I'd enable by | default. | summerlight wrote: | PGO will still likely improve general performance even with | biased workloads because in many cases we want compilers to | focus on optimizing happy paths, but this is not always well | executed even when it's pretty trivial for human eyes. | sorz wrote: | Is it possible that the profile is over-fitting to the benchmark | tests? | varajelle wrote: | Yes that's likely. But the idea is that even if that's the | case, it is still better than no PGO. | | Edit: I'd like to add that if the 10-20% mentioned is measured | on the benchmark that was used to do the pgo, then that figure | might indeed not be representative of the real gain. | Tuna-Fish wrote: | Their main benchmark test is compiling every publicly released | crate on crates.io. This is also their main regression test. | | If you manage to overfit against that, it's still probably an | amazing general purpose solution. | tyingq wrote: | https://archive.ph/5nvje | unnouinceput wrote: | xeonmc wrote: | Oh, I just assumed that the article was written by a French and | shrugged. | DeathArrow wrote: | wizardman wrote: | evilduck wrote: | Why does this flame bait have to get squeezed in everywhere? | Thaxll wrote: | There is no viable alternative to Electron that is truly multi- | platform. | simplotek wrote: | > There is no viable alternative to Electron that is truly | multi-platform. | | Qt? | pixl97 wrote: | They did say viable. | simplotek wrote: | Since when is the likes of electron more viable than Qt? | pixl97 wrote: | Since there are at least an order of magnitude more | programmers that are going to write JS over Qt. | fuzzy2 wrote: | I'm all for native UIs, but if you _must_ go cross-platform, | it's often not worth it. Why bother with the specifics of | Windows, Linux, macOS, Android, iOS. Just build one mediocre UI | to rule them all. | | Electron does not have to be slow, just how web applications in | general do not have to be slow, multi-megabyte monstrosities. | | I bet you would be surprised if you knew where even | specialized, sort-of-embedded UI is moving. Hint: it's not | native. | pas wrote: | Nowadays it's possible to opt for a best of both worlds | situation: https://tauri.app/ or https://github.com/sciter- | sdk/rust-sciter | Thaxll wrote: | Electron uses chrome everywhere, Tauri uses the system | webview which is very different and vastly inferior. | zeta0134 wrote: | This really depends on your definition of inferior; I'd | consider the lower resource usage a good tradeoff for fewer | bleeding edge features on most days | geodel wrote: | On the third hand are people who use VScode to write Rust code | and _claim_ tools based on electron is the way to go. | [deleted] | simplotek wrote: | > On the other hand there are guys who don't waste much time | and pick Electron for their app, for a 1000% performance | degradation. | | Do hypothetical 1000% performance gains matter if perceived | performance is already within acceptable limits? | | Wasting time gold plating solutions with no meaningful tradeoff | is a negative trait, not a positive one. | pixl97 wrote: | Individually or in bulk? | | For an individual, no. | | For the gigawatt of power you just wasted nationwide, yes. | golergka wrote: | Being able to use the same code on all three major desktop | platforms as well as the web is worth it many times over. ___________________________________________________________________ (page generated 2022-10-23 23:01 UTC)