[HN Gopher] How Rust 1.64 became faster on Windows
       How Rust 1.64 became faster on Windows
       Author : todsacerdoti
       Score  : 160 points
       Date   : 2022-10-23 13:43 UTC (9 hours ago)
 (HTM) web link (tomaszs2.medium.com)
 (TXT) w3m dump (tomaszs2.medium.com)
       | mgaunard wrote:
       | wongarsu wrote:
       | Great to see how huge the benefit of profile-guided optimization
       | is. I feel it's one of the more underappreciated techniques. Rust
       | adding support for it on windows, and showcasing what a big
       | improvement it makes on the compiler is pretty big (in addition
       | to just having a faster compiler)
         | mhh__ wrote:
         | I did pgo builds for the D compiler, was about 10% to 30% even
         | on some benchmarks.
         | The subtle win is the space savings, no more Jackson pollock
         | inlining
         | londons_explore wrote:
         | Unfortunately, many projects never benefit from PGO, because
         | there is quite a lot of complexity involved in setting up a
         | profiling workload, storing the profile somewhere, and using it
         | for future builds.
         | I'd like compiler writers to embed a 'default profile' into the
         | compiler, which uses data from as much opensource code as they
         | can find all over github etc.
         | This default profile will improve the performance of lots of
         | libraries that everyone uses, and will probably still help
         | closed source code (since it will probably be written in a
         | similar style to opensource code).
           | hinkley wrote:
           | JITs do PGO all the time. It's their bread and butter.
             | pjmlp wrote:
             | Nowadays they also save PGO across executions, so that they
             | don't always start from zero.
             | The most modern ones that is (Java, .NET, Android).
           | andrewaylett wrote:
           | The "default" profile "for PGO" is the compiler on its own --
           | folk put a _lot_ of effort into making sure it will generally
           | compile arbitrary code well. And a big part of that is lots
           | of people running lots of open source code and measuring how
           | well it performs.
           | The difficulty with "as much open source code as they can
           | find" is that we need to execute the code to make a profile.
           | And unless we're running the code under real-world
           | conditions, there's no guarantee that we'll generate a useful
           | profile. So we need to be a little careful about which code
           | we look at from a performance perspective. Even when we have
           | a profile, it's a count of branches taken for the specific
           | code that was compiled, and it's not normally applicable to
           | either a different version of the compiler or any input
           | that's not identical to the input used for profiling. With
           | link-time optimisations, even a "common" profile for library
           | code isn't necessarily going to be useful: which bits of a
           | library we'll try to inline will vary according to the code
           | that's calling it.
           | pca006132 wrote:
           | I think you can already build shared libraries with PGO,
           | although this doesn't really work with header only libraries
           | for C++...
           | darksaints wrote:
           | I think a cool project to work on would be model-based ML-
           | generated profiles that takes a set of parameters like:
           | * application type (e.g. client, server, batch process,
           | parser, etc.) * target architecture, vendor, model, etc. *
           | target resources like RAM, HD Types, Network interfaces, etc.
           | I would think you could get very close to an actual PGO level
           | of performance with just a handful of parameters and lot of
           | data.
           | branko_d wrote:
           | Perhaps a better approach would be some sort of per-library
           | profile?
             | londons_explore wrote:
             | If you can make it be zero effort for developers, thats a
             | good plan... But if it involves even a minor effort from
             | the developer, then most developers probably won't bother.
             | I'm imagining for example a 'profile server', which anyone
             | can upload profiler data to, and that the compiler queries
             | to get profile data for any given file it wants to compile.
           | pjmlp wrote:
           | That is the beauty of modern JITs with feedback PGO data, it
           | can be saved across execution sessions and with time the data
           | with grow towards an optimal data point.
           | bruce343434 wrote:
           | > I'd like compiler writers to embed a 'default profile' into
           | the compiler, which uses data from as much opensource code as
           | they can find all over github etc.
           | What would be the point? The whole thing about PGO is that it
           | measures which paths of _your_ code are "hot".
             | tsavola wrote:
             | Consider error handling paths.
               | tialaramex wrote:
               | Rust will already end up optimising out the error
               | handling that _can 't_ happen because Infallible is an
               | Empty Type (it makes no sense to emit code for an Empty
               | Type because no values of this type can exist, so during
               | monomorphization this code evaporates)
               | (e.g. trying to convert a 16-bit unsigned integer into a
               | 32-bit signed integer can't fail, that always works so
               | its error type is Infallible, whereas trying to convert a
               | 32-bit _signed_ integer into an unsigned one clearly
               | fails for some values, that 's a
               | core::num::TryFromIntError you need to handle)
               | So we're left only with errors which _don 't_ happen. But
               | who says? On my workload maybe the profile image file
               | doesn't exist 0% of the time since I'm actually making
               | the image files, so of course they exist, but in _your_
               | workload the user gets to specify the filename and so
               | they type it wrong about 0.1% of the time, and in
               | somebody else 's workload the hostile adversary spews
               | nonsense filename values like "../../../../../etc/passwd"
               | to try to exploit bugs in some PHP code from 15 years
               | ago, so they see almost 10% errors. What would we learn
               | from a "general profile"? Nothing useful.
               | hinkley wrote:
               | Or a perennial favorite of mine:
               | $ process Some Image Name.png
               | Could not find file "Some"
               | $ process "Some Image Name.png"
               | Done.
               | a1369209993 wrote:
               | > Some Image Name.png
               | ... Urg, _that_.
               | If I ever implement a bespoke file system format, it is
               | going to be encoding-level impossible to represent file
               | names with spaces. Not FAT-style[0] "the spec says to
               | replace that with a underscore" or something, but more
               | "the on-disk character encoding does not contain any
               | sequence of bits that represents space".
               | 0: (non-ex-)FAT stores filenames in all caps, but the
               | data of disk is ASCII, so you can just write lowercase
               | letters in the physical directory entries. (I've seen at
               | least one FAT implementation that actually uses that to
               | 'support' lowercase filenames.)
               | dhosek wrote:
               | Meh, I remember the move from DOS 3.3 to ProDOS back in
               | the Apple //e days and the loss of spaces in filenames
               | was something that seemed a regression to me.
               | dhosek wrote:
               | I'd rather see a ban on non-Unicode strings as file
               | paths. ^&*# Windows.
               | [deleted]
               | hinkley wrote:
               | And all this time I've been blaming Windows for bringing
               | us white spaces in file names.
             | londons_explore wrote:
             | Lots of _your_ code is library code that everybody uses...
             | And lots of your code has similar hot paths to everyone
             | elses code. It turns out that `for x in pixels { }` is
             | probably going to be a hot loop... But `for x in
             | serial_ports { }` probably isn't a hot loop...
           | tnh wrote:
           | Agree this difficulty is the biggest obstacle to PGO's
           | success. A language/ecosystem that works out how to integrate
           | this as smoothly as testing would have a sizeable performance
           | boost in practice.
           | The default profile is a nice hack. We do this by default for
           | C++ builds at [company], it works great. Teams that care can
           | build a custom profile which performs better, but most don't.
           | > I'd like compiler writers to embed a 'default profile' into
           | the compiler, which uses data from as much opensource code as
           | they can find all over github etc.
           | Working out how to build, let alone profile all that code is
           | no joke. And the result will be large, and maybe not that
           | much overlap with the average program. As a sibling points
           | out, maybe using ML to recognize patterns instead of concrete
           | code would help?
           | I'd settle for profiling of the standard library. In an
           | ecosystem like Rust, per-crate default profiles that you
           | could stitch together would be amazing.
       | dijit wrote:
       | What is the common consensus on benchmarking test suites in Rust?
       | From what I understood: Criterion was the gold standard, but
       | there is a built-in benchmark suite but that is only supported in
       | Nightly.
       | What's the difference?
         | dochtman wrote:
         | There's the bencher crate as well, which provides a similar API
         | to nightly through macros that work on stable. On one of the
         | projects I maintain we reverted from criterion to bencher
         | because the criterion results sometimes made no sense.
         | throwup wrote:
         | Criterion is still the gold standard.
         | Pros for Criterion over the stdlib:
         | https://github.com/bheisler/criterion.rs#features
         | Downsides of Criterion:
         | https://bheisler.github.io/criterion.rs/book/user_guide/know...
           | dijit wrote:
           | Thanks throwup!
       | superjan wrote:
       | TLDR: Profile guided optimization was not supported on windows.
       | That has been enabled now. So this only helps you if you are
       | compiling on windows and want to go the extra mile of running PGO
       | builds.
         | itamarst wrote:
         | They're shipping PGO builds of the Rust compiler, so for faster
         | compilation you don't have to do anything ("Windows builds now
         | use profile-guided optimization, providing 10-20% improvements
         | to compiler performance", per Rust's release notes:
         | https://github.com/rust-lang/rust/blob/master/RELEASES.md).
         | bogeholm wrote:
         | After enabling PGO, that was used to compile `rustc` itself.
         | So compiling Rust code on Windows is faster with 1.64 than
         | 1.63.
       | mastax wrote:
       | Tldr: PGO
       | https://github.com/rust-lang/rust/pull/96978/
         | [deleted]
       | SimonV1235 wrote:
       | hinoki wrote:
       | I thought PGO instrumentation works on basic blocks, and the
       | inlining, outlining, and register allocation optimisations are
       | all done on llvm's IR. So everything can happen in the backend.
       | What sort of work is OS specific, or language specific?
       | I've used PGO before, but I'm not familiar with the details.
         | Someone wrote:
         | Computing the profile wasn't possible on Windows
         | FTA: _But there is one problem: PGO was up until now available
         | only on Linux._
         | I think they couldn't use a profile generated on Linux because
         | of differences in ABI and standard library.
         | I also think generating a profile is OS dependent because you
         | want it to not have much of a performance impact.
       | eventhorizonpl wrote:
       | It's great to see every speed up :)
         | [deleted]
         | aliqot wrote:
         | I think your comments are being suppressed, they're all removed
         | after you post. You might check on that.
           | nalllar wrote:
           | https://news.ycombinator.com/newsfaq.html
           | Look at the [dead] section
             | aliqot wrote:
             | A single flagged comment and we've declared someone
             | irredeemable. Amazing.
             | https://news.ycombinator.com/item?id=31738035
               | xyzzy123 wrote:
               | Look closer.
               | sp332 wrote:
               | I dont think that's right, because there are 6 non-dead
               | comments after that one. But very new accounts are likely
               | to be banned after a single flag, to avoid ban evasion.
               | aliqot wrote:
               | Those have been vouched for
       | jeff-davis wrote:
       | For databases, I'm reluctant to rely on PGO (profile-guided
       | optimization) because the workloads are so varied. There's a risk
       | of over-fitting to the profiled workloads at the expense of
       | others.
       | Though there may be a lot of opportunity with some database
       | _subsystems_ that have a more consistent usage pattern.
       | Edit: also, PGO is closely related to JIT techniques, which are
       | based on current runtime information rather than profiles
       | generated a long time ago on a workload that may or may not be
       | representative.
         | vlovich123 wrote:
         | I think in practice enabling PGO will be a net gain even if
         | it's suboptimal on some workloads (ie you should still see some
         | performance gain across the board even if the specific workload
         | isn't profiled). The reason is that it's using the profiles to
         | make decisions in lieu of heuristics which should be a win even
         | for non profiled workloads because heuristics are essentially
         | just general case profiles (ie tuned to a bunch of OSS software
         | out there). I'm unaware of any research showing PGO being worse
         | than not doing it even if your profile isn't the workload
         | (you'd probably have to try to specially build such a situation
         | and unlikely to come up in practice).
         | Have you actually seen otherwise?
           | Filligree wrote:
           | Throughput isn't everything, and improving OPS at the
           | expensive of tail latency can be a problem. Depends on your
           | specific workload, but it isn't something I'd enable by
           | default.
         | summerlight wrote:
         | PGO will still likely improve general performance even with
         | biased workloads because in many cases we want compilers to
         | focus on optimizing happy paths, but this is not always well
         | executed even when it's pretty trivial for human eyes.
       | sorz wrote:
       | Is it possible that the profile is over-fitting to the benchmark
       | tests?
         | varajelle wrote:
         | Yes that's likely. But the idea is that even if that's the
         | case, it is still better than no PGO.
         | Edit: I'd like to add that if the 10-20% mentioned is measured
         | on the benchmark that was used to do the pgo, then that figure
         | might indeed not be representative of the real gain.
         | Tuna-Fish wrote:
         | Their main benchmark test is compiling every publicly released
         | crate on crates.io. This is also their main regression test.
         | If you manage to overfit against that, it's still probably an
         | amazing general purpose solution.
       | tyingq wrote:
       | https://archive.ph/5nvje
       | unnouinceput wrote:
         | xeonmc wrote:
         | Oh, I just assumed that the article was written by a French and
         | shrugged.
       | DeathArrow wrote:
         | wizardman wrote:
         | evilduck wrote:
         | Why does this flame bait have to get squeezed in everywhere?
         | Thaxll wrote:
         | There is no viable alternative to Electron that is truly multi-
         | platform.
           | simplotek wrote:
           | > There is no viable alternative to Electron that is truly
           | multi-platform.
           | Qt?
             | pixl97 wrote:
             | They did say viable.
               | simplotek wrote:
               | Since when is the likes of electron more viable than Qt?
               | pixl97 wrote:
               | Since there are at least an order of magnitude more
               | programmers that are going to write JS over Qt.
         | fuzzy2 wrote:
         | I'm all for native UIs, but if you _must_ go cross-platform,
         | it's often not worth it. Why bother with the specifics of
         | Windows, Linux, macOS, Android, iOS. Just build one mediocre UI
         | to rule them all.
         | Electron does not have to be slow, just how web applications in
         | general do not have to be slow, multi-megabyte monstrosities.
         | I bet you would be surprised if you knew where even
         | specialized, sort-of-embedded UI is moving. Hint: it's not
         | native.
         | pas wrote:
         | Nowadays it's possible to opt for a best of both worlds
         | situation: https://tauri.app/ or https://github.com/sciter-
         | sdk/rust-sciter
           | Thaxll wrote:
           | Electron uses chrome everywhere, Tauri uses the system
           | webview which is very different and vastly inferior.
             | zeta0134 wrote:
             | This really depends on your definition of inferior; I'd
             | consider the lower resource usage a good tradeoff for fewer
             | bleeding edge features on most days
         | geodel wrote:
         | On the third hand are people who use VScode to write Rust code
         | and _claim_ tools based on electron is the way to go.
         | [deleted]
         | simplotek wrote:
         | > On the other hand there are guys who don't waste much time
         | and pick Electron for their app, for a 1000% performance
         | degradation.
         | Do hypothetical 1000% performance gains matter if perceived
         | performance is already within acceptable limits?
         | Wasting time gold plating solutions with no meaningful tradeoff
         | is a negative trait, not a positive one.
           | pixl97 wrote:
           | Individually or in bulk?
           | For an individual, no.
           | For the gigawatt of power you just wasted nationwide, yes.
         | golergka wrote:
         | Being able to use the same code on all three major desktop
         | platforms as well as the web is worth it many times over.
       (page generated 2022-10-23 23:01 UTC)