[HN Gopher] PartialExecuter: Reducing WebAssembly size by explor...
       ___________________________________________________________________
        
       PartialExecuter: Reducing WebAssembly size by exploring all
       executions in LLVM
        
       WebAssembly is commonly used as part of web applications, and
       minimizing its size is especially important.  As part of the latest
       release of Cheerp, our C++ to WebAssembly/JavaScript compiler, we
       have introduced a powerful new LLVM optimization that aggressively
       reduce WebAssembly output size at compile time.  We have named this
       optimization 'PartialExecuter', the key idea behind it being taking
       advantage of known function parameters to find inner code blocks
       that cannot ever be possibly executed.  Such blocks can then be
       completely removed from the compiled output, significantly reducing
       its size.  What makes this pass more powerful than typical Dead
       Code Elimination is the ability of reasoning over all the possible
       executions that the code can take, while being robust to memory
       stores and side-effects. Moreover, PartialExecuter can even reason
       over loads as far as they refer to read-only memory. This latter
       capability is especially useful to drop code from complex functions
       whose behavior depend on input strings (i.e. printf).  We think
       this work may be of interest for the HN community, and we welcome
       feedback and questions.  In-depth blog post:
       https://leaningtech.com/reducing-webassembly-size-by-explori...
        
       Author : apignotti
       Score  : 212 points
       Date   : 2022-03-15 15:57 UTC (7 hours ago)
        
 (HTM) web link (leaningtech.com)
 (TXT) w3m dump (leaningtech.com)
        
       | titzer wrote:
       | Nice work. I love simple and "conservative brute force"
       | techniques like this--brute force in that you run all the example
       | call sites in the interpreter, conservative in that it bails out
       | when things get hard (e.g. bail out if a single basic block is
       | executed too many times).
       | 
       | I am wondering if this technique could be hybridized with
       | abstract interpretation or partial evaluation, which are known
       | techniques in compilers. In essence, this would become a type of
       | concolic execution.
        
         | carlopi wrote:
         | The step forward that I believe worked well here is that given
         | some actual paths (and since they are simply a sequence of
         | Instruction, they can be fed to an interpreter) they are merged
         | or forked while entering or exiting SCCs, this allows to keep a
         | low count of actual visit but exploring a big enough set of
         | path to prove that it covers all possible executions.
         | 
         | On the second part, I have to do some thinking and coming back,
         | thanks!
        
       | jjice wrote:
       | Optimizations like this are incredible to me. In my compiler
       | class in college, I was smitten by how different optimizations
       | were performed, and we only ever worked with pretty basic
       | constant propagation and dead code elimination. The work here is
       | incredible. My respect to the entire team.
        
         | carlopi wrote:
         | > My respect to the entire team.
         | 
         | Same!
        
       | densh wrote:
       | Thanks for sharing, we ended up designing something very similar
       | in Scala Native [1]. The use case we had was to explore all
       | possible code paths was to reduce the overhead of virtual call
       | dispatch (since very often virtual calls have very few targets in
       | practice), which is extremely dominant and prohibitive in JVM
       | languages unless optimized away. I hope to see your work upstream
       | in LLVM.
       | 
       | [1]: https://scala-
       | native.readthedocs.io/en/latest/blog/interflow...
        
         | carlopi wrote:
         | Devirtualization is always plenty of fun + has lots of
         | potential. Added to the list of article to read.
        
       | turminal wrote:
       | Are the these new techniques only applicable to WebAssembly? If
       | so, why?
        
         | eklitzke wrote:
         | GCC and Clang aren't super aggressive with DCE (dead code
         | elimination) because most of the time the complex cases for DCE
         | have essentially no impact on run time performance, and may be
         | expensive to implement (i.e. significantly increase compile
         | times).
         | 
         | For example, suppose you have a library that is configured with
         | some options struct that contains a bunch of flags/settings.
         | The behavior of the library changes based on these flags. The
         | compiler will see a bunch of branches like "if (opts.foo) { ...
         | }" and will generate code for all of these branches of. Now
         | let's say you statically compile this library, and in practice
         | in your code you only ever has one set of options enabled. In
         | principle the compiler could figure out which branches can be
         | eliminated based on the single instantiation of the opts struct
         | in your code, and eliminate dead branches. But in practice
         | neither Clang nor GCC will actually do this kind of DCE even at
         | -O3 because it's simply not worth it. By the way, this kind of
         | example is exactly the kind of DCE that the blog post is
         | talking about and could be removed by the new DCE pass
         | implemented by the author.
         | 
         | How big of an impact would this kind of DCE make on
         | performance? Well the compiled binary size will be a bit
         | smaller, which is kind of nice. But in practice this will have
         | almost no impact on performance. Loading and mapping an ELF
         | file is practically instantaneous even on huge executables. The
         | branch predictor will predict all of the options branches that
         | are hit repeatedly at close to 100%. Eliminating the branch
         | entirely is in theory better than having a branch with a 100%
         | hit rate, but hard to demonstrate in real world benchmarks for
         | all but the most critical code paths. If there are large pieces
         | of code in the executable that are unused they'll be mapped but
         | won't even be page faulted during program execution. There are
         | some kind of hand wavy arguments you can make about the extra
         | code wasting space in the icache but again it would probably be
         | difficult to actually demonstrate the impact even in
         | microbenchmarks.
         | 
         | This isn't to say that there are _no_ benefits to more
         | expensive DCE passes. But they 're generally extremely meager,
         | so it's not worth increasing compile times for most
         | applications. Wasm is an exception because compiled assets need
         | to be transferred over the network and apparently it takes
         | longer to load wasm code than it does to map an ELF executable.
         | It's also worth noting that Clang and GCC do a lot of other
         | types of simpler DCE, and these simpler DCE passes can be
         | critical for performance, so I'm not trying to suggest that DCE
         | entirely is worthless; just that the type of DCE presented here
         | is less useful for traditional compilation.
        
         | carlopi wrote:
         | WebAssembly implies static linking (malloc / printf & all are
         | part of the shipped module) and code size matters since it
         | directly influence users (since there might be delays in
         | downloading big payloads). Both factors plays a role in
         | deciding to plan putting work in optimizations like this.
         | 
         | There are some general gains to be had with this optimization,
         | and probably the same ideas were already around for a while,
         | but putting together in this way was helped by thinking about
         | WebAssembly specific constraints.
        
           | sroussey wrote:
           | Code size always matters, and while wasm may be an extreme
           | case, hopefully this kind of benefit can contribute to
           | shrinking other kinds of code targets.
        
           | pjmlp wrote:
           | Not necessarily, shared libraries are on the WebAssembly
           | roadmap.
        
             | carlopi wrote:
             | Yes, I wanted to be more nuanced but oversimplified. Thanks
             | for mentioning this!
        
         | benoitberaud wrote:
         | At least, I hope these could be applied to other VMs like the
         | JVM. I don't see any reason why it can't be done, but I'm not
         | an expert at all.
         | 
         | From my understanding, the main issue is mostly that it is very
         | complex / risky to do these optimizations, and the reward is
         | deemed low for the JVM.
         | 
         | The main JVM target I can think about where it could probably
         | be profitable is the Android subsystem, but why would you care
         | to reduce OS / Apps size when the fact that phones get bloated
         | is very good for your business since you also sell phones, so
         | it's quite useful to push the user to renew its phone on a
         | regular basis.
        
       | Hercuros wrote:
       | How much does this optimization save in terms of executable size
       | on realistic examples (i.e. a full-sized code base)?
       | 
       | I would naively expect that for most functions you know too
       | little about their arguments or the state in which they will be
       | executed to be able to tell that certain branches will definitely
       | not be taken. Especially since any function that has some mildly
       | interesting side-effects (like loading and storing from a pointer
       | or a class/struct field, or even _calling into any function that
       | does that_) would seem to foil the analysis and turn large chunks
       | of code into "unknown reachability".
       | 
       | Rephrasing, I would expect the set of basic blocks that can be
       | removed just by essentially repeated constant propagation of
       | statically-known function arguments to be pretty small. But I
       | would be happy to be proved wrong if that intuition is not right.
       | 
       | Printf is somewhat of an exception, since the format strings are
       | usually known at compile time, meaning that a lot of the control
       | flow can indeed be deduced by compile-time evaluation, but for
       | most functions I would not expect that.
        
       | gavinray wrote:
       | This company has an x86-to-WASM compiler that lets you execute
       | arbitrary binaries in the browser.
       | 
       | Also a JVM to WASM transpiler.
       | 
       | There's a fantastic Meetup presentation given by one of them
       | where they show running a C++ multiplayer game with both client
       | AND server running in a browser, using WebRTC as a networking
       | polyfill. Really mindblowing:
       | 
       | https://youtu.be/7JUs4c99-mo?t=167
        
         | zeusk wrote:
         | At what point does the browser become an "os", what's next?
         | Chrome hypervisor?
        
           | tentacleuno wrote:
           | Not sure what you mean by hypervisor, but Microsoft _does_
           | have... Microsoft Defender Application Guard, I believe it 's
           | called. It's a totally sandboxed version of Edge.
        
           | runeks wrote:
           | > At what point does the browser become an "os", what's next?
           | 
           | When it no longer requires an OS to run.
        
           | dmitrygr wrote:
           | It happened about a decade ago.
        
           | sva_ wrote:
           | I'd be much more willing to execute some binary in my browser
           | than directly on my OS. Sure, you could do a VM or so, but a
           | browser tab is spun up much more quickly.
           | 
           | The absence of friction of just executing something quickly,
           | and being able to share it with a single text string, is what
           | makes this appealing to people, I reckon. I dunno why people
           | seemingly get offended by this.
        
         | fsiefken wrote:
         | What would the impact be on battery life compared to native
         | execution? Can one get it just as power efficient as native
         | execution, only slower? For computationally less intensive
         | applications browser portability might be an excellent
         | tradeoff.
         | 
         | It gives a new meaning to the slogan "write once, run
         | everywhere".
        
           | eklitzke wrote:
           | Generally things that run faster are more power efficient,
           | because the CPU has to run for less time to do the same work.
           | There might be a few odd exceptions here and there, e.g.
           | certain AVX instructions can use a lot of power, but they're
           | not common.
        
             | mhh__ wrote:
             | Those AVX instructions use more power at least in part
             | because they massively better utilize the silicon, so
             | you're making a tradeoff between speed and efficiency as
             | per usual.
        
               | deathanatos wrote:
               | Sorry, but to be pedantic for a second: do they consume
               | more _energy_ , which is really what I care about for
               | battery life? (Let's assume I'm committed to asking the
               | machine to do whatever work it is that I asked it to do,
               | so the required computation is constant, for discussion.)
               | My understanding of AVX is that it consumes more _power_
               | , yes -- but the point the parent makes is a good one: if
               | it finishes faster, where does the energy use come out?
               | More power over less time can mean less energy. (Or more.
               | Depends on the additional power vs. the time saved...)
        
         | mysterydip wrote:
         | > running a C++ multiplayer game with both client AND server
         | running in a browser, using WebRTC as a networking polyfill
         | 
         | Ok, _now_ you have my attention. Been waiting for that
         | possibility for a while!
        
           | yuri91 wrote:
           | If you are interested, here is the article I wrote about that
           | particular project: https://medium.com/leaningtech/porting-a-
           | c-multiplayer-game-...
        
         | unnouinceput wrote:
         | This sounds like ActiveX. Security issues related to ActiveX
         | are solved or not? Because if they are not, then it's just a
         | reskin of ActiveX and malware authors gonna have a field day.
        
           | apignotti wrote:
           | There is a very big difference: ActiveX was native code that
           | was literally running on your system with full access to
           | system calls.
           | 
           | CheerpX is a Virtual Machine environment, it JIT compiles
           | Wasm code from x86 binaries and it is fully sandboxed by the
           | browser. It _cannot_ access your system even if it tried.
        
             | runeks wrote:
             | > It _cannot_ access your system even if it tried.
             | 
             | That's the intent, at least. I'm sure we'll get to see
             | exploits that manage to do exactly this.
        
               | comex wrote:
               | Perhaps, but if so, that would be a bug in the browser's
               | WebAssembly implementation, not CheerpX.
        
               | jfoutz wrote:
               | CheerpX looks amazing. Not blaming that project it in any
               | way.
               | 
               | But rowhammer is still a thing.
               | 
               | There's a whole stack of abstractions, that all _may be_
               | vulnerable. I'm sure CheerpX is very good, but there's no
               | way to _know_ that all the dependencies from the
               | toolchain used to build all the way down to the running
               | environment is actually bug free.
               | 
               | As a first line of investigation, I'd suspect cheerpX,
               | just because so many eyes look at browser sandboxes.
               | _shrug_ your milage may vary.
        
               | saagarjha wrote:
               | You can Rowhammer from JavaScript already, so this really
               | has nothing to do with CheerpX.
        
               | jfoutz wrote:
               | Very true. Let me try to restate.
               | 
               | I don't know how to prove the absence of a thing. I can
               | only prove existence.
               | 
               | I was trying to highlight that every layer of abstraction
               | has vulnerabilities all the way down to the hardware
               | level.
               | 
               | I'm perfectly willing to accept that CheerpX has no known
               | vulnerabilities.
        
             | pjmlp wrote:
             | It can still be exploited the Applets/Flash way, by forcing
             | the internal memory to become corrupted and with it change
             | its behaviour.
        
               | sfink wrote:
               | Right, you can corrupt memory and thereby alter behavior,
               | but that memory is a managed array. You can't corrupt
               | anything outside of the application's own memory. As you
               | say, you can change behavior, and thereby do whatever you
               | want with whatever the wasm application is allowed to
               | access. Which is a very limited set of things.
               | 
               | Likening it to Java applets or Flash is deceptive -- yes,
               | you can still hack them and exploit vulnerabilities. But
               | the scope of what you can do with such vulnerabilities is
               | wildly, dramatically different. Even when sandboxed,
               | Flash has an enormously wider attack surface to play
               | with. WebAssembly has barely anything.
               | 
               | It's like the difference between patching a leak in your
               | roof with a sponge vs tar paper. In theory, water _could_
               | find a path through the tar paper.
        
               | pjmlp wrote:
               | Imagine a WASM module used to control security
               | authentication in the browser, or controlling IoT devices
               | in a factory, now that it is fashionable to run WASM
               | outside of the browser.
        
               | sfink wrote:
               | Right, I agree that wasm being used to control nuclear
               | launches is not fundamentally better than native
               | sandboxed C++ code in terms of preventing unwanted
               | nuclear launches.
               | 
               | But wasm being used to control the brightness pattern of
               | a blinky light on the console of the machine that
               | controls nuclear launches? That _is_ fundamentally safer
               | than native sandboxed C++ code being used to control the
               | blinky light.
               | 
               | I'm guessing we don't actually disagree on anything here
               | -- I also feel like people are making unwarranted
               | assumptions that wasm gives you more safety than it
               | actually does. (It reminds me of another incorrect
               | assumption that seems to get made a lot, that running
               | unsafe code in a VM means you don't need to worry that
               | it'll escape to the host or other VMs on that host.)
        
               | nynx wrote:
               | It literally cannot.
        
               | pjmlp wrote:
               | I advise some courses in pentesting.
        
               | codr7 wrote:
               | Assuming the VM can't be exploited/escaped somehow, which
               | would be a first.
        
               | MikeHolman wrote:
               | Wasm has the same security risk as executing JavaScript
               | in your browser, except with less risk of XSS type
               | security issues because wasm modules are better
               | encapsulated.
        
           | pjmlp wrote:
           | That is the thing with fake security advertising of
           | WebAssembly.
           | 
           | You can just as easily compile heartbleed into WebAssembly
           | and call it a day.
           | 
           | Yes it is sandboxed, but hardly any different than running an
           | OS process with sandboxing lock down regarding which OS
           | syscalls are allowed.
           | 
           | Think it this way, just because an OS container cannot
           | exploit its host (minus hypervisor bugs), that doesn't mean
           | that what is inside isn't subject to security flaws.
        
             | the_duke wrote:
             | You keep repeating this, but it's simply not true.
             | 
             | The syscall surface is zero by default. WASI of course
             | expands that by a lot, but sane WASI implementations
             | require restrictive whitelisting of available resources
             | like specific files or directories that can be accessed.
             | (wasmtime-wasi does this well,for example). The whole wasi
             | spec is aiming for capability based security, including
             | things like only providing access to pre-opened sockets.
             | 
             | All production deployments of server side Wasm I have seen
             | have very tight scoping.
             | 
             | Sandboxing on Linux, in contrast, is a complicated mess,
             | with a myriad of partial and leaky solutions (SELinux,
             | Apparmor, Docker, Snap, Flatpak, ...) .
             | 
             | Having a good cross platform toolkit is completely out of
             | reach.
             | 
             | The only OS that has comparable sandboxing that is
             | practical is Fuchsia.
        
               | nix0n wrote:
               | Is this true of server side only, or is there a practical
               | solution for locking down JS within the browser?
        
               | the_duke wrote:
               | Wasm has no access to any browser APIs by default, so
               | yes.
               | 
               | You can expose the required functionality via JavaScript
               | bridges.
               | 
               | The current Rust (wasm-bindgen) and C/C++ (emscripten)
               | tooling exposes the the whole browser API by default
               | though, which isn't ideal.
        
               | pjmlp wrote:
               | And will keep repeating ad eternum given the fake
               | security sales from Webassembly, for God's sake evem the
               | standard has a security section mentioning possible
               | issues not addressed.
               | 
               | There is no magic in WASM, only when the only thing it
               | does is warming up a CPU.
        
               | the_duke wrote:
               | I'm happy to have a technical discussion, but your
               | comments always stop at "Wasm bad", without countering
               | the arguments.
        
               | kaba0 wrote:
               | I would be interested in the previously mentioned claims
               | like "wasm can't protect against rowhammer" --- while
               | perhaps not an easily exploited vulnerability, and likely
               | every current sandbox tech is vulnerable, is there
               | anything specific to wasm that would help here?
               | 
               | Also, how does wasm relate to traditional sandboxes that
               | "just" hijack system calls? Is there a difference?
        
           | jchw wrote:
           | HN is doing its typical "well, technically," thing in the
           | replies here, so I'll give a better answer: Yes, the problems
           | that plagued ActiveX are gone now. WebAssembly hosts still
           | have vulnerabilities from time to time, but they are not
           | literally by-design like they were with ActiveX. They're
           | typically complicated and to do similar things to what
           | ActiveX modules could do _completely ordinarily_ would
           | require breaking many sandbox layers.
           | 
           | The discussion also is breaking into an unrelated branch
           | about security _within_ WebAssembly, so it's worth pointing
           | out that the security model of WebAssembly is that it should
           | prevent the host machine from being attacked by code running
           | in the sandbox, not that it makes code running in the sandbox
           | any more secure. It doesn't make code in the sandbox not
           | susceptible to most classes of errors, it just isolates those
           | errors to runtime errors and not exploitable bugs that can
           | escalate privilege.
           | 
           | ActiveX, of course, did not deal with either security model,
           | instead shrugging it off and just assuming that code signing
           | would be a good enough deterrent to malware.
           | 
           | (Of course, the problem with this is manyfold, including the
           | fact that just because a module is trustworthy and not
           | malicious, does not mean it should be exposed to all users,
           | and the fact that even if you _trust_ the company behind the
           | module, that doesn't make the module any more guaranteed to
           | be secure.)
        
         | apignotti wrote:
         | > This company has an x86-to-WASM compiler that lets you
         | execute arbitrary binaries in the browser.
         | 
         | Direct link to our latest demo in case anybody would like to
         | see this tech in action: https://webvm.io
        
       | api wrote:
       | Please mainstream this in general. It would save a decent amount
       | of RAM if it became a standard part of LLVM and would probably
       | improve overall performance due to cache effects. Dead code
       | sitting in RAM would be bad for cache locality.
        
         | mhh__ wrote:
         | It could save ram but presumably if applied wrong you could
         | waste ram by having multiple copies of the code. So you'd
         | probably want to profile guide this.
        
           | api wrote:
           | Dead code removal and deduplication are orthogonal aren't
           | they? You can already turn off verbosity increasing things
           | with -Os etc.
        
       | jhgb wrote:
       | > We have named this optimization 'PartialExecuter', the key idea
       | behind it being taking advantage of known function parameters to
       | find inner code blocks that cannot ever be possibly executed.
       | 
       | I'm wondering, what exactly are the differences of this approach
       | from "classical" partial evaluation? This seems to be a special
       | case of it, unless I missed something.
        
       | rowanG077 wrote:
       | Have you guys looked the literature of supercompilation? This
       | seems to be a special case of it.
        
         | fulafel wrote:
         | All analytical solutions to optimization problems can be seen
         | as special cases of the brute force search I guess. But SC is
         | impractically slow for anything except a few instructions long
         | sequences.
         | 
         | edit: actually I was thinking of superoptimizers, I guess it's
         | a different concept.
        
           | rowanG077 wrote:
           | Supercompilation is not brute force search.
        
           | titzer wrote:
           | Superoptimizers have gotten a _lot_ better in the past few
           | years. E.g. have a look at [1].
           | 
           | [1] https://arxiv.org/abs/1211.0557
        
         | moralestapia wrote:
         | Supercompilation came to my mind as well!
         | 
         | Wasm + SC would be a killer deal.
        
         | carlopi wrote:
         | Hi, author of the post here, I am not familiar with this
         | concept, do you have any pointers?
        
           | rowanG077 wrote:
           | The idea is to evaluate the program at compile time and
           | obtain a trace of the program that allows you to reconstruct
           | a semantically equivalent program which has some desirable
           | properties. In your case (and most cases in literature) it's
           | done for optimization. Partial evaluation, which is a subset
           | of supercompilation, is similar to what you are doing. See
           | the wiki page:
           | https://en.wikipedia.org/wiki/Partial_evaluation
           | 
           | Some literature:
           | 
           | https://dl.acm.org/doi/10.1145/5956.5957
           | 
           | https://ndmitchell.com/downloads/paper-
           | rethinking_supercompi...
        
       | sabr wrote:
       | Annotated article:
       | https://smort.io/5e621c00-3c95-4807-b0be-488947a34d8a
       | 
       | For the first image to render correctly, please change the theme
       | to light mode
        
         | mkl wrote:
         | So you highlighted a few sentences? What's the point, other
         | than advertising your project?
        
           | sabr wrote:
           | I made highlights as well as formatting changes that I found
           | insightful. Sharing it so that others can benefit from it too
           | :)
        
       | JoshTriplett wrote:
       | This looks great! Please consider upstreaming this into LLVM; it
       | would benefit many other users of LLVM. I'd love to see this used
       | in Rust, for instance.
       | 
       | What kind of compilation performance do you see for how long this
       | pass takes?
       | 
       | Do you apply this to all functions, or to all functions with
       | certain properties, or to functions tagged some particular way?
        
         | masklinn wrote:
         | > This looks great! Please consider upstreaming this into LLVM;
         | it would benefit many other users of LLVM. I'd love to see this
         | used in Rust, for instance.
         | 
         | Or in straight C++ for that matter.
         | 
         | Though possibly the optimisation passes relies on specific
         | properties of the _target_ which don 't exist for non-wasm? I
         | didn't see anything in the writeup but that doesn't mean they
         | don't exist.
        
           | carlopi wrote:
           | PartialExecuter happens at the LLVM's IR level, and in theory
           | it's fully generic. Then has been only partially tested
           | outside the Cheerp-pipeline, so I would expect it to rely on
           | some implicit assumptions on the kind of legalizations or
           | lowering that are done before-hand.
           | 
           | Target-wise, all information is kept encoded in the IR, so
           | that part is easier (the only strange thing that might
           | happens is bumping some alignment to 8, but I expect every
           | target to be able to handle the prescribed IR alignemtn)
        
         | carlopi wrote:
         | Thanks a lot!
         | 
         | I also believe it would be cool to upstream this, we will have
         | to sit down and do some planning.
         | 
         | Compilation time could improve (I was actually working on this
         | today), but it's already in line with other optimizations,
         | taking < 10% of the time spent doing optimizations on a big
         | codebase we use as benchmark.
         | 
         | Currently it's applied to all functions, since runtime it's
         | anyhow somehow linear in the number of Instructions a Function
         | has, but possibly in more costly versions of this (that we have
         | on paper but yet to implement) some logic to filter functions
         | in advance could be used.
        
           | saagarjha wrote:
           | My understanding is that CheerpX, but simulating a full
           | userspace, can optimize all the way through system libraries,
           | which are typically very general and have lots of code that
           | is "dead" to your application. How would this approach fail
           | for applications where the code tends to be more specific to
           | the task?
        
           | JoshTriplett wrote:
           | When upstreaming it, it might make sense to give it two
           | mechanisms: either process _every_ function, or process only
           | functions labeled to use it. Then, a frontend can experiment
           | with things like automatically detecting which functions
           | would benefit from it, using language-specific mechanisms.
           | Or, worst-case, frontends can provide attributes to tag
           | functions explicitly, and libraries can tag functions known
           | to benefit from this.
        
         | syrusakbary wrote:
         | I'd love if the is pass submitted upstream as well if possible.
         | I think a lot of ecosystems could benefit from the awesome work
         | you did!
        
       | richdougherty wrote:
       | I love all these techniques where you can find static info about
       | a program by 'running' it at compile time.
       | 
       | Normally you run a program at runtime (obviously) at which point
       | you have the full environment and inputs, so you can run the
       | program fully.
       | 
       | However... you can also kind of "run" a program at compile time.
       | You can do this using some known and some unknown values in the
       | source code, so you can "partially" run it.
       | 
       | https://en.wikipedia.org/wiki/Partial_evaluation
       | 
       | Or you can use abstract/pretend values instead of real values, so
       | you can "abstractly" interpret it.
       | 
       | https://en.wikipedia.org/wiki/Abstract_interpretation
       | 
       | Running at compile time lets you learn things about the program
       | at compile time, allowing advanced optimisations and error
       | checking.
       | 
       | You might realise that some code can never execute, so you can
       | remove it (as in the project above). Or you can learn that some
       | code is incorrect and give an error at compile time...
        
       | landr0id wrote:
       | This is sweet! This is actually a very similar approach to how I
       | deobfuscate Python bytecode:
       | https://github.com/landaire/unfuck/blob/bfa164b4e261deffeb37...
       | 
       | My code is pretty messy, but I take the same exact approach of
       | taking known function parameters, interpreting the instructions,
       | and removing any condition and the instructions which built its
       | arguments if it evaluates to a constant value. Even called it
       | partial execution as well :p
        
         | carlopi wrote:
         | Congrats, later will check & take inspiration!
        
       | hackcasual wrote:
       | How does this differ from existing LLVM interprocedural
       | optimizations? Particularly value propagation sounds like it
       | would handle a lot of these cases.
       | 
       | > You might have heard of dead-code elimination, an LLVM
       | optimization pass that removes code proven as unreachable. I was
       | actually interested in less-obvious situations. In particular
       | blocks that are reachable on the control flow graph, but not when
       | consider wider execution invariants.
       | 
       | I've got a trivial example here:
       | https://gcc.godbolt.org/z/7EnPG5WM6 noinline is added to foo to
       | demonstrate Clang is actually changing the number of arguments
       | foo takes, and its inlined the arguments from bar and baz.
       | 
       | > Can we use information on the call-sites and the corresponding
       | format strings to prove that some code paths, for example
       | formatting for floating point numbers, are actually never taken?
       | 
       | Seems to already have that effect in Clang. Using Emscripten
       | 3.1.3, compiling 2 different main's with just
       | printf("Hello world %d\n", argc);
       | 
       | And                   printf("Hello world %d %f\n", argc,
       | volatileFloatArg);
       | 
       | For the single int arg, the top 3 symbols by size are
       | 47.7%  2.65Ki -NAN(IND)%       0    printf_core          7.4%
       | 421 -NAN(IND)%       0    [section Data]          6.6%     375
       | -NAN(IND)%       0    main
       | 
       | And for the one that added a float arg,                   34.5%
       | 3.04Ki -NAN(IND)%       0    fmt_fp         28.8%  2.54Ki
       | -NAN(IND)%       0    printf_core          5.1%     464
       | -NAN(IND)%       0    [section Data]
       | 
       | So when a float argument wasn't passed into printf, it did not
       | include fmt_fp, a delegate that handles floating point for printf
       | 
       | One issue that can complicate this analysis is linking multiple
       | libraries that reference standard library (or functions in other
       | libraries). If you're linking together object code, without more
       | IR context, LLVM is going to have to be more conservative. So I
       | think if you link in a WASM object code file that references
       | printf, it won't be able to perform all the IPO that will allow
       | it to trim the CFG.
        
       | [deleted]
        
       | andrewbarba wrote:
       | Is this something that could be exposed as a generic CLI tool to
       | replace something like wasm-opt from Binaryen?
        
         | carlopi wrote:
         | A subset of this could possibly be applied at the wasm-opt
         | level, but consider that at the LLVM's IR level there is more
         | information to be leveraged (in particular PartialExecuter uses
         | information on what memory ranges are read-only).
         | 
         | In general the whole concept behind Cheerp (C++ to WebAssembly
         | + JavaScript compiler) is doing as much as possible at the LLVM
         | IR level, JavaScript concept included, since it's easier and
         | more powerful to do code transformation there.
        
       | [deleted]
        
       ___________________________________________________________________
       (page generated 2022-03-15 23:00 UTC)