[HN Gopher] Tell HN: We are trying to get tail calls into the We...
       ___________________________________________________________________
        
       Tell HN: We are trying to get tail calls into the WebAssembly
       standard
        
       WebAssembly is a modern bytecode supported by all browsers and
       designed to be a compiler target for a wide variety of programming
       languages.  To effectively support some forms of Functional
       Programming support for tail-calls has been proposed as an
       extension to the WebAssembly standard.  This proposal has reached
       Phase3 of the standardization process years ago, but has since
       stalled.  Phase3 is known as "the implementation phase" and the
       prerequisite for advancing the proposal to Phase4 is to have
       support in two different browser engines. V8/Chrome support has
       been available for a long time, so another engine is required.  To
       unblock this situation we have contributed full support for
       WebAssembly Tail Calls to JavaScript/WebKit/Safari. The PR is
       available here:  https://github.com/WebKit/WebKit/pull/2065  An in-
       depth article about the challenges of implementing this feature is
       also available. This is intended both as documentation for our
       contribution, but also as a general explainer about how tails calls
       actually work, with a particular focus on stack space management.
       https://leaningtech.com/fantastic-tail-calls-and-how-to-impl...
        
       Author : apignotti
       Score  : 404 points
       Date   : 2022-07-12 13:14 UTC (9 hours ago)
        
       | dannyobrien wrote:
       | Thanks for doing this!
        
       | neilv wrote:
       | I have to resist the urge to implement Scheme in this, right now,
       | but I hope someone else has a chance to try this out for Scheme
       | soon, and help inform the standard before things finalized.
        
       | [deleted]
        
       | egberts1 wrote:
       | "proposal to move WASM towards a control-flow model friendlier to
       | tail-call optimization, which would bring it more in line with
       | physical hardware."
       | 
       | perhaps it is nigh time to bring the CPU hardware closer to the
       | current WASM design.
       | 
       | Might simplify a lot of issues related to pipelining, cache-
       | busting, and Spectre-like mitigations, not to mention crazy
       | varied breakout of legacy microcode to subfunctions (Intel, I am
       | looking at you as well).
       | 
       | But what do I know, I just play with Unicorn engine.
        
       | lloydatkinson wrote:
       | Please don't let apple block this proposal like they did for
       | browsers. We don't have tail calls because apple decided they
       | couldn't be bothered to implement it.
        
         | Jtsummers wrote:
         | Safari is the mainstream browser with proper tail calls
         | implemented. Was there some historical point where Apple
         | blocked it and so others followed along but then Apple turned
         | around and did it anyways?
        
       | j-pb wrote:
       | Sorry if I miss something obvious, but how is this not solvable
       | by the compiler?
       | 
       | I'm a huge functional programming evangelist, but high-level
       | stuff like this does not belong in a low level language bytecode
       | like WASM.
       | 
       | Wasm should only care about two things: Security and Performance.
       | 
       | With the standard blowing up like crazy we'll get neither. Worse,
       | we'll cemenent the current duopoly of browser engines, because
       | we'll make it (again) so complex that no-one can create an
       | alternative.
       | 
       | We shouldn't have GC, Exceptions or Tail calls in WASM, as long
       | as the compiler can provide them.
       | 
       | What we should have is multi memory support and memory read
       | permissions, because without them WASM lacks basic security
       | primitives.[1]
       | 
       | Reference types maybe. But I've yet to see a convincing argument
       | as to why they are absolutely necessary.
       | 
       | Everything else (as long as it can be done by the compiler) is
       | just bloat.
       | 
       | 1.
       | https://www.usenix.org/conference/usenixsecurity20/presentat...
        
         | kragen wrote:
         | Your concern with burgeoning standards reinforcing the browser
         | duopoly is well-founded, but unfortunately you're sticking your
         | stake in on the wrong side of the dragon here.
         | 
         | It sounds like you're confused about how Wasm works, and you
         | imagine that it's like a normal assembly language, with calls
         | and returns implemented by the compiler using instructions that
         | push and pop a stack in memory. You're probably right that
         | that's how it _should_ have been designed.
         | 
         | But that is not how Wasm works. The stack is not in Wasm's
         | "linear memory". (This makes it much easier to compile access
         | to local variables efficiently.) Calls, returns, and in a sense
         | exceptions in Wasm are provided by the virtual machine at a
         | fundamental level in a way that makes it impossible for a
         | compiler targeting Wasm to implement tail calls, call/cc, or
         | cooperative multithreading, except by compiling your entire
         | program into a single function, which ruins the performance of
         | existing Wasm implementations.
         | 
         | But, as Scheme has demonstrated, once the virtual machine
         | supports tail calls, a compiler can implement exceptions and
         | cooperative multithreading by way of compiling to continuation-
         | passing style. So this is not the beginning of a long series of
         | extensions; it's Lambda the Ultimate Goto.
        
           | light_hue_1 wrote:
           | > But, as Scheme has demonstrated, once the virtual machine
           | supports tail calls, a compiler can implement exceptions and
           | cooperative multithreading by way of compiling to
           | continuation-passing style. So this is not the beginning of a
           | long series of extensions; it's Lambda the Ultimate Goto.
           | 
           | Well to an extent. CPS can be extremely slow.
           | 
           | Scheme shows that continuations will also be needed down the
           | road. But yes, that's about it.
        
             | kragen wrote:
             | In theory CPS is very fast indeed, but of course our hero
             | can easily collapse attempting to cross the chasm of the
             | Sufficiently Smart Compiler. I suspect Chicken using CPS
             | for everything is the main reason
             | http://canonical.org/~kragen/urscheme, which doesn't
             | implement call/cc, is so much faster than Chicken.
        
         | miloignis wrote:
         | Throwing in my 2 cents to agree with apignotti - as someone who
         | has implemented a compiler for a functional language that emits
         | wasm, it is not possible to solve this performantly in the
         | compiler.
         | 
         | Because wasm only has structured control flow & no way for user
         | code to modify the wasm stack, there isn't any good way to tail
         | call between multiple independent functions, particularly
         | dynamic function calls. Simple tail recursion of a function
         | calling itself is simple enough and can be implemented in the
         | compiler (and I did), but that's it, and not enough to
         | correctly implement Scheme, for instance.
         | 
         | I also want the wasm standard to remain as simple as possible,
         | but this isn't a very complex addition and it is required for
         | reasonable performance for both functional languages as well as
         | C++20 features like coroutines, as mentioned by the article.
        
           | emmelaich wrote:
           | Can you please clarify? apignotti wrote
           | 
           | > _Tail-calls is fundamentally something that the compiler
           | _cannot_ solve._
           | 
           | You write
           | 
           | > _it is not possible to solve this performantly in the
           | compiler_
           | 
           | So is it fundamental or not? Apologies if the question seems
           | direct or rude.
        
             | convolvatron wrote:
             | 'call' leaves return state on the stack, so if you use it
             | to implement 'jump'. that has to be cleaned up. a
             | trampoline is kinda the only choice if you are forced to
             | run on a stack. go ahead and use 'call' as much as you
             | like, but as the useless return addresses (and often
             | locals) pile up on the stack, at some point you just return
             | all the way back (or do a longjmp equivalent if your
             | environment supports it) and start over again.
             | 
             | this is measurably worse than just using jump, and as
             | another posted pointed out, can introduce an O(n) term that
             | doesn't need to exist. (edit: nevermind - if you are going
             | through the forward direction n times then it doesn't
             | change complexity to do a little more n work on the way
             | out)
        
             | User23 wrote:
             | > So is it fundamental or not?
             | 
             | Obviously a Turing-complete system can simulate any other
             | Turing-complete system. For example you could write an
             | emulation of any system you like in WASM, and in the
             | emulator you could have tail call optimization. Thus from
             | context it should be apparent that we're talking about
             | efficient implementations.
        
             | miloignis wrote:
             | No worries - it depends on what your constraints and
             | context are. It's fundamentally something a wasm-emitting
             | compiler cannot solve _performantly_ in general. You can
             | use trampolines, as the sibling comments mention, but you
             | 'll take a heavy performance hit.
             | 
             | If we stretch things too far we'll end up in the "wasm is
             | Turing-complete and thus can run anything" type of
             | territory - if you're doing something performance
             | intensive, say x86 virtualization like I believe apignotti
             | is at Leaning Tech, I think it's fair to describe the
             | problem as unsolvable by the compiler.
        
             | thayne wrote:
             | It might be possible for a compiler to generate code that
             | effectively does trampolining. But that would definitely
             | have a performance cost.
        
         | jhgb wrote:
         | > I'm a huge functional programming evangelist, but high-level
         | stuff like this does not belong in a low level language
         | bytecode like WASM.
         | 
         | A tail call is a jump. Jumps are not "high-level stuff".
         | They're very simple instructions. If WASM can't do a jump, it
         | is actually WASM that you can't call "a low level bytecode".
        
           | Spivak wrote:
           | WASM doesn't have jumps [1]. And it's not as low-level as you
           | might expect. But even still you're assuming that low level
           | means a specific model of computation a la PDP-11 that's as
           | fictional as any other.
           | 
           | https://webassembly.github.io/spec/core/syntax/instructions..
           | ..
        
             | msla wrote:
             | WASM isn't particularly low-level, in that it's a
             | compiler's IR, with a few concomitant restrictions on how
             | control flow is allowed to work. Here's a piece on those
             | restrictions, and how they make implementation more
             | difficult:
             | 
             | http://troubles.md/why-do-we-need-the-relooper-algorithm-
             | aga...
             | 
             | > WebAssembly isn't designed as a front-end language for
             | general programming though, so what's the problem? Well,
             | WebAssembly does have some constraints, specifically it
             | must produce code that is valid. WebAssembly is a stack
             | machine, and you can't jump to just any label since that
             | label might point to code that pops too many values off the
             | stack, or pops the wrong types off the stack, or pushes too
             | many values onto the stack.
             | 
             | Also, to summarize another part, when a compiler targeting
             | WASM sees control flow WASM can't directly express, it has
             | to implement it in a loop-and-switch form, something called
             | the Relooper algorithm; therefore, compilers taking WASM
             | down to native code have to understand that, and undo it
             | back into a performant form. This adds up to a lot of work
             | most VMs don't require, hence the proposal to move WASM
             | towards a control-flow model friendlier to tail-call
             | optimization, which would bring it more in line with
             | physical hardware.
        
             | jhgb wrote:
             | I never expected WASM to be low-level to begin with, so I
             | don't believe it could ever possibly be "as low level as I
             | might expect". Something like NaCl (the original one)
             | perhaps could have been called "low level".
             | 
             | > But even still you're assuming that low level means a
             | specific model of computation a la PDP-11 that's as
             | fictional as any other.
             | 
             | I don't see how jumps are "fictional". Plenty of CPUs have
             | them. Even actual stack CPUs, for that matter.
        
               | shadowgovt wrote:
               | Jump as a concrete implementation is easy to understand
               | (it's just math on a program counter).
               | 
               | But code is both a mechanical implementation and an
               | abstract, often mathematical, concept. In the concept
               | space, jump is as abstract as call, variable assignment,
               | operator evaluation, etc. A Von Neumann machine is but
               | one way to implement an abstract mathematical /
               | conceptual machine.
               | 
               | In the mathematical space, there's no "high level" vs.
               | "low level," there's just "features a language has" vs.
               | "features it does not." There are other abstractions that
               | are actually harder / impossible to do correctly with
               | jump in the language (stack unwinding comes to mind,
               | which is why C++ solves the exceptions vs. setjmp /
               | longjmp dichotomy by... Not solving it, and a program
               | that uses both will just do something undefined. C++
               | without setjmp / longjmp would be a language with fewer
               | undefined behaviors).
        
               | jhgb wrote:
               | Possibly, but when _I_ talk about  "low-level stuff", I
               | usually imagine very concrete things that are suitable
               | for physical hardware implementation. Things like what
               | the simplest RISC-V chip is designed to do. Unlimited
               | jumps definitely are one of those things. Notions of
               | stack frames or block structured languages (limitations
               | on jumps) etc. definitely aren't among them. So I
               | couldn't possibly ever consider something like WASM to be
               | "low-level", because it places constraints on your code
               | that no reasonable physical CPU would ever enforce.
               | 
               | [EDIT, from a deleted comment of yours:
               | 
               | > The RISC-V chip is in the same family as the PDP-11
               | architecture. The Turing machine itself requires no JMP
               | instruction, so it is possible to build a CPU without
               | one. Has anyone done so? In general no, because the
               | PDP-11 had profound impact on the world of physical
               | computing. But one does see it from time to time; GLSL,
               | for example, has no `goto` instruction because the
               | graphics card shader hardware it was built to operate on
               | does its magic by running exactly the same instructions
               | on multiple data; jump would imply the ability for one of
               | the instruction flows to be different from the others in
               | its processing cell, which the hardware cannot support
               | while maintaining the throughput it must to accomplish
               | its task.
               | 
               | I get what you're trying to say here, but sort of a
               | guiding principle for me here for many years has been
               | what I call "the principle of minimal total complexity".
               | While you can keep making the CPU simpler in many cases,
               | if that causes your program to become excessively long in
               | the sense that the total size of the description of your
               | program AND the device it's running on starts actually
               | increasing, that's a place you don't want to design
               | yourself into in practice. In case of sequential
               | machines, I don't consider removal of features that makes
               | programs unnecessarily long "making the machine even more
               | low-level", that's just "making the machine dumber" to
               | me. Feel free to look at the Oberon system (both HW and
               | SW) to see what I have in mind by that -- that seems to
               | be about as minimal a complete system as I can imagine in
               | practice.]
        
               | [deleted]
        
               | shadowgovt wrote:
               | To be clear, I deleted that comment because it was
               | inaccurate. GLSL has break, continue, and return, which
               | are flow-control statements. It excludes goto, and my
               | explanation for why it excludes goto is probably
               | incorrect (but I don't have time today to rabbit-hole on
               | why goto was excluded or why the SIMD architecture can
               | support break and continue just fine while excluding
               | goto).
               | 
               | > In case of sequential machines, I don't consider
               | removal of features that makes programs unnecessarily
               | long "making the machine even more low-level", that's
               | just "making the machine dumber"
               | 
               | You may be interested to consider how incredibly complex
               | the modern x86 architecture is to implement because it
               | supports sequential program execution as a core invariant
               | principle. As a result, modern computers (which strive to
               | be faster than a PDP-11) have to do a massive amount of
               | work to parallelize that sequential code because parallel
               | execution is the only frontier of fast computation that
               | remains. They literally rewrite the opcodes on the fly
               | into something that can be SIMD'd and do branch
               | prediction, where code is run speculatively just to
               | discard the result. All to support the idea that
               | deterministic flow control should be _possible_ in 2022.
               | It 's a brilliant fantasy the chipset manufacturers have
               | constructed for us so we don't have to reframe our
               | thinking.
               | 
               | I think I get what you're saying, but in modern times
               | calling jump "low level" (or, for that matter, calling
               | branching in general low level) is a very "Do you think
               | that's air your breathing?" kind of position. I'm not
               | aware of anyone seriously considering approaching the
               | challenge of all this implementation complexity by
               | throwing out fundamental assumptions of our PDP-
               | descendant instruction sets and saying "Here's a new
               | machine code, it's designed for parallel execution, the
               | first assumption you must throw away is the order in
               | which any of these instructions are executed."
               | 
               | But I suspect we're getting very close to that day.
        
               | jhgb wrote:
               | Maybe it was inaccurate but I got what you were trying to
               | say -- that "incoherent execution" is difficult on SIMD
               | machines.
               | 
               | > You may be interested to consider how incredibly
               | complex the modern x86 architecture is to implement
               | because it supports sequential program execution as a
               | core invariant principle. As a result, modern computers
               | (which strive to be faster than a PDP-11) have to do a
               | massive amount of work to parallelize that sequential
               | code because parallel execution is the only frontier of
               | fast computation that remains.
               | 
               | I'm aware what recent CPUs do with ISA instructions.
               | _That_ flies very badly in the face of the minimum
               | complexity principle as well. The amount of physical
               | resources dedicated these days to making your legacy code
               | run just slightly faster is exceptionally wasteful.
               | 
               | > but in modern times calling jump "low level" (or, for
               | that matter, calling branching in general low level) is a
               | very "Do you think that's air your breathing?" kind of
               | position
               | 
               | Well, it's low-level for the kind of minimum complexity
               | system that I'd consider ideal. Not necessarily for the
               | abominations forced on us as an accident of history.
        
               | shadowgovt wrote:
               | You might be interested in a fascinating game called
               | TIS-100 by Zachtronics. The game is a lot of things, but
               | a core premise is that it imagines a computer from a time
               | approximately parallel to the PDP-11 that took the form
               | of small compute components that were connected to each
               | other instead of a monolithic central processor. It's a
               | fun game, and it sort of raises the question of which is
               | the "abomination[s] forced on us as an accident of
               | history." Because when you look around at most of the
               | biological world, you see heavily distributed systems
               | with some centralization, but computers (man-made things
               | that they are) are heavily centralized, clock-locked,
               | deterministic... And energy-intensive. And slow.
               | 
               | History is arbitrary but not random, and it's fun to
               | think about how things might have been different if the
               | first machines started embarrassingly parallel with
               | follow-up work to consolidate the data instead of
               | embarrassingly centralized with us now in the era of how
               | to make the monoliths fast. It's interesting to think
               | about what's "ideal" about a machine that supports
               | arbitrary jumps (and the global address space that
               | demands, and the sequential execution necessary to
               | prevent decoherence, and the memory protection demanded
               | because some addresses are not executable code and should
               | never be executed, etc., etc.).
        
               | jhgb wrote:
               | > and it sort of raises the question of which is the
               | "abomination[s] forced on us as an accident of history."
               | 
               | What I meant by that is the legacy of AMD64 having
               | thousands of instructions, many of them with arbitrary
               | opcode encoding, half-assed SIMD ISA instead of a proper
               | vector ISA, and the ability to emulate an 8086, all
               | purely for reasons of backwards compatibility. If you
               | started designing a computing ecosystem completely from
               | scratch, surely you wouldn't end up with an AMD64-based
               | IBM PC descendant as your best idea you could come up
               | with?
        
               | saagarjha wrote:
               | This would be kind of a sad direction to go in if we
               | didn't have any constraints, since we already have new
               | architectures going in this direction anyways. When you
               | take the crust of the ISA off though all processors look
               | similar and that's where the interesting bits lie.
        
               | munificent wrote:
               | _> Jump as a concrete implementation is easy to
               | understand (it 's just math on a program counter)._
               | 
               | This doesn't take into account what happens to memory on
               | the stack if you jump into or out of the scope of a local
               | variable. You can say that memory and initialization is
               | not the implementation's problem and rely on the compiler
               | to emit instructions that correctly adjust or initialize
               | the stack before and after jumps. Then, yes, you get a
               | very simple jump implementation.
               | 
               | But you also get an architecture that must either do
               | stack analysis before it can safely run code, or you get
               | an unsafe architecture. Those are both valid choices (the
               | JVM does the former and native architectures do the
               | latter), but there are real trade-offs with them.
               | 
               | A third way, that WASM does, is to say that control flow
               | isn't simple, but that you get safe stack management for
               | free with it.
        
               | Spivak wrote:
               | I feel like you're ascribing some judgement to the word
               | fictional, it just means that machines, real or virtual,
               | present programs a model of computation that is an
               | abstraction over the implementation. One such abstraction
               | is the notion of a program that is a block of
               | instructions executed in sequence with a program counter,
               | registers, memory, a stack. But that's not the only
               | abstraction you could imagine. The JVM for example
               | eschews manual memory management. And WASM eschews the
               | concept of a program counter, and arbitrary jumps, and
               | instead only allows structured control flow.
        
               | jhgb wrote:
               | I feel like you're using the word "fictional" in a sense
               | that renders all models of computation "fictional" to the
               | same extent and hence renders the word "fictional" itself
               | meaningless/uninformative. Personally I feel like there
               | _should_ be some distinction between features depending
               | on how much supporting machinery you need to physically
               | implement them. To me at least,  "low-level features" are
               | those that require the smallest amount of physical
               | machinery to implement them. Things like bitwise
               | operations, indirect addressing, arbitrary IP
               | adjustments, etc. Definitely _not_ things like enforcing
               | stack discipline, enforcing block structure and such.
               | (Just so that you know what I mean when _I_ say  "low-
               | level". Opinions on that may differ very wildly, of
               | course, and it's perfectly possible that other people use
               | that term differently.)
               | 
               | I'm not saying that "fictional" things are bad. Tail
               | calls themselves are "fictional" to me in the sense that
               | all tail calls are jumps but not all jumps are tail
               | calls, and the distinction between a tail call and a
               | general jump is too difficult to make for a reasonably
               | sized physical machine and best left to the compiler. But
               | that is not me being judgmental against tail calls, of
               | course -- I _love_ tail calls and consider their lack a
               | huge design failure wherever they 're absent.
        
           | ilaksh wrote:
           | I don't think web assembly is low level at all. Actually the
           | name is confusing because it's really not much like an
           | assembly language.
        
         | gumby wrote:
         | Webassembly doesn't have a goto.
         | 
         | There are some special cased scoped branches but if you read
         | the spec you'll see they are so specific that you can't branch
         | back to the head.
         | 
         | (ex compiler dev)
        
         | brrrrrm wrote:
         | Edit: this is wrong.
         | 
         | For posterity my original comment was:
         | 
         | "From what I understand, tail calls can always (?) be lowered
         | to while loops[1], which are expressible in WASM.
         | 
         | 1.
         | https://en.wikipedia.org/wiki/Tail_call#Relation_to_the_whil...
        
           | mibsl wrote:
           | Only tail recursion, not tail calls in general.
        
           | apignotti wrote:
           | This is possible, and trivial, when self-recursing: A -> A
           | 
           | If you have an A -> B, or A -> [indirect] call, that is not
           | the case.
        
             | marcosdumay wrote:
             | When you have a set of functions that may recourse into
             | each other, you can convert them into a iterative one by
             | doing a `while cond {switch function_selector {..}}` block.
             | 
             | It's an ugly piece of code, but because of parsers, there
             | is a huge amount of know how on it.
        
               | light_hue_1 wrote:
               | You really can't.
               | 
               | For example, what if those functions are in different
               | modules or libraries? Tail calls are still supposed to
               | work.
        
               | marcosdumay wrote:
               | If they are statically linked, it's not a problem, and
               | since AFAIK wasm doesn't support dynamic linking, the
               | only problem is on crossing security boundaries, that
               | will be probably kept unsolved anyway. (IMO, this one is
               | better left unsolved, it's to complicated a problem to
               | require for interoperability.)
        
             | throwaway17_17 wrote:
             | I am genuinely asking, is your position that a compiler
             | cannot convert:
             | 
             | g(): a = 1+1; b = 2+a; print(b); return f()
             | 
             | into code that does not allocate stack space and just
             | reuses the frame allocated for g()?
        
               | SamReidHughes wrote:
               | It depends on the calling convention used -- whether it
               | is caller clean-up or callee clean-up. With caller clean-
               | up, the stack pointer is left below the argument list
               | when the function returns. With callee clean-up, it gets
               | popped above the argument list. You can perform tail
               | calls naturally if it's callee clean-up. (With caller
               | clean-up, a compiler might hypothetically pull off a tail
               | call if the callee's argument list takes less memory, but
               | it's not something you could add as a language feature.)
        
               | throwaway17_17 wrote:
               | Thanks, that explains the trouble I was having. I was
               | thinking about it without any consideration to calling-
               | convention.
        
         | munificent wrote:
         | _> Reference types maybe. But I 've yet to see a convincing
         | argument as to why they are absolutely necessary._
         | 
         | Are you referring to WASM GC here (or managed memory in
         | general)?
         | 
         | If so, I think the main compelling reason is interop. Without
         | GC in underlying architecture, any managed language targeting
         | WASM must include its own runtime and GC. If you then want to
         | write a polyglot program using multiple languages, you now have
         | a single WASM executable containing multiple garbage collectors
         | each managing their own objects. If you end up with cycles
         | between those different managed heaps, you can end up in a
         | situation where the GCs are unable to reclaim memory because
         | they don't understand cross GC cycles.
         | 
         | This is why Chrome did Oilpan [1]: so that they could more
         | easily handle cycles between the DOM, JS heap, and
         | (aspirationally at the time) Dart VM heap.
         | 
         | A more general user-value proposition argument is that putting
         | the GC in the WASM implementation means any improvements made
         | to it are amortized across all managed languages that target
         | WASM. Also, it makes applications written in managed languages
         | and compiled to WASM smaller because they don't have to ship a
         | runtime and GC with the app.
         | 
         | [1]:
         | https://chromium.googlesource.com/v8/v8/+/main/include/cppgc...
        
           | smolder wrote:
           | I'm likely in the minority in my thinking, but I don't think
           | targeting wasm with managed languages is such a great use
           | case for wasm anyway, and so it wasn't worth building in GC
           | to support. For the times you want to, something light like
           | reference counting should be easy enough.
        
             | munificent wrote:
             | I'm biased since I work on a managed language that targets
             | the web, but I strongly believe that it _is_ a great use
             | case for WASM.
             | 
             | Most programs targeting WASM are client-side apps with rich
             | user experiences and I believe all but a small fraction of
             | users would be better served implementing rich UIs in
             | managed languages. Doing UI work in C++ (which I have done
             | plenty of) is a special exercise is pain for almost no
             | upside. GC is great.
        
               | smolder wrote:
               | JavaScript is already a GC'd language effective at UI, so
               | it doesn't seem to me like there's a strong need for wasm
               | to fill the same role. Rather, it seems to me like wasm
               | is better suited to accelerating processing-heavy tasks
               | those rich web apps depend on, like custom
               | [de]compression, ML models, data processing, image
               | processing or simulation of some kind.
        
               | discreteevent wrote:
               | I went back to that pain a few years ago after many years
               | away from it. After a few months I got right back in the
               | swing of things. Spending my days arranging and solving
               | little memory ownership problems.I got to thinking: "This
               | is programming". And I didn't really mind it.
               | 
               | But every so often I had to be honest and remind myself
               | of what I had noticed at the beginning: Thinking about
               | and managing memory is a colossal waste of time, even
               | with smart pointers and the like - The machine can do
               | that and has been able to do it efficiently for most
               | things for a long time.
        
         | mibsl wrote:
         | Compilers can easily lower tail recursion into loops, but not
         | general tail calls between arbitrary (possibly unknown,
         | indirect) functions.
         | 
         | The best they can do are trampolines, which come with a high
         | performance cost.
        
         | [deleted]
        
         | apignotti wrote:
         | Tail-calls is fundamentally something that the compiler
         | _cannot_ solve. Trust me, if there was a way we would have
         | avoided ourselves all this work.
         | 
         | The issue is that to avoid stack blow-up you need the engine to
         | recycle stack frames. You might argue that this could happen
         | implicitly in the VM, with all the calls in tail position being
         | automatically converted to tail-calls. The problem with this is
         | that in WebAssembly (and JavaScript) the call stack is
         | observable via the .stack property of thrown exceptions.
         | 
         | Since an implicit conversion would be observable engines cannot
         | simply optimize the problem away, and that is the reason why
         | new opcodes (return_call/return_call_indirect) are actually
         | required at the WASM level.
         | 
         | For the specific case of direct calls (return_call opcode) the
         | compiler could solve the problem by inlining, with some luck.
         | But for the case of return_call_indirect there is no other
         | possible solution.
        
           | keithwinstein wrote:
           | Heya,
           | 
           | (1) Thank you for implementing this in JSC!! I hope they take
           | it, it makes it into Safari, and the tail-call proposal
           | advances.
           | 
           | (2) I don't think you are exactly right about the call stack
           | being observable via thrown exceptions. There's no formal
           | spec for the v3 exceptions proposal yet, but in the documents
           | and tests, there's nothing that would change in WebAssembly
           | core to make the call stack observable. (There's no ".stack
           | property" that would be added to Wasm itself.) It's true that
           | the proposal amends the JS API (but only the JS API) to
           | describe a traceStack=true option; from Wasm's perspective I
           | understand that's just an ordinary exception that happens to
           | include an externref value (just like any other value) to
           | which Wasm attaches no special significance. The Web-based
           | engines can attach an informative stack trace if they want,
           | but there's no requirement preventing frames from having been
           | optimized out. The non-Web engines won't have to think about
           | this.
           | 
           | (3) I think the real reason that a Wasm engine can't
           | _implicitly_ make tail calls proper is that the spec tests
           | forbid it, basically because they didn 't want the
           | implementation base to fragment by having some engines
           | perform an optimization that changes the space complexity of
           | a program, which some programs would have started to depend
           | on. (The spec tests say: "Implementations are required to
           | have every call consume some abstract resource towards
           | exhausting some abstract finite limit, such that infinitely
           | recursive test cases reliably trap in finite time. This is
           | because otherwise applications could come to depend on it on
           | those implementations and be incompatible with
           | implementations that don't do it (or don't do it under the
           | same circumstances.)")
           | 
           | But the issue is much weaker than "call stack is observable"
           | -- it's more like "infinite recursion must trap eventually,
           | but it can be nondeterministic when."
           | 
           | More discussion here:
           | https://github.com/WebAssembly/spec/issues/150
        
             | dataflow wrote:
             | > Implementations are required to have every call consume
             | some abstract resource towards exhausting some abstract
             | finite limit
             | 
             | If an implementation really wanted to, they could get
             | around this by incrementing a "function call counter" that
             | traps at, say, 2^64, rendering it effectively moot.
             | 
             | I feel like these kinds of situations are where being
             | practical might make more sense than being mathematically
             | precise. Something like "each function call must consume at
             | least N bits of memory" or something concrete like that. Or
             | heck, "implementations may not perform tail-call
             | optimization" or even "implementations must be able to
             | reconstruct the full logical call stack at any point".
        
               | wyager wrote:
               | > each function call must consume at least N bits of
               | memory
               | 
               | This is going to be a problem for any long-running
               | recursive program. Why would we mandate this?
        
           | j-pb wrote:
           | Fair enough. Sounds more like an issue with .stack being an
           | observable property though.
           | 
           | I've long suspected exceptions making it into WASM being it's
           | million dollar mistake, and this further confirms it.
        
             | miloignis wrote:
             | I haven't messed with .stack, but on first impression I
             | agree with you that it seems like a mistake.
             | 
             | Even if it wasn't observable though, I think guaranteed
             | tail-call elimination via either a new opcode or having it
             | be a required property of regular calls in the spec (we
             | already missed that ship, of course) is important. Without
             | it, proper compilation and execution of some languages on
             | wasm would depend on an otherwise invisible property of the
             | engine - that is, tail call elimination isn't just an
             | optimization, it's a critical aspect of language semantics
             | that needs to be guaranteed to be relied on.
        
               | coliveira wrote:
               | If it is a mistake depends on the point of view. What
               | seems to be more important: exceptions of tail-call
               | elimination? Most modern languages use exceptions in one
               | way or another, conversely very few use the later
               | feature. From the point of view of existing software it
               | is way more important to optimize VMs for exception
               | handling than for tail call elimination.
        
               | kragen wrote:
               | It's more important for exception handling to be
               | _implementable_ than for TCE to be implementable, yes.
               | But it 's more important for TCE to be _provided by the
               | virtual machine_ (or, alternatively, for call and return
               | to be manipulations of user-level state, as on
               | traditional CPUs, rather than magic privileged operations
               | as in Wasm) because you can implement exception handling
               | with TCE but you can 't implement TCE with exception
               | handling.
        
               | miloignis wrote:
               | Well wasm doesn't have exceptions right now either! The
               | exceptions proposal is also in phase 3, like tail-calls.
               | Right now wasm just supports traps, which can't be caught
               | or inspected from within wasm code, and don't actually
               | need to record stack frames (but the JavaScript host does
               | in web browsers). I'm not sure what benefit exposing the
               | wasm part of the stack for traps gives the JavaScript
               | host side except perhaps easing debugging? (though that
               | is important, I think there are other valid solutions for
               | that)
               | 
               | To be clear, I do want wasm to support the major language
               | use cases, and I think implementing exception support is
               | a good (though tricky, see the exceptions proposal) idea.
        
               | j-pb wrote:
               | After reading the discussion, I think I'm much more in
               | favour of adding tail calls, than adding exceptions.
               | 
               | Actually I wonder if one couldn't adapt the tail-call
               | mechanism slightly to also serve as an exception
               | mechanism. I think you could do both by allowing `br` to
               | jump to an arbitrary label previously established. An
               | exception then simply being a "tail-call" that unwinds
               | more than one stack frame, and calls into the "catch".
               | 
               | Not sure how well JITable this would be, but I think an
               | arbitrary backjump and call should be mostly fine in
               | terms of control flow?
        
               | bordercases wrote:
        
               | pmontra wrote:
               | Maybe I'm naive and I surely don't know anything about
               | the WASM spec, but aren't CPUs implementing both
               | exceptions and tail call elimination as gotos (some jump
               | machine code instruction) ? Exceptions might have to pop
               | some frame, TCE doesn't.
        
               | dunham wrote:
               | Wasm has functions - despite the name, it is higher level
               | than assembly. Your code defines functions and WASM
               | manages the calling conventions. There are no jumps
               | between functions. (And it sounds like it mandates that
               | tail calls not be implemented between functions.)
               | 
               | But I think if you put everything in one function with a
               | giant switch statement (or just jumps between labels),
               | you might be able to emulate tail calls.
        
           | anfilt wrote:
           | What? Compilers handle tail calls all the time even in
           | languages like C and higher level functional languages.
           | 
           | Its just a jmp? Does the wasm VM not have a jmp
           | instruction?!?
        
             | Jtsummers wrote:
             | https://www.w3.org/TR/wasm-core-1/#control-
             | instructions%E2%9...
             | 
             | Those are the list of control instructions. WASM seems to
             | be a bit of a misnomer as it is not an assembly language in
             | the more conventional sense. It is a structured language
             | and provides a limited goto in the form of branches (in the
             | section I linked to) which are constrained to a particular
             | scope.
             | 
             | If you compile your whole program so it fits within a
             | single function, then yes you can use this to do universal
             | tail call elimination. Otherwise you are restricted to
             | doing TCE only on auto-recursive functions and maybe
             | mutually recursive functions if you have a single entry
             | point (of the set of functions) and decide to optimize by
             | moving all of them into one function.
             | 
             | Otherwise, function calls are performed using one of the
             | two call instructions which (presently) implement a
             | behavior more like conventional call stack/stack frames.
             | This proposal would add a second pair of call instructions
             | that a compiler can emit which the WASM runtime would then
             | optimize (by not generating new stack frames).
        
           | jjtheblunt wrote:
           | > Tail-calls is fundamentally something the the compiler
           | _cannot_ solve.
           | 
           | How does the famous 1977 Guy Steele paper on compilers
           | optimizing tail calls not apply?
           | 
           | https://dl.acm.org/doi/10.1145/800179.810196
        
             | convolvatron wrote:
             | this paper espouses the use of jump+arguments instead of
             | call. Wasm has no control transfer instruction that doesn't
             | use the stack, nor does it have writable access to the
             | stack pointer.
        
             | taeric wrote:
             | My guess is that this is assuming that the compiler can
             | rewrite a return into a jump. That is, these compilers have
             | control over the stack on the machine.
        
               | jjtheblunt wrote:
               | compilers always have control over the stack on the
               | machine: they generate the code which realizes the notion
               | of a stack (and happily usually have dedicated registers
               | to use for that generated code)
        
               | taeric wrote:
               | Not something compiling to WASM. Nor something compiling
               | to JVM Bytecode.
        
               | jjtheblunt wrote:
               | If WASM and JVM Bytecode have a goto, then the compiler
               | can use it. Anyway, if you've never read that paper i
               | linked to above, i do believe you'll find it mind-
               | blowingly cool stuff. not kidding.
        
               | taeric wrote:
               | They don't. That is literally the thing.
               | 
               | Edit: I should say, they don't have anything equivalent
               | that can jump out of the method you are in.
               | 
               | Edit2: This is a bit more clear if you consider what it
               | means for JVM bytecode to have a "return" set of
               | instructions. Why does the bytecode need a return, if
               | that is all managed by code that the compiler should
               | handle anyway? You can look at the instructions here: htt
               | ps://en.wikipedia.org/wiki/List_of_Java_bytecode_instruct
               | .... Note that it is the "jsr" instructions that let you
               | do the equivalent of a long jump, and those specifically
               | manipulate the stack. There is a "goto", but it is not
               | valid to have that jump outside of the subroutine you are
               | in.
        
               | Jtsummers wrote:
               | WASM's version of a goto is a branch, it is constrained
               | to a limited scope. This creates a problem for general
               | tail call elimination but not for auto-recursive
               | functions.
               | 
               | Auto-recursive functions can switch to using a branch or
               | using another loop construct, and the compiler can
               | generate that WASM code. The control structures available
               | are described in the link below. Mutually recursive
               | functions where only one is meant as the entry point
               | could be similarly converted into WASM instructions but
               | where the additional functions are, essentially,
               | eliminated and inlined into the original caller (this may
               | not be a good general solution, though). But general tail
               | calls cannot be converted into gotos as you might with a
               | more conventional assembly language since WASM branches
               | cannot go to arbitrary instruction addresses.
               | 
               | The problem is that in a conventional assembly language,
               | if you do TCE then the goto would go to (har har) the
               | same (or nearly the same) point as a regular function
               | call (maybe it skips the first few instructions depending
               | on the calling conventions involved). With TCE and WASM
               | you'd have to, essentially, inline the called function,
               | or a variant of it, in order to be able to branch to the
               | start of it. You _cannot_ branch into a different
               | function or to the start of a different function, you
               | have to call the other function and then you have a
               | single entry point. Which, in WASM as currently
               | specified, means you have to use the current calling
               | conventions which does not permit TCE.
               | 
               | https://www.w3.org/TR/wasm-core-1/#control-
               | instructions%E2%9...
        
           | nynx wrote:
           | .stack definitely seems like a mistake in the context of
           | Webassembly. I'd be interested in seeing the justification
           | for it.
        
             | aseipp wrote:
             | Because people want a call stack when an exception is
             | thrown, so they can print it out, just the same way they do
             | in JavaScript and every other language.
             | 
             | EDIT: Also as another commenter mentioned, this is a
             | property exposed by the VM engine to the host environment,
             | not something directly observable from within WebAssembly
             | itself.
        
           | OskarS wrote:
           | I don't quite get your point here. Suppose I write a compiler
           | to compile Scheme to WASM. I would _have_ write it so that it
           | does proper tail calls, otherwise it wouldn 't be Scheme.
           | What's stopping me? Would the browsers not run the code?
           | Like, yeah, it would change .stack, but who cares?
           | 
           | Same thing applies, I think, to any other language compiled
           | to WASM. C/C++ compilers regularly inline huge amounts of the
           | code when optimizations are turned on, as well as do tail-
           | call optimization. I haven't tried, but I would assume that
           | they do that for WASM just as they do for x86 or ARM or
           | whatever other build target I choose. As long as my users are
           | ok with this (and they presumably are, otherwise they
           | wouldn't turn optimizations), what's the problem, exactly?
        
             | marshray wrote:
             | > yeah, it would change .stack, but who cares?
             | 
             | You might enjoy participating in an interoperable
             | standardization process sometime.
        
               | OskarS wrote:
               | I don't mean to minimize anybody's work, I'm genuinely
               | curious about why it can't be done. Like, what's stopping
               | me from making a compiler that optimizes tail calls?
               | 
               | For instance: the C and C++ standards have very strict
               | rules for how to handle floating point math, and
               | compilers aren't allowed to deviate from that according
               | to the standard. Which turns off all sorts of cool
               | optimizations you can do. But of course, all modern
               | compilers implement some version of "-ffast-math" which
               | turns off those rules and allows for the optimizations.
               | It's no longer standard C/C++, but that doesn't mean that
               | switch can't exist. The compiler is a computer program,
               | it can output anything it wants, regardless of what the
               | standard says. Nobody is going to go to jail because you
               | turned on -ffast-math. The code will still run just fine.
               | 
               | So, my question is, why can't you write a compiler with
               | an option that's like "I don't particularly care that
               | .stack changes, do the tail call optimization". A -ffast-
               | math, but for tail calls. Is there a technical reason why
               | you can't do this?
        
               | light_hue_1 wrote:
               | > So, my question is, why can't you write a compiler with
               | an option that's like "I don't particularly care that
               | .stack changes, do the tail call optimization". A -ffast-
               | math, but for tail calls. Is there a technical reason why
               | you can't do this?
               | 
               | Because the stack in web assembly is only observable. It
               | is implicit. You cannot modify it yourself. There are
               | simply no instructions for this. The WASM stack is not
               | present on the WASM heap, like it is for your physical
               | machine.
               | 
               | You simply cannot express tail calls with the instruction
               | set given to you by WASM right now. No hacks are
               | possible.
        
             | kragen wrote:
             | > _I don 't quite get your point here. Suppose I write a
             | compiler to compile Scheme to WASM. I would have write it
             | so that it does proper tail calls, otherwise it wouldn't be
             | Scheme. What's stopping me?_
             | 
             | Stack overflows are stopping you, if you implement Scheme
             | calls as Wasm calls.
             | 
             | In Scheme, tail call elimination is not merely an optional
             | "optimization", so the compiler can't opt to not do it when
             | it's inconvenient.
        
           | nu11ptr wrote:
           | > Tail-calls is fundamentally something that the compiler
           | _cannot_ solve
           | 
           | Possibly a stupid question as I haven't given this much
           | thought, but I thought tail call elimination could be used to
           | convert recursive calls in tail position into loops. Could a
           | compiler not do this (like Scala does, for example)?
        
             | lmm wrote:
             | Scala has a weak version that handles self-tail-calls only.
             | It causes significant trouble for e.g. iteratee/streaming
             | libraries in Scala (like FS2) where you always have to use
             | a trampoline.
        
             | samatman wrote:
             | Tail-call elimination applies to any function which calls
             | another function as the last action.
             | 
             | Turning this into a loop is only possible when the function
             | calls itself, not when it calls another function, and not
             | (naively) when two functions call each other in the tail
             | position.
        
               | nu11ptr wrote:
               | Understood, but in practice how often does that really
               | happen? (I'm aware of the contrived 'odd'/'even' example
               | always given, but in the real world, I never had anything
               | like that). Even when I wrote more functional code I
               | rarely (if ever?) did that. 98-99% of the time it was the
               | same function calling itself. Is your experience
               | different?
        
               | dmkolobov wrote:
               | If you're implementing a compiler for a language like
               | Haskell, then everything is compiled into mutually-
               | recursive tail calls.
        
               | dwohnitmok wrote:
               | Happens a fair bit of time when parsing deeply nested
               | structures (e.g. parsing deeply nested JSON structures)
               | using mutually recursive parsers. I've generally resorted
               | to either trampolining or explicit stacks.
        
               | kragen wrote:
               | This is usually a case where TCE doesn't help, in my
               | experience, because most of the calls to sub-parsers
               | aren't in tail position. This is especially unpleasant
               | when you get bug reports of mysterious segfaults that
               | turn out to be stack overflows...
        
               | jhgb wrote:
               | For example pretty much any higher-order function that
               | wants to do some decision-making and then hand over
               | execution to one of several functions provided will want
               | to hand over execution by means of a tail call, so that
               | it doesn't unnecessarily change stack complexity for
               | whatever algorithm is being run around that piece of
               | code.
        
               | User23 wrote:
               | It's useful anytime you can benefit from the performance
               | advantages of replacing a full fledged function call with
               | little more than a jump. One classic example is
               | implementing an efficient VM.
        
               | lmm wrote:
               | Iteratees are immensely useful in practice and they
               | involve doing this all the time (basically you do stream
               | processing with a source and a sink that are mutually
               | recursive - the source emits an element by calling the
               | sink to process it, the sink processes it and then
               | recurses into the source for the next element. It feels a
               | bit forced if you describe it like that, but it gives you
               | a model wiht lots of natural structure - it's the only
               | approach to stream processing I've found where you can
               | have a _value_ that represents a stage in the middle of
               | processing a stream (not just a 1:1 function but
               | something that does grouping or suspension or the like)
               | in a natural way, that you can then test standalone
               | etc.).
        
               | lilyball wrote:
               | State machines can often be implemented with a set of
               | functions that each handle a single state and then tail-
               | call into the function for the next state.
        
               | kevin_thibedeau wrote:
               | This is just a fancy form of spaghetti code. Don't do it
               | unless you actually need the performance.
        
               | marshray wrote:
               | State machines are known for a predilection to use `goto`
               | extensively, but that does not make them 'spaghetti
               | code'.
               | 
               | A state machine is, in fact, a well-understood method of
               | applying structure to spaghetti.
        
               | kevin_thibedeau wrote:
               | The point is that tail calls obscure the transitions
               | between states by spreading it everywhere.
        
               | marshray wrote:
               | I'm not too clear on how one would implement a state
               | machine without "spreading the transitions everywhere".
               | Can you share a link to some example code showing the
               | format you prefer?
               | 
               | Typically, we'd see the code that implements the logic
               | for each state gathered together. Since this is where the
               | outgoing transitions get decided, the transitions end up
               | being distributed across their originating states.
        
               | eloff wrote:
               | That doesn't compute for me. You'd need to give an
               | example for me to understand you.
        
               | kragen wrote:
               | In languages like Scheme, Haskell, OCaml, and Erlang it
               | happens all the time, often when the user isn't aware of
               | it; even when these languages have special looping
               | constructs they are often implemented in terms of tail
               | recursion, sometimes mutual tail recursion between
               | several anonymous subroutines.
               | 
               | In languages like Scala, Java, C#, and Swift, it
               | basically never happens.
        
               | light_hue_1 wrote:
               | > In languages like Scala, Java, C#, and Swift, it
               | basically never happens.
               | 
               | Scalaz and cats use tail calls extensively and wouldn't
               | be possible without them.
        
               | kragen wrote:
               | How does that work on the JVM?
        
               | bitwalker wrote:
               | One example would be in a parser, as various parsing
               | functions would be invoked recursively until end of
               | input, or until some non-matching input is encountered.
               | In pretty much any scenario where you would use recursive
               | functions, it is quite common to have mutual recursion
               | between multiple functions. Not always of course, which
               | is where the tail call elimination optimization is
               | typically applied, but not being able to freely use
               | mutual recursion to arbitrary depth turns out to be quite
               | an annoying limitation.
        
               | merb wrote:
               | > 98-99% of the time it was the same function calling
               | itself
               | 
               | well if the same function is in the "tail" it would still
               | apply a tail call elimination.
               | 
               | But a lot of the time tail call eliminiation is only
               | needed in some extreme cases. it can eliminate stack
               | overflow exceptions by a big amount. C#'s new
               | System.Text.Json would greatly benefit with tail call
               | ops, because it often has a call chain that is recursive
               | and does not call itself (its mostly like ReadObject,
               | ReadObjectAsync, ReadObject, ReadBla, ReadObject, etc.
               | and if the object has a ref to itself system.text.json
               | often blows up.
               | 
               | In Java and Scala this thing can btw. also happen (and
               | I've already seen it) however at least you can configure
               | the stack size there and thus most often eliminate the
               | problem somewhat.
               | 
               | it's an edge case, but one that would be cool to fix,
               | since it makes code often more reasonable especially
               | parsers.
        
               | retrac wrote:
               | In functional languages, recursive data structures are
               | often most naturally walked with recursive functions.
               | Languages like Haskell and ML take this to its natural
               | conclusion and use it systematically. For example, a
               | linked list. Here's how the Haskell "last" function, to
               | get the last item in a linked list, is implemented in the
               | standard library:                   last [x]
               | =  x         last (_:xs)             =  last xs
               | last []                 =  errorEmptyList "last"
               | 
               | If given a list with a single item, return that item. If
               | given a list with two or more items, call itself with the
               | list minus its head. If given an empty list, throw an
               | error.
        
               | nu11ptr wrote:
               | This is the scenario I am used to using where "last" is
               | called in tail position calling itself, but I see from
               | the other answers there are definitely valid use cases
               | for non-same function recursion as well
        
               | dunham wrote:
               | You can turn it into a loop with a case statement inside
               | if you collect the connected component of tail called
               | functions and make those the cases. But it's gonna be a
               | big function.
               | 
               | You can split it up a bit if you use a trampoline, but
               | then you lose some efficiency.
        
         | kamaal wrote:
         | I think its Rich Hickey who said something to the effect that
         | tail calls are so fundamental that the underlying platform
         | should be providing them.
        
           | convolvatron wrote:
           | tail calls are so fundamental that its trivial to build a
           | call-push, return-pop stack calling protocol on top of them,
           | but not the converse
        
             | kaba0 wrote:
             | Could you point me to some further info on this?
        
               | convolvatron wrote:
               | sorry, its kind of ... basic?
               | 
               | the kind of generalized tail call I'm talking about is
               | generally referred to as a continuation. its easiest to
               | think of it as a jump. so you can imagine that if we have
               | jump and a stack register, we can push+jump and thats the
               | same as 'call'. we can pop+jump and thats the same as
               | 'return'. since 'call' and 'return' have side effects, we
               | cant really use them to replace jump.
               | 
               | so thats really straightforward, but things do get a bit
               | screwy. if we can support arbitrary jumps, and the frame
               | storage (arguments + locals) cant necessarily go on a
               | stack because our frames might have arbitrary liftimes,
               | so we cant reclaim them in stack order.
               | 
               | so in a scheme for example we just pull in a gc and track
               | references to these closures and dust our hands off with
               | a smirk. if that doesn't work for you then you have to
               | adopt some kind of framework that lets you know
               | explicitly when these can be released. reference counting
               | is a poor choice here because these references between
               | frames can easily be cyclic.
        
         | marcosdumay wrote:
         | The problem here is that for security, your interpreted code
         | can not do whatever it wishes on your memory. And for
         | performances, you can't validate every small memory operation
         | your code does.
         | 
         | The result is a fairly high-level VM that doesn't export enough
         | power for your program to implement GC or exceptions without
         | sacrificing a lot of performance. Tail calls is a different
         | matter, and there is a cost-benefit analysis for it.
        
         | [deleted]
        
         | wruza wrote:
         | _We shouldn 't have GC, Exceptions or Tail calls in WASM, as
         | long as the compiler can provide them._
         | 
         | I'm not a compiler guy, but spent my time with runtimes
         | closely, and can say that befriending GCs/RCs, exceptions,
         | continuations, etc between them was my least favorite part.
         | 
         | That said, wasm is already a potential target for _different_
         | runtimes built with no common vm in mind, so interop headaches
         | are imminent either way.
        
         | ufo wrote:
         | Wasm is too high level to implement your own tail calls. The
         | WASM virtual machine handles the call stack & function calling
         | convention, instead of being something that the code itself is
         | responsible for. This means that the compiler can't implement
         | tail calls; WASM doesn't allow a jump instruction in one
         | function to jump into a different function.
         | 
         | There are a bunch of reasons why WASM was designed to be higher
         | level than native assembly. One is that they want it to be more
         | compact to save bandwidth when downloading via the internet.
         | Another is that they want to make it faster to compile down to
         | native code; one way they do this is that the control flow is
         | more structured, for example it forbids irreducible control
         | flow graphs. There is also the matter of safety. WASM defines
         | several validation constraints that are checked before the
         | bytecode is run e.g.: the target of a jump instruction must be
         | listed in labels list. If WASM were too low level, it wouldn't
         | be possible to do the safety checks they want.
        
           | MikeHolman wrote:
           | The compiler absolutely can implement tail calls, I don't
           | know why this keeps getting thrown around. Adding a high-
           | level directive in the spec doesn't enable the compiler to do
           | anything, it just enforces it. The only thing preventing it
           | is browser vendors wanting the .stack property to stay well
           | behaved, but that isn't required by the spec and certainly
           | isn't relevant for non-browser targets.
        
             | wtetzner wrote:
             | I think they're referring to a compiler that is targeting
             | WASM, not a WASM-to-machine-code compiler.
        
             | light_hue_1 wrote:
             | This is simply false.
             | 
             | The compiler cannot implement tail calls correctly as it
             | stands. You do not have access to modify the WASM stack and
             | it's not present on the heap like it is for normal
             | programs.
             | 
             | No compiler tricks can enable tail calls in WASM at the
             | moment (with the exception of trampolines which always work
             | and are _absurdly_ slow).
        
       | labrador wrote:
       | I'm using Blazor (C#) WebAssembly and I'm really wishing it could
       | do DOM manipulation. My favorite tool for that is Dart, so I'm
       | working on marrying C# and Dart for my client solutions.
        
         | traviswt wrote:
         | Genuine question: is the goal to get something that is not
         | achievable with JS/TS, or is the goal to simply avoid JS/TS?
        
           | labrador wrote:
           | I really dislike JavaScript for a number of reasons. Dart
           | gives me more distance from it than TypeScript. Of course, I
           | can't avoid it, but I don't have to deal with many annoyances
           | such as which 'this' is this? Should I put 'this' into a var
           | called 'self' to be safe? That's just one example of how
           | JavaScript and I don't get along. I understand some people
           | have brains that think this way, but mine doesn't
        
             | kkdaemas wrote:
             | Surely it's easier to stick to JavaScript The Good Parts
             | than to introduce a whole new layer in the form of Blazor?
        
               | labrador wrote:
               | Blazor is not equal to JavaScript. Blazor is Microsoft's
               | answer to React and Flutter. So far of the three, I like
               | Blazor, but not the server version that uses SignalR to
               | communicate with the client because it locks you in.
               | Blazor Webassembly is awesome, but just needs a good DOM
               | manipulation library. With Blazor Wasm on the client, I
               | can use anything on the back end.
        
             | MuffinFlavored wrote:
             | > I really dislike JavaScript for a number of reasons.
             | 
             | Might I suggest being less sensitive/that your reasons are
             | overblown?
        
               | labrador wrote:
               | I don't like liver and horse radish either. Should I eat
               | that because a lot of people like it?
        
               | 202206241203 wrote:
               | Yes, let's keep using JavaScript for all eternity now.
               | It's COBOL of this century.
        
             | FlorianRappl wrote:
             | The comment reads like you used JavaScript 10 years ago.
             | For instance, just use arrow functions and this remains
             | untouched.
        
               | labrador wrote:
               | I know but that's just bizarre. One function has it's own
               | this and another doesn't so if my arrow function gets big
               | and I want to make it a regular function I may have to
               | rewrite it. Ugh.
               | 
               | Edit: I'm thinking of cognitive load too. I like to
               | eliminate load I don't need. Keeping track of this is not
               | something I want to do with my life
               | 
               | https://drpicox.medium.com/reducing-programmers-
               | cognitive-ov...
        
               | chmod775 wrote:
               | > so if my arrow function gets big and I want to make it
               | a regular function I may have to rewrite it.
               | 
               | There's absolutely no reason to make an arrow function a
               | regular function just because it goes past a certain
               | number of lines.
               | 
               | It really seems like you should become more familiar
               | before forming an opinion. There's valid criticism to be
               | made, but those aren't it.
        
               | labrador wrote:
               | I sometimes refactor code for purely aesthetic reasons.
               | To each his own.
               | 
               | http://www.synergeticapplications.com/ergonomics.htm
               | 
               | By the way, I feel like I'm at a bar and you're negging
               | me
        
             | atwood22 wrote:
             | What do you like about Dart over TypeScript? The type
             | system in TypeScript is far superior to Dart's. Other than
             | that, the languages are more-or-less the same.
        
               | labrador wrote:
               | Actually, I'm taking another look at TypeScript today. I
               | liked it well enough when I last investigated it in 2016,
               | but the code I wrote then was just a veneer over jQuery.
               | I'm sure things have improved dramatically, which I am
               | about to find out. Frankly, a Turing complete type system
               | [1] seems like over-kill to me and a complication I don't
               | need.
               | 
               | Still, Dart had everything I wanted in a front end
               | language in 2013. I think JS devs were short sighted to
               | reject it out of hand. It's a shame it never caught on by
               | itself and not as a just a Flutter language. I read that
               | Google is using it in Fuchsia so there is still a chance
               | Dart will be big in the future!
               | 
               | [1] TypeScript and Turing Completeness:
               | https://itnext.io/typescript-and-turing-completeness-
               | ba8ded8...
        
               | atwood22 wrote:
               | I highly recommend TypeScript. I don't really like the JS
               | runtime, but, purely from a language point of view, TS is
               | my favorite.
               | 
               | Some cool features:
               | 
               | 1) Type unions
               | 
               | interface A { a: string; }
               | 
               | interface B { b: string; }
               | 
               | type C = A | B;
               | 
               | const c: C = { a: 'a' };
               | 
               | 2) Type assertions if ('a' in c) { /* compiler knows c is
               | of type A here */ }
               | 
               | function isA(c: A): c is A { return 'a' in C } //
               | compiler knows c is A if this returns true
               | 
               | 3) Nice helper types interface A { ... }
               | 
               | ReadOnly<A> // Like A, but all properties are read-only
               | 
               | Partial<A> // Like A, but all properties are optional
               | 
               | Omit<A, 'someProp'> // Like A, but without 'someProp'
        
               | shadowgovt wrote:
               | TBH, TypeScript's type system really impressed me. It's
               | the strongest type system of any language I regularly use
               | (I haven't had time to unpack Rust yet, and I learned
               | enough Haskell to decide Haskell didn't help me solve
               | problems I had).
        
               | wruza wrote:
               | It's an expressive type system, but ime it allows
               | developers to go crazy on type interdependencies and
               | general entanglement, so you can't just go to the
               | "header" and quickly figure out what your method or a
               | return value really is, despite TS has structural typing.
               | 
               | E.g. look at this: https://github.com/telegraf/telegraf/b
               | lob/v4/src/telegram-ty...
        
               | atwood22 wrote:
               | That's definitely overwhelming. After reading it for 5
               | minutes though, there is a type called Telegram and it
               | has a field called 'Opts' that has a lot of different
               | fields. MakeExtra lets you take a type from 'Opts' and
               | exclude certain parameters. Probably do something like
               | "you can set the chat message contents, but you can't set
               | the chat id"
        
               | labrador wrote:
               | Interesting. I'll learn more today as I rewrite one of my
               | Dart components into TypeScript. But I also dislike Node
               | and Npm so I'll try Deno first.
        
         | lmm wrote:
         | What's C# giving you that Dart isn't, OOI? My impression is
         | they're fairly similar languages. (Does Dart not have a
         | WebAssembly backend?)
        
         | asabla wrote:
         | For someone about to go knees deep into Blazor very soon.
         | 
         | Do you have any gotchas which isn't really mentioned on MS
         | documentation site? primarily related to performance, since it
         | will be one of my main concerns.
        
           | worble wrote:
           | If one of your main concerns is performance, then all I can
           | say is don't use Blazor. WASM will never be able to compete
           | with regular JS so long as it has to go through an interop
           | layer, plus it has to send down and parse an entire
           | interpreter before it can even start looking at your actual
           | UI and business logic. Server Side has to wait for a server
           | response for every single action you take, and any small
           | drops can make the entire thing sluggish, or even buggy.
           | 
           | That's not to say it isn't cool and can't work for some
           | projects where say, development speed is more important, but
           | performant it just aint.
        
         | vbezhenar wrote:
         | What's wrong with JS bindings? Surely DOM manipulation is slow
         | enough for WASM/JS overhead not being noticeable.
        
           | labrador wrote:
           | You still need to go through a port with Blazor, but you can
           | minimize those calls for the most part. DOM manipulation was
           | slow as recently as a few years ago, but improvements to
           | Blink have made it blazingly fast. I agree with Svelte that
           | we no longer need a virtual DOM
           | 
           | https://svelte.dev/blog/virtual-dom-is-pure-overhead
           | 
           | Edit: I didn't answer your question. There's nothing wrong
           | with JS bindings. Both Blazor and Dart use them, which could
           | get awkward going from Blazor -> Dart (as JS) -> JavaScript
           | lib. I'm considering TypeScript again because no port is
           | needed to call JS libs.
        
       | Zamicol wrote:
       | I'm waiting for constant time WASM.
       | https://github.com/WebAssembly/constant-time/blob/main/propo...
        
       | jolux wrote:
       | Tangential but what's the status of garbage collection and DOM
       | manipulation in WASM? Are we ever getting those? I understand
       | it's a high-value technology without them, but I'm interested in
       | writing full apps in say, OCaml (so I'm glad to hear that WASM is
       | getting TCE!).
        
         | azakai wrote:
         | GC is making a lot of progress. There are VM and toolchain
         | prototypes. You can compile Java and Dart to wasm on those
         | today and it generally works and is pretty fast. (There is also
         | a Kotlin prototype but I have less information about it.) Most
         | of the big spec questions have also been resolved.
         | 
         | DOM manipulation hasn't changed - you still need to call into
         | JS to do those. Ideas like WebIDL bindings have been proposed
         | over the years but haven't shown enough benefit. JS is better
         | for DOM-heavy code, while wasm excels at computation-heavy
         | code. But you can write bindings from wasm (maybe someone
         | already has for OCaml?), which is what toolchains do today -
         | not as fast as JS, but often good enough.
        
           | titzer wrote:
           | I will add to this that we are in a healthy design loop that
           | is tightening in on what I feel is a reasonable final design
           | that is implemented in at least one high-performance engine,
           | V8. AFAIK there are Igalia folks trying to get a parallel
           | implementation in JSC to meet the 2-engine bar.
           | 
           | GC will not directly make DOM manipulation easier, but it
           | will make it easier to integrate DOM references into a
           | managed language that compiles to WASM, since it obviates the
           | need for indirections through tables. This feature alone is
           | majorly opens up capabilities for WASM on the Web!
        
           | marcosdumay wrote:
           | > JS is better for DOM-heavy code
           | 
           | If you only look at performance, yeah. Performance isn't the
           | only thing we get from WASM, and it would be nice to get the
           | other benefits without having to sacrifice it.
           | 
           | (That said, the marshaling needed for DOM access isn't that
           | relevant, I wouldn't say this is a high-priority problem and
           | would prefer people to focus on GC instead, like they are
           | doing.)
        
         | Kaze404 wrote:
         | Have you looked into ReasonML by any chance?
        
           | k__ wrote:
           | I lost track of it since the Reason/ReScript split.
        
         | jillesvangurp wrote:
         | Kotlin 1.7.0 shipped with a WASM compiler recently that
         | currently requires some experimental flags in Chrome to enable
         | features related to garbage collection, threads, and memory
         | management. So, it looks like this is progressing nicely. Once
         | this lands, I imagine there will be a whole range of languages
         | that will be able to make good use of this.
         | 
         | As for DOM manipulation, that seems to work well enough but I'm
         | sure there are further improvements coming.
        
         | wtetzner wrote:
         | While I'm also looking forward to TCE and GC for WASM, you can
         | build full apps in OCaml now via js_of_ocaml.
        
       | doublerabbit wrote:
       | ELI5: Tail-calls?
        
         | Jtsummers wrote:
         | A tail call is a function call occurring in the final position
         | of a function:                 void foo(...) {         ...
         | bar(x,y,z); // <= a tail call       }
         | 
         | If the function has a return value (vice void like above):
         | int foo(...) {         ...         return bar(x,y,z); // <=
         | still a tail call       }
         | 
         | In the way most languages are compiled, function calls generate
         | a new entry in the call stack (a stack frame). This is
         | necessary for all non-tail calls in order to handle the
         | bookkeeping around what to do when the call finishes, how does
         | the caller resume.
         | 
         | With tail calls, that additional stack frame has no real value
         | (outside, maybe, debugging information to give you a stack
         | trace but traces can be collected other ways). Tail call
         | elimination (or tail call optimization) will reuse the current
         | stack frame rather than construct a new one. This reduces the
         | memory overhead (you aren't constructing unnecessary stack
         | frames) and gives some performance improvement (less
         | bookkeeping overhead, and the function call becomes a simple
         | jump). These two functions can, in principle, get compiled to
         | the same thing if you have TCE:                 uint
         | factorial(uint n) {         uint acc = 1;         for(; n > 0;
         | n--) {           acc *= n;         }         return acc;
         | }       uint factorial(uint n, uint acc = 1) {         if (n ==
         | 0) return acc;         return factorial(n - 1, acc * n);
         | }
         | 
         | But while that's a recursive example, tail calls and tail call
         | elimination (TCE) aren't _just_ for recursive cases. My first
         | two examples, though they are just sketches, show a non-
         | recursive example of tail calls. With full TCE (it isn 't
         | uncommon to have TCE only apply to self-recursive cases) those
         | examples would also have tail call elimination performed.
        
         | qsort wrote:
         | https://www.youtube.com/watch?v=-PX0BV9hGZY
        
       | titzer wrote:
       | Thank you for this!
        
       | continuational wrote:
       | Tail calls are super important for non-C like control flow, and
       | it's great that it might be added.
       | 
       | But what I really think wasm should focus on is to get near
       | native performance (say, <50% overhead). Until that happens, the
       | whole endeavor seems pointless to me.
        
         | azakai wrote:
         | Estimates vary (because benchmarks and use cases vary), but
         | it's generally faster than 50%. That number was seen here,
         | 
         | https://www.usenix.org/system/files/atc19-jangda.pdf
         | 
         | (I assume that's what you refer to?) It's a good measurement,
         | but it's from 2019, and it's just on 2 wasm engines. There are
         | other estimates, like here:
         | 
         | https://kripken.github.io/blog/wasm/2020/07/27/wasmboxc.html
         | 
         | That tries to measure the fundamental overhead of wasm's
         | sandboxing as opposed to a specific wasm VM, and it finds just
         | 14%.
        
       | miloignis wrote:
       | This is wonderful news! Thank you so much for doing this, and
       | doing it right - this is much better than my idea of making a bad
       | MVP web browser to provide second implementations to push through
       | proposals when other vendors drag their feet.
       | 
       | As an author of a functional compiler that targets wasm, this is
       | a dream come true - yall do really cool work.
        
       | Animats wrote:
       | I want threads more than tail calls. WASM has processes with
       | limited shared memory, but not real threads. This is a headache
       | for porting high-performance games to the web.
        
       | convolvatron wrote:
       | I'm working on a relational stream processor and this is pretty
       | critical to making it work well. a runtime standard with support
       | for network connections would be nice too, but lets not get
       | greedy
        
         | apignotti wrote:
         | > support for network connections
         | 
         | I wish daily for that as well, but stay tuned: we _might_ have
         | found a half-satisfactory solution for that ;-)
        
           | convolvatron wrote:
           | pointer to a draft?
        
       | jacobmischka wrote:
       | Neat! This proposal caused me a lot of headaches, mechanizing its
       | specification was the primary contribution of my Master's thesis
       | a couple years ago[1]. I forgot until rereading it just now, but
       | doing so caught a typo in the proposal specification[2], my
       | extremely minor contribution to advancing WebAssembly.
       | 
       | Glad to see it finally moving forward after stalling for so long!
       | Excellent work!
       | 
       | [1]: https://github.com/jacobmischka/uwm-masters-
       | thesis/releases/... [2]: https://github.com/WebAssembly/tail-
       | call/issues/10
        
       | carterschonwald wrote:
       | How can I help this?!
       | 
       | I've been wanting tailcalls to be in wasm for years! I consider
       | it a blocker for all sorts of uses I'd be interested in
        
       | andsoitis wrote:
       | > tail-calls has been proposed as an extension to the WebAssembly
       | standard.
       | 
       | Do you know why it wasn't in the standard to begin with?
       | 
       | Even ECMAScript 6 mandates PTC (proper tail call) - article from
       | 2016 on Webkit.org no less -
       | https://webkit.org/blog/6240/ecmascript-6-proper-tail-calls-...
        
         | apignotti wrote:
         | I am unsure, but I can make a guess. The first release of Wasm
         | has been always considered a Minimum Viable Product, which a
         | strong emphasis on Minimum. Pretty much only features already
         | part of the legacy asm.js "standard" made the cut.
         | 
         | The fact that WebKit had some level of support for tail calls
         | on the JS side is exactly the reason we choose it as the right
         | platform to invest in.
        
         | pygy_ wrote:
         | The standard mandates it, and the V8 team implemented it,
         | shipped it behind a flag, then unshipped based on reasons that
         | initially seemed and ultimately were proven fuddy, when WebKit
         | shipped PTC, and the world didn't fall down.
         | 
         | The reason actual why it was withdrawn is that it would have
         | required expensive changes to Microsoft's Chakra (the calling
         | conventions were incompatible).
         | 
         | Then Edge died... and Google didn't add it back. Go figure.
         | 
         | Firefox also has problems implementing cross-realm tail calls.
         | That could be spec'ed around, but there's no will to do so.
        
           | silon42 wrote:
           | If I understand what cross-realm means (code security), then
           | I'm not sure that can be done at all.
        
             | pygy_ wrote:
             | Firefox can't eliminate tail call across realms. The spec
             | could make an exception for that scenario
        
               | dwheeler wrote:
               | That seems reasonable. Tail call elimination, limited to
               | within a realm, still seems pretty useful.
        
           | leoc wrote:
           | These absolute bozos. A Web Assembler that can't do jumps.
           | What a show.
        
             | pygy_ wrote:
             | My comment is about ES6 proper tail calls. WASM is another
             | story
        
               | leoc wrote:
               | Graaagh, sorry.
        
         | brrrrrm wrote:
         | How is ECMAScript 6 (high level language) related to
         | WebAssembly (low level target)?
        
       | AtNightWeCode wrote:
       | You can use F# with WASM. I know F# convert recursive tail calls
       | to loops but I understand this is another case. But shouldn't F#
       | have the same problem?
        
         | Jtsummers wrote:
         | I have not been following WASM generally, but my understanding
         | after reading this:
         | 
         | Unlike regular machine language, WASM does not allow you to
         | control the call semantics. That is, if you emit x86 machine
         | instructions you can handle TCE entirely in your own compiler,
         | the machine itself does not care.
         | 
         | WASM has special call instructions, these perform basically a
         | conventional stack frame/call stack approach and you can't
         | short circuit it. The closest you can get, which works fine for
         | auto-recursive functions and maybe some detectable mutually
         | recursive functions, is to convert tail calls into loops within
         | the compiled WASM output. So recursive tail calls work fine in
         | WASM if your compiler doesn't turn them into _call_ s, but
         | leaves them as a self-implemented (by the compiler) loop
         | structure using whatever appropriate WASM instructions are
         | available. This is fine for auto-recursive functions, but puts
         | the burden at implementing TCE for them on the compiler writer.
         | Which means you'll get it for some languages that compile to
         | WASM, but not all. And only for the limited cases they support
         | (probably restricted to auto-recursive and some mutual
         | recursive circumstances).
         | 
         | What this proposal introduces is a new version of the _call_
         | instruction that would be used in tail call positions. Then the
         | WASM interpreters would do the work of actually combining the
         | stack frames internally. Detecting that a call is a tail call
         | is actually pretty straightforward for a compiler. Given that
         | it is a tail call, it can emit (or not) the new call
         | instruction and will get TCE  "for free". It doesn't have to do
         | a code transformation from recursive to looping, and it can be
         | applied more broadly than just recursive circumstances.
        
       | codedokode wrote:
       | Recursion indeed can be useful when working with trees or graphs.
       | However, I don't like using recursion instead of a loop, because
       | you make reader to unroll your code in their head to understand
       | what it really does. If you need to add two arrays of numbers,
       | use loop or array addition, but don't use recursion as a
       | replacement for a loop.
       | 
       | Sadly the article uses a poor example, writing a useless
       | factorial function. I have never needed such a function in
       | production code, and if I needed, I would write it using a loop.
       | Using bad examples like this might create an impression that
       | recursion is not useful in real code (which is wrong). It would
       | be better if you have used an example that looks like real code
       | and that would become less readable without recursion.
        
         | RHSeeger wrote:
         | > If you need to add two arrays of numbers, use loop or array
         | addition
         | 
         | There are plenty of algorithms that make more sense when
         | expressed using recursion. Iterating over a list of numbers
         | generally isn't one of them. But walking a tree is a good
         | example.
        
           | vbezhenar wrote:
           | Those algorithms usually use non-tail recursion. Most tail
           | recursion uses usually is trivial to rewrite in a loops.
        
           | kmeisthax wrote:
           | More specifically: if you don't want to recursively iterate a
           | tree, you have to hold a list of prior nodes to return to
           | anyway... which is no different from weaving that information
           | into the stack. Non-recursive iteration in this case
           | literally requires you to emulate the recursion.
        
         | jagger27 wrote:
         | Readability is really not a relevant factor for WebAssembly.
         | Having tail-call recursion support is critical for the
         | languages that _compile_ to Wasm that use tail-calls
         | themselves.
        
       | 202206241203 wrote:
       | Why not focus on features that would unlock better integration of
       | actual mainstream languages - Java, C#, Python etc.?
       | 
       | Let functional programmers discuss monoids on their endofunctor
       | forums or something.
        
         | yuri91 wrote:
         | Actually even C++ can benefit from this.
         | 
         | In LLVM C++ coroutines compile down to functions with the
         | "musttail" attribute, that right now is not possible to express
         | in Wasm [1].
         | 
         | It is also useful to efficiently implement stuff like
         | interpreters, where you use it as a way to jump to the code of
         | the next instruction directly (Wasm has only structured control
         | flow, but you can see a tail call as a sort of jump/goto
         | instruction)
         | 
         | [1]: https://discourse.llvm.org/t/supporting-coroutines-
         | without-t...
        
         | azakai wrote:
         | It isn't one or the other. Work on Wasm GC is ongoing and
         | separate from this.
        
         | Jtsummers wrote:
         | Recursion has been a part of programming and programming
         | languages for a very long time, and not just the functional
         | ones. See ALGOL for recursion from very early on.
        
       ___________________________________________________________________
       (page generated 2022-07-12 23:00 UTC)