hngopher.com

       [HN Gopher] Metal shader converter and the missing device-scoped...
       ___________________________________________________________________
        
       Metal shader converter and the missing device-scoped barrier
        
       Author : raphlinus
       Score  : 25 points
       Date   : 2023-06-12 18:40 UTC (1 days ago)
        
 (HTM) web link (raphlinus.github.io)
 (TXT) w3m dump (raphlinus.github.io)
        
       | tedunangst wrote:
       | So how [well] does MoltenVK work? The prevailing attitude I've
       | seen is basically "just target vulkan for everything because it
       | just works" but I'm not sure how much experience is reflected in
       | such claims.
        
         | raphlinus wrote:
         | If you're doing advanced compute work (including lock-free data
         | structures), then it's best effort.
         | 
         | https://github.com/linebender/vello/issues/42 is an issue from
         | when Vello (then piet-gpu) had a single-pass prefix sum
         | algorithm. Looking back, I'm fairly confident that it's a
         | shader translation issue and that it wouldn't work with
         | MoltenVK either, but we stopped investigating when we moved to
         | a more robustly portable approach.
        
       | bronxbomber92 wrote:
       | I believe this post is referring to device-scoped _memory_
       | barriers - also sometimes called fences - as opposed to
       | _execution_ barriers.
       | 
       | The former being a mechanism to ensure memory accesses follow a
       | well defined order (e.g. it'd be bad if the memory accesses
       | executed inside a critical section could be reordered before or
       | after the lock and unlock calls).
       | 
       | The latter being a mechanism that ensures all threads (within
       | some scope, perhaps all threads running on the "device") reach
       | the same point in the program before any are allowed to proceed.
        
         | raphlinus wrote:
         | That's correct, it's the _memory scope_ that I expect to be
         | device-scoped. GPUs tend not to have execution barriers in the
         | shader language beyond workgroup scope; generally the next
         | coarser granularity for synchronization is a separate dispatch.
         | However, single-pass prefix sum algorithms, including decoupled
         | look-back, can function just fine with device-scoped memory
         | barriers, and do not require execution barriers with coarser
         | scope than workgroup.
        
       | Animats wrote:
       | Apple having to Think Different mean we need about two more
       | layers in portable games.
        
       | richdodd wrote:
       | Does the M1/M2 use ARM designs in the GPU as well as the CPU? If
       | so, it might be possible to work out what could be implemented by
       | looking at the [arm docs](https://developer.arm.com/documentation
       | /102203/0100/Valhall-...).
        
         | richdodd wrote:
         | Hmm OK according to the doucmentation they designed the GPU
         | themselves, so there's no public information on them.
        
         | nicoburns wrote:
         | No, they have a custom GPU design originally derived from
         | Imagination Technologies PowerVR GPUs.
        
         | raphlinus wrote:
         | The most complete documentation is in the applegpu repo[1] by
         | dougallj showing a great deal of recent activity (including by
         | alyssarosenzweig). Last I checked, the documentation of barrier
         | instructions wasn't complete enough to tell whether these
         | device-scoped barriers are possible. (Note: on RDNA2, they're
         | accomplished by DLC and GLC flags on memory accesses, combined
         | with cache flush instructions such as S_GL1_INV).
         | 
         | There's also a lot of great material, accessibly written, on
         | Alyssa's blog[2], see in particular the posts titled
         | "Dissecting the Apple M1 GPU, part ${I}".
         | 
         | [1]: https://github.com/dougallj/applegpu
         | 
         | [2]: https://rosenzweig.io/
        
         | DeRock wrote:
         | Apple doesn't use ARM IP for either, and hasn't for many years.
        
       ___________________________________________________________________
       (page generated 2023-06-13 23:01 UTC)