[HN Gopher] Auto-vectorization for the masses (2011)
       ___________________________________________________________________
        
       Auto-vectorization for the masses (2011)
        
       Author : lelf
       Score  : 26 points
       Date   : 2020-02-15 05:46 UTC (17 hours ago)
        
 (HTM) web link (leiradel.github.io)
 (TXT) w3m dump (leiradel.github.io)
        
       | epistasis wrote:
       | Very interesting and useful to see.
       | 
       | And in an entirely approach for vectorization for the masses: I
       | do wish that it was easier to access vectorization through BLAS,
       | a library that is well supported across nearly all languages,
       | gets massively optimized, but is hard to install correctly.
        
         | chewxy wrote:
         | Good news is that the Gonum team has been working on an
         | optimized pure Go version of BLAS. It's at parity with netlib
         | blas for some of the important functions (GEMV, GEMV, etc).
         | 
         | Why is this good news? Go is a very easy to use language, and
         | it favours using compile targets, leading it to be available
         | across different platforms. To install, one simply does `go get
         | gonum.org/v1/gonum`
        
           | jedbrown wrote:
           | Netlib BLAS is a very low bar [1], and not at all how one
           | should go about writing a performance portable BLAS. BLIS
           | (https://github.com/flame/blis/) is a much better approach,
           | and underlies vendor implementations on AMD
           | (https://developer.amd.com/amd-aocl/blas-library/) and many
           | embedded systems.
           | 
           | [1] GEMV is entirely limited by memory bandwidth, thus quite
           | uninteresting from a vectorization standpoint. Maybe you
           | meant GEMM?
        
       | marklacey wrote:
       | I only barely skimmed the post and the follow-on posts so this is
       | less about that and more about autovectorizers.
       | 
       | Autovectorization is the wrong approach for data-parallelization.
       | You don't want to rely on a brittle unpredictable code
       | transformation for performance in this case. You want to bake it
       | into the programming model.
       | 
       | ispc uses this approach and it results in performance
       | predictability to a large degree. You can imagine other
       | approaches as well, like explicitly data-parallel loops, or a
       | declarative approach.
       | 
       | Most of these (and the GPU data-parallel models) rely to a very
       | large extent on the programmer to manage data dependencies to
       | ensure correctness.
        
         | llukas wrote:
         | Just for the record: you rely on performance tests to guarantee
         | performance, nothing else.
        
         | tom_mellior wrote:
         | > You don't want to rely on a brittle unpredictable code
         | transformation for performance in this case.
         | 
         | That's somewhat true, but much of the unpredictability could be
         | removed if compilers provided annotations saying "I expect this
         | loop to be vectorized" where the compiler would be forced to
         | report an error if it didn't manage to do it.
        
       | rsp1984 wrote:
       | This has been done by Intel: https://ispc.github.io
        
         | tom_ wrote:
         | More about ispc, from Matt Pharr:
         | https://pharr.org/matt/blog/2018/04/30/ispc-all.html - includes
         | some discussion of Intel's corporate culture. Interesting
         | throughout.
        
       | tom_mellior wrote:
       | So... skimming this post and its successors, I didn't see any
       | actual examples of generated vector code, especially not examples
       | that GCC can't do although they are supposedly "easy". And no
       | benchmarks. Did I miss anything or did this project really die
       | before it got to vectorization (or anything more interesting than
       | constant folding)?
        
       ___________________________________________________________________
       (page generated 2020-02-15 23:00 UTC)