[HN Gopher] Bit twiddling with Arm Neon: beating SSE movemasks, ...
       ___________________________________________________________________
        
       Bit twiddling with Arm Neon: beating SSE movemasks, counting bits
       and more
        
       Author : danlark
       Score  : 29 points
       Date   : 2022-08-29 19:56 UTC (3 hours ago)
        
 (HTM) web link (community.arm.com)
 (TXT) w3m dump (community.arm.com)
        
       | zX41ZdbW wrote:
       | It improves string comparison and sorting in ClickHouse by 15%:
       | https://github.com/ClickHouse/ClickHouse/pull/38093
        
       | alas44 wrote:
       | Really interesting, thanks for sharing
       | 
       | From the article also, 10-20% improvement (I guess in
       | Instructions Per Cycle) on some str methods in glibc
       | https://sourceware.org/git/?p=glibc.git;a=commit;h=3c9980698...
        
       | olliej wrote:
       | This is a really interesting article. I was expecting some
       | obviously biased and/or marketing horror by virtue of it being on
       | arm.com
       | 
       | It's actually an interesting breakdown of ways NEON differs from
       | SSE, and how a "direct" translation may well be sub optimal.
       | Their first example is really illustrative of this. SSE has an
       | instruction that pulls the top(I think?) but of each register and
       | creates an 8bit mask from those. You can do similar in NEON but
       | the perf is apparently terrible. But NEON has an instruction that
       | packs some bits from each register into a 64bit value, and you
       | can go from that to the masking behaviour you were presumably
       | trying for originally, but much faster.
       | 
       | The other examples and case studies are similarly interesting.
        
       ___________________________________________________________________
       (page generated 2022-08-29 23:00 UTC)