[HN Gopher] Bit twiddling with Arm Neon: beating SSE movemasks, ... ___________________________________________________________________ Bit twiddling with Arm Neon: beating SSE movemasks, counting bits and more Author : danlark Score : 29 points Date : 2022-08-29 19:56 UTC (3 hours ago) (HTM) web link (community.arm.com) (TXT) w3m dump (community.arm.com) | zX41ZdbW wrote: | It improves string comparison and sorting in ClickHouse by 15%: | https://github.com/ClickHouse/ClickHouse/pull/38093 | alas44 wrote: | Really interesting, thanks for sharing | | From the article also, 10-20% improvement (I guess in | Instructions Per Cycle) on some str methods in glibc | https://sourceware.org/git/?p=glibc.git;a=commit;h=3c9980698... | olliej wrote: | This is a really interesting article. I was expecting some | obviously biased and/or marketing horror by virtue of it being on | arm.com | | It's actually an interesting breakdown of ways NEON differs from | SSE, and how a "direct" translation may well be sub optimal. | Their first example is really illustrative of this. SSE has an | instruction that pulls the top(I think?) but of each register and | creates an 8bit mask from those. You can do similar in NEON but | the perf is apparently terrible. But NEON has an instruction that | packs some bits from each register into a 64bit value, and you | can go from that to the masking behaviour you were presumably | trying for originally, but much faster. | | The other examples and case studies are similarly interesting. ___________________________________________________________________ (page generated 2022-08-29 23:00 UTC)