[HN Gopher] M1 Icestorm cores can still perform well
       ___________________________________________________________________
        
       M1 Icestorm cores can still perform well
        
       Author : ingve
       Score  : 51 points
       Date   : 2021-09-01 08:02 UTC (1 hours ago)
        
 (HTM) web link (eclecticlight.co)
 (TXT) w3m dump (eclecticlight.co)
        
       | simondotau wrote:
       | TLDR: Based on a single simple synthetic benchmark, the low
       | performance "Icestorm" cores were shown to be as much as 52%--or
       | as little as 18%--of the performance of the primary "Firestorm"
       | cores. Highly efficient assembly showed the least performance
       | drop whereas complex "idiomatic" Swift code showed the greatest
       | performance drop.
       | 
       | However the Icestorm cores also use substantially less energy so
       | they are an efficiency win regardless. Plus they take up use
       | significantly less physical space which is a large cost saving
       | for the SOC part.
        
         | Filligree wrote:
         | How significantly less, I wonder?
         | 
         | For my workloads it'd be an overall win to have more cores at
         | that speed. The more the better; I'd cap out at maybe a a
         | hundred or so.
         | 
         | Obviously Firestorm is better, but a hundred-core desktop CPU
         | at present seems... unlikely.
        
           | maccard wrote:
           | AMD [0] would like a word. it's 64 cores but 128 with
           | hyperthreading.
           | 
           | [0] https://www.amd.com/en/products/cpu/amd-ryzen-
           | threadripper-3...
        
             | Filligree wrote:
             | So, not a hundred-core processor yet.
             | 
             | I can't use hyperthreading. It does give a 60% speed boost,
             | but it's also disabled in production so...
        
         | OskarS wrote:
         | > Highly efficient assembly showed the least performance drop
         | whereas complex "idiomatic" Swift code showed the greatest
         | performance drop.
         | 
         | I wonder what this means. The efficient assembly probably has
         | fewer instructions that use vector instruction and floating
         | point calculations more, while the "idiomatic" Swift probably
         | has just a larger number of instructions that aren't doing
         | heavy calculation. Does that imply then that the high
         | performance cores does much deeper pipelining, but the the
         | number floating point units or whatever is probably pretty
         | similar across both types?
        
           | simondotau wrote:
           | My initial guess is that it's because Icestorm CPUs have less
           | L1 and L2 cache, resulting in more frequent cache misses in
           | complex loops. I'm by no means an expert in any of this, so I
           | really have no place hypothesising.
           | 
           | Firestorm has 128KB L1 per core and 12MB shared L2.
           | 
           | Icestorm has 64KB L1 per core and 4MB shared L2.
        
       | webmobdev wrote:
       | big.LITTLE Processing: Defining the Future of SoC Architecture -
       | https://www.samsung.com/semiconductor/minisite/exynos/newsro...
       | 
       | With this CPU design some cores are optimised for performance (at
       | the expense of using more power) while some cores are optimised
       | for efficiency (using the least power at the expense of computing
       | performance). This makes sense for laptops and smartphones, as it
       | can save power and thus run longer when being powered by
       | batteries. But (in my opinion) not for Desktop PC's where most
       | people care more about computing performance than saving a few
       | watts.
        
         | Synaesthesia wrote:
         | Most of the time your PC isn't working hard, and it makes sense
         | to use lower power cores to perform basic tasks.
        
         | simondotau wrote:
         | I'm not sure that you could make a case for this not making
         | sense in a desktop computer, as everything is ultimately a
         | trade-off.
         | 
         | It's fairly clear that the Icestorm cores represent a
         | performance gain in terms of performance per watt, but also die
         | area. The four Icestorm cores and their support infrastructure
         | takes up about the same physical space as one Firestorm core
         | with its support infrastructure.
         | 
         | I doubt that an M1 with five Firestorm cores would perform as
         | well as the eight cores we did get.
        
         | m_eiman wrote:
         | Saving watts means lowering fan RPM, meaning less noise. And
         | that's a big priority for many.
        
           | n1000 wrote:
           | Also, aren't desktop CPUs constrained by thermal load at some
           | point or can we use ever bigger coolers? Personally, I find
           | it almost obscene that my desktop PC consumes roughly as much
           | as a good old incandescent lightbulb (60+W) _while idling_.
           | My laptop uses as much under full load.
        
             | Synaesthesia wrote:
             | Your PC uses 60w idling? Is that with screen? It's not too
             | much in that case. CPUS and GPUS have gotten a lot better
             | at idle power consumption, and PSUs are also quite
             | efficient these days.
        
           | Roritharr wrote:
           | Not only that, i'd prefer to have a small amount of ram and
           | cpu to be running 24/7 for always-on features that I'd love
           | to have my PC doing.
           | 
           | I don't like having to run a PI for some stuff just because i
           | don't want my huge tower running all the time, it would be
           | really neat if it could run at anything between 5 - 600W, not
           | sure though if the PSUs would be able to offer that range.
        
       | shantara wrote:
       | I didn't realize map-reduce was so much slower than a regular
       | looped multiplication, regardless of the hardware the code was
       | running on.
        
       | codetrotter wrote:
       | If the author is here and able to do so I'd much appreciate if
       | they would share the complete code for the benchmarking as a
       | whole, so that others may use it for benchmarking other code in
       | the same way :)
        
       ___________________________________________________________________
       (page generated 2021-09-01 10:00 UTC)