hngopher.com

       [HN Gopher] Ampere Altra 80-core ARM CPU
       ___________________________________________________________________
        
       Ampere Altra 80-core ARM CPU
        
       Author : cameron_b
       Score  : 130 points
       Date   : 2020-03-03 15:14 UTC (7 hours ago)
        
 (HTM) web link (www.servethehome.com)
 (TXT) w3m dump (www.servethehome.com)
        
       | cameron_b wrote:
       | 192 PCIe Gen4 lanes in 2P platforms - looks like they're
       | optimizing for next-gen storage bandwidth or potentially GPU /
       | TPU integrations. This could be interesting from a company that's
       | been busy working on their B-to-B sales, hopefully solving for
       | problems that cloud platform providers actually have.
        
       | eoerl wrote:
       | the key issue when compared to Epyc is that this is mono-die, and
       | not much faster (even with metrics straight from Ampere). Mono-
       | die means that the die is huge, the yield is low, it's probably
       | pretty expensive to produce (and the reason why they went for
       | 32MB cache, well below Arm's recommendations, core count is a
       | bigger seller than cache it seems). Unless they get massively
       | better performance (they don't), this has no chance vs a multi-
       | die solution which has a much better yield. Intel is cornered in
       | a similar situation right now. The same applies to Graviton, this
       | stands absolutely no chance in the long run.
       | 
       | Not saying that the future has to be multi-die, but if it is not,
       | then it has to be way faster than the cheaper-to-manufacture
       | competition.
        
         | baybal2 wrote:
         | ARM cores are tens of times smaller than X86, plus very likely
         | they fab it on a more mature node
        
           | floatboth wrote:
           | No, no way the Neoverse N1 is _tens_ of times smaller than
           | Zen 2, maybe a little bit smaller. And both are on TSMC 's
           | latest process.
        
         | ksec wrote:
         | >the key issue when compared to Epyc is that this is mono-die,
         | 
         | This die cost metrics is way overblown and its narrative is too
         | narrowly focused. Especially on ARM Server where unit cost
         | dynamics with ARM IP along with much higher margin on server
         | CPU lower the multi die BOM benefits. And the same definitely
         | does not apply to Graviton, which Amazon owns the whole stack.
        
           | eoerl wrote:
           | Amazon owns nothing, not the ARM IP nor the manufacturing
           | chain (TSMC or Samsung most probably), in that field it's not
           | a big player. It owns what it does with Graviton, that's
           | pretty much it.
           | 
           | Else yield obviously counts, that's what stands in the way of
           | this CPU having more cache or 160 cores, for what it's worth,
           | so it has to count for something obviously. The multiple
           | tiers in every cpu manufacturer line up is also a consequence
           | of yield, so it's very much not a minor element of the
           | equation
        
         | msandford wrote:
         | Huge die doesn't really matter if you have the ability to
         | suffer defects on it and still turn out a quality product.
         | 
         | If they put 100 cores on every die but only activate 80 of them
         | then that means they can tolerate absolutely HORRIBLE per-
         | processor yields and still make chips that work. Their yields
         | could actually be BETTER than with chiplets because they can
         | afford so many problems.
         | 
         | Not saying that this is true, BTW, just that it's theoretically
         | and practically possible.
        
       | imtringued wrote:
       | How long until they shut it down? ARM vendors are infamous for
       | quitting before releasing their products.
        
       | jnwatson wrote:
       | "What we will note is that Ampere de-rated both the AMD EPYC 7742
       | and Xeon Platinum 8280 results by 16.5% and 24% respectively.
       | This was done to adjust for using GCC versus AOCC2.0 and ICC
       | 19.0.1.144. Ampere disclosed this, and it is a big impact. Arm
       | servers tend to use GCC as the compiler while there are more
       | optimized compilers out there for AMD and Intel."
       | 
       | If I read this right, they reduce their competitors' benchmarks
       | because they have better compilers? Can anyone justify this?
        
         | p1necone wrote:
         | Very little software is actually compiled with AOCC and ICC.
         | Really Intel and AMD are being dishonest by publishing
         | benchmarks that don't match reality. Of course it's different
         | if you're compiling everything yourself, then those benchmarks
         | might be relevant.
        
           | adev_ wrote:
           | > It's different if you're compiling everything yourself,
           | then those benchmarks might be relevant.
           | 
           | And even if you do it's irrelevant. Most common
           | large/important frameworks won't compile with proprietary
           | compilers.
           | 
           | Doing Bench's with ICC, XLC or other is hypocritical and
           | often does not reflect anything useful.
           | 
           | Only the HPC world can afford to recompile everything with
           | proprietary compilers and justify the man power to do so. And
           | even so, they already have passed most compute intensive
           | kernels on GPGPU with cuda a long time ago.
        
             | pierrebai wrote:
             | No, the correct thing to do is to publish multiple columns
             | showing performance under the different compilers. Large co
             | do use the specific compiler that will give them better
             | performance. I know many big software compiled with ICC.
             | Then, there is the lack of tlak about MSVC. That's one
             | standard compiler used extensively.
             | 
             | In benchmarking you have two choices: publish the real
             | numbers or not. Which option you choose marks you as honest
             | or not.
             | 
             | You can argue about why the numbers for your product are
             | lower in the discussion section of your report. Not in an
             | asterisk.
        
             | pjmlp wrote:
             | IBM, Intel and PGI reign on HPC.
        
         | wmf wrote:
         | If you assume that most real customers use a regular compiler
         | like GCC or Clang then benchmarks using tuned compilers like
         | AOCC (never heard of it before today) or ICC are
         | unrepresentative. However, the proper way to make such a
         | comparison would be to run benchmarks using GCC on all the
         | chips, not to apply magic derating factors. Shame on Ampere for
         | such voodoo benchmarketing.
         | 
         | Oh, and don't miss the 3.0 vs. 3.3 GHz.
        
         | Patrick-STH wrote:
         | GCC is the open-source compiler that is used all over. Not
         | every AMD EPYC system people are using AOCC2.0 on. Likewise,
         | people do not only compile code on ICC that is used on Intel
         | Xeons. Arm has focused efforts on getting optimizations in GCC
         | because it is so popular.
         | 
         | Generally, that is why we prefer to publish "compiler
         | optimized" as best-case performance as well as "GCC" as more of
         | the least common denominator. Both sets of data points are
         | important.
         | 
         | Official SPECint published numbers will not use GCC because the
         | organizations that submit them always want to see the best
         | performance. Ampere used a scaling factor off of published
         | numbers.
         | 
         | If you want to see the impact, we have some numbers from my
         | ThunderX2 review: https://www.servethehome.com/cavium-
         | thunderx2-review-benchma...
         | 
         | You can see the impact clearly there even though that was from
         | a few years ago. Cray has a better performing compiler for
         | ThunderX2 but we did not get to use it due to licensing
         | restrictions.
         | 
         | I hope that helps. The bigger need is for more data since this
         | is one view of performance. There are other needs as well such
         | as FP performance.
        
           | ksec wrote:
           | Nice to know both Patrick and Ian ( Anandtech ) are on HN :)
           | 
           | When could we expect a review on Altra?
        
             | Patrick-STH wrote:
             | Ha! Sometimes it surprises people that we know each other
             | and hang out a bit when we are in the same town.
             | 
             | On an Altra review. Great question. I have been bringing it
             | up for some time and live 15 minutes from their
             | headquarters. The invitation is open on our end.
        
       | m0zg wrote:
       | Looks like a company by a bunch of ex-Intel people. How are they
       | doing on Spectre/Meltdown and other bugs caused by the culture of
       | cutting corners?
        
         | inputError wrote:
         | It's run by the same lady that made Intel buy McAffee in 2010
         | for several billion. Years after it was already a dumpster
         | fire. Yeah, I'm staying away from their products.
        
         | wmf wrote:
         | The N1 core was designed by Arm Austin. It includes "The traps
         | for EL1 and EL0 cache controls, PSTATE SSBS (Speculative Store
         | Bypass Safe) bit that supports software mitigation for Spectre
         | Variant 4, and the speculation barriers (CSDB, SSBB, PSSBB)
         | instructions..."
        
       | tmikaeld wrote:
       | I'm curious how this will perform vs AMD's Epyc line in terms of
       | performance per Watt on different workloads.
        
         | drewg123 wrote:
         | It depends on the workload. We tried the ampere emag, and what
         | killed it for us was that TLS performance was nowhere near
         | modern x86-64 CPUs (Intel or AMD)
        
           | cameron_b wrote:
           | did you use them on Packet?
        
           | Rebelgecko wrote:
           | Was that heavily cipher dependent? I wouldn't be surprised if
           | Chacha20 performed much better than AES w/o any hardware
           | acceleration (other than SIMD instructions)
        
             | drewg123 wrote:
             | Ah, that's interesting. We don't use chacha20. This was
             | AES-GCM
        
               | Rebelgecko wrote:
               | The situation is probably better now that ARMv8 has some
               | crypto-specific instructions, but AES-GCM on older ARMs
               | performed awfully without the instructions specifically
               | for doing AES and Galois field multiplication
        
           | magicalhippo wrote:
           | As a n00b, is that down to poor or lacking encryption
           | hardware support ala AES-NI? Or something else?
        
             | drewg123 wrote:
             | Yes.
        
               | floatboth wrote:
               | You did make sure the AES instructions were used, right?
               | I wouldn't be surprised the AES unit on the eMAG is
               | relatively slow -- other units seem to be as well, e.g.
               | in my silly CRC32 benchmark, the Arm Cortex-A72 did 1kb
               | in 79 ns at just 2.0 GHz while the eMAG did it in 103 ns
               | at much faster clocks (3.0 or 3.3 GHz).
               | 
               | I suspect they might have reused these HW blocks from the
               | old Applied Micro X-Gene :D
               | 
               | But now on the new product, it's all Arm Neoverse cores,
               | it's gonna be great.
        
       | mmoez wrote:
       | Growing sick of the trend of naming companies after famous
       | scientists...
        
         | monocasa wrote:
         | You know that's a unit of electricity as well, right?
        
           | new_realist wrote:
           | A unit of electricity named after a famous scientist.
        
             | monocasa wrote:
             | And?
        
               | znpy wrote:
               | and he/she is growing sick of that, apparently
        
               | DannyB2 wrote:
               | A "he" apparently... https://en.wikipedia.org/wiki/Andr%C
               | 3%A9-Marie_Amp%C3%A8re
               | 
               | He would be spinning in his grave. Which would generate
               | an AC current.
        
               | posterboy wrote:
               | He would be well grounded though having no effective
               | voltage. Bad joke, although, it has potential .
        
               | Symmetry wrote:
               | Just as good as using famous authors I suppose
               | 
               | http://dresdencodak.com/2010/06/03/dark-science-01/
        
         | myself248 wrote:
         | Just as soon as my Kardashian is charged up, I'll swing by your
         | place to drop off the new samples of the 128-core Bieber X.
        
         | derision wrote:
         | Why? What should they name it after instead?
        
           | posterboy wrote:
           | innovation, not copying anyone?
        
             | derision wrote:
             | There's billions of products in existence. I don't think
             | each one having an "innovative" name provides any value at
             | all. And copying is the highest form of flattery. If I had
             | spent my life researching electricity and found out the
             | most innovative company in the electric vehicle and battery
             | industry was named after me I would feel quite proud
        
             | [deleted]
        
       | eecc wrote:
       | What's the point of these wall of text without some hardware-
       | porn?
       | 
       | ;)
        
       | rwmj wrote:
       | For context this is an evolution of the Applied Micro X-gene (I
       | believe this is the 3rd generation). The 1st gen was the famous
       | Mustang, one of the first Aarch64 chips generally available that
       | ran Linux. I still have one in my loft somewhere.
       | 
       | Edit: I should note that if you used the X-gene 1 it was very
       | slow, albeit a reliable workhorse for early 64-bit ARM Linux
       | development. These newer chips have far better performance.
        
         | cameron_b wrote:
         | The meandering paths that these different processor families
         | take is interesting on its own. eMag and the Thunder X2 are
         | expensive to develop but so promising that the product seems to
         | find a new gear even when one company runs our of steam
         | developing it. That or they find a new C-Suite group that has
         | the same opinion as me.
        
         | wmf wrote:
         | Ampere eMAG was X-Gene 3 but Altra appears to have little or no
         | X-Gene IP in it.
        
         | ksec wrote:
         | Thank You, there were too many ARM Server Startup, merger and
         | acquisition I sort of lost count. I think we are left with
         | Ampere and one from Marvell.
         | 
         | I sort of record Applied Micro were doing POWER as well, is
         | that still the case with Ampere?
        
           | floatboth wrote:
           | Yes, seems like Ampere is the big public server player now.
           | 
           | Marvell bought Cavium which has the ThunderX line. (ThunderX2
           | being a rather HPC-oriented chip, I think there's a
           | supercomputer already built with it.) Marvell also makes
           | networking-gear-oriented smaller chips (e.g. Armada 8k), one
           | of which is in my little ARM Desktop (MACCHIATObin) :)
           | 
           | NXP (Layerscape) and Mellanox (BlueField) also make network-
           | oriented chips that have around 24 Cortex-A72 cores. NXP's is
           | in SolidRun's newer workstation product.
           | 
           | Meanwhile Amazon bought Annapurna Labs and they make the
           | Graviton (2) for the AWS cloud. This isn't something you can
           | touch physically but it's going to have the biggest impact of
           | all things. This is the real confirmation that Arm servers
           | are legit and the x86/amd64 monopoly is over.
           | 
           | There's also Huawei HiSilicon's Taishan/Kunpeng stuff, which
           | you apparently can buy if you're a serious business, but now
           | it's available in the public Huawei Cloud, but only for the
           | Chinese region it seems??
           | 
           | Oh and Fujitsu is making some epic chip with HBM2 memory and
           | the new Scalable Vector Extensions. But that's only available
           | if you're making supercomputers.
           | 
           | And Nuvia is going to be a thing eventually.. they have not
           | announced anything yet, we have no idea which ISA they are
           | even going to use (could be RISC-V or POWER or SPARC for all
           | we know) but a prominent UEFI/ACPI-on-Arm person is now their
           | VP of Software and is still referring to the Arm ecosystem as
           | "we"
           | https://twitter.com/jonmasters/status/1234734345350369281 :)
           | 
           | And yeah.. press F to pay respects for Qualcomm Centriq and
           | AMD Seattle.
        
             | ksec wrote:
             | >And Nuvia is going to be a thing eventually..
             | 
             | According to techcrunch they have confirmed it will be
             | built on top ARM.
             | 
             | Edit: That is assuming they sort out their lawsuit with
             | Apple.
             | 
             | [1] https://techcrunch.com/2019/11/15/three-of-apple-and-
             | googles...
        
         | eb0la wrote:
         | I see Ampere eMag is supported by OpenBSD
         | (https://www.openbsd.org/arm64.html) :-)
         | 
         | It had "only" 32 cores. I still find it a lot.
         | 
         | I guess it would be easy to port OpenBSD to the Altra since it
         | boots from UEFI.
        
           | floatboth wrote:
           | Yes. There is no "porting" with new SBSA/SBBR systems -- only
           | fixing bugs and/or adding quirks :)
           | 
           | On FreeBSD for the eMAG, we've had to:
           | 
           | - ignore a wrong value for UART access width https://svnweb.f
           | reebsd.org/base?view=revision&revision=34622... (IIRC Ampere
           | did fix the value in the newer FW revisions)
           | 
           | - restore another register after calling EFI runtime services
           | https://svnweb.freebsd.org/base?view=revision&revision=34699.
           | ..
           | 
           | - fix some PCIe things we were doing wrong https://svnweb.fre
           | ebsd.org/base?view=revision&revision=34792... https://svnweb.
           | freebsd.org/base?view=revision&revision=34793...
           | 
           | - fix some memory map things https://svnweb.freebsd.org/base?
           | view=revision&revision=34958...
        
       | wmf wrote:
       | STH has the slides and some analysis:
       | https://www.servethehome.com/ampere-altra-80-arm-cores-for-c...
        
         | ksec wrote:
         | Yes, was about to post that, much better than the PR. Anandtech
         | also has its take on it as well.
         | 
         | https://www.anandtech.com/show/15575/amperes-altra-80-core-n...
        
         | dang wrote:
         | Ok, we've changed to that from
         | https://amperecomputing.com/ampere-altra-industrys-
         | first-80-.... Thanks!
        
           | Patrick-STH wrote:
           | Hi dang. I know you did not have to do that but I just wanted
           | to say from the STH team we appreciate it. Have a great day.
        
       | [deleted]
        
       ___________________________________________________________________
       (page generated 2020-03-03 23:00 UTC)