[HN Gopher] Ampere Altra 80-core ARM CPU ___________________________________________________________________ Ampere Altra 80-core ARM CPU Author : cameron_b Score : 130 points Date : 2020-03-03 15:14 UTC (7 hours ago) (HTM) web link (www.servethehome.com) (TXT) w3m dump (www.servethehome.com) | cameron_b wrote: | 192 PCIe Gen4 lanes in 2P platforms - looks like they're | optimizing for next-gen storage bandwidth or potentially GPU / | TPU integrations. This could be interesting from a company that's | been busy working on their B-to-B sales, hopefully solving for | problems that cloud platform providers actually have. | eoerl wrote: | the key issue when compared to Epyc is that this is mono-die, and | not much faster (even with metrics straight from Ampere). Mono- | die means that the die is huge, the yield is low, it's probably | pretty expensive to produce (and the reason why they went for | 32MB cache, well below Arm's recommendations, core count is a | bigger seller than cache it seems). Unless they get massively | better performance (they don't), this has no chance vs a multi- | die solution which has a much better yield. Intel is cornered in | a similar situation right now. The same applies to Graviton, this | stands absolutely no chance in the long run. | | Not saying that the future has to be multi-die, but if it is not, | then it has to be way faster than the cheaper-to-manufacture | competition. | baybal2 wrote: | ARM cores are tens of times smaller than X86, plus very likely | they fab it on a more mature node | floatboth wrote: | No, no way the Neoverse N1 is _tens_ of times smaller than | Zen 2, maybe a little bit smaller. And both are on TSMC 's | latest process. | ksec wrote: | >the key issue when compared to Epyc is that this is mono-die, | | This die cost metrics is way overblown and its narrative is too | narrowly focused. Especially on ARM Server where unit cost | dynamics with ARM IP along with much higher margin on server | CPU lower the multi die BOM benefits. And the same definitely | does not apply to Graviton, which Amazon owns the whole stack. | eoerl wrote: | Amazon owns nothing, not the ARM IP nor the manufacturing | chain (TSMC or Samsung most probably), in that field it's not | a big player. It owns what it does with Graviton, that's | pretty much it. | | Else yield obviously counts, that's what stands in the way of | this CPU having more cache or 160 cores, for what it's worth, | so it has to count for something obviously. The multiple | tiers in every cpu manufacturer line up is also a consequence | of yield, so it's very much not a minor element of the | equation | msandford wrote: | Huge die doesn't really matter if you have the ability to | suffer defects on it and still turn out a quality product. | | If they put 100 cores on every die but only activate 80 of them | then that means they can tolerate absolutely HORRIBLE per- | processor yields and still make chips that work. Their yields | could actually be BETTER than with chiplets because they can | afford so many problems. | | Not saying that this is true, BTW, just that it's theoretically | and practically possible. | imtringued wrote: | How long until they shut it down? ARM vendors are infamous for | quitting before releasing their products. | jnwatson wrote: | "What we will note is that Ampere de-rated both the AMD EPYC 7742 | and Xeon Platinum 8280 results by 16.5% and 24% respectively. | This was done to adjust for using GCC versus AOCC2.0 and ICC | 19.0.1.144. Ampere disclosed this, and it is a big impact. Arm | servers tend to use GCC as the compiler while there are more | optimized compilers out there for AMD and Intel." | | If I read this right, they reduce their competitors' benchmarks | because they have better compilers? Can anyone justify this? | p1necone wrote: | Very little software is actually compiled with AOCC and ICC. | Really Intel and AMD are being dishonest by publishing | benchmarks that don't match reality. Of course it's different | if you're compiling everything yourself, then those benchmarks | might be relevant. | adev_ wrote: | > It's different if you're compiling everything yourself, | then those benchmarks might be relevant. | | And even if you do it's irrelevant. Most common | large/important frameworks won't compile with proprietary | compilers. | | Doing Bench's with ICC, XLC or other is hypocritical and | often does not reflect anything useful. | | Only the HPC world can afford to recompile everything with | proprietary compilers and justify the man power to do so. And | even so, they already have passed most compute intensive | kernels on GPGPU with cuda a long time ago. | pierrebai wrote: | No, the correct thing to do is to publish multiple columns | showing performance under the different compilers. Large co | do use the specific compiler that will give them better | performance. I know many big software compiled with ICC. | Then, there is the lack of tlak about MSVC. That's one | standard compiler used extensively. | | In benchmarking you have two choices: publish the real | numbers or not. Which option you choose marks you as honest | or not. | | You can argue about why the numbers for your product are | lower in the discussion section of your report. Not in an | asterisk. | pjmlp wrote: | IBM, Intel and PGI reign on HPC. | wmf wrote: | If you assume that most real customers use a regular compiler | like GCC or Clang then benchmarks using tuned compilers like | AOCC (never heard of it before today) or ICC are | unrepresentative. However, the proper way to make such a | comparison would be to run benchmarks using GCC on all the | chips, not to apply magic derating factors. Shame on Ampere for | such voodoo benchmarketing. | | Oh, and don't miss the 3.0 vs. 3.3 GHz. | Patrick-STH wrote: | GCC is the open-source compiler that is used all over. Not | every AMD EPYC system people are using AOCC2.0 on. Likewise, | people do not only compile code on ICC that is used on Intel | Xeons. Arm has focused efforts on getting optimizations in GCC | because it is so popular. | | Generally, that is why we prefer to publish "compiler | optimized" as best-case performance as well as "GCC" as more of | the least common denominator. Both sets of data points are | important. | | Official SPECint published numbers will not use GCC because the | organizations that submit them always want to see the best | performance. Ampere used a scaling factor off of published | numbers. | | If you want to see the impact, we have some numbers from my | ThunderX2 review: https://www.servethehome.com/cavium- | thunderx2-review-benchma... | | You can see the impact clearly there even though that was from | a few years ago. Cray has a better performing compiler for | ThunderX2 but we did not get to use it due to licensing | restrictions. | | I hope that helps. The bigger need is for more data since this | is one view of performance. There are other needs as well such | as FP performance. | ksec wrote: | Nice to know both Patrick and Ian ( Anandtech ) are on HN :) | | When could we expect a review on Altra? | Patrick-STH wrote: | Ha! Sometimes it surprises people that we know each other | and hang out a bit when we are in the same town. | | On an Altra review. Great question. I have been bringing it | up for some time and live 15 minutes from their | headquarters. The invitation is open on our end. | m0zg wrote: | Looks like a company by a bunch of ex-Intel people. How are they | doing on Spectre/Meltdown and other bugs caused by the culture of | cutting corners? | inputError wrote: | It's run by the same lady that made Intel buy McAffee in 2010 | for several billion. Years after it was already a dumpster | fire. Yeah, I'm staying away from their products. | wmf wrote: | The N1 core was designed by Arm Austin. It includes "The traps | for EL1 and EL0 cache controls, PSTATE SSBS (Speculative Store | Bypass Safe) bit that supports software mitigation for Spectre | Variant 4, and the speculation barriers (CSDB, SSBB, PSSBB) | instructions..." | tmikaeld wrote: | I'm curious how this will perform vs AMD's Epyc line in terms of | performance per Watt on different workloads. | drewg123 wrote: | It depends on the workload. We tried the ampere emag, and what | killed it for us was that TLS performance was nowhere near | modern x86-64 CPUs (Intel or AMD) | cameron_b wrote: | did you use them on Packet? | Rebelgecko wrote: | Was that heavily cipher dependent? I wouldn't be surprised if | Chacha20 performed much better than AES w/o any hardware | acceleration (other than SIMD instructions) | drewg123 wrote: | Ah, that's interesting. We don't use chacha20. This was | AES-GCM | Rebelgecko wrote: | The situation is probably better now that ARMv8 has some | crypto-specific instructions, but AES-GCM on older ARMs | performed awfully without the instructions specifically | for doing AES and Galois field multiplication | magicalhippo wrote: | As a n00b, is that down to poor or lacking encryption | hardware support ala AES-NI? Or something else? | drewg123 wrote: | Yes. | floatboth wrote: | You did make sure the AES instructions were used, right? | I wouldn't be surprised the AES unit on the eMAG is | relatively slow -- other units seem to be as well, e.g. | in my silly CRC32 benchmark, the Arm Cortex-A72 did 1kb | in 79 ns at just 2.0 GHz while the eMAG did it in 103 ns | at much faster clocks (3.0 or 3.3 GHz). | | I suspect they might have reused these HW blocks from the | old Applied Micro X-Gene :D | | But now on the new product, it's all Arm Neoverse cores, | it's gonna be great. | mmoez wrote: | Growing sick of the trend of naming companies after famous | scientists... | monocasa wrote: | You know that's a unit of electricity as well, right? | new_realist wrote: | A unit of electricity named after a famous scientist. | monocasa wrote: | And? | znpy wrote: | and he/she is growing sick of that, apparently | DannyB2 wrote: | A "he" apparently... https://en.wikipedia.org/wiki/Andr%C | 3%A9-Marie_Amp%C3%A8re | | He would be spinning in his grave. Which would generate | an AC current. | posterboy wrote: | He would be well grounded though having no effective | voltage. Bad joke, although, it has potential . | Symmetry wrote: | Just as good as using famous authors I suppose | | http://dresdencodak.com/2010/06/03/dark-science-01/ | myself248 wrote: | Just as soon as my Kardashian is charged up, I'll swing by your | place to drop off the new samples of the 128-core Bieber X. | derision wrote: | Why? What should they name it after instead? | posterboy wrote: | innovation, not copying anyone? | derision wrote: | There's billions of products in existence. I don't think | each one having an "innovative" name provides any value at | all. And copying is the highest form of flattery. If I had | spent my life researching electricity and found out the | most innovative company in the electric vehicle and battery | industry was named after me I would feel quite proud | [deleted] | eecc wrote: | What's the point of these wall of text without some hardware- | porn? | | ;) | rwmj wrote: | For context this is an evolution of the Applied Micro X-gene (I | believe this is the 3rd generation). The 1st gen was the famous | Mustang, one of the first Aarch64 chips generally available that | ran Linux. I still have one in my loft somewhere. | | Edit: I should note that if you used the X-gene 1 it was very | slow, albeit a reliable workhorse for early 64-bit ARM Linux | development. These newer chips have far better performance. | cameron_b wrote: | The meandering paths that these different processor families | take is interesting on its own. eMag and the Thunder X2 are | expensive to develop but so promising that the product seems to | find a new gear even when one company runs our of steam | developing it. That or they find a new C-Suite group that has | the same opinion as me. | wmf wrote: | Ampere eMAG was X-Gene 3 but Altra appears to have little or no | X-Gene IP in it. | ksec wrote: | Thank You, there were too many ARM Server Startup, merger and | acquisition I sort of lost count. I think we are left with | Ampere and one from Marvell. | | I sort of record Applied Micro were doing POWER as well, is | that still the case with Ampere? | floatboth wrote: | Yes, seems like Ampere is the big public server player now. | | Marvell bought Cavium which has the ThunderX line. (ThunderX2 | being a rather HPC-oriented chip, I think there's a | supercomputer already built with it.) Marvell also makes | networking-gear-oriented smaller chips (e.g. Armada 8k), one | of which is in my little ARM Desktop (MACCHIATObin) :) | | NXP (Layerscape) and Mellanox (BlueField) also make network- | oriented chips that have around 24 Cortex-A72 cores. NXP's is | in SolidRun's newer workstation product. | | Meanwhile Amazon bought Annapurna Labs and they make the | Graviton (2) for the AWS cloud. This isn't something you can | touch physically but it's going to have the biggest impact of | all things. This is the real confirmation that Arm servers | are legit and the x86/amd64 monopoly is over. | | There's also Huawei HiSilicon's Taishan/Kunpeng stuff, which | you apparently can buy if you're a serious business, but now | it's available in the public Huawei Cloud, but only for the | Chinese region it seems?? | | Oh and Fujitsu is making some epic chip with HBM2 memory and | the new Scalable Vector Extensions. But that's only available | if you're making supercomputers. | | And Nuvia is going to be a thing eventually.. they have not | announced anything yet, we have no idea which ISA they are | even going to use (could be RISC-V or POWER or SPARC for all | we know) but a prominent UEFI/ACPI-on-Arm person is now their | VP of Software and is still referring to the Arm ecosystem as | "we" | https://twitter.com/jonmasters/status/1234734345350369281 :) | | And yeah.. press F to pay respects for Qualcomm Centriq and | AMD Seattle. | ksec wrote: | >And Nuvia is going to be a thing eventually.. | | According to techcrunch they have confirmed it will be | built on top ARM. | | Edit: That is assuming they sort out their lawsuit with | Apple. | | [1] https://techcrunch.com/2019/11/15/three-of-apple-and- | googles... | eb0la wrote: | I see Ampere eMag is supported by OpenBSD | (https://www.openbsd.org/arm64.html) :-) | | It had "only" 32 cores. I still find it a lot. | | I guess it would be easy to port OpenBSD to the Altra since it | boots from UEFI. | floatboth wrote: | Yes. There is no "porting" with new SBSA/SBBR systems -- only | fixing bugs and/or adding quirks :) | | On FreeBSD for the eMAG, we've had to: | | - ignore a wrong value for UART access width https://svnweb.f | reebsd.org/base?view=revision&revision=34622... (IIRC Ampere | did fix the value in the newer FW revisions) | | - restore another register after calling EFI runtime services | https://svnweb.freebsd.org/base?view=revision&revision=34699. | .. | | - fix some PCIe things we were doing wrong https://svnweb.fre | ebsd.org/base?view=revision&revision=34792... https://svnweb. | freebsd.org/base?view=revision&revision=34793... | | - fix some memory map things https://svnweb.freebsd.org/base? | view=revision&revision=34958... | wmf wrote: | STH has the slides and some analysis: | https://www.servethehome.com/ampere-altra-80-arm-cores-for-c... | ksec wrote: | Yes, was about to post that, much better than the PR. Anandtech | also has its take on it as well. | | https://www.anandtech.com/show/15575/amperes-altra-80-core-n... | dang wrote: | Ok, we've changed to that from | https://amperecomputing.com/ampere-altra-industrys- | first-80-.... Thanks! | Patrick-STH wrote: | Hi dang. I know you did not have to do that but I just wanted | to say from the STH team we appreciate it. Have a great day. | [deleted] ___________________________________________________________________ (page generated 2020-03-03 23:00 UTC)