[HN Gopher] Apple M1 Ultra ___________________________________________________________________ Apple M1 Ultra Author : davidbarker Score : 546 points Date : 2022-03-08 19:00 UTC (3 hours ago) (HTM) web link (www.apple.com) (TXT) w3m dump (www.apple.com) | mwcampbell wrote: | I wonder how long it will be until a CPU as capable as this one | will be the required baseline for ordinary apps. Is there any | hope that the upgrade treadmill will stop any time soon? | FredPret wrote: | One day we'll be up to our eyeballs in computronium, and it | will never be enough. | | Thinking is a superpower even better than being the first | species to develop sight. | | See also "The Last Question" by Asimov. | bufferoverflow wrote: | There will always be apps that use all the power you throw at | them. Raytracing scales pretty much linearly with the core | count. Compression does too. | | But your everyday apps like your browser have been fast for at | least a decade. | xedrac wrote: | While I'm not on the Apple train, I love how they are pushing | AMD, Intel and NVidia out of complacency. No more of these little | tiny incremental improvements to milk the industry. Bring your | BEST to the table or get left behind! | 3836293648 wrote: | None of those three were anywhere near complacent before Apple | released the M1. A few years ago, before Zen, absolutely, but | now it's actually very competitive. But more competition | doesn't hurt | dartharva wrote: | This can also backfire on consumers if the competitors decide | to keep up with their user-hostile practices as well: locked- | down walled gardens, zero customisability/upgradability for | hardware, low repairability, low interoperability with set | standards to "distinguish" their products, planned | obsolescence, etc. | olliej wrote: | I'm find the slow drip of M1 extensions to be kind of ehn - | like the tick part of the old intel cycle, only in this case | it's literally just gluing more of the same cores together | (obviously work is involved, but not at the level of an | architecture rev) | | (edit: calm down people, I recognize it's impressive, but it's | just not as fun an announcement as an architecture rev, which I | was hoping for after a year :D ) | teilo wrote: | It's literally not. | outworlder wrote: | If 'gluing cores together' were this simple, every random | desktop CPU would have 20 cores. That's not the case. | Dylan16807 wrote: | You're forgetting price. If simplicity was the main | concern, every random desktop CPU _would_ have 16+ cores | right now. | jjtheblunt wrote: | what? the unified memory growth IS an architecture rev | soheil wrote: | Milk? | mulmen wrote: | milk verb 2 : to draw something from as if by | milking: such as b : to draw or coerce profit or | advantage from illicitly or to an extreme degree : exploit | milk the joke for all it's worth | | https://www.merriam-webster.com/dictionary/milk | jonplackett wrote: | Makes you wonder how many they can glue together. Looking at | you Mac Pro | zitterbewegung wrote: | Probably at least four together with LPDDR5X to get up to 1.5 | TB and forty cores. | amelius wrote: | Let's hope they don't also push their lock-in business models | onto others. | pphysch wrote: | > This enables M1 Ultra to behave and be recognized by software | as one chip, so developers don't need to rewrite code to take | advantage of its performance. There's never been anything like | it. | | Since when did the average developer care about how many sockets | a mobo has...? | | Surely you still have to carefully pin processes and reason about | memory access patterns if you want maximum performance. | KolenCh wrote: | For any applications people use to justify buying a more than | one socket machine needs this. | | Eg simulation softwares often used in the industry (but the one | I've on top of my head is Windows only.) | | Anyway, the point the make is this: if you claim doubling | performance, but only the selected few softwares as you | observed would be optimized to take advantage of this extra | performance, then this is mostly useless to the average | consumer. So their point is made exactly with your observation | in mind, that all your softwares is benefiting from it. | | But actually their statement is obviously wrong for people in | the business--this is still NUMA and your software should be | NUMA aware to be really squeezing the last bit of performance. | It just degrades more gracefully to non optimized code. | KaiserPro wrote: | I think its more the case that OSX hasn't really had SMP/numa | for a long time. | | My understanding was the the dustbin was designed with one big | processor because SMP/numa was a massive pain in the arse for | the kernel devs at the time so it was easier to just drop it | and not worry. | jra101 wrote: | They are referring to the GPU part of the chip. There are two | separate GPU complexes on the die but from the software point | of view, it is a single large GPU. | grork wrote: | I thought that to extract peak performance out of NUMA based | systems, you had to get down-and-dirty with memory access & | locality to ensure you don't cross sockets for data thats | stored in RAM attached to other CPUs. | | Or am I out of date on NUMA systems? | pphysch wrote: | The big dies these days (M1 included) have non-uniform memory | access baked in because they distribute the memory caches. If | you want maximum performance, you will certainly want to be | aware of which "performance core" you're running in. | otherjason wrote: | This is what they were referring to. To get optimum | performance out of NUMA systems, you need to be careful about | memory allocation and usage to maximize the proportion of | your accesses that are local to the NUMA domain where the | code is running. Apple's answer here is essentially "we made | the link between NUMA domains have such high bandwidth, you | don't even have to think about this." | adfgadfgaery wrote: | This line is nonsense and you can safely ignore it. There have | been multi-chip-modules that act like a single socket for many | years. In particular, pretty much every current AMD CPU works | that way. I guarantee you that for the M1 Ultra, just like | every CPU before it, the abstraction will be leaky. Programmers | will still care about the interconnect when eking out the last | few percent of performance. | | Remember the Pentium D? Unfortunately, I used to own one. | kllrnohj wrote: | The existing AMD CPUs aren't _quite_ like that. Technically | they are all UMA, not NUMA - the L3 cache is distributed, but | they are all behind a single memory controller with | consistent latencies to all cores. But the Threadripper 1st | gen was absolutely like that. Straight up 2+ CPUs connected | via infinity fabric pretending to be a single CPU. So is that | 56 core Xeon that Intel was bragging about for a while there | until the 64 core Epycs & Threadrippers embarrassed the hell | out of it. | yuuko11 wrote: | Not sure what is required of a dev, but as an example, Adobe | Premiere pro doesn't take any advantage of >1 CPU, at least on | Windows. https://www.pugetsystems.com/labs/articles/Should-you- | use-a-... | masklinn wrote: | It's probably not "average developer" either but some of the | big box software still has per-socket licensing, or had until | recently anyway. | w0mbat wrote: | Article is 5 years old. | AdrianB1 wrote: | CPU as in core or socket? These days most CPUs are "many-CPU- | cores-in-1-socket" and having X CPU cores over 1 or 2 sockets | make a small difference, but software does not care about | sockets. | aidenn0 wrote: | Plenty of enterprise software is licensed on a per-socket | basis. | pphysch wrote: | And if they read this press release they will probably | try to switch to per-core licensing. | excerionsforte wrote: | They say this is the M1 Ultra Benchmark | https://browser.geekbench.com/v5/cpu/13330272 Wow. | lend000 wrote: | Are the neural engines actually used for anything, to anyone's | knowledge? | | Edit: Apparently in iPhones, they are used for FaceID. | can16358p wrote: | Probably object tracking in videos will be the best use of | them. | | Or, there will be some new form of video generation (like the | ones generating video from Deep Dream etc, but something aimed | at studio production) using ML that wasn't practically usable | before. | | It opens many doors, but it will take at least many months, if | not years, to see some new "kind" of software to emerge that | efficiently makes use of them. | daggersandscars wrote: | Adobe uses them Lightroom / Photoshop for some functions. | | https://www.digitalcameraworld.com/news/apple-m1-chip-makes-... | acchow wrote: | I thought it's also used to activate Siri by voice without any | CPU usage | bm-rf wrote: | Would something like huggingface transformers ever be able to | support this? Or is it best fit to just use the GPU. | sharikous wrote: | CoreML models may run on them to macOS's discretion. If you | manage to get your neural network in CoreML's format you may | use it. | jazzyjackson wrote: | Adobe Photoshop, Premiere etc make use of it for scene | detection, content aware fill, "neural filters" and so on | poyu wrote: | This[1] neural filter? | | [1] https://www.youtube.com/watch?v=hq8DgpgtSQQ | speed_spread wrote: | It's to build a giant distributed Aleph in which a preserved | digitized Steve Jobs can live once again. | xiphias2 wrote: | I think the most important use of the neural engines so far is | for the internal camera postprocessing. Better camera | postprocessing is the reason why people buy new iPhones. | piyh wrote: | Throw in translation, on device image labeling, stuff like on | body walking/biking detection, voice recognition. | sercand wrote: | Neural Engines may be used in CoreML models. I don't know it | can be used with Apple's BNNS library [1]. You can use with | TensorFlow Lite with coreML delegate as well [2]. And some | tried to reverse engineer it and used it for model training | [3]. | | [1] https://developer.apple.com/documentation/accelerate/bnns | | [2] https://www.tensorflow.org/lite/performance/coreml_delegate | | [3] https://github.com/geohot/tinygrad#ane-support-broken | vmception wrote: | > M1 Ultra can be configured with up to 128GB of high-bandwidth, | low-latency unified memory | | Nice! Good enough to run a Solana node! | | I was slightly annoyed that the M1 Max's 64gb RAM puts it just | under the system requirements, at that premium price | | But I don't have any other theoretical use case for that much | resources | xiphias2 wrote: | Didn't AMD do something similar with putting 2 CPU chips together | with cache in-between? What's the difference here in packaging | technology? (maybe there is no shared cache here) | diamondlovesyou wrote: | Yes. AMD has had integrated CPU+GPU+ _cache-coherent HBM_ for a | while. You can 't buy these parts as a consumer though. And | they're probably priced north of 20k$/each at volume, with the | usual healthy enterprise-quality margins. | paulpan wrote: | I think you're referring to AMD's 3D V-Cache, which is already | out in their Epyc "Milan X" lineup and forthcoming Ryzen | 5800X3D. https://www.amd.com/en/campaigns/3d-v-cache | | Whereas AMD's solution is focused on increasing the cache size | (hence the 3D stacking), Apple here seems to be connecting the | 2 M1 Max chips more tightly. It's actually more reminiscent of | AMD's Infinity Fabric interconnect architecture. | https://en.wikichip.org/wiki/amd/infinity_fabric | | The interesting part for this M1 Ultra is that Apple opted to | connect 2 existing chips, rather than design a new one | altogether. Very likely the reason is cost - this M1 Ultra will | be a low volume part, as will be future iterations of it. The | other approach would've been to design a motherboard that | sockets 2 chips, which seems would've been cheaper/faster than | this - albeit at expense of performance. But they've designed a | new "socket" anyway due to this new chip's much bigger | footprint. | calaphos wrote: | They have been shipping multi die CPUs for quite a while, but | the interconnect is closer to a PCIe connection (slower, longer | range, less contacts). | | Intels upcomming Saphire Rapid server CPUs are extremly | similar, with wide connections between two close dies. | Crossectional bandwith is in the same order of magnitude there. | adfgadfgaery wrote: | AMD is currently shipping high-end CPUs built with up to nine | dies. Their ordinary desktop parts have up to three. They are | not built with "cache in-between". There is one special I/O die | but it does not contain any cache. Each compute die contains | its own cache. | MBCook wrote: | This doesn't seem to be two cores connected in standard SMP | configuration, or with a shared cache between them. Apple | claims there were like 10,000 connection points. | | It _sounds_ like this operates as if it was one giant physical | chip, not two separate processors that can talk very fast. | | I can't wait to see benchmarks. | haneefmubarak wrote: | Modern SMP systems have NUMA behavior mostly not because of a | lack of bandwidth but because of latency. At the speeds | modern hardware operates at, the combination of distance, | SerDes, and other transmission factors result in high | latencies when you cross dies - this can't be ameliorated by | massively increasing bandwidth via parallel lanes. For | context, some server chips which have all the cores on a | single die exhibit NUMA behavior purely because there's too | many cores to all be physically close to each other | geometrically (IIRC the first time I saw this was on an 18 | core Xeon, with cores that themselves were a good bit smaller | than these). | | It's probably best to think of this chip as an extremely fast | double socket SMP where the two sockets have much lower | latency than normal. Software written with that in mind or | multiple programs operating fully independent of each other | will be able to take massive advantage of this, but most | parallel code written for single socket systems will | experience reduced gains or even potential losses depending | on their parallelism model. | msoad wrote: | At what point Apple will put those chips in their servers or sell | server chips? It only makes sense for them to take this | architecture to the cloud deployments | ghostly_s wrote: | Unfortunately I still don't think the market has much interest | in energy-efficient servers. But maybe the energy-sector crunch | created by Putin's war will precipitate some change here... | FredPret wrote: | Energy is probably the biggest bill for a data centre. | | Lower TDP = lower electric bills and lower airconditioning | bill. Win win | lambda_dn wrote: | They could be secretly working on their own cloud platform, | with their data centres having a choice between M1, Pro Max | ultra instances. $$$$ | memco wrote: | Don't they already offer Xcode build as a service? That | presumably is using Mac servers so it wouldn't be totally out | of the blue to have more Mac SaaS. | jdgoesmarching wrote: | For a company betting so heavily on "services," it would be | borderline incompetence if they weren't working on this. Even | just for internal use it would still be a better investment | than the stupid car. | stjohnswarts wrote: | It's going to be a couple years. The guys who bought those | power workstations and servers will be very peeved if it | happens too quickly | arcticbull wrote: | They likely won't re-visit the Xserve, IMO. No reason to. They | can't sell them at a premium compared to peers and its outside | their area of expertise. | jazzyjackson wrote: | I don't know, the performance per watt has a big effect on | data-centers, both in power budget and HVAC for cooling. | em500 wrote: | > They can't sell them at a premium compared to peers | | Intel is believed to have pretty good margins on their server | CPUs | | > and its outside their area of expertise. | | That's what people used to say about Apple doing CPUs in- | house. | npunt wrote: | FWIW, Apple's been helping define CPU specs in-house since | the 90s. They were part of an alliance with Motorola and | IBM to make PowerPC, and bought a substantial part of ARM | and did a joint venture to make Newton's CPU. And they've | done a bunch of architecture jumps, from 6502 to 68k to PPC | to Intel to A-series. | | Folks who said CPUs weren't their core expertise (I assume | back in 2010 or before, prior to A4) missed out on just how | involved they've historically been, what it takes to get | involved, the role of fabs and off the shelf IP to | gradually build expertise, and what benefits were possible | when building silicon and software toward common purpose. | nicoburns wrote: | I doubt they'll go after the web server market. But I wonder | if they might go after the sort of rendering farms that | animation studios like Pixar use. Those guys are willing to | pay silly money for hardware, and are market Apple has a long | history with. | xyst wrote: | It makes sense, although what I am concerned about is the cost. | Apple isn't exactly know for providing services at or near | cost. | adfgadfgaery wrote: | It doesn't make much sense to me. The M1 is designed to have | memory in the same package as the processor. This leads to | reduced latency and increased bandwidth. Moving to off-package | memory might totally destroy its performance, and there is an | upper limit on how much memory can go in the package. | | The M1 Ultra is already a little light on memory for its price | and processing power; it would have much too little memory for | a cloud host. | xiphias2 wrote: | As more developers move to ARM architecture by buying Macbooks | (I did it last year the first time in my life), ARM cloud will | grow very fast, and Apple needs growth, so they can't afford | not to do it in a few years (probably with M2 architecture they | are already thinking of it). Regarding the exact timeline: I | don't know :) | andrewxdiamond wrote: | They'd have to go all-in on supporting third party OSs like | Linux first. Sure, there are projects to bring linux to the | M1, but enterprises that buy commercial server hardware will | demand 1st party support | ksubedi wrote: | Knowing Apple, their version of "cloud" servers would | probably be some sort of SDK that lets developers build | applications on top of their hardware / software stack, and | charge per usage. Kind of like Firebase, but with Apple's | stack. | xiphias2 wrote: | It will be a hard business decision for them, as at this | point it's extremely hard to compete with Amazon, Google | and Microsoft. Maybe they will buy up some cloud services | provider, we'll see. | tylerjd wrote: | The major Linux providers already offer 1st party supported | Linux on AWS. Both RHEL and Ubuntu instances offer support | contracts from their respective companies, as well as | Amazon Linux from AWS themselves. It is already here and a | big force there. You can provision ElastiCache and RDS | Graviton instances too. | superkuh wrote: | It sounds like the chip is fast but I wonder, if like other M1 | products, the computers built with it will be fairly restricted, | like a console, in terms of the hardware they're able to use (not | being able to boot off external HDDs, problems with thunderbolt 3 | compatibility in peripherals, having to use abstraction layers to | run most of the software world indirectly or have specific | porting projects dedicated to M1, etc). | aaomidi wrote: | > having to use abstraction layers to run most of the software | world indirectly | | Nearly everything I use daily is built for M1 now. | | https://isapplesiliconready.com/ | | And honestly, if it's not, its a good indication that it's time | to move away from that product as they don't care about a huge | segment of their users. | drcongo wrote: | Dropbox and Signal are the only two I ever use, and yeah, the | lack of interest in porting to M1 from both of those | companies is increasing my lack of interest in their apps. | nintendo1889 wrote: | That page says that both of those apps are supported. | cersa8 wrote: | Maybe this is the same marketing speak as we've seen with the | 1600 nits peak, and 1000 nits sustained brightness claim for | the new mini led displays. Which later became 500 nits for SDR | content, when ambient temperature allows. [0] | | I want to see proper benchmarks before getting too exited. | | [0] https://www.notebookcheck.net/The-new-MacBook-Pro-14-only- | ma... | jjuuaann wrote: | Geee wrote: | This is insane. They claim that its GPU performance tops the RTX | 3090, while using 200W less power. I happen to have this GPU on | my PC and not only it costs over $3000, but its also very power- | hungry and loud. | | Currently, you need this kind of GPU performance for high | resolution VR gaming at 90 fps, but its just barely enough. This | means that the GPU will run very loudly and heat up the room, and | running games like HL Alyx on max settings is still not possible. | | It seems that Apple might be the only company who can deliver a | proper VR experience. I can't wait to see what they've been | cooking up. | EugeneOZ wrote: | But it's still impossible to replace RTX 3090 with this new Mac | Studio because games just will not run on MacOS. | idonotknowwhy wrote: | Maybe Valve can port proton to mac | gameswithgo wrote: | unless it has its own gddr6 it won't be anything like a 3090 | for games | TheKarateKid wrote: | What we're seeing right now with Apple's M1 chips for desktop | computing is on the same level of revolutionary as what the | original iPhone did for mobile phones. | | The advancements in such a short period of time in the amount | of computing power, low power usage, size, and heat usage of | these chips is unbelievable and game-changing. | bradmcgo wrote: | It really feels like this is all in the name of their AR/VR | efforts. The killer device as far as I can think would be a | simple headset that packs the capabilities of full-blown | workstations. Apple Silicon seems like it could totally be on | that track in some way. | outworlder wrote: | Too bad that historically Apple has not given any attention to | Mac gaming. | bobbylarrybobby wrote: | Part of Apple's historic MO has been to not invest in areas | they don't see themselves having a competitive advantage in. | Now that they can make gaming happen with a very low wattage | budget they may well try to enter that space in earnest. | kllrnohj wrote: | The gaming performance of the existing M1 GPUs is, well, | crap (like far behind the other laptop competition, to say | nothing of desktop GPUs). The Ultra probably isn't changing | that, since it's very unlikely to be a hardware problem and | instead a software ecosystem & incentives problem. | altairprime wrote: | The difference between an Apple TV and a Mac Mini is | essentially how powerful of Apple silicon it has, whether | it runs tvOS or macOS, and whether it has HDMI out or not. | | The Studio is a more compact form factor than any modern 4K | gaming console. If they chose to ship something in that | form factor with tvOS, HDMI, and an M1 Max/Ultra, it would | be a very competitive console on the market -- _if_ game | developers could be persuaded to implement for it. | | How would it compare to the Xbox Series X and PS5? That's a | comparison I expect to see someday at WWDC, once they're | ready. And once a game is ported to Metal on _any_ Apple | silicon OS, it's a simple exercise to port it to all the | rest; macOS, tvOS, ipadOS, and (someday, presumably) vrOS. | | Is today's announcement enough to compel large developers | like EA and Bungie to port their games to Metal? I don't | know. But Apple has two advantage with their hardware that | Windows can't counter: the ability to boot into a | signed/sealed OS (including macOS!), load a signed/sealed | app, attest this cryptographically to a server, and lock | out other programs from reading with a game's memory or | display. This would end software-only online cheating in a | way that PCs can't compete with today. This would also | reduce the number of GPUs necessary to support to one, | Apple Metal 2, which drastically decreases the complexity | of testing and deployment of game code. | | I look forward to Apple deciding to play ball with gaming | someday. | neetdeth wrote: | This all makes sense, and in that context it's | unfortunate that Apple's relationship with the largest | game tools company, Epic, is... strained, to say the | least. | | They could always choose to remedy that with a generous | buyout offer. | miohtama wrote: | Apple has now too much money and is running out of core | business areas. Expect more investing in non-Apple areas like | gaming, cars, etc. | | Though every video game company on the planet hates them | because of App Store terms. | jen20 wrote: | > Expect more investing in non-Apple areas like gaming, | cars, etc. | | I remember people saying this about phones in 2006. | nr2x wrote: | The first half of the event was old wine in new bottles - I | reckon that's the main growth area they are squeezing. | lazyeye wrote: | I'm surprised Apple found time outside of focusing on growing the | Chinese economy to work on this. | | https://www.theguardian.com/technology/2021/dec/07/apple-chi... | gigatexal wrote: | I used to be one of the biggest Apple fanboys/apologists but I've | since put Linux on my MacBook Pro from 2013 and built a Linux | workstation and rarely use my 2020 MacBook Pro anymore, I say | this because I yawned at the over-the-top Apple marketing. The | products are interesting, sure, but I wasn't blown away. It's | mostly the prices. The hardware is just far above what I can | afford these days -- even though my MacBook Pro from 2013 is so | well made it still works now, and buying a 2 or 3 thousand dollar | MacBook now I am sure it'd last just as long but it's just too | much. Though I am saving for one for multimedia work, probably a | used M1 MacBook Air. | ChuckMcM wrote: | In some ways I wish this processor was available from a CPU chip | seller. As a compute engine it gets a lot "right" (in my opinion) | and would be fun to hack on. | | That said, the idea that USB C/Thunderbolt is the new PCIe bus | has some merit. I have yet to find someone who makes a peripheral | card cage that is fed by USBC/TB but there are of course | standalone GPUs. | [deleted] | Dylan16807 wrote: | > That said, the idea that USB C/Thunderbolt is the new PCIe | bus has some merit. I have yet to find someone who makes a | peripheral card cage that is fed by USBC/TB but there are of | course standalone GPUs. | | I hope we get closer to that long-standing dream over the next | few years. | | But right now you can see laptop manufacturers so desperate to | avoid thunderbolt bottlenecks that they make their own custom | PCIe ports. | | For the longest time, thunderbolt ports were artificially | limited to less than 3 lanes of PCIe 3.0 bandwidth, and even | now the max is 4 lanes. | dheera wrote: | > USB C/Thunderbolt is the new PCIe bus | | Oh please hell no. | | I have to unplug and plug my USB-C camera at least once a day | because it gets de-enumerated very randomly. Using the best | cables I can get my hands on. | | File transfers to/from USB-C hard drives suddenly stop mid- | transfer and corrupt the file system. | | Don't ask me why, I'm just reporting my experiences, this is | the reality of my life that UX researchers don't see because | they haven't sent me an e-mail and surveyed me. | | Never had such problems with PCIe. | delusional wrote: | You have a very exotic configuration if you plugged your | webcam and thumb drives into PCIe slots. | icelancer wrote: | My USB-C dongle (AMD processor, so not Thunderbolt) that has | PD plugged into it permanently and is my "docking station" | for the office, and I have to cycle its power (unplug/plug | PD) to get the DisplayPort monitor that's connected to it to | work, on top of the fact that there are other issues with it, | especially with external drives as you also reported. | | So, I'm in total agreement. | droopyEyelids wrote: | Friendly reminder that USB-C is a form factor, and | thunderbolt is the actual transfer protocol. | | Sounds like you're listing the common complaints with usb-3 | over usb-c peripherals, which are not a suitable replacement | for PCIe. Thunderbolt is something different, more powerful & | more reliable. | jamesfmilne wrote: | https://www.sonnettech.com/product/thunderbolt/pcie-card-exp... | ChuckMcM wrote: | Thanks! Of course they are bit GPU centric but the idea is | there. | | Very interesting stuff. I wonder both if the Zynq Ultrascale | RFSOC PCIe card would work in that chassis and if I could get | register level access out of MacOS. | jcadam wrote: | Well, more reasonable than a mac pro, price wise. Might have to | consider this when the time comes to replace my Ryzen9 rig. | alberth wrote: | I'm cross posting a question I had from the Mac Studio thread | (currently unanswered). | | ---- | | Mac Pro scale up? | | How is this going to scale up to a Mac Pro, especially related to | RAM? | | The Ultra caps at 128 GB of RAM (which isn't much for video | editing, especially given that the GPU uses the system RAM). | Today's Mac Pro goes up to 1.5TB (and has dedicated video RAM | above this). | | If the Mac Pro is say, 4 Ultra's stacked together - that means | the new Mac Pro will be capped at 512GB of RAM. Would Apple stack | 12 Ultra's together to get to 1.5TB of RAM? Seems unlikely. | dagmx wrote: | A few points to make... | | - the shared CPU+GPU RAM doesn't necessarily mean the GPU has | to eat up system RAM when in use, because it can share | addressing. So whereas the current Mac pro would require two | copies of data (CPU+GPU) the new Mac studio can have one. | Theoretically. | | - they do have very significant video decoder blocks. That | means that you may use less RAM than without since you can keep | frames compressed in flight | arcticbull wrote: | Also, the memory model is quite different - with the ultra- | fast SSD and ultra-fast on-die RAM. You can get away with | significantly less RAM for the same tasks, not just because | of de-duplication but because data comes in so quickly from | the SSD that paging isn't nearly the hit it is on say an | Intel based Mac. | | I'd expect it to work more like a game console, streaming in | content from the SSD to working memory on the fly, processing | it with the CPU and video decode blocks, and insta-sharing it | with the GPU via common address space. | | All that is to say, where you needed 1.5TB of RAM on a Xeon, | the architectural changes on Apple Silicon likely mean you | can get away with far less and still wind up performing | better. | | The "GHz myth" is dead, long live the "GB myth." | fpoling wrote: | Another thing to consider is memory compression. If Apple | added dedicated hardware for that, it can effectively | double the total memory with minimal performance hit. | rocqua wrote: | Memory compression only works in certain scenarios. It | requires your memory to actually have low entropy. | masklinn wrote: | > ultra-fast on-die RAM | | The RAM is not on die. It's just soldered on top of the SoC | package. | | > All that is to say, where you needed 1.5TB of RAM on a | Xeon, the architectural changes on Apple Silicon likely | mean you can get away with far less and still wind up | performing better. | | No, it does not. You might save a bit, but most of what you | save is the _transfers_ , because moving data from the CPU | to the GPU is just sending a pointer over through the | graphics API, instead of needing to actually copy the data | over to the GPU's memory. In the latter case, unless you | still need it afterwards you can then drop the buffer from | the CPU. | | You do have some gains as you move buffer _ownership_ back | and forth instead of needing a copy in each physical | memory, but if you needed 1.5TB physical before... you | won't really need much less after. You'll probably save a | fraction, possibly even a large one, but not "2 /3rd" | large, that's just not sensible. | samatman wrote: | This went by so fast I'm not sure I heard it right, but I | believe the announcer for the Ultra said it the last in the M1 | lineup. | | They just can't ship a Mac Pro without expansion in the normal | sense, my guess is that the M2 will combine the unified memory | architecture with expansion busses. | | Which sounds gnarly, and I don't blame them for punting on that | for the first generation of M class processors. | skunkworker wrote: | This is what I've been thinking as well, a M2 in a Mac Pro | with 128/256gb soldered and up to 2TB 8 channel DDR5-6400 | expandable, and do a tiered memory cache | cehrlich wrote: | I think some of this can be guessed from the SoC codenames | | https://en.wikipedia.org/wiki/List_of_Apple_codenames | | M1 Max is Jade C-Die => 64GB | | M1 Ultra is Jade 2C-Die => 128GB | | There is a still unreleased SoC called Jade 4C-Die =>256GB | | So I think that's the most we'll see this generation, unless | they somehow add (much slower) slotted RAM | | If they were to double the max RAM on M2 Pro/Max (Rhodes Chop / | Rhodes 1C), which doesn't seem unreasonable, that would mean | 512GB RAM on the 4C-Die version, which would be enough for | _most_ Mac Pro users. | | Perhaps Apple is thinking that anyone who needs more than half | a Terabyte of RAM should just offload the work to some other | computer somewhere else for the time being. | | I do think it's a shame that in some ways the absolute high-end | will be worse than before, but I also wonder how many 1.5TB Mac | Pros they actually sold. | rowanG077 wrote: | How is slotted RAM slower? 6400Mhz DIMM exists. This would | match the specs of the RAM on the M1 Max. Even octa-channel | has been done before so the memory bus would have the exact | same width, latency and clock frequency. | fastball wrote: | The memory bandwidth of the M1 Max is 400 GB/s with 64GB of | RAM, where as the memory bandwidth of Corsair's 6400MHz | DDR5 32GB RAM module is 51GB/s per stick, or 102GB/s for | the M1 Max equivalent. | rowanG077 wrote: | 51GB/s * 8 (octa-channel, not dual channel as you are | calculating) is 408 GB/s. Basically the same as the M1 | Max. It's not fair to use an off the shelf product since | even if the RAM is slotted Apple wouldn't use an off the | shelf product. | | Whether they use slotted RAM or not has nothing to do | with performance. It's a design choice. For the mobile | processors it makes total sense to save space. But for | the Mac pro they might as well use slotted RAM. Unless | they go for HBM which does offer superior performance. | rocqua wrote: | Is 8 channel RAM doable, are there downsides? If no to | both, why don't high end x86 processors have it? | rowanG077 wrote: | High-end x86 do have it. Threadripper 3995WX for example. | my123 wrote: | Note that those are overclocked out of spec configurations | today. | | https://ark.intel.com/content/www/us/en/ark/products/134599 | /... | | 4800 MT/s is the actual maximum spec, anything beyond that | is OC. | mnholt wrote: | Agreed, I think they will use the 4C config to debut M2 and | make a slash. They said in the keynote that M1 Ultra | completes the M1 family. Timing works out well for November | launch with the 2 year apple silicon transition timeline they | gave themselves. Not sure what they are going to call it and | if it will be A15 or A16 based. | | A16 would give great performance, and I think it's safe for | them to have a two year iteration time on laptop/desktops vs | one year for phone/tablet. | can16358p wrote: | I think they will unveil M2 which can probably at least double | the 64GB max to 128GB max RAM of M1-series. | | Then, on the highest configuration, I think they actually can | put 6 M2-top-specced or more into the Mac Pro. | bpicolo wrote: | Does the mac studio potentially replace the mac pro concept? It | seems targeted at exactly the audience that mac pros targeted | (ridiculous amounts of video simul-editing) | sharikous wrote: | The presentor very explicitly said they are not done and they | will replace the Mac Pro. | | But yes, I see a lot of folks replacing current Mac Pros with | Studios. | zitterbewegung wrote: | No this looks like a modular replacement of the iMac Pro. If | it was to replace the Mac Pro they wouldn't have said at the | end of the event "the Mac Pro will have to wait until next | time". | alberth wrote: | To me, this seems to have killed the iMac Pro not the Mac | Pro. | Asmod4n wrote: | The Mac Pro will have replaceable RAM. It will use the RAM | soldered onto the CPU as cache. | | You'll most likely also be able to buy dedicated GPUs/ML | booster addon Cards and the likes for it. | | It's the most likely thing to happen or they won't release | another Mac Pro. | rowanG077 wrote: | Why would they use soldered RAM as cache? It's not like it's | faster then replaceable RAM. Unless they go HBM2 but I doubt | that. | fpoling wrote: | The bandwidth of the soldered ram is much higher which | makes it much faster for code that accesses a lot of RAM | like video editors. | [deleted] | vimy wrote: | The pro is most likely going to have ram and PCIe slots. | ostenning wrote: | I read "Apple unveils MK ULTRA" | yurishimo wrote: | Low key what if this was planned to change Google results to | "Did you mean M1 Ultra?" when searching for the experiment? The | CIA is using all that money for something consumers can use | now! | | /takes off foil hat | [deleted] | 1980phipsi wrote: | Can we trust the performance measurements that are listed? | Lramseyer wrote: | Yes, but assume that they're cherry picked. Don't get me wrong, | these numbers are impressive, but it claims that it's GPU is | faster than the highest end discrete GPU (RTX 3090) but it's | unclear on what benchmark it used. It's important to keep in | mind that their GPUs are not architected with gaming in mind, | whereas the 3090 definitely is. So it's not unreasonable to | find some metrics where their GPU performs better. | pantalaimon wrote: | The very same dual-chiplet design marcan predicted - nice! | klelatti wrote: | I wonder if the max 128GB graphics memory opens up some | applications that would not have been viable before? | freemint wrote: | Not really. Thanks to the GPU interconnect NVLINK we have | system 320GB. https://www.deltacomputer.com/nvidia- | dgx-a100-320gb-3ys-edu.... | | There are even some with 640GB. This is at a different price | point though. | klelatti wrote: | Not on a single GPU though and it's 40x the cost! | freemint wrote: | Rewriting a CUDA application to use NVLINK is a lot easier | then rewriting it for Apples GPU. | 314 wrote: | 640GB should be enough for anyone. | manquer wrote: | Perhaps until VR becomes more main stream and gets higher | frame rates, resolutions etc. | | Rendering in VR takes a lot of memory at higher | resolutions. | yisonPylkita wrote: | Bill said it 40 years ago and here we are with 2 orders of | prefix more of memory. I wonder if in next 40 years we'll | get to 640 peta bytes | nintendo1889 wrote: | Does it mine bitcoin well? | judge2020 wrote: | Not Bitcoin but Ethereum | | https://9to5mac.com/2021/11/10/m1-pro-macbook-pro-cryptocurr... | | M1 Pro -> 5.8 MH/s, with a 17w draw, means $12.82 a month | profit. I don't imagine the M1 Ultra is too much better, maybe | 20 MH/s at absolute most, but we'll see. It definitely won't be | as economical as 3070 or 3080 FE cards at current profitability | levels. | vmception wrote: | & $.10 per KwH, many residences are often higher but | professional operations are closer to $.03 per KwH or | sometimes even zero or negative | | also note that mining calculator they used assumes 2 Ether | per block paid to miners | | In Ethereum it can be _much much_ higher because people pay | to use that blockchain. Mining can be insanely profitable and | I'm not aware of any calculator that shows it. Everyone is | operating on bad data. A cursory look right now shows latest | blocks having 2.52 Ether in them, which is 26% greater yield. | | Block 14348267 a few minutes ago had 4.83 Ether, 140% greater | yield | | There have been prolonged periods of time, weeks and months, | where block rewards were 6-9 Ether. | | Miners were raking it all in while the calculators said "2 | Ether" | | All this to say it could probably make $20-30 a month. | cosmotic wrote: | I'm sure an ASIC would best the M1. | willcipriano wrote: | Is it actually more powerful than a top of the line | threadripper[0] or is that not a "personal computer" CPU by this | definition? I feel like 64 cores would beat 20 on some workloads | even if the 20 were way faster in single core performance. | | [0]https://www.amd.com/en/products/cpu/amd-ryzen- | threadripper-3... | jbellis wrote: | Next-gen Threadripper Pro was also announced today: | https://www.tomshardware.com/news/amd-details-ryzen-threadri... | paulmd wrote: | Bit of a wet fart though, even Charlie D thinks it's too | little too late. OEM-only (and only on WRX80 socket), no | V-cache, worse product support. | | https://semiaccurate.com/2022/03/08/amd-finally-launches- | thr... | | The niche for high clocks was arguable with the 2nd-gen | products but now you are foregoing v-cache which also | improves per-thread performance, so Epyc is relatively | speaking even more attractive. And if you take Threadripper | you have artificial memory limits, half the memory channels, | half the PCIe lanes, etc, plus in some cases it's _more_ | expensive than the Epyc chips. It is a lot to pay (not just | in cash) just for higher clocks that your 64C workloads | probably don 't even care about. | | AMD moved into rent-seeking mode even before Zen3 came out. | Zen2 threadripper clearly beats anything Intel can muster in | the segment (unless they wanted to do W-3175X seriously and | not as a limited-release thing with $2000 motherboards) and | thus AMD had no reason to actually update this segment when | they could just coast. Even with this release, they are not | refreshing the "mainstream" TRX40 platform but only a limited | release for the OEM-only WRX80 platform. | | It was obvious when they forced a socket change, and then | cranked all the Threadripper 3000 prices (some even to | higher-levels than single-socket Epyc "P" skus) what | direction things were headed. They have to stay competitive | in server, so those prices are aggressive, but Intel doesn't | have anything to compete with Threadripper so AMD will coast | and raise prices. | | And while Milan-X isn't cheap - I doubt these WRX80 chips are | going to be cheap either, it would be unsurprising if they're | back in the position of Threadripper being more expensive for | a chip that's locked-down and cut-down. And being OEM-only | you can't shop around or build it yourself, it's take it or | leave it. | dljsjr wrote: | Apple's ARM chips can process a metric ton of ops per cycle due | to the architecture of the chip: | https://news.ycombinator.com/item?id=25257932 | zamadatix wrote: | But the answer to the question is still "no". | klelatti wrote: | Only if the only thing you compare is CPU performance - | adding a big GPU on die adds a certain amount of 'power' by | any measure. | [deleted] | gjsman-1000 wrote: | Doesn't have to though. A Threadripper 3990X uses barrels | of electricity, generates plenty of heat, comes with no | GPU, has worse single-threaded performance, and still costs | $4000 by itself without any of the parts needed to make it | actually work. | Nition wrote: | The question is in relation to Apple's claim that it's | "the world's most powerful and capable chip for a | personal computer". | gzer0 wrote: | It might also be reasonable to say that the threadripper | is a workstation chip, not a chip for personal computers. | | Edit: even AMD themselves call their threadripper lineup | workstation chips, not personal. | | https://www.amd.com/en/processors/workstation | kllrnohj wrote: | Threadripper _Pro_ is the workstation chip. Regular | Threadripper (non-Pro) was not aimed at workstations, it | was aimed at the "HEDT" market. Strictly speaking it's | considered a consumer market (albeit for the enthusiasts | of enthusiasts) | 2OEH8eoCRo0 wrote: | I'd call them personal chips. When I think of non- | personal chips I think IBM POWER or Ampere Altra. | gjsman-1000 wrote: | Depends on what you define "capable" as. Remember, they | specify that it is the most powerful and capable _chip_ , | not necessarily complete system. | | There's no other chip that has the power of an RTX 3090 | and more power than an i9-12900K in it - after all, | Threadripper doesn't have a lick of graphics power at | all. This chip can do 18 8K video streams at once, which | Threadripper would get demolished at. | | I'm content with giving them the chip crown. Full system? | Debatable. | dathinab wrote: | Through you would need to compare it to the coming | threadripper 5000WX(?) or better the soon coming Ryzen | 7000 CPUs (which seen to have integrated graphics). | | I mean they all are CPUs coming out this year as far as I | know. | zamadatix wrote: | It's a fantastic chip but that wasn't the question. I | love my M1 Max and I love my Threadripper workstation, | each has their own strengths and that's alright. | guelo wrote: | It's bad for competition that only Apple gets to use TSMC's 5nm | process. Though what's really bad is that Intel and Samsung | haven't been able to compete with TSMC. | paulmd wrote: | AMD will be on TSMC N5P next year, which will give them node | parity with Apple (who will be releasing A15 on N5P this | year), and actually a small node lead over the current | N5-based A14 products. So we will get to test the "it's all | just node lead guys, nothing wrong with x86!!!" theory. | | Don't worry though there will still be room to move the | goalposts with "uhhh, but, Apple is designing for high IPC | and low clocks, it's totally different and x86 could do it if | they wanted to but, uhhh, they don't!". | | (I'm personally of the somewhat-controversial opinion that | x86 can't really be scaled in the same super-wide-core/super- | deep-reorder-buffer fashion that ARM opens up and the IPC gap | will persist as a result. The gap is _very wide_ , higher | than 3x in floating-point benchmarks, it isn't something | that's going to be easy to close.) | adgjlsfhk1 wrote: | There is a third variable: Apple is putting ram much closer | to the CPU than AMD. This has the advantage that you get | lower latency (and slightly higher bandwidth), but the | downside that you're currently limited to 128gb of ram, | compared to 2tb for threadripper (4tb for epic). Amd's 3d | cache that they're launching in a few months will be | interesting since it lets the L3 go up a ton. | Macha wrote: | We've already seen x86 draw even with Intel 12th gen: | https://www.youtube.com/watch?v=X0bsjUMz3EM | alwillis wrote: | It doesn't support your argument when we're talking about a | massive processor like a threadripper vs. a M1 Ultra. | | The performance per watt isn't in the same universe and that | matters. | wyattpeak wrote: | The article claims that the chip is "the world's most | powerful and capable chip for a personal computer". It's | reasonable to ask whether it genuinely is faster than another | available chip, it's not an implicit argument that it's not | powerful. | adfgadfgaery wrote: | The M1 Ultra is by a very wide margin the bigger of the two. | According to Tom's Hardware [1], top-of-the-line Epycs have | 39.54 billion transistors. That is about a third of the 117 | billion in the M1 Ultra. Apple builds bigger than anyone | else, thanks largely to their access to TSMC's best process. | | The M1 Ultra is a workstation part. It goes in machines that | start at $4,000. The competition is Xeons, Epycs, and | Threadrippers. | bpye wrote: | That's not really a fair comparison. Apples chip spends | most of that on their GPU, and the neural engine takes a | chunk too. Threadripper is only a CPU. | forrestthewoods wrote: | > The performance per watt isn't in the same universe and | that matters. | | I couldn't give less of a shit about performance-per-watt. | The ONLY metric I care about is performance-per-dollar. | | A Mac Studio and Threadripper are both boxes that sit | on/under my desk. I don't work from a laptop. I don't care | about energy usage. I even don't really care about noise. My | Threadripper is fine. I would not trade less power for less | noise. | ricardobeat wrote: | The vast majority of developers today has a laptop as their | main machine. Performance-per-watt is absolutely crucial | there. | hu3 wrote: | This is what some folks miss. | | One hour of my time is more expensive than an entire month | of a computer electricity bill. | | Some people just want tasks to perform as fast as possible | regardless of power consumption or portability. | | Life's short and time is finite. | | Every second adds up for repetitive tasks. | ghshephard wrote: | The power is only relevant because it makes the machine | quite in a compact form. If you've got a bit of space, | then a water cooled system accomplishes a lot of the same | thing. For some people there is an aesthetic element. | | Power does make a big difference in data centers though - | it's often the case that you run out of power before you | run out of rack space. | | Where power for a computer might make a difference could | be in power-constrained (solar/off grid) scenarios. | | I don't know if I've ever heard anyone make an argument | based on $$$. | altcognito wrote: | The only reason I've ever cared about watts is that | generally speaking 120 watt and 180 watt processors | require more complicated cooling solutions. That's less | true today than it ever was. Cases are designed for | things like liquid cooling, and they tend to be pretty | silent. The processors stay cool, and are pretty | reliable. | | I personally stick to the lower wattage ones because I | don't generally need high end stuff, so I think Apple is | going the right direction here, but it should be noted | that Intel has also started down the path of high | performance and efficiency cores already. AMD will find | itself there too if it turns out that for home use, we | just don't need a ton of cores, but instead a small group | of fast cores surrounded by a bunch of specialist cores. | Dylan16807 wrote: | Air coolers can handle 300 watts without any complexity. | Just a big block of fins on heat pipes. | paulmd wrote: | wattage doesn't really tell you how difficult it is to | cool a part anymore. 11th-gen Intel is really easy to | cool despite readily going to 200W+. Zen3 is hard to cool | even at 60W. | | Thermal density plays a huge role, the size of the chips | is going down faster than the wattage, so thermal density | is going up every generation even if you keep the same | number of transistors. And everyone is still putting more | transistors on their chips as they shrink. | | Going forward this is only going to get more complicated | - I am very interested to see how the 5800X3D does in | terms of thermals with a cache die over the top of the | CCD (compute die). But anyway that style of thing seem to | be the future - NVIDIA is also rumored to be using a | cache die over the top of their Ada/Lovelace | architecture. And obviously 60W direct to the IHS is | easier to cool than 60W that has to be pulled through a | cache die in the middle. | chaostheory wrote: | It doesn't matter. Speaking as an Apple cult member imo | Threadripper is better value if you're not using the machine | for personal use. | marcan_42 wrote: | My 1st gen 16 core Threadripper is _barely_ faster than an M1 | Pro /Max at kernel builds, so a 64 core TR3 should handily | double the M1 Ultra performance. | | But you know, I'm still happy to double my current build perf | in a small box I can stick in my closet. Ordered one :-) | mhh__ wrote: | How many threads are actually getting utilized in those | kernel builds? I don't work on the kernel enough to have | intuition in mind but people make wildly optimistic | assumptions about how compilation stresses processors. | | Also 1st gen threadrippers are getting on a bit now, surely. | It's a ~6 year old microarchitecture. | nextos wrote: | Yes, it'd be interesting to see this comparison made with | current AMD CPUs and a full build that has approximately | the same price. | | I am curious whether there is a real performance | difference? | | I do lots of computing on high-end workstations. Intel | builds used to be extremely expensive if you required ECC. | They used that to discriminate prices. Recent AMD offerings | helped enormously. I wonder whether these M1 offerings are | a significant improvement in terms of performance, making | it worthwhile to cope with the hassle of switching | architectures? | manmal wrote: | I wouldn't automatically expect a linear decrease in compile | time with growing core count. That would have to be tried. | gtvwill wrote: | Thaxll wrote: | Well compare that to a 400$ CPU like a 5900x, the first M1 is | slower than this one and cost 2x the price. | [deleted] | cehrlich wrote: | Seems like for things that are: 1. Perfectly parallel 2. Not | accelerated by some of the other stuff that's on the Apple | Silicon SoC's ...it will be a toss-up. | | Threadripper 3990X get about 25k in Geekbench Multicore [1] | | M1 Max gets about 12.5k in Geekbench Multicore, so pretty much | exactly half [2] | | Obviously different tasks will have _vastly_ different | performance profiles. For example it's likely that the M1 Ultra | will blow the Threadripper out of the water for video stuff, | whereas Threadripper is likely to win certain types of | compiling. | | There's also the upcoming 5995WX which will be even faster: [3] | | [1] https://browser.geekbench.com/processors/amd-ryzen- | threadrip... | | [2] | https://browser.geekbench.com/v5/cpu/search?utf8=%E2%9C%93&q... | | [3] https://www.amd.com/en/products/cpu/amd-ryzen- | threadripper-p... | Teknoman117 wrote: | Something is seriously fishy about those geekbench results. | | 24-core scores 20k, 32-core scores 22.3k, and 64-core score | 25k. Something isn't scaling there. | e4e78a06 wrote: | Many GB5 (and real world) tasks are memory bandwidth | bottlenecked, which greatly favors M1 Max because it has | over double a Threadripper's memory bandwidth. | Teknoman117 wrote: | Sort of. The CPU complex of the M1 Max can achieve ~200 | GB/s, you can only hit the 400 GB/s mark by getting the | GPU involved. | | At the same time the Threadrippers also have a gargantuan | amount of cache that can be accessed at several hundred | gigabytes per second per core. Obviously not as nice as | being able to hit DRAM at that speed. | e4e78a06 wrote: | That cache is not uniform time access. It costs over | 100ns to cross the IO die to access another die's L3, | almost as much as going to main memory. In practice you | have to treat it as 8 separate 32 MB L3 caches. | | Also, not everything fits into cache. | mrtksn wrote: | Probably it's the thermals that don't scale. The more the | cores, the lower the the peak performance per core. | enneff wrote: | Yeah, it's the real world tasks that GeekBench tries to | simulate that don't tend to scale linearity with processor | count. A lot of software does not take good advantage of | multiple cores. | fivea wrote: | > A lot of software does not take good advantage of | multiple cores. | | It sounds pointless to come up with synthetic benchmarks | which emulate software that is not able to handle | hardware, and then use said synthetic benchmarks to | evaluate the hardware performance. | BobbyJo wrote: | It has a very specific point: communicating performance | to people who don't know hardware. | | Most consumers are software aware, not hardware aware. | They care what they _will_ use the hardware for, not what | they _can_ use it for. To that end, benchmarks that | correlate with their experience are more useful than a | tuned BLAS implementation. | Teknoman117 wrote: | That's certainly true. But if that's your workload you | shouldn't be buying a 64-core CPU... | | I use a few 32 and 64 core machines for build servers and | file servers, and while the 64-core EPYCs are not twice | as fast as the 32-core ones due to lower overall | frequency, they're 70% or so faster in most of the things | I throw at them. | brigade wrote: | Does Geekbench actually attempt to simulate that in their | multi-core score? And how? | | I was under the impression that all of their multi-core | tests were "run N independent copies of the single- | threaded test", just like SPECrate does. | kllrnohj wrote: | Geekbench is extremely sensitive to the OS. Like the same CPU | on Windows & Linux score _wildly_ different on Geekbench. For | example the 3990X regularly hits 35k multicore geekbench when | run on Linux: https://browser.geekbench.com/v5/cpu/11237183 | gjsman-1000 wrote: | Also of note is that half of the Mac Studio's case is | dedicated to cooling. Up to this point, all M1 Max benchmarks | are within laptops while all Threadripper benchmarks are in | desktops. The M1 Max in the Mac Studio will probably perform | better than expected. | tacLog wrote: | This is sound logic and probably be the case but I wonder | if this effect will be less than what we have seen in the | past because of the reduced TDP of the M1 processors in | general. | | Maybe the cooling and power delivery difference between | laptop formfactors and PC formfactors will be less with | these new arm based chips. | runako wrote: | Having not seen benchmarks, I would imagine that claimed memory | bandwidth of ~800 GB/s vs Threadripper's claimed ~166 GB/s | would make a significant difference for a number of real-world | workloads. | paulmd wrote: | Someone will probably chime in and correct me (such is the | way of the internet - Cunningham's Law in action) but I don't | think the CPU itself can access all 800 GB/s? I think someone | in one of the previous M1 Pro/Max threads mentioned that | several of the memory channels on Pro/Max are dedicated for | the GPU. So you can't just get a 800 GB/s postgres server | here. | | You could still write OpenCL kernels of course. Doesn't mean | you _can 't_ use it, but not sure if it's all just accessible | to CPU-side code. | | (or maybe it is? it's still a damn fast piece of hardware | either way) | runako wrote: | Fascinating! | | Linking this[1] because TIL that the memory bandwidth | number is more about the SoC as a whole. The discussion in | the article is interesting because they are actively trying | to saturate the memory bandwidth. Maybe the huge bandwidth | is a relevant factor for the real-world uses of a machine | called "Studio" that retails for over $3,000, but not as | much for people running postgres? | | 1 - https://www.anandtech.com/show/17024/apple-m1-max- | performanc... | crest wrote: | On an M1 Max MacBook Pro the CPU (8P+2E) cores peak at a | combined ~240GB/s the rest of the advertised 400GB/s memory | bandwidth is only useable by the other bus masters e.g. | GPU, NPU, video encoding/decoding etc. | paulmd wrote: | So now the follow-on question I really wanted to ask: if | the CPU can't access all the memory channels does that | mean it can only address a fraction of the total memory | as CPU memory? Or is it a situation where all the | channels go into a controller/bus, but the CPU link out | of the controller is only wide enough to handle a | fraction of the bandwidth? | brigade wrote: | It's more akin to how on Intel, each core's L2 has some | maximum bandwidth to LLC, and can't individually saturate | the total bandwidth available on the ring bus. But Intel | doesn't have the LLC <-> RAM bandwidth for that to be | generally noticeable. | kiratp wrote: | My workstation has a 3990x. | | Our "world" build is slightly faster on my M1 Max. | | https://twitter.com/kiratpandya/status/1457438725680480257 | | The 3990x runs a bit faster on the initial compile stage but | the linking is single threaded and the M1 Max catches up at | that point. I expect the M1 Ultra to crush the 3990x on compile | time. | howinteresting wrote: | Try mold. | petecooper wrote: | >Try mold | | Curiosity got the better of me: | | https://github.com/rui314/mold | kiratp wrote: | We plan to move to it once MacOS support lands (for the | laptops). | fivea wrote: | > The 3990x runs a bit faster on the initial compile stage | but the linking is single threaded and the M1 Max catches up | at that point. | | Isn't linking IO-bound? | codeflo wrote: | For a clean build and a reasonably specced machine, all the | intermediate artifacts will still be in the cache during | linking. | kiratp wrote: | Exposing my limited understanding of that level of the | computing stack - it is but Apple seems to have very very | good caching strategies - filesystem and L1/2/3. | | https://llvm.org/devmtg/2017-10/slides/Ueyama-lld.pdf | | There is a breakdown in those slides discussing what parts | of lld are single threaded and hard to parallelize so I | suspect single thread performance plays a big role too. I | generally observe one core pegged during linking. | fivea wrote: | > Exposing my limited understanding of that level of the | computing stack - it is but Apple seems to have very very | good caching strategies - filesystem and L1/2/3. | | That would mean that these comparisons between | Threadripper and the M1 Ultra do not reflect CPU | performance but instead showcase whatever choice of SSD | they've been using. | nicoburns wrote: | https://github.com/rui314/mold would suggest otherwise. | Massive speedups by multithreading the linker. I think | traditional linkers just aren't highly optimised. | gjsman-1000 wrote: | It is worth noting that it has, at least according to Apple's | graphs, slightly more than an RTX 3090 in graphics performance. | | So, even if it doesn't quite beat Threadripper in the CPU | department - it will absolutely _annihilate_ Threadripper in | anything graphics-related. | | For this reason, I don't actually have a problem with Apple | calling it the fastest. Yes, Threadripper might be marginally | faster in real-world work that uses the CPU, but other tasks | like video editing, graphics, it won't be anywhere near close. | komuher wrote: | It wont be even close to RTX 3090 looking at m1 max and using | same scaling maximum it can be close to 3070 performance. | | We all need to take Apple claims with grain of salt as they | are always cherrypicked so i wont be surprise if it wont be | even 3070 performance in real usage. | Teknoman117 wrote: | I'm obviously going to reserve judgement until people can get | their hands on them. Apple makes good stuff but their keynote | slides are typically heavily cherrypicked (e.g. our video | performance numbers compare our dedicated ASIC to software | encoding on a different architecture even though competing | ASICs exist kinds of things). | gtvwill wrote: | randyrand wrote: | These CPU names are terrible. When did Apple get bad at naming | things? | mrcwinn wrote: | Pro, Max, Ultra. | | The board has been set. M1 Endgame is nearly ready. | Damogran6 wrote: | With Ultra Fusion....UltraFabric next to stack them all | vertically. | jmull wrote: | Superman could've kicked the crap out of Ultraman, just FYI. | [deleted] | neycoda wrote: | I really wish popular companies would focus more on software | optimization than hardware renovations. It's really sad to see | how fast products die due to increasingly bloated and overly- | complex software. | zwaps wrote: | So then, when will common ML frameworks work on Apple? I guess | compiled Tensorflow works with some plugins or whatever, where | afaik performance is still subpar. Apple emphasizes that they | have this many Tensorcores... but unfortunately to use them one | has to roll one's own framework on what Swift or something. I am | sure it gets better soon. | contingencies wrote: | Desktop marketing seems to be getting desperate. Few people use | the apps they show in the Mac Studio benchmarks. Fewer still care | that their chips use less power... if they did, they would stay | on their phones. | etchalon wrote: | The "few people" who use those apps are the people Apple is | selling these systems to. | Spooky23 wrote: | Large businesses do. I bet in some places you could get | grant/loan incentives to replace a PC fleet with these things. | | Back in the Pentium-4 days, iirc I was able to get almost $250k | in grants and $1.5M in subsidized loans to do accelerated | refresh of a PC fleet and small datacenter, all through a | utility's peak load reduction program. | contingencies wrote: | I don't deny such things happen but it's illogical. If you | have 20 people on desktops a small fraction of them will use | more energy for microwaving lunch, making coffee or on air | conditioning than they will save in aggregate on this nominal | reduction in power draw. | AdrianB1 wrote: | Microwaving lunch: 2 minutes at 800W; desktop with monitor: | 8 hours at 150W, that is 45 times higher. Similar for | coffee, no simple math for AC. If you can reduce 150W to | 80W, it is both significant and achievable - this is what | my desktop usually draws. | Spooky23 wrote: | 20, yes, it's a waste of time. At 2000, reducing power | consumption by 30%, may yield $100k annually. | [deleted] | yoloyoloyoloa wrote: | MKUltra was a better chip | [deleted] | Mikeb85 wrote: | Ugh really wish that a non-Apple vendor could make an ARM chip of | this calibre. Jealous but cat bring myself to use a proprietary | OS and get locked into Apple. | pantalaimon wrote: | There is always Asahi Linux | kristianp wrote: | I wonder what clock rate the studio runs these chips at with the | extra cooling. Frustrating that the marketing materials don't | mention that. | top_sigrid wrote: | This is a very little discussed question but one of the most | interesting unknowns I think. | | The pro and max come only in laptops, so the cooling difference | should be quite significant, but also there is more chip and an | interconnect to cool. Really looking forward to the in depth | analysis of this. | Synaesthesia wrote: | Apple have some serious chip design abilities. Imagine if they | entered the server market, with this architecture it could be | very successful. | throwawayboise wrote: | They tried that before and flopped. | | The server market is different. Companies buy servers from the | low bidder. Apple has never really played in that market. | flatiron wrote: | People care about performance per watt now. So they could | compete. The real question is if they would support Linux. In | our containerized world I can't see their servers getting | super big running macOS | greenknight wrote: | The reason they dominate at PPW, is because they are on | TSMCs 5nm process. No one else has made a cpu chip on this | process yet. AMD are scheduled for later this year (they | are currently using 7nm). | | It will be interesting to see the difference in performance | and performance per watt, when both companies are on the | same node. | flatiron wrote: | Arm I believe helps a bit as well. | stjohnswarts wrote: | This is pretty true. While people who buy racks consider | vertical improvements they tend to think laterally and how | easy is it to expand (aka how cheap is the +1 server) | alwillis wrote: | That was before people cared about performance per watt. | | Besides for some use cases, these Mac Studios will be racked | and in data centers as is. | renewiltord wrote: | Haha, bloody hell, what a monster of a chip. I find my M1 Max | already remarkably fast. The change is so huge. It's like in the | old days when you'd get a new computer and it felt like it could | do things before you could think of doing them. | | But surely the GPU things can't be real? The GPU in the M1 Ultra | beats the top-of-the-line Nvidia? That's nuts. | acchow wrote: | > The GPU in the M1 Ultra beats the top-of-the-line Nvidia? | That's nuts. | | We don't know yet. Apple is benchmarking against Workstation | graphics cards | | "production 2.5GHz 28-core Intel Xeon W-based Mac Pro systems | with 384GB of RAM and AMD Radeon Pro W6900X graphics with 32GB | of GDDR6" | sercand wrote: | > Highest-end discrete GPU performance data tested from Core | i9-12900K with DDR5 memory and GeForce RTX 3090. | | From the linked article. Apple is comparing against RTX 3090. | lastdong wrote: | Nvidia 3090, I wonder what Relative Performance equates to. | | Can't wait for the (real world) reviews to be published | lastdong wrote: | Just to add in any case Apple is solving a big problem | related to limited GPU memory, which is quite cool | | Hopefully AMD, Nvidia, others can follow the trend | bkyiuuMbF wrote: | > But surely the GPU things can't be real? The GPU in the M1 | Ultra beats the top-of-the-line Nvidia? | | Dubious. https://www.pcgamer.com/apple-m1-max-nvidia- | rtx-3080-perform... | xsmasher wrote: | Just for clarity, that article is about the M1 Pro and the M1 | Max chips from October. | brutal_boi wrote: | From the article: | | > Apple even says its new GPU is a match for Nvidia's RTX | 3080 mobile chip, though you'll have to take Apple's word for | it on that one. We've also reached out to Nvidia to see what | it might have to say on the matter. | | > RTX 3080 mobile chip | | > mobile chip | | There's a 50%[1] (!) difference with mobile and non-mobile | versions of the chip. So that's hardly a deal breaker. | | [1] https://www.videocardbenchmark.net/high_end_gpus.html | gowld wrote: | The "mobile" scam in GPUs is terrible. Nvidia flat out lies | about Mobile performance by giving misleading product names | (same as the desktop names). | LegitShady wrote: | its beyond that. the same chip might have several tdps | and drastic performance differences between models, such | that a high tdp 3070 mobile is faster than a low tdp | 3080. you end up having to get benchmarks for each | particular laptop configuration. | Omniusaspirer wrote: | Based on Anandtech benchmarks the M1 Max GPU is basically on | par with a mobile 3080, which a quick search tells me is about | 60% as fast as a desktop 3080. Not unreasonable to believe 2 of | them combined will outperform a 3090- with nearly 128 GB of | VRAM to boot. | | Even more incredible Anandtech reports the M1 max GPU block | maxing at 43W in their testing. So a 90W GPU in the M1 Ultra is | trading blows with a 350+ watt 3090. | | 1) https://www.anandtech.com/show/17024/apple-m1-max- | performanc... | cassac wrote: | What on earth are you talking about. That link shows it's not | even half as fast as the 3060, let alone the 3080. | | In borderlands it got 24 FPS while the 3080 got 52 FPS. How | is that on par? | squeaky-clean wrote: | If you buy a Mac for gaming, you're going to have a bad | time. Look at the GFXBench 5.0 benchmark. The first graph | on the page. | [deleted] | Omniusaspirer wrote: | Gaming benchmarks are completely irrelevant when discussing | the actual raw power of the GPU. As the other commenter | said- look at the actual GPU benchmark in the first graph. | | Legacy games written for x86 CPUs obviously are going to | perform poorly. I recommend you actually read the review | and don't just scroll to the worst gaming benchmark you can | find. | cassac wrote: | There are only two real gaming benchmarks and they are | both real bad for the M1. In Tomb Raider it fairs even | worse at 4K than it does in borderlands. | | It's a great chip but it doesn't trade blows with | anything Nvidia puts out especially at comparable price | points. | | Maybe you buy things to run benchmarks. I buy them to run | the software I own. For games they come up short on fps | and high on price. That is the inverse of what I'm | looking for. | Omniusaspirer wrote: | If your interest is purely in playing unoptimized games | coded for different architectures then absolutely there's | better options. | | However if your workloads are in a more professional | domain as mine are then it's entirely fair to say this | chip is trading blows with Nvidia's best at lower prices. | Don't forget this is an entire SOC and not just a GPU, | power saving aren't irrelevant either if you actually | work your hardware consistently as I do. | mhh__ wrote: | > Gaming benchmarks are completely irrelevant when | discussing the actual raw power of the GPU. | | Maybe, but the "raw power" is useless if it can't be | exploited. | | > Legacy games written for x86 CPUs obviously are going | to perform poorly. | | Not if they're GPU-bound. Even native performance isn't | that impressive | Omniusaspirer wrote: | If the power is substantial enough it will get exploited | eventually. Hopefully even if Metal ports don't occur the | eventual Asahi-adjacent open source drivers will open the | gaming doors. | Macha wrote: | GPU scaling is absolutely not linear in that way. nvidia gave | up on that in recent generations as without software support | to match, you had situations where double 1080s were 95% as | fast as one 1080 with worse frame times. | | Might be nice for e.g. ML where you can effectively treat | them as entirely independent GPUs but for games I would be | surprised if this matches a high end GPU. | vimy wrote: | macOS will see it as one gpu. | teilo wrote: | Given that it's basically double the performance of the Max, | with massive memory bandwidth, seems reasonable to me. But | Apple always fudges things a bit. Like, which Nvidia exactly is | this being compared to, and under what workload exactly? | make3 wrote: | the problem on mac is the super tiny game selection | Thaxll wrote: | > But surely the GPU things can't be real? The GPU in the M1 | Ultra beats the top-of-the-line Nvidia? That's nuts. | | People that game on Mac know it's a lie, GPU for gaming on mac | is vastly slower than recent graphic cards. | [deleted] | jlouis wrote: | Insane claims requires insane evidence. We don't have that | there. | | For some workloads i would not be surprised at all. But for all | workloads, ... | [deleted] | thfuran wrote: | That thing has four times as many transistors as a 3090. | maronato wrote: | Although true, transistor count is only tangentially related | to performance | mhh__ wrote: | Cache size is _very_ related to performance. | thfuran wrote: | It's certainly not the sole determinant of performance but | given two reasonably solid designs, one with a vastly | larger transistor budget and major node advantage to boot, | I know which one I'd pick as likely winner. | davrosthedalek wrote: | That counts memory, right? | wmf wrote: | No, but it does include cache; the M1 Ultra should have | 96MB of cache (>6B transistors) while Nvidia GPUs have | relatively little cache. 128GB of DRAM has 1 trillion | transistors. | swyx wrote: | was surprised to learn that the CPUs and GPUs on the M1x chips | are essentially a single unit, and for the M1 Ultra they | basically slapped two M1's together. | | in traditional PC building, the CPU is quite distinct from the | GPU. can anyone ELI5 what the benefits are to having the CPU | closely integrated with GPU like the M1 has? seems a bit unwieldy | but i dont know anything about computer architecture | Koshkin wrote: | I remember how AMD 3DNow! and Intel MMX were meant to render | GPUs obsolete. | tediousdemise wrote: | How does Apple manage to blow every other computing OEM out of | the water? What's in the secret sauce of their company? | | Is it great leadership? Top tier engineering talent? Lots of | money? I simply don't understand. | KaiserPro wrote: | marketing and a forgiving audience. | | You have to remember that since the 2014 retina, Apple's | offerings have been a bit crap. | | This is a return to form ( and a good one at that) but its not | worthy of hero worship. They've done a good job turning things | around, which is very hard. | yurishimo wrote: | I think it's mostly engineering and the cash to make things | happen. You heard it today in the presentation that since they | launched M1, sales have skyrocketed for Apple computers. | | Hopefully leadership is really looking hard at this trend and | adjusting future offerings accordingly. Consumers WANT machines | with high performance and great I/O and they're willing to pay | for them. | | With Apple, Intel, and AMD really stepping up the last couple | of years, I think the next decade of personal computing is | going to be really exciting! | stalfosknight wrote: | Put simply, it is vertical integration paired with management | that is adept at playing the long game. | lotsofpulp wrote: | Willingness to risk a ton of capital over many years into | developing hardware. | amilios wrote: | D) all of the above? | ThrowawayR2 wrote: | If I had to guess, their secret sauce is that 1) they're paying | lots of money to be on a chip fabrication node ahead of both | AMD and Intel, 2) since their chip design is in-house, they | don't have to pay the fat profit margin Intel and AMD want for | their high-end processors and can therefore include what is | effectively a more expensive processor in their systems for the | same price, and 3) their engineering team is as good as | AMD/Intel. Note that the first two have more to do with | economics rather than engineering. | [deleted] | forgotmyoldacc wrote: | Apple isn't an OEM? They don't sell products that are marketed | by another company. | xyst wrote: | "M1 Ultra Pro Max", wen? | | Naming scheme aside, this is great! | iskander wrote: | So little memory? | | The now outdated Mac Pro goes up to 1.5TB, only 128GB available | here. | masklinn wrote: | Their design is basically gated by the number of memory | controllers: 1MC tops out at 16GB (M1), 2MCs for 32 (Pro), 4 | MCs for 64 (Max), and I guess 8 MCs for 128 (Ultra which is | apparently two Maxes stapled together). | | Hopefully the next gen will provide more capable and flexible | memory controllers, both so they can scale the top end for a | full Pro-scale offering, and so there is more memory | flexibility at the lower end e.g. the ability to get an M1 with | 32+GB RAM, or a Pro with 64+. | bengale wrote: | They mentioned the Mac Pro replacement is still to come. | fulafel wrote: | "Apple's innovative packaging architecture that interconnects the | die of two M1 Max chips to create a system on a chip (SoC)" | | Did they get their terminology confused? Later it says "By | connecting two M1 Max die with our UltraFusion packaging | architecture [...]" which also sounds like it's a MCM and not a | SoC. | crazypython wrote: | Intel 12th gen i9 is 11% better at single core and 42% slower at | multicore. https://www.cpu-monkey.com/en/compare_cpu- | intel_core_i9_1290... | | For most non-parallel tasks, my guess is the Intel 12900K will | beat at performance. | | Intel's next generation will have 50% more cores and beat this | chip at multithreading. | teilo wrote: | And overnight, Intel's Ice Lake is again way behind. | pjmlp wrote: | 80% of the desktop market and 100% of cloud deployments won't | care. | brailsafe wrote: | Would 80% of "the desktop market"--whatever that means--care | about Ice Lake to begin with, or any high end chip at all? | pjmlp wrote: | What they definitely won't care is Apple hardware at Apple | prices, specially outside first world countries. | manquer wrote: | Desktop will care about noise of the fans. | | Data centers are also pretty conscious of power consumption, | more power means more cooling infra required and higher | energy bill, while it is not the top priority it certainly is | a significant factor in decision making. | fastball wrote: | I dunno, I think cloud is starting to think more and more | about power consumption of the chips used, where Apple | Silicon blows the competition out of the water. | pjmlp wrote: | Let us know when Apple starts a cloud business. | Thaxll wrote: | PC user don't pay 4k for a computer, on PC you can get 2x the | speed for 2x less the price. | samatman wrote: | Link me to the $2K computer that's twice as fast as the M1 | Ultra. Take all the time you need. | squeaky-clean wrote: | > Take all the time you need | | Check your replies in 10 years and I'll be able to list a | dozen ;P | | But sarcasm aside yeah this chip looks insane. | Thaxll wrote: | It took less than 6month to have a faster amd / intel CPU | than the m1 back then, Apple charts are showing | performance / watt which for a desktop PC is kind of | irrelevant. In pure speed amd / intel are faster or will | be very soon. | | For graphic card I don't try to argue because fps on Mac | are very inferior in games than a average modern card. | It's not even on the same league. | ishansharma wrote: | May I ask the $2000 desktop configuration with 2x the speed? | | Of course Apple chips won't work well for gaming, but what | other benchmarks will this $2000 desktop win? | joshstrange wrote: | > PC user don't pay 4k for a computer | | I'm almost certain that's not true, especially for machines | that would compete with the Studio | | > on PC you can get 2x the speed for 2x less the price. | | Citation needed. This hasn't been true for a long time as far | as I can tell. | [deleted] | maronato wrote: | This hasn't been true since M1's release | Thaxll wrote: | A CPU like the 5900x is better than the m1 and cost 400$. | yurishimo wrote: | That's one part of the equation though, not to mention | it's a desktop chip. I can get a laptop with an M1 Max | with hours of battery life running full tilt. | | Your $400 CPU needs at least another $1000 in parts just | to boot (and those aren't even the parts you likely want | to pair with it). | | Your cost comparison is silly. Nobody compares singular | CPUs to entire machines. | mhh__ wrote: | If you're going to be smug why not use a recent Intel chip? | adfgadfgaery wrote: | Ice Lake shipped in 2019. The current generation is Alder Lake, | which is slightly ahead of the M1 in single-threaded | performance according to most benchmarks. | teilo wrote: | My bad. I meant Alder Lake. | sharikous wrote: | And massively behind in terms of power consuption | adfgadfgaery wrote: | Yes, certainly. I don't think that's relevant in this case, | though. Why would anyone care if their workstation CPU | draws 60W or 200W? It's easy to cool in either case and the | power consumption is trivial. | | M1 is clearly the best design on the market for mobile | devices and is merely _very good_ for desktops. Let 's keep | the enthusiasm realistic. | manquer wrote: | Higher power means more cooling, which usually means more | noise. A lot of people find value in quieter machines. | secondcoming wrote: | >Why would anyone care if their workstation CPU draws 60W | or 200W? | | I care. I work from home and my main power sink is my | desktop. Considering the soaring energy prices these days | I really do care about what my usage is. | abletonlive wrote: | You say it's easy to cool but that's actually not the | case, for anybody that cares about noise. Any music | studio is going to happily take the 60W over 200W because | they record and monitor music and need the quietest | machine possible in the room. | | Unsurprisingly, it's called Mac _Studio_ , as in music | studio, or art studio, or what have you studio, where | these things matter. | | This is a machine aimed at content creators. | mhh__ wrote: | Its behind M1 but it's worth pointing out that Alder Lake | is not a power-hog on "normal" workloads i.e. gaming when | compared to it's other X86 competitors. It only starts | cooking itself on extremely heavy workloads. | didip wrote: | The power leveling in this chip's naming scheme can rival Dragon | Ball Z. | amne wrote: | How many CUDA cores? It's over ninethousaaaaa .. oh wait | nevermind! | Jetrel wrote: | I for one am holding out for the ULTRA GIGA chips. | willis936 wrote: | My first thought was "does it include an LSD subscription?". | ccwilson10 wrote: | I wish HN was like this more often | 0xbadcafebee wrote: | Be the comments you want to see in the HN | et-al wrote: | I don't. | | There's already Reddit if you want to crack puns and farm | karma. Let's try to keep the signal:noise ratio higher here. | rpmisms wrote: | I appreciate that it's infrequent. Sure, it's fun to blow off | some steam and have a laugh, but that's fundamentally not | what this place is about. Confining it to Apple release | threads makes it more of a purge scenario. | technocratius wrote: | I really hope it won't. Let's cherish the high quality | comments of HN. Once this comment section becomes a karma-fed | race to the bottom driven by who can make the most memeable | jokes, it will never recover. Case in point: Reddit. | stjohnswarts wrote: | the M1-O9000 | t_mann wrote: | Would be interesting to get more info on the neural engine. On | one hand, I find it fascinating that major manufacturers are now | putting neural architectures into mainstream hardware. | | On the other hand I wonder what exactly it can do. To what degree | are you tied into a specific neural architecture (eg recurrent vs | convolutional), what APIs are available for training it, if it's | even meant to be used that way (not just by Apple-provided | featues lke FaceID)? | zitterbewegung wrote: | It's a general purpose accelerator. You have coremltools[1] to | convert your trained model into a format or you can make your | own using CreateML[2]. | | [1] https://coremltools.readme.io/docs | | [2] https://developer.apple.com/machine-learning/create-ml/ | zozbot234 wrote: | Typical "neural engines" are intended for real-time network | inference, not training. Training is highly parallel and | benefits more from GPU-like vector processing. | mlajtos wrote: | Apple is pushing for training & fine-tuning on the devices | too. | | https://developer.apple.com/documentation/coreml/model_custo. | .. | slimsag wrote: | They're doing quite a lot of work here: | | https://developer.apple.com/machine-learning/ | | https://developer.apple.com/machine-learning/create-ml/ | | https://developer.apple.com/documentation/createml | slmjkdbtl wrote: | Curious about the naming here, terms like "pro" "pro max" "max", | "ultra" (hopefully there's no "pro ultra" and "ultra max" in the | future) is very confusing and hard to know which one is more | powerful than which, or if it's a power-level relationship. Is | this on purpose or it's just bad naming? Is there example of good | naming for this kind of situation? | polyrand wrote: | I think the GPU claims are interesting. According to the graph's | footer, the M1 Ultra was compared to an RTX 3090. If the | performance/wattage claims are correct, I'm wondering if the Mac | Studio could become an "affordable" personal machine learning | workstation (which also won't make the electricity bill | skyrocket). | | If Pytorch becomes stable and easy to use on Apple Silicon | [0][1], it could be an appealing choice. | | [0]: | https://github.com/pytorch/pytorch/issues/47702#issuecomment... | [1]: https://nod.ai/pytorch-m1-max-gpu/ | whoisburbansky wrote: | Cursory look gives you a ~$3500 price tag for a gaming PC with | a 3090 [1], vs. at least $4k for a Mac Studio with an M1 Ultra. | Roughly the same ballpark, but I wouldn't call the M1 Ultra | more affordable given those numbers. | | 1. https://techguided.com/best-rtx-3090-gaming- | pc/#:~:text=With.... | BugsJustFindMe wrote: | > _Cursory look gives you a ~$3500 price tag for a gaming PC | with a 3090_ | | That 3500 is for a DIY build. So, sure, you can always save | on labor and hassle, but prebuilt 3090 rigs commonly cost | over 4k. And if you don't want to buy from Amazon because of | their notorious history of mixing components from different | suppliers and reselling used returns, oof, good luck even | getting one. | airstrike wrote: | You mean I get to save _AND_ have fun building my own PC? | _joel wrote: | FSVO fun if you use Newegg | unicornfinder wrote: | Not to mention if you build your own PC you can upgrade | the parts as and when, unlike with the new Mac where | you'll eventually just be replacing the whole thing. | ricardobeat wrote: | Since the context here is using these machines for work, | a mid-level engineer will easily cost an extra $1000 in | his own time to put that together :) | sudosysgen wrote: | Prebuilt 3090 builds can often be found for less than the | cost of the corresponding parts. | gjsman-1000 wrote: | They are also absolutely massive and probably much more | expensive long-term because of the massively increased | electricity usage. | kllrnohj wrote: | Unless you're running a farm of these, the power cost | differences is going to be largely unnoticeable. Like even | in a country with very expensive power, you're talking a | ~$0.10/USD per hour premium to have a 3090 at full bore. | And that's assuming the M1 Ultra manages to achieve the | same performance as the 3090, which is going to be | extremely workload dependent going off of the existing M1 | GPU results. | FridgeSeal wrote: | Hahaha good luck getting your hands on a 30xx series card | though. | | Here in Australia, 3090's go for close to 3k on their own. | dmz73 wrote: | And cheapest Mac Studio with M1 ultra is A$6000 so yes.... | | 20-Core CPU 48-Core GPU 32-Core Neural Engine | 64GB unified memory 1TB SSD storage1 Front: | Two Thunderbolt 4 ports, one SDXC card slot Back: | Four Thunderbolt 4 ports, two USB-A ports, one HDMI port, | one 10Gb Ethernet port, one 3.5-mm headphone jack | | A$6,099.00 | sorry_outta_gas wrote: | We've been buying tons of 3090s at work for about 1.6 USD- | 2k USD without to much trouble | nightfly wrote: | > tons | BitwiseFool wrote: | You better not be working at a mining facility.... /s? | alasdair_ wrote: | Note the label on the y-axis. "relative performance" from | "0-200" seems like marketing bullshit to me. | | "M1 Ultra has a 64-core GPU, delivering faster performance than | the highest-end PC GPU available, while using 200 fewer watts | of power." | | Note that they say "faster performance" not "more performance". | What does "faster" mean? Who knows! | savant_penguin wrote: | And hopefully not make you deaf with their buzzing fans | kllrnohj wrote: | The GPU claims on the M1 Pro & Max were, let's say, cherry | picked to put it nicely. The M1 Ultra claims already look | suspicious since the GPU graph tops out at ~120W & the CPU | graph tops out at ~60W yet the M1 Studio is rated for 370W | continuous power draw. | | Since you mention ML specifically, looking at some benchmarks | out there (like https://tlkh.dev/benchmarking-the- | apple-m1-max#heading-gpu & | https://wandb.ai/tcapelle/apple_m1_pro/reports/Deep-Learning... | ), even if the M1 Ultra is 2x the performance of the M1 Max (so | perfect scaling), it would still be _far_ behind the 3090. Like | completely different ballpark behind. But of course there is | that price & power gap, but the primary strength of the M1 | GPUs seems to really be from the essentially very large VRAM | amount. So if your working set doesn't fit in an RTX GPU of | your desired budget, then the M1 is a good option. If, however, | you're not VRAM limited, then Nvidia still offers far more | performance. | | Well, assuming you can actually buy any of these, anyway. The | M1 Ultra might win "by default" by simply being purchasable at | all unlike pretty much every other GPU :/ | brigade wrote: | 100W of that is probably for the USB ports; afaik TB4 ports | are required to support 15W, and I don't think there's been a | Mac didn't support full power simultaneously across all | ports. (if that's even allowed?) | joshspankit wrote: | Watching the keynote I was almost thinking that Nvidia missed | the boat when they chose not to sign whatever they had to to | make OSX drivers. | | Thank you for recalibrating me to actual reality and not | Apple Reality (tm) | Macha wrote: | An M1 Ultra is $2000 incrementally over a M1 Max, so there is | no price gap, even with the inflated prices 3090s actually go | for today. | apohn wrote: | The 3090 also can do fp16 and the M1 series only supports | fp32, so the M1 series of chips basically needs more RAM for | the same batch sizes. So it isn't an Oranges to Oranges | comparison. | | Back when that M1 MAX vs 3090 blog post was released, I ran | those same tests on the M1 Pro (16GB), Google Colab Pro, and | free GPUs (RTX4000, RTX5000) on the Paperspace Pro plan. | | To make a long story short, I don't think buying any M1 chip | make senses if your primary purpose is Deep Learning. If you | are just learning or playing around with DL, Colab Pro and | the M1 Max provide similar performance. But Colab Pro is | ~$10/month, and upgrading any laptop to M1-Max is at least | $600. | | The "free" RTX5000 on Paperspace Pro (~$8 month) is much | faster (especially with fp16 and XLA) than M1 Max and Colab | Pro, albeit the RTX5000 isn't always available. The free | RTX4000 is also a faster than M1 Max, albeit you need to use | smaller batch sizes due to 8GB of VRAM. | | If you assume that M1-Ultra doubles the performance of M1-Max | in similar fashion to how the M1-Max seems to double the gpu | performance of the M1-Pro, it still doesn't make sense from a | cost perspective. If you are a serious DL practitioner, | putting that money towards cloud resources or a 3090 makes a | lot more sense than buying the M1-Ultra. | Koshkin wrote: | For some definitions of "affordable." | forgotmyoldacc wrote: | Neural Engine cores are not accessible for third party | developers, so it'll be severely constrained for practical | purposes. Currently the M1 Max is no match for even last | generation mid-tier Nvidia GPU. | viktorcode wrote: | They are accessible to third party developers, only they have | to use CoreML. | komuher wrote: | xD | LegitShady wrote: | I always take such claims with a grain of salt anyways. It | usualy on one specific benchmark. I wait for better benchmarks | always instead of trusting the marketing | pathartl wrote: | Even if their claims are accurate, it usually has the | asterisk of *Only with Apple Metal 2. I honestly cannot | understand why Apple decided they needed to write their own | graphics API when the rest of the world is working hard to | get away from the biggest proprietary graphics API. | moralestapia wrote: | RIP Intel | MangoCoffee wrote: | not yet. Intel Alder lake have mostly positive reviews. | cube2222 wrote: | Looks like all the people saying "just start fusing those M1 | CPU's into bigger ones" were right, that's basically what they | did here (fused two M1 Max'es together). | | And since the presenter mentioned the Mac Pro would come on | another day, I wonder if they'll just do 4x M1 Max for that. | iSnow wrote: | >I wonder if they'll just do 4x M1 Max for that. | | They'll be running out of names for that thing. M1 Ultra II | would be lame, so M1 Extreme? M1 Steve? | BurningFrog wrote: | "M1 More", would show Apple is fun again! | tiernano wrote: | M1 Max Pro... :P | ceejayoz wrote: | "iPhone 14 Pro Max, powered by the M1 Max Pro". | bee_rider wrote: | It seems kind of strange to have the "A" line go from | smartphones to... iPads, and then have the "M" line go all | the way from thin-and-lights proper workstations. Maybe they | need a new letter. Call it the C1 -- "C" for compute, but | also for Cupertino. | gordon_freeman wrote: | M1 Hyper or M1 Ludicrous:) | jazzyjackson wrote: | M1 Houndstooth | bacro wrote: | M1 God | rootusrootus wrote: | > M1 Steve | | That would be the funniest thing Apple has done in years. I | totally support the idea. | Isamu wrote: | Pro < Max < Ultra < Ne Plus Ultra < Steve | jhgb wrote: | And Steve < Woz, perhaps? | sdenton4 wrote: | Just need to increment the 1 instead... Eventually moving | into using letters instead of numbers, until we end up with | the MK-ULTRA chip. | Roboprog wrote: | I suspect they will have a different naming convention | after they get to M4. | | There might be some hesitance installing an M5. You should | stay out of the way if the machine learning core needs more | power. | | I guess by the time they get to M5, anyone old enough to | get the reference will have retired. | jckahn wrote: | That would really be a trip! | NoSorryCannot wrote: | M1 Magnum XL | ceva wrote: | Epic M1 fit good | concinds wrote: | I like "X1" way more than "M2 Extreme". | stretchwithme wrote: | I like M1 Steve, as it can honor two people. | rpmisms wrote: | Steve would be awesome, but a deal with Tesla to use "Plaid" | would be perfection. | KerrAvon wrote: | I would think they could just go to Mel Brooks instead of | dealing with Tesla. | TylerE wrote: | Never happen. | | Elon is somewhat toxic these days... | bobsil1 wrote: | M1 Plaid | bacro wrote: | M1 Greta in 2030 (when it is "carbon neutral") | randomdata wrote: | iM1 Pro. | tsuru wrote: | I'm pretty sure all their messaging is preparing us for "M1 | Outrageous" | chaosharmonic wrote: | Super M1 Turbo HD Remix | theyeenzbeanz wrote: | M1 Ludicrous the IV | gonzo wrote: | Maximum Plaid | mhb wrote: | Just get the guys who came up with the new name for the | iPhone SE working on it. Oh, wait. | MangoCoffee wrote: | its a chiplet design. whenever people ask what we going to do | after 1nm...well, we can combine two chips into one | ksec wrote: | >Looks like all the people saying "just start fusing those M1 | CPU's into bigger ones" were right, | | Well they were only correct that Apple managed to hide a whole | section of Die Image. ( Which is actually genius ) Otherwise it | wouldn't have made any sense. | | Likely to be using CoWoS from TSMC [1] since the bandwidth | numbers fits. But needs further confirmation. | | [1] https://en.wikichip.org/wiki/tsmc/cowos | kodah wrote: | I've been using a Vega-M for some time which I think follows | this model. It's really great. | kzrdude wrote: | Throwing more silicon at it, like this, sounds extremely | expensive or price-inefficient. | | It's at least two separate chips combined together. That makes | more sense, mitigates the problem. | 2OEH8eoCRo0 wrote: | Right. Still riding gains from the node shrink and on package | memory. | | Could AMD/Intel follow suit and package memory as an additional | layer of cache? I worry that we are being dazzled by the | performance at the cost of more integration and less freedom. | scns wrote: | The next CPU coming from AMD will be the 5800X3D with 96MB | cache. They stack 64MB L3 on top. Rumours say it comes out | 20th of April. | | edit: typo + stacking + rumoured date | zitterbewegung wrote: | They might have to have the unified memory more dense to get to | 1.5 TB max of RAM on the machine (also since this would be | originally shared with a GPU). Maybe they could stack the RAM | on the SoC or just get the RAM at a lower process node. | paulmd wrote: | The M1 Max/Ultra is already extremely dense design for that | approach, it's really almost as dense as you can make it. | There's packages stacked on top, and around, etc. I guess you | could put more memory on the backside but that's not going to | do more than double it, assuming it even has the pinout for | that (let's say you could run it in clamshell mode like GDDR, | no idea if that's actually possible, but just | hypothetically). | | The thing is they're at 128GB which is way way far from | 1.5TB. You're not going to find a way to get 12x the memory | while still doing the embedded memory packages. | | Maybe I'll be pleasantly surprised but it seems like they're | either going to switch to (R/LR)DIMMs for the Mac Pro or else | it's going to be a "down" generation. And to be fair that's | fine, they'll be making Intel Mac Pros for a while longer | (just like with the other product segments), they don't have | to have _every single_ metric be better, they can put out | something that only does 256GB or 512GB or whatever and that | would be fine for a lot of people. | my123 wrote: | > You're not going to find a way to get 12x the memory | while still doing the embedded memory packages. | | https://www.anandtech.com/show/17058/samsung-announces- | lpddr... | | > It's also possible to allow for 64GB memory modules of a | single package, which would correspond to 32 dies. | | It is possible, and I guess that NVIDIA's Grace server CPU | will use those massive capacity LPDDR5X modules too. | | The M1 Ultra has 8 memory packages today, and Apple could | also use 32-bit wide ones (instead of 64-bit) if they want | more chips. | stjohnswarts wrote: | You don't just "fuse" two chips together willnilly. that was | designed in from the beginning for the architecture for future | implementation. | wilg wrote: | https://hypercritical.co/2021/05/21/images/city-of-chiplets.... | kasperni wrote: | > I wonder if they'll just do 4x M1 Max for that. | | Unlikely, M1 Ultra is the last chip in the M1 family according | to Apple [1]. | | "M1 Ultra completes the M1 family as the world's most powerful | and capable chip for a personal computer."" | | [1] https://www.apple.com/newsroom/2022/03/apple- | unveils-m1-ultr... | johnmaguire wrote: | In a previous Apple press release[1] they said: | | > The Mac is now one year into its two-year transition to | Apple silicon, and M1 Pro and M1 Max represent another huge | step forward. These are the most powerful and capable chips | Apple has ever created, and together with M1, they form a | family of chips that lead the industry in performance, custom | technologies, and power efficiency. | | I think it is just as likely that they mean "completes the | family [as it stands today]" as they do "completes the family | [permanently]." | | [1] | https://www.apple.com/newsroom/2021/10/introducing-m1-pro- | an... | | edit: This comment around SoC code names is worth a look too: | https://news.ycombinator.com/item?id=30605713 | paulmd wrote: | That doesn't necessarily rule out more powerful iterations | that also launch under the M1 Ultra branding though. | | (edit: per a sibling comment, if the internals like IRQ only | really scale to 2 chiplets that pretty much would rule it out | though.) | Aaargh20318 wrote: | Probably not on the same design as the current M1 series, | at least not for the Mac Pro. The current x86 pro supports | up to 1.5TB of RAM. I don't think they will be able to | match that using a SoC with integrated RAM. There will | probably be a different CPU design for the Pro with an | external memory bus. | nicoburns wrote: | They also said that the Mac Pro is still yet to transition. | So they'll have to come up with something for that. My | suspicion is that it won't be M branded. Perhaps P1 for pro? | MuffinFlavored wrote: | What is M2 really going to be difference wise? | 1123581321 wrote: | ~15-20% faster if releases start this year, plus whatever | optimizations learned from M1 in wide release such as | perhaps tuning the silicon allocation given the various | system. If next year, M2 or M3 (get it) will use Taiwan | Semi's so-called 3nm, which should be a significant jump | just like 7-5nm several years ago for the phones and iPads. | masklinn wrote: | Hopefully one of the changes of the M2 design will be a | better decorrelation of RAM and cores count. | | They'd need that anyway for a Mac Pro replacement (128GB | wouldn't cut it for everyone), but even for smaller | config it's frustrating being limited to 16G on the M1 | and 32 on the Pro. Just because I need more RAM doesn't | mean I want the extra size and heat or whatever. | bouncing wrote: | For my purposes, the biggest drawback of using an SoC is | being constrained to just the unified memory. | | Since I run a lot of memory intensive tasks but few CPU | or GPU bound tasks, a regular m1 with way more memory | would be ideal. | bpye wrote: | I doubt there will be much learned after actually | shipping M1. Developing silicon takes a long time. I | wouldn't be surprised if the design was more or less | fixed by the time the M1 released. | marcan_42 wrote: | I've been saying 4x M1 Max is not a thing and never will be a | thing ever since the week I got my M1 Max and saw that the | IRQ controller was only instantiated to support 2 dies, but | everyone kept parroting that nonsense the Bloomberg reporter | said about a 4-die version regardless... | | Turns out I was right. | | The Mac Pro chip will be a different thing/die. | bee_rider wrote: | Could they do a multi-socket board for the Mac Pro? | pathartl wrote: | They would never do that | restlake wrote: | They have done this previously for dual socket Xeons. | Historical precedence doesn't necessarily hold here, but | in fact, it's been done on the "cheese graters" | previously | snowwrestler wrote: | Plus they are running out of M1 superlatives. They'll have | to go to M2 to avoid launching M1 Plaid. | aneutron wrote: | Bloomberg brought the Supermicro hit pieces. I personally | can't take them seriously anymore. Not after the second | article with 0 fact checking and sad attempt at an | irrelevant die shot. And their word is certainly irrelevant | against one of people who are working (and succeeding) at | running linux on M1. | wdurden wrote: | Ahhh, reminiscent of the G4 Desktop Supercomputer .. | | https://www.deseret.com/1999/9/1/19463524/apple-unveils-g4-d... | | I kinda believe em this time, but time will tell. | ur-whale wrote: | Where be the Linux distro that can run on an M1 (ultra or | otherwise)? | | Without having to be a kernel hacker, that is. | neogodless wrote: | This is the one that has that as a core goal: | | https://asahilinux.org/ | jedberg wrote: | Given how low the power consumption is for the power you get, I | wonder if we'll see a new push for Mac servers. In an age where | reducing power consumption in the datacenter is an advantage, it | seems like it would make a lot of sense. | nonameiguess wrote: | > M1 Ultra features an extraordinarily powerful 20-core CPU with | 16 high-performance cores and four high-efficiency cores. It | delivers 90 percent higher multi-threaded performance than the | fastest available 16-core PC desktop chip in the same power | envelope. | | Maybe not a _huge_ caveat, as 16-core chips in the same power | envelope probably covers most of what an average PC user is going | to have, but there are 64-core Threadrippers out there available | for a PC (putting aside that it 's entirely possible to put a | server motherboard and thus a server chip in a desktop PC case). | hoistbypetard wrote: | Is that Threadripper in anything like the "same power | envelope"? | ollien wrote: | If I'm reading the graph in the press release right, M1 Ultra | will have a TDP of 60W, right? A 3990X has a TDP of 280W. I | know TDP != power draw, and that everyone calculates TDP | differently, but looking purely at orders of magnitude, no, | it's not even close. | [deleted] | eloff wrote: | "in the same power envelope" is a pretty big caveat. Desktop | chips aren't very optimized for power consumption. | | I'd like to see the actual performance comparison. | adfgadfgaery wrote: | That line is blatantly dishonest, but not for the reasons you | pointed out. While the i9-12900K is a 16-core processor, it | uses Intel's version of big.LITTLE. Eight of its 16 cores are | relatively low performance 'E' cores. This means it has only | half the performance cores of the M1 Ultra, yet it achieves 3/4 | of the performance by Apple's own graphic. | | Alder Lake has been repeatedly shown to outperform M1 core-per- | core. The M1 Ultra is just way bigger. (And way more power | efficient, which is a tremendous achievement for laptops but | irrelevant for desktops.) | j_d_b wrote: | M1 has the most powerful chip ever yet it still can't handle two | monitors. | neogodless wrote: | This is not relevant to the Apple M1 Ultra. | | From the Mac Studio technical specifications | | > Simultaneously supports up to five displays: | | > Support for up to four Pro Display XDRs (6K resolution at | 60Hz and over a billion colors) over USB-C and one 4K display | (4K resolution at 60Hz and over a billion colors) over HDMI ___________________________________________________________________ (page generated 2022-03-08 23:00 UTC)