[HN Gopher] Modern CPUs have a backstage cast ___________________________________________________________________ Modern CPUs have a backstage cast Author : hlandau Score : 122 points Date : 2023-05-30 17:16 UTC (5 hours ago) (HTM) web link (www.devever.net) (TXT) w3m dump (www.devever.net) | chasil wrote: | "...this is interesting is because POWER9 is basically the first | time the public got a real view of how sophisticated the | backstage cast actually is of a modern server CPU." | | Not quite correct; the OpenSPARC T1 and T2 were publicly released | and available by 2008. | | https://www.oracle.com/servers/technologies/opensparc.html | | "Large parts of this process are handled by vendor-supplied | mystery firmware blobs, which may as well be boxes with "???" | written in them. | | The maintainers of the me_cleaner script likely have the clearest | view of what is known. | | https://github.com/corna/me_cleaner | hlandau wrote: | >Not quite correct; the OpenSPARC T1 and T2 were publicly | released and available by 2008. | | Points for mentioning this! But things have come a long way | since 2008. You can get Intel ME-less machines from the 2008 | era. Not sure if OpenSPARC T2 has any management cores. | | >The maintainers of the me_cleaner script likely have the | clearest view of what is known. | | Yep, absolutely. Much of what we know is thanks to the efforts | of researchers like these. See also the talks on finding the | 'Red Unlock' mode of modern Intel CPUs. | jakeogh wrote: | Microcode access on Atom | https://www.youtube.com/watch?v=5Pq1FmxS6H8 | kccqzy wrote: | > It's responsible for initialising the chip and getting it out | of bed enough to the point where at least one of the main cores | can run using cache-as-RAM mode | | The somewhat surprising but true implication is that on boot, the | CPU is initialized before the RAM is initialized. So there is a | window of time during boot when the main core on the CPU is | running instructions that cannot access the RAM. Even on | register-starved x86 it is possible to write code without using | RAM, but it certainly seems more convenient to me to treat the | cache as RAM. | | Documentation for a special compiler that compiles to code that | doesn't use RAM: | https://github.com/wt/coreboot/blob/master/util/romcc/romcc.... | notacoward wrote: | I got some exposure to this at SiCortex, where we had our own | MIPS-based processors and so had to do many of these things in | software. There was one ColdFire (embedded 68K) processor | running mClinux per board, plus 27 of our own. This "Module | Service Processor" would boot first, fetch a boot image from | the one-per-system "System Service Processor" (pretty generic | PC), load that _via JTAG_ into each node 's cache, then finally | set each one loose to do things like memory registration and | interconnect setup. This all set the stage for the actual Linux | boot, which itself involved two stages with a switch_root in | between. My very first assignment was to work on some of that | MSP-to-node stuff, then later I had to dive into memory | registration at least twice, even though both were pretty far | from my real specialty. Small company, y'know. | | This kind of low-level work is significantly more complicated | than even most kernel developers realize - hence the need for | articles like OP. Ditto for anything on large (more than | single-board) systems. The intersection of the two was, | frankly, a bit exhausting. Just keeping track of all the moving | parts and their respective states induced a cognitive load that | made debugging other already-hard problems that much more | difficult. My hat's off for anyone who has kept on doing that | stuff longer than I did, or who has to do it in an environment | where vendors are keeping so many secrets. | lgg wrote: | This stuff is certainly pretty rarified these days. I | remember when the PPC970 came out people were shocked how | difficult it was to bootstrap. IBM didn't really care as | POWER4 (from which it was derived) was not a merchant chip | and they had management processors (and very high margins) in | all their machines to handle it. Apple was the launch partner | and even back then had a lot of in house expertise doing this | sort of work. Everyone else who tried to use it was in for | some real pain and most of them gave up. The guys doing the | eval boards with support from IBM literally posted this: http | s://web.archive.org/web/20060715134515/http://www.970eva... | | TL;DR, the last line is "Once all of the above is completed, | the processor will be able to successfully fetch instructions | from a boot source. You are now effectively at the same point | you would have been 5 months ago, had this been a standard | 750 bringup... Board bringup from this point should be very | straightforward and follow established methods." | notacoward wrote: | That's an amazing document. Practically every sentence, | though tersely stated, hints at hours (or worse) of | experimentation and head-scratching. The "would have been 5 | months ago" bit at the end is remarkably restrained. I'm | certain I would have quit (or worse) by that point. Respect | and condolences to whoever did this. | jacquesm wrote: | I had to do some of this while bringing up a 486 to run my | own kernel. Very frustrating, to the point that I had the | reset switch of the machine wired to a sustain pedal just so | that I didn't have to dive under the desk all the time. | intelVISA wrote: | CAR is a gem. It's great for lite OSes in hostile envs. | derefr wrote: | Funny enough, a modern CPU doing CAR still has more memory | than a PC from the 1980s. Presuming you statically recompiled | them, you could run entire SNES games from a modern CPU's | cache! | | (And that being said, now I'm wondering whether you could | force eviction and retainment into L3 cache on demand, to | achieve something like memory bank switching...) | JonathonW wrote: | SNES games? You could comfortably run Windows 95 within the | L3 cache on many recent Intel processors (the one in my | 2019-era MBP has 16 MB of L3 onboard; current generation | processors go even bigger and Windows 95 only needs 4 MB). | | It's not really clear to me from the limited bits of info | that I've read whether or not L3 is guaranteed to be | accessible when doing CAR, but, if it is, you've got enough | memory available to do a lot of stuff. (And even the L2 | cache is starting to get pretty big on the higher-end | current-gen chips.) | derefr wrote: | Well, keep in mind that in the sort of state the computer | is in when doing CAR, you don't get to talk to storage | devices; nor do you get the benefit of having some kind | of ROM on the bus. I know Windows 95 is happy to run from | 4MB RAM _with access to a hard drive_ ; but how much | memory would W95 need for a "bootable live-CD | environment" where the disk image must be resident (if | compressed) in memory along with all work RAM? | | (This is why I compared to the SNES: if you have to map | the SNES's RAM _and_ [every bank of] the game 's ROM, | then you're looking at 4-16MB depending on the game. The | SNES is pretty much the newest console whose games would | entirely fit, I think.) | dfox wrote: | I think that the only thing that prevents you from | ignoring the memory controller and initializing the rest | of x86 board while still remaining in the CAR mode is the | sheer ridiculousness of doing that. As for whether you | have an memory-mapped ROM available I'm not exactly sure, | but the high-level model of what x86 firmware does seems | to imply, that the hardware maps an part of SPI Flash at | the address range where there was an ROM chip on the | 8086/286/386 PCs (the actual address ranges are | different). | justsomehnguy wrote: | About 35Mb IMSMR. Win98 could be stripped to around 50Mb | without a loss of functionality. | | NB L3 is unified most of the time, but but with L2 you | still need to distinguish between data/code. | Arrath wrote: | Man now I want to see this. | hlandau wrote: | In fact, the largest POWER9 CPUs have up to 110MB of | L3... and Zen 4's L3 apparently maxes out at 384MB(!!). | dist-epoch wrote: | A modern motherboard can update it's BIOS from a USB stick | WITHOUT a CPU or memory installed. | | Think about that. The motherboard "knows" how to read a FAT file | system from a USB mass storage device, verify it's digital | signature and flash it with no main CPU or memory. | wmf wrote: | I assume this is done with a microcontroller on the board. | sebazzz wrote: | And also born out necessity, given that many Intel and AMD | boards can't be booted with a too new CPU if the BIOS doesn't | know about it - not even for flashing a new BIOS - so you | needed to borrow an old CPU just for the sake of upgrading | the BIOS. | ls612 wrote: | It was originally to solve the issue where if you lost | power flashing your bios you'd brick the system | irrevocably. Now even if the bios is corrupt and the system | won't boot you can reflash a known good firmware with stock | settings to get back up and running. | chasil wrote: | The ARC processor was formerly in the northbridge of the | chipset. | | Intel has since replaced this with an 80486 in modern | designs; perhaps it also is implemented in the northbridge. | | https://en.wikipedia.org/wiki/ARC_(processor) | wmf wrote: | I think you're talking about the ME but I don't think the | ME is responsible for "BIOS" flashing. I think it must be a | separate microcontroller. This is kind of the point of the | original blog post: don't go looking for "the | microcontroller" because there isn't just one; there are | many. | Simplicitas wrote: | Doesn't mention the special core reserved for the NSA and other | national security agencies :-) | travisgriggs wrote: | I miss these kinds of articles on the net. Is anyone else | reminded of the CPU Praxis articles that were part of ARS | Technica's early rise to popularity? I really miss those. This | article, is of course, much shorter, but still, I miss that sort | of content on the internet. | JdeBP wrote: | As the author of https://superuser.com/a/347115/38062 and | https://superuser.com/a/345333/38062, you have my sympathy about | the "pack of lies" involving real mode and several wrong | combinations of selector and offset. | JdeBP wrote: | It's also worth adding that none of this is new. There's always | been a reason that the "C" in "CPU" has stood for "central". | The idea that there are other, non-central, processors around | the place goes back a long time. | | Four particular ones come to mind: | | * The DPT range of SCSI host bus adapter cards, many years ago, | had an full blown MC680x0 processor on the card. | | * Connor Krukosky, who famously installed a mainframe in his | basement with a console front-end processor that was a PC | machine running OS/2. | | * PC/AT keyboards had on-board microcontrollers running | programs. | | * And of course who can forget the BBC Micro's Tube? | | It's the short period in history where people thought that | computers came with only one processor that is the real oddity. | (-: | jacquesm wrote: | The Tube used the processor in the Tube as the CPU when it | was connected but otherwise the CPU was the CPU in the BBC | Micro itself. With the Tube CPUs connected (68K, Z80, 65C02, | 32016 and more) the BBC processor served as I/O processor. | | The elegant and well adhered to OS calls made this a | straightforward process, if your program ran on the BBC | standalone it would work across the Tube for the 65(C)02, but | for other coprocessors you had to at a minimum recompile and | probably rewrite quite a bit of your code. | | https://sites.google.com/site/jamesskingdom/Home/computers-e. | .. | | In a typical PC there are > 10 actual processors in the | various peripheral and controller chips, and then there is | the management engine (a full blown computer in its own | right) or equivalent and usually almost every peripheral will | have one or more processors as well. | hlandau wrote: | Had to reverse engineer a real mode PCI option ROM once... that | was extremely unpleasant [1]. And then of course there's | "Unreal Mode". | | Moreover Intel is just this week actually finally proposing | removing real mode. [2] I'm a bit worried for what this means | for emulation of old 16-bit Windows and DOS software under Wine | (one of the great ironies that Wine can still run Win16 | programs on an x64 host OS when Windows can't) - though I | suspect the performance requirements of such software is so low | by modern standards that emulating such programs wouldn't pose | any challenge. | | [1] https://www.devever.net/~hl/ortega [2] | https://www.phoronix.com/news/Intel-X86-S-64-bit-Only | JdeBP wrote: | See https://news.ycombinator.com/item?id=36074093 for a more | significant worry. Emulating a CPU is not affected as much as | code that would otherwise have still run on the bare | hardware. | shrubble wrote: | A while ago I bought some older AMD 8350 systems, which | apparently are the last without a PSP, the platform security | processor. | | I did this as a sort of 'just in case' setup, was planning to put | OpenSolaris on it and run things under Zones or LX zones and to | run it as a backup server. Fast enough to get some work done and | possibly more secure if the PSP is ever used/broken | maliciously... | jacquesm wrote: | That may well end up being a very prescient move. Be prepared | to be labeled a tinfoil hat type until then, but I definitely | think you are wise to take a precaution. | buildbot wrote: | "Turtles all the way down" Modern CPUs are so complex you need | simpler ones to abstract it! Very cool breakdown of how power9 | does this. | giuliomagnifico wrote: | I understood nothing (as a sysadmin) but this looks like a very | interesting article for who can understand it. | bicolao wrote: | I think you can see a modern CPU as a network. There are some | beefy servers doing all the heavy lifting which is what the | outsiders see. But there's also a few smaller servers here and | there monitoring the system (or even responsible for powering | on the entire network). | hlandau wrote: | Author here. This is very much the case for a computer system | as a whole also. Basically a network of cooperating | microprocessors, including in I/O peripherals etc. | | PCIe in particular is literally a packet-switched computer | network - it has a physical layer, data link layer, and a | transaction layer which is basically packet switched. There | are even proprietary solutions for tunnelling PCIe over | Ethernet. | di4na wrote: | And you have smaller one that basically pxe boot the bigger | one and manage the power, cooling, etc. It is datacenters | all the way down. | | As someone that used to do embedded, there is a reason i | felt most at home in erlang and elixir. | | Their processes that share nothing and use message passing | was really close to how it looks to build and code for an | embedded platform. | p_l wrote: | To make it even funnier - Digital's last Alpha CPU, EV7, | which was essentially the ancestor of AMD K8 (which finally | brought "mesh" networking to mainstream PCs), actually had | IP-based internal management network! | | Each EV7 computer had, instead of normal BMC, a bigger | management node connected to 10MBit ethernet hub (twisted | ethernet, fortunately :P), and this network was then | connected to things like I/O boards, power control, system | boards... including to each individual EV7 CPU. Each so | connected component had a small CPU with ethernet that was | responsible for interfacing their specific component to the | network, and when the system booted part of it involved | prodding the CPUs over ethernet to put them into | appropriate halt state from which they could start booting. | wmf wrote: | Much of the openness of Power7/8/9 was _encouraged_ by Google who | wanted to have control over all the firmware, even the secret | firmware. I think Google is also auditing PSP /ME source code but | the public only sees the audit results. ___________________________________________________________________ (page generated 2023-05-30 23:00 UTC)