[HN Gopher] Modern CPUs have a backstage cast
       ___________________________________________________________________
        
       Modern CPUs have a backstage cast
        
       Author : hlandau
       Score  : 122 points
       Date   : 2023-05-30 17:16 UTC (5 hours ago)
        
 (HTM) web link (www.devever.net)
 (TXT) w3m dump (www.devever.net)
        
       | chasil wrote:
       | "...this is interesting is because POWER9 is basically the first
       | time the public got a real view of how sophisticated the
       | backstage cast actually is of a modern server CPU."
       | 
       | Not quite correct; the OpenSPARC T1 and T2 were publicly released
       | and available by 2008.
       | 
       | https://www.oracle.com/servers/technologies/opensparc.html
       | 
       | "Large parts of this process are handled by vendor-supplied
       | mystery firmware blobs, which may as well be boxes with "???"
       | written in them.
       | 
       | The maintainers of the me_cleaner script likely have the clearest
       | view of what is known.
       | 
       | https://github.com/corna/me_cleaner
        
         | hlandau wrote:
         | >Not quite correct; the OpenSPARC T1 and T2 were publicly
         | released and available by 2008.
         | 
         | Points for mentioning this! But things have come a long way
         | since 2008. You can get Intel ME-less machines from the 2008
         | era. Not sure if OpenSPARC T2 has any management cores.
         | 
         | >The maintainers of the me_cleaner script likely have the
         | clearest view of what is known.
         | 
         | Yep, absolutely. Much of what we know is thanks to the efforts
         | of researchers like these. See also the talks on finding the
         | 'Red Unlock' mode of modern Intel CPUs.
        
           | jakeogh wrote:
           | Microcode access on Atom
           | https://www.youtube.com/watch?v=5Pq1FmxS6H8
        
       | kccqzy wrote:
       | > It's responsible for initialising the chip and getting it out
       | of bed enough to the point where at least one of the main cores
       | can run using cache-as-RAM mode
       | 
       | The somewhat surprising but true implication is that on boot, the
       | CPU is initialized before the RAM is initialized. So there is a
       | window of time during boot when the main core on the CPU is
       | running instructions that cannot access the RAM. Even on
       | register-starved x86 it is possible to write code without using
       | RAM, but it certainly seems more convenient to me to treat the
       | cache as RAM.
       | 
       | Documentation for a special compiler that compiles to code that
       | doesn't use RAM:
       | https://github.com/wt/coreboot/blob/master/util/romcc/romcc....
        
         | notacoward wrote:
         | I got some exposure to this at SiCortex, where we had our own
         | MIPS-based processors and so had to do many of these things in
         | software. There was one ColdFire (embedded 68K) processor
         | running mClinux per board, plus 27 of our own. This "Module
         | Service Processor" would boot first, fetch a boot image from
         | the one-per-system "System Service Processor" (pretty generic
         | PC), load that _via JTAG_ into each node 's cache, then finally
         | set each one loose to do things like memory registration and
         | interconnect setup. This all set the stage for the actual Linux
         | boot, which itself involved two stages with a switch_root in
         | between. My very first assignment was to work on some of that
         | MSP-to-node stuff, then later I had to dive into memory
         | registration at least twice, even though both were pretty far
         | from my real specialty. Small company, y'know.
         | 
         | This kind of low-level work is significantly more complicated
         | than even most kernel developers realize - hence the need for
         | articles like OP. Ditto for anything on large (more than
         | single-board) systems. The intersection of the two was,
         | frankly, a bit exhausting. Just keeping track of all the moving
         | parts and their respective states induced a cognitive load that
         | made debugging other already-hard problems that much more
         | difficult. My hat's off for anyone who has kept on doing that
         | stuff longer than I did, or who has to do it in an environment
         | where vendors are keeping so many secrets.
        
           | lgg wrote:
           | This stuff is certainly pretty rarified these days. I
           | remember when the PPC970 came out people were shocked how
           | difficult it was to bootstrap. IBM didn't really care as
           | POWER4 (from which it was derived) was not a merchant chip
           | and they had management processors (and very high margins) in
           | all their machines to handle it. Apple was the launch partner
           | and even back then had a lot of in house expertise doing this
           | sort of work. Everyone else who tried to use it was in for
           | some real pain and most of them gave up. The guys doing the
           | eval boards with support from IBM literally posted this: http
           | s://web.archive.org/web/20060715134515/http://www.970eva...
           | 
           | TL;DR, the last line is "Once all of the above is completed,
           | the processor will be able to successfully fetch instructions
           | from a boot source. You are now effectively at the same point
           | you would have been 5 months ago, had this been a standard
           | 750 bringup... Board bringup from this point should be very
           | straightforward and follow established methods."
        
             | notacoward wrote:
             | That's an amazing document. Practically every sentence,
             | though tersely stated, hints at hours (or worse) of
             | experimentation and head-scratching. The "would have been 5
             | months ago" bit at the end is remarkably restrained. I'm
             | certain I would have quit (or worse) by that point. Respect
             | and condolences to whoever did this.
        
           | jacquesm wrote:
           | I had to do some of this while bringing up a 486 to run my
           | own kernel. Very frustrating, to the point that I had the
           | reset switch of the machine wired to a sustain pedal just so
           | that I didn't have to dive under the desk all the time.
        
         | intelVISA wrote:
         | CAR is a gem. It's great for lite OSes in hostile envs.
        
           | derefr wrote:
           | Funny enough, a modern CPU doing CAR still has more memory
           | than a PC from the 1980s. Presuming you statically recompiled
           | them, you could run entire SNES games from a modern CPU's
           | cache!
           | 
           | (And that being said, now I'm wondering whether you could
           | force eviction and retainment into L3 cache on demand, to
           | achieve something like memory bank switching...)
        
             | JonathonW wrote:
             | SNES games? You could comfortably run Windows 95 within the
             | L3 cache on many recent Intel processors (the one in my
             | 2019-era MBP has 16 MB of L3 onboard; current generation
             | processors go even bigger and Windows 95 only needs 4 MB).
             | 
             | It's not really clear to me from the limited bits of info
             | that I've read whether or not L3 is guaranteed to be
             | accessible when doing CAR, but, if it is, you've got enough
             | memory available to do a lot of stuff. (And even the L2
             | cache is starting to get pretty big on the higher-end
             | current-gen chips.)
        
               | derefr wrote:
               | Well, keep in mind that in the sort of state the computer
               | is in when doing CAR, you don't get to talk to storage
               | devices; nor do you get the benefit of having some kind
               | of ROM on the bus. I know Windows 95 is happy to run from
               | 4MB RAM _with access to a hard drive_ ; but how much
               | memory would W95 need for a "bootable live-CD
               | environment" where the disk image must be resident (if
               | compressed) in memory along with all work RAM?
               | 
               | (This is why I compared to the SNES: if you have to map
               | the SNES's RAM _and_ [every bank of] the game 's ROM,
               | then you're looking at 4-16MB depending on the game. The
               | SNES is pretty much the newest console whose games would
               | entirely fit, I think.)
        
               | dfox wrote:
               | I think that the only thing that prevents you from
               | ignoring the memory controller and initializing the rest
               | of x86 board while still remaining in the CAR mode is the
               | sheer ridiculousness of doing that. As for whether you
               | have an memory-mapped ROM available I'm not exactly sure,
               | but the high-level model of what x86 firmware does seems
               | to imply, that the hardware maps an part of SPI Flash at
               | the address range where there was an ROM chip on the
               | 8086/286/386 PCs (the actual address ranges are
               | different).
        
               | justsomehnguy wrote:
               | About 35Mb IMSMR. Win98 could be stripped to around 50Mb
               | without a loss of functionality.
               | 
               | NB L3 is unified most of the time, but but with L2 you
               | still need to distinguish between data/code.
        
               | Arrath wrote:
               | Man now I want to see this.
        
               | hlandau wrote:
               | In fact, the largest POWER9 CPUs have up to 110MB of
               | L3... and Zen 4's L3 apparently maxes out at 384MB(!!).
        
       | dist-epoch wrote:
       | A modern motherboard can update it's BIOS from a USB stick
       | WITHOUT a CPU or memory installed.
       | 
       | Think about that. The motherboard "knows" how to read a FAT file
       | system from a USB mass storage device, verify it's digital
       | signature and flash it with no main CPU or memory.
        
         | wmf wrote:
         | I assume this is done with a microcontroller on the board.
        
           | sebazzz wrote:
           | And also born out necessity, given that many Intel and AMD
           | boards can't be booted with a too new CPU if the BIOS doesn't
           | know about it - not even for flashing a new BIOS - so you
           | needed to borrow an old CPU just for the sake of upgrading
           | the BIOS.
        
             | ls612 wrote:
             | It was originally to solve the issue where if you lost
             | power flashing your bios you'd brick the system
             | irrevocably. Now even if the bios is corrupt and the system
             | won't boot you can reflash a known good firmware with stock
             | settings to get back up and running.
        
           | chasil wrote:
           | The ARC processor was formerly in the northbridge of the
           | chipset.
           | 
           | Intel has since replaced this with an 80486 in modern
           | designs; perhaps it also is implemented in the northbridge.
           | 
           | https://en.wikipedia.org/wiki/ARC_(processor)
        
             | wmf wrote:
             | I think you're talking about the ME but I don't think the
             | ME is responsible for "BIOS" flashing. I think it must be a
             | separate microcontroller. This is kind of the point of the
             | original blog post: don't go looking for "the
             | microcontroller" because there isn't just one; there are
             | many.
        
       | Simplicitas wrote:
       | Doesn't mention the special core reserved for the NSA and other
       | national security agencies :-)
        
       | travisgriggs wrote:
       | I miss these kinds of articles on the net. Is anyone else
       | reminded of the CPU Praxis articles that were part of ARS
       | Technica's early rise to popularity? I really miss those. This
       | article, is of course, much shorter, but still, I miss that sort
       | of content on the internet.
        
       | JdeBP wrote:
       | As the author of https://superuser.com/a/347115/38062 and
       | https://superuser.com/a/345333/38062, you have my sympathy about
       | the "pack of lies" involving real mode and several wrong
       | combinations of selector and offset.
        
         | JdeBP wrote:
         | It's also worth adding that none of this is new. There's always
         | been a reason that the "C" in "CPU" has stood for "central".
         | The idea that there are other, non-central, processors around
         | the place goes back a long time.
         | 
         | Four particular ones come to mind:
         | 
         | * The DPT range of SCSI host bus adapter cards, many years ago,
         | had an full blown MC680x0 processor on the card.
         | 
         | * Connor Krukosky, who famously installed a mainframe in his
         | basement with a console front-end processor that was a PC
         | machine running OS/2.
         | 
         | * PC/AT keyboards had on-board microcontrollers running
         | programs.
         | 
         | * And of course who can forget the BBC Micro's Tube?
         | 
         | It's the short period in history where people thought that
         | computers came with only one processor that is the real oddity.
         | (-:
        
           | jacquesm wrote:
           | The Tube used the processor in the Tube as the CPU when it
           | was connected but otherwise the CPU was the CPU in the BBC
           | Micro itself. With the Tube CPUs connected (68K, Z80, 65C02,
           | 32016 and more) the BBC processor served as I/O processor.
           | 
           | The elegant and well adhered to OS calls made this a
           | straightforward process, if your program ran on the BBC
           | standalone it would work across the Tube for the 65(C)02, but
           | for other coprocessors you had to at a minimum recompile and
           | probably rewrite quite a bit of your code.
           | 
           | https://sites.google.com/site/jamesskingdom/Home/computers-e.
           | ..
           | 
           | In a typical PC there are > 10 actual processors in the
           | various peripheral and controller chips, and then there is
           | the management engine (a full blown computer in its own
           | right) or equivalent and usually almost every peripheral will
           | have one or more processors as well.
        
         | hlandau wrote:
         | Had to reverse engineer a real mode PCI option ROM once... that
         | was extremely unpleasant [1]. And then of course there's
         | "Unreal Mode".
         | 
         | Moreover Intel is just this week actually finally proposing
         | removing real mode. [2] I'm a bit worried for what this means
         | for emulation of old 16-bit Windows and DOS software under Wine
         | (one of the great ironies that Wine can still run Win16
         | programs on an x64 host OS when Windows can't) - though I
         | suspect the performance requirements of such software is so low
         | by modern standards that emulating such programs wouldn't pose
         | any challenge.
         | 
         | [1] https://www.devever.net/~hl/ortega [2]
         | https://www.phoronix.com/news/Intel-X86-S-64-bit-Only
        
           | JdeBP wrote:
           | See https://news.ycombinator.com/item?id=36074093 for a more
           | significant worry. Emulating a CPU is not affected as much as
           | code that would otherwise have still run on the bare
           | hardware.
        
       | shrubble wrote:
       | A while ago I bought some older AMD 8350 systems, which
       | apparently are the last without a PSP, the platform security
       | processor.
       | 
       | I did this as a sort of 'just in case' setup, was planning to put
       | OpenSolaris on it and run things under Zones or LX zones and to
       | run it as a backup server. Fast enough to get some work done and
       | possibly more secure if the PSP is ever used/broken
       | maliciously...
        
         | jacquesm wrote:
         | That may well end up being a very prescient move. Be prepared
         | to be labeled a tinfoil hat type until then, but I definitely
         | think you are wise to take a precaution.
        
       | buildbot wrote:
       | "Turtles all the way down" Modern CPUs are so complex you need
       | simpler ones to abstract it! Very cool breakdown of how power9
       | does this.
        
       | giuliomagnifico wrote:
       | I understood nothing (as a sysadmin) but this looks like a very
       | interesting article for who can understand it.
        
         | bicolao wrote:
         | I think you can see a modern CPU as a network. There are some
         | beefy servers doing all the heavy lifting which is what the
         | outsiders see. But there's also a few smaller servers here and
         | there monitoring the system (or even responsible for powering
         | on the entire network).
        
           | hlandau wrote:
           | Author here. This is very much the case for a computer system
           | as a whole also. Basically a network of cooperating
           | microprocessors, including in I/O peripherals etc.
           | 
           | PCIe in particular is literally a packet-switched computer
           | network - it has a physical layer, data link layer, and a
           | transaction layer which is basically packet switched. There
           | are even proprietary solutions for tunnelling PCIe over
           | Ethernet.
        
             | di4na wrote:
             | And you have smaller one that basically pxe boot the bigger
             | one and manage the power, cooling, etc. It is datacenters
             | all the way down.
             | 
             | As someone that used to do embedded, there is a reason i
             | felt most at home in erlang and elixir.
             | 
             | Their processes that share nothing and use message passing
             | was really close to how it looks to build and code for an
             | embedded platform.
        
             | p_l wrote:
             | To make it even funnier - Digital's last Alpha CPU, EV7,
             | which was essentially the ancestor of AMD K8 (which finally
             | brought "mesh" networking to mainstream PCs), actually had
             | IP-based internal management network!
             | 
             | Each EV7 computer had, instead of normal BMC, a bigger
             | management node connected to 10MBit ethernet hub (twisted
             | ethernet, fortunately :P), and this network was then
             | connected to things like I/O boards, power control, system
             | boards... including to each individual EV7 CPU. Each so
             | connected component had a small CPU with ethernet that was
             | responsible for interfacing their specific component to the
             | network, and when the system booted part of it involved
             | prodding the CPUs over ethernet to put them into
             | appropriate halt state from which they could start booting.
        
       | wmf wrote:
       | Much of the openness of Power7/8/9 was _encouraged_ by Google who
       | wanted to have control over all the firmware, even the secret
       | firmware. I think Google is also auditing PSP /ME source code but
       | the public only sees the audit results.
        
       ___________________________________________________________________
       (page generated 2023-05-30 23:00 UTC)