[HN Gopher] Intel and AMD Contemplate Different Replacements for...
       ___________________________________________________________________
        
       Intel and AMD Contemplate Different Replacements for x86 Interrupt
       Handling
        
       Author : eklitzke
       Score  : 140 points
       Date   : 2021-06-04 18:10 UTC (4 hours ago)
        
 (HTM) web link (www.eejournal.com)
 (TXT) w3m dump (www.eejournal.com)
        
       | korethr wrote:
       | Somewhat off topic from the main thread of the article, but I
       | have always wondered about the multiple privilege levels. What's
       | the expected/intended use for them? The only thing I think of is
       | separating out hardware drivers (assuming ring 1 can still
       | directly read/write I/O ports or memory addresses mapped to
       | hardware) so they can't crash the kernel should the drivers or
       | hardware turn out to be faulty. But I don't think I've ever heard
       | of such a design being used in practice. It seems everyone throws
       | their driver code into ring 0 with the rest of the kernel, and if
       | the driver or hardware faults and takes the kernel with it, too
       | bad, so sad. Ground RESET and start over.
       | 
       | What I find myself wondering is _why_? It seems like a good idea
       | on paper, at least. Is it just a hangover from other CPU
       | architectures that only had privileged /unprivileged modes, and
       | programmers just ended up sticking with what they were already
       | familiar and comfortable with? Was there some painful gotcha
       | about multiple privilege modes that made them impractical to use,
       | like the time overhead of switching privilege levels made it
       | impossible to meet some hardware deadline? Silicon-level bugs?
       | Something else?
        
         | monocasa wrote:
         | It works with the call gates. You could have a sort of nested
         | microkernel idea if it wasn't such a mess with each ring able
         | to include the lower privileged ring's address spaces. And not
         | just a free for all, but the kernel really is just the control
         | plane, but can set up all sorts of per process descriptor
         | tables (the LDTs).
         | 
         | So you'd have a tiny kernel at ring 0 which could R/W
         | everything but wasn't responsible for much.
         | 
         | Under that you'd have drivers at ring 1 that can't see the
         | kernel or other drivers, but can R/W user code at rings 2 and
         | 3.
         | 
         | Under that you'd have system daemons at ring 2 that can R/W
         | regular programs but not other daemons at the same level, nor
         | drivers or the kernel.
         | 
         | And then under that you had regular processes at ring 0 that
         | have generally the same semantics as today's processes.
         | 
         | Each process of any ring can export a syscall like table
         | through the call gates, and so user code could directly invoke
         | drivers or daemons without going throught the kernel at all.
         | Basically IPC with about the same overhead as a C++ virtual
         | method call.
         | 
         | So what happened? The exact underlying semantics didn't exactly
         | match many OSs anyone wanted to build (particularly OSs that
         | anyone cared about in the late 80s early 90s). And you can
         | enforce similar semantics all in software with the exception of
         | the cheap IPC anyway.
        
         | dmitrygr wrote:
         | I think OS/2 used every feature x86 had, including call gates
         | and at least 3 privilege levels (0, 2, 3). That is why OS/2 is
         | such a good test for any aspiring x86 emulator developer.
        
       | vlovich123 wrote:
       | Given the multi core, NUMA and Spectre/meltdown reality we're
       | living in, and the clear benefits of the io_uring approach, why
       | not just have a dedicated core(s) to handle "interrupts" which
       | are nothing more than entries in a shared memory table?
        
         | devit wrote:
         | There must be a way to alter a core's instruction pointer from
         | another core or from hardware to support killing processes
         | running untrusted machine code, and to support pre-emptive
         | multithreading without needing to have compilers add a check
         | for preemption on all backward branches and calls.
         | 
         | These features are very worth the hassle of providing this
         | capability (which is known as "IPI"s), and once you have that
         | hardware interrupts become pretty much free to support by using
         | the same capability and the OS/user can then decide whether to
         | dedicate a core, to affinity them to a core, to load balance
         | them among all cores or to disable the interrupts and poll
         | instead.
        
           | vlovich123 wrote:
           | I was thinking rather than mucking with instruction pointers
           | you would just send a message back to the other CPU saying
           | "pause & switch to context X". Technically an interrupt but
           | one that can be handled internally within the CPU.
        
         | [deleted]
        
         | electricshampo1 wrote:
         | This is essentially the approach taken by
         | 
         | https://www.dpdk.org/ (network) and https://spdk.io/ (storage)
         | 
         | Anything trying to squeeze perf doing IO intensive work should
         | switch to this model (context permitting of course).
        
         | knz42 wrote:
         | This is the approach proposed in
         | http://doi.org/10.1109/TPDS.2015.2492542
         | 
         | (preprint:
         | https://science.raphael.poss.name/pub/poss.15.tpds.pdf )
        
           | DSingularity wrote:
           | And old systems like Corey OS.
        
         | eklitzke wrote:
         | This approach works for I/O devices (and for things like
         | network cards the kernel will typically poll them anyway), but
         | I/O isn't the only thing that generates interrupts. For
         | instance, a processor fault (e.g. divide by zero) should be
         | handled immediately and synchronously since the CPU core
         | generating the fault can't do any useful work until the fault
         | is handled.
        
           | vlovich123 wrote:
           | Is that actually true? Wouldn't this imply you could launch a
           | DOS attack against cloud providers just generating divisions
           | by zero?
        
             | bonzini wrote:
             | You would only attack yourself. The CPU time you're paying
             | for would be spent processing division by zero exceptions.
        
               | vlovich123 wrote:
               | Then the CPU isn't stopping and is moving on doing other
               | work. Meaning the divide by zero could be processed by a
               | background CPU and doesn't require immediate handling.
               | Same for page faults.
        
               | bananabreakfast wrote:
               | Incorrect. The "CPU" is stopping and handling the fault.
               | There is no background CPU from your perspective. In a
               | cloud provider you are always in a virtual environment
               | and using a vCPU which is constantly being preempted by
               | the hypervisor.
               | 
               | You cannot DOS a hypervisor just by bogging down your
               | virtualized kernel.
        
               | bonzini wrote:
               | Your instance can still be preempted by the hypervisor.
        
             | imtringued wrote:
             | A site that renders HTML is more computationally expensive
             | than handling a site that does nothing but division by
             | zero.
        
           | justincormack wrote:
           | Faults for divide by zero are a terrible legacy thing, Arm
           | etc do not do this, you test for zero if you want to,
           | otherwise you get a value.
        
             | DblPlusUngood wrote:
             | A better example: a page fault for a non-present page.
        
               | rwmj wrote:
               | At university we designed an architecture[1] where you
               | had to test for page not present yourself. It was all
               | about seeing if we could make a simpler architecture
               | where all interrupts could be handled synchronously, so
               | you'd never have to save and restore the pipeline. Also
               | division by zero didn't trap - you had to check before
               | dividing. IIRC the conclusion was it was possible but
               | somewhat tedious to write a compiler for[2], plus you had
               | to have a trusted compiler which is a difficult sell.
               | 
               | [1] But sadly didn't implement it in silicon! FPGAs were
               | much more primitive back then.
               | 
               | [2] TCG in modern qemu has similar concerns in that they
               | also need to worry about when code crosses page
               | boundaries, and they also have a kind of "trusted"
               | compiler (in as much as everything must go through TCG).
        
               | warkdarrior wrote:
               | Interesting. So what happens if the program does not test
               | for the page? Doesn't the processor have to handle that
               | as an exception of sorts?
        
               | vlovich123 wrote:
               | Could be handled by having the CPU switch to a different
               | process while the kernel CPU faults the data in.
        
               | ben509 wrote:
               | The CPU doesn't know what processes are, that's handled
               | by the OS. So there still needs to be a fault.
        
               | vlovich123 wrote:
               | You're thinking about computer architecture as designed
               | today. There's no reason there isn't a common data
               | structure defined that the CPU can use to select a backup
               | process, much how it uses page table data structures in
               | main memory to resolve TLB misses.
        
               | pjc50 wrote:
               | Just to make it explicit for the people having trouble,
               | the mechanism for switching processes in a pre-emptive
               | multitasking system _is_ interrupts.
        
               | DblPlusUngood wrote:
               | In some cases, yes (if there are other runnable threads
               | on this CPU's queue).
        
             | eklitzke wrote:
             | That was just an example, there are many other things the
             | CPU can do that will generate a fault (for example, trying
             | to execute an illegal instruction).
        
             | mschuster91 wrote:
             | Yes, but this one is so hard baked into everything that it
             | would kill any form of backwards compatibility.
        
           | bonzini wrote:
           | Or interprocessor interrupts for flushing the TLB,
           | terminating the scheduling quantum, or anything else.
        
         | bostonsre wrote:
         | Would guess it would be complicated and difficult to update the
         | kernel to support something like that. Not sure Linus would
         | entertain PRs for custom boards that do something like that.
         | Would think it would need to be an industry wide push for that.
         | But just speculation..
        
           | vlovich123 wrote:
           | We're talking about the CPU architecture here, not custom
           | one-off ARM boards. Think x86 or ARM not a Qualcomm SoC.
           | 
           | And yes of course. Linus' opinion would be needed.
        
       | bogomipz wrote:
       | The author states:
       | 
       | >'The processor nominally maintained four separate stacks (one
       | for each privilege level), plus a possible "shadow stack" for the
       | operating system or hypervisor.'
       | 
       | Can someone elaborate on what the "shadow stack" is and what it's
       | for exactly? This is the first time I've heard this nomenclature.
        
       | ChuckMcM wrote:
       | Argh, why do author's write stuff like this -- _" It is, not to
       | put too fine a point on it, a creaking old bit of wheezing
       | ironmongery that, had the gods of microprocessor architecture
       | been more generous, would have been smote into oblivion long
       | ago."_
       | 
       | Just because a technology is "old" doesn't mean it is useless, or
       | needs to be replaced. I'm all in favor of fixing problems, and
       | refactoring to improve flow and remove inefficiencies. I am _not_
       | a fan of re-inventing the wheel because gee, we 've had this
       | particular wheel for 50 years and its doing fine but hey let's
       | reimagine it anyway.
       | 
       | That said, the kink in x86 architecture was put their by "IBM PC
       | Compatibility" and a Windows/Intel monopoly that went on way too
       | long. But even knowing _why_ the thing has these weird artifacts
       | that just means the engineers are working under constraints you
       | don 't understand, doesn't give you license to dismiss what
       | they've done as needing to be "wiped away."
       | 
       | We are in a period where enthusiasts can design, build, and
       | operate a completely bespoke ISA and micro-architecture with
       | dense low cost FPGAs. Maybe they don't run at multi-GHz speeds
       | but if you want to contribute positively to the question of
       | computer architecture, there has never been a better time. You
       | don't even have to build the whole thing! You can just add it
       | into an existing architecture and compare how you do against it.
       | 
       | Want to do flow control colored register allocation for
       | speculative instruction retirement? You can build the entire
       | execution unit in an FPGA and throw instructions at it to your
       | hearts content and provide analysis of the results.
       | 
       | Okay, enough ranting. I want AARCH64 to win so we can reset the
       | problem set back to a smaller number of workarounds, but I think
       | the creativity of people trying to advance the x86 architecture
       | given the constraints is not something to belittled, it is to be
       | admired.
        
         | gumby wrote:
         | Also "smitten" is the passive past participle of "smite", not
         | "smote", which is the active. This bothered me through the
         | whole article.
         | 
         | I suppose the author is not a native English speaker. Like the
         | person who titled a film "honey I shrunk the kids". Makes me
         | wince to even type it.
        
           | Dylan16807 wrote:
           | I propose that they are quite familiar with English and want
           | to avoid the love-related connotations of "smitten".
        
           | Scene_Cast2 wrote:
           | It might be there to put a playful twist on things. Kind of
           | like "shook" (slang) or "woke" (when it first appeared).
        
             | failwhaleshark wrote:
             | _And I woke from a terrible dream_
             | 
             |  _So I caught up my pal Jack Daniel 's_
             | 
             |  _And his partner Jimmy Beam_
             | 
             | https://www.independent.co.uk/news/uk/home-news/woke-
             | meaning...
             | 
             |  _AC /DC - You Shook Me All Night Long (Official Video)_
             | 
             | https://youtu.be/Lo2qQmj0_h4
             | 
             | (Rock n' roll stole the best vernacular.)
        
               | gumby wrote:
               | Those are both correct ("conventional" to descriptivists
               | like me) uses of the respective terms.
        
           | thanatos519 wrote:
           | Ah, "not a native English speaker", usual written as
           | "American".
           | 
           | </troll>
        
         | coldtea wrote:
         | > _Just because a technology is "old" doesn't mean it is
         | useless, or needs to be replaced._
         | 
         | Sure. But if on top of old it is "a creaking wheezing
         | ironmonger" that "had the gods of microprocessor architecture
         | been more generous, would have been smote into oblivion long
         | ago", then it does need to be replaced.
         | 
         | And both the article mentions reasons why (they don't stop at
         | mere old) and Intel/AMD share them.
        
         | nwmcsween wrote:
         | > Just because a technology is "old" doesn't mean it is
         | useless, or needs to be replaced. I'm all in favor of fixing
         | problems, and refactoring to improve flow and remove
         | inefficiencies. I am not a fan of re-inventing the wheel
         | because gee, we've had this particular wheel for 50 years and
         | its doing fine but hey let's reimagine it anyway.
         | 
         | Can't get promotions if you don't NIH a slow half broken
         | postgres
        
           | edoceo wrote:
           | > slow half broken postgres
           | 
           | Don't bring Postgres into this. Its fast and closer to zero-
           | broken than to half-broken. ;)
        
             | dkersten wrote:
             | I think you misunderstood the comment. It wasn't calling
             | Postgres slow and half broken, it was making a stab at
             | other home-brewed (NIH, Not Invented Here
             | https://en.wikipedia.org/wiki/Not_invented_here) databases,
             | calling them slow, half baked copies of postgres, implying
             | that they should have just used postgres, but that doing so
             | wouldn't get them promotions.
        
         | justicezyx wrote:
         | I am doubting that if you walk over the aisle to the engineers
         | in the other team, and state an obscure legacy issue to
         | him/her. That person would, with decent chance, say something
         | like "how could it be?" "was the original engineer
         | dumb/imcompetent"? "was the original team got reorged"? etc....
         | 
         | Not saying the tech journalist is better in any sense. But
         | let's be honest, there is no reason they should be doing
         | better...
        
         | failwhaleshark wrote:
         | Sssh! "Old = bad" means job security for millions of engineers.
         | We need yet another security attack surface, I mean "improved
         | interrupt handling."
         | 
         | I don't understand why people, engineers of all people, fall
         | for the Dunning-Kruger/NIH/ageism-in-all-teh-things consumerism
         | fallacy that everything else that came before is dumb,
         | unusable, or can _always_ be done better.
         | 
         | Code magically rusts after being exposed to air for 9 months,
         | donja know? If it's not trivially-edited every 2 months, it's a
         | "dead" project.
        
           | lazide wrote:
           | Part of it I think is that for many people, it's more fun to
           | build something than to maintain something. It's also easier
           | to write code than it is to read it (most of the time).
           | 
           | So why not do the fun and easy thing? Especially if they
           | aren't the one writing the checks!
        
         | lazide wrote:
         | Well, no one gets promoted/a raise by writing 'and everything
         | is actually fine' right?
         | 
         | On the engineering side, similar more often than not. You get
         | promoted by solving the 'big problem'. The really enterprising
         | (and ones you need to watch) often figure out how to make the
         | big problem the one they are trying/well situated to solve -
         | even if it isn't really a problem.
        
           | Sebb767 wrote:
           | > Well, no one gets promoted/a raise by writing 'and
           | everything is actually fine' right?
           | 
           | Well, that's actually not true. There are quite a few live
           | coaches and the likes making their living on positive
           | writing. And there a quite a few writers, like Scott
           | Alexander, which, while not being all positive, definitely
           | don't need to or want to paint an overly dark or dramatic
           | picture.
           | 
           | In the more conventional news sector, on the other hand, this
           | is probably true.
        
         | herpderperator wrote:
         | > we've had this particular wheel for 50 years and its doing
         | fine but hey let's reimagine it anyway
         | 
         | How is it doing fine? Apple is doing laps over Intel because --
         | it seems -- of their choice to use ARM. Would they have been
         | able to design an x86/amd64 CPU just as good?
        
           | yyyk wrote:
           | >Apple is doing laps over Intel because -- it seems -- of
           | their choice to use ARM.
           | 
           | AMD processors are essentially equivalent to M1 in
           | performance while still keeping x86, so probably yes (there
           | are an ARM advantage in instruction decoding, but judging by
           | performance differences, it's probably small).
           | 
           | Apple's advantage is mostly that Apple (via TSMC) can use a
           | smaller processor node than Intel and optimize the entire Mac
           | software stack for their processors.
        
             | Someone wrote:
             | > and optimize the entire Mac software stack for their
             | processors.
             | 
             | And vice versa.
        
         | cycomanic wrote:
         | I would argue the author is not really saying "old" is "bad",
         | but instead that we have been piling more and more craft onto
         | the old so it has now become a significant engineering exercise
         | to work around the idiosyncrasies of the old system every time
         | you want to do something else.
         | 
         | To use your wheel analogy. It's sort of like starting with the
         | wheel of horse cart and adding bits and pieces to that same
         | wheel to make it somehow work as the landing gear wheel for a
         | jumbo jet. At some point it might be a good idea to simply
         | design a new wheel.
        
       | gwbas1c wrote:
       | Why continue with x86? Given how popular ARM is, why not just
       | join the trend?
        
         | ronsor wrote:
         | x86 is not going anywhere for performance computing, backwards
         | compatibility, and plethora of other reasons.
        
       | young_unixer wrote:
       | Linus' opinion:
       | https://www.realworldtech.com/forum/?threadid=200812&curpost...
        
         | bryanlarsen wrote:
         | tldr: AMD is "fix the spec bugs". Intel is "replace with better
         | approach". Linus: do both, please!
        
         | bogomipz wrote:
         | Thanks for posting this link. I was curious about a, b and d:
         | 
         | >"(a) IDT itself is a horrible nasty format and you shouldn't
         | have to parse memory in odd ways to handle exceptions. It was
         | fundamentally bad from the 80286 beginnings, it got a tiny bit
         | harder to parse for 32-bit, and it arguably got much worse in
         | x86-64."
         | 
         | What is it about IDT that requires parsing memory in odd ways?
         | What is odd about it?
         | 
         | >"(b) %rsp not being restored properly by return-to-user mode."
         | 
         | Does anyone know why this is? Is this a historical accident or
         | something else?
         | 
         | >"(d) several bad exception nesting problems (NMI, machine
         | checks and STI-shadow handling at the very least)" Is this one
         | these two exceptions are nested together or is this an issue
         | when either one of these is present in the interrupt chain? Is
         | there any good documentation on this?
        
       | PopePompus wrote:
       | Since cloud servers are a bigger market than users who want to
       | run an old copy of VisiCalc, why doesn't either Intel or AMD
       | produce a processor line that has none of the old 16 and 32 bit
       | architectures (and long-forgotten vector extensions), implemented
       | in silicon? Why not just make a clean (or as clean as possible)
       | 64 bit x86 processor?
        
         | jandrewrogers wrote:
         | Intel did this with the cores for the Xeon Phi. While it was
         | x86 compatible, they removed a bunch of the legacy modes.
        
         | th3typh00n wrote:
         | Because the number of transistors used for that functionality
         | is absolutely negligible, so removing it has virtually no
         | benefit.
        
           | Symmetry wrote:
           | The number of transistors sure. The engineer time to design
           | new features that don't interfere with old features is high.
           | The verification time to make sure every combination of
           | features plays sensibly together is extremely high. To the
           | extent that Intel and AMD are limited by the costs of
           | employing and organizing large numbers of engineers it's a
           | big deal. Though that's also the reason they'll never make a
           | second, simplified, core.
        
             | failwhaleshark wrote:
             | It's never going to happen. The ISA is a hardware contract.
        
               | PopePompus wrote:
               | When things get to the point where AMD is considering
               | making nonmaskable interrupts maskable (as the article
               | states), maybe it's time to invoke "force majeure".
        
           | PopePompus wrote:
           | Even so, doesn't having a more complex instruction set,
           | festooned with archaic features needed by very few users,
           | increase the attack surface for hacking exploits and increase
           | the likelyhood of bugs being present? Isn't it a bad thing
           | that the full boot process is understood in depth by only a
           | tiny fraction of the persons programming for x86 systems (I'm
           | certainly not one of them)?
        
             | jcranmer wrote:
             | Not really. Basically none of these features can be used
             | outside of the kernel anyways, which means that the
             | attacker already has _far_ more powerful capabilities they
             | can employ.
        
           | failwhaleshark wrote:
           | Yep. Benefit = The cost of minimalism at every expense - the
           | cost of incompatibility (in zillions): breaking things that
           | cannot be rebuilt, breaking every compiler, breaking every
           | debugger, breaking every disassembler, adding more feature
           | flags, and it's no longer the Intel 64 / EMT-32 ISA. Hardware
           | != software.
        
         | lizknope wrote:
         | You mean like the Intel i860?
         | 
         | https://en.wikipedia.org/wiki/Intel_i860
         | 
         | Or the Intel Itanium?
         | 
         | https://en.wikipedia.org/wiki/Itanium
         | 
         | Or the AMD Am29000?
         | 
         | https://en.wikipedia.org/wiki/AMD_Am29000
         | 
         | Or the AMD K12 which was a 64-bit ARM?
         | 
         | https://www.anandtech.com/show/7990/amd-announces-k12-core-c...
         | 
         | All of these things were either rejected by the market or
         | didn't even make it to the market.
         | 
         | Binary compatibility is one of the major if not the major
         | reason that x86 has hung around so long. In the 1980's and 90's
         | x86 was slower than the RISC workstation competitors but Intel
         | and AMD really took the performance crown around 2000.
        
           | fredoralive wrote:
           | I think he's suggesting something more like the 80376, an
           | obscure embedded 386 that booted straight into protected
           | mode. So you'd have an x86-64 CPU that boots straight into
           | Long Mode and thus could remove stuff like real mode and
           | virtual 8086 mode. AFAIK with UEFI it's the boot firmware
           | that handles switching to 32/ 64 bit mode, not the OS loader
           | or kernel, so it would be transparent to the OS and programs.
           | 
           | But in order to not break a lot of stuff on desktop Windows
           | (and ancient unmaintained custom software on corprate
           | servers) you'd still have to implement the "32 bit software
           | on 64 bit OS" support. That probably means you don't actually
           | simplfy the CPU much.
           | 
           | Of course some x86 extensions do get dropped occasionally,
           | but only things like AMD 3DNow (I guess AMD market share
           | meant few used it anyway) and that Intel transactional memory
           | thing that was just broken.
        
           | defaultname wrote:
           | Binary compatibility kept x86 dominant, coupled with
           | competing platforms not offering enough of a performance or
           | price benefit to make them worth the trouble.
           | 
           | That formula has completely changed. With the tremendous
           | improvement in compilers, and the agility of development
           | teams, the move has been long underway. People are firing up
           | their Gravitron2 instances at a blistering pace, my Mac runs
           | on Apple Silicon (already, just months in, with zero x86 apps
           | -- I thought Rosetta2 would be the lifevest, but everyone
           | transitioned so quickly I could do fine without it).
           | 
           | It's a very different world.
        
             | ben509 wrote:
             | GHC[1] is almost there >_<
             | 
             | [1]: https://www.haskell.org/ghc/blog/20210309-apple-m1-sto
             | ry.htm...
        
           | monocasa wrote:
           | No, there's a lot of PCisms that that can be removed and
           | still allow for x86 cores. User code doesn't care about PC
           | compat really anymore (see the PS4 Linux port for the
           | specifics of a non PC x86 platform that runs regular x86 user
           | code like Steam, albeit one arguably worse designed than the
           | PC somehow). Cleaning up ring 0 in a way that ring 3 code
           | can't tell the difference with a vaguely modern kernel could
           | be a huge win.
        
       | nanaahow wrote:
       | Really nice <a href="https://nanahow.com/how-to-make-brown/">how
       | to make brown</a>
        
       | raverbashing wrote:
       | > Rather than use the IDT to locate the entry point of each
       | handler, processor hardware will simply calculate an offset from
       | a fixed base address
       | 
       | So, wasn't the 8086 like this? Or at least some microprocessors
       | that jump to $BASE + OFFSET to a point where one JMP fits more or
       | less
        
         | sounds wrote:
         | I have no idea how Intel's proposal handles this, but the 8086
         | jump was to a fixed location. i.e. BASE was always 0.
        
       ___________________________________________________________________
       (page generated 2021-06-04 23:00 UTC)