[HN Gopher] Undocumented CPU behavior: analyzing undocumented op...
       ___________________________________________________________________
        
       Undocumented CPU behavior: analyzing undocumented opcodes on Intel
       x86-64 (2018) [pdf]
        
       Author : luu
       Score  : 170 points
       Date   : 2020-03-08 10:16 UTC (12 hours ago)
        
 (HTM) web link (www.cattius.com)
 (TXT) w3m dump (www.cattius.com)
        
       | stopads wrote:
       | When I learned to program (before 9/11) there was a big emphasis
       | on assembly language and using low level interfaces to
       | communicate with other hardware. The idea was that everyone
       | studying computer science should understand every aspect of the
       | CPU down to the register and operation level, and then be able to
       | design logic gates to replicate that functionality if needed.
       | 
       | Now we have CPUs that are fundamentally undocumented, unknowable,
       | and untranslatable. The entire infrastructure of the network, the
       | telecoms, and the cpu design itself has all been subverted to the
       | needs of the national security complex or corporate advertising.
       | 
       | I'm not sure what computer science even means anymore. Everything
       | I learned is completely useless.
        
         | dfox wrote:
         | The set of ~8 undocumented but well-known i386 instructions
         | predates not only 9/11 but even 486.
        
         | tmotwu wrote:
         | X86 was designed way back even before pre-9/11, so "now" is not
         | any different from the past. We all know the rep X86 gets for
         | poor documentation - poor design in general. Claiming that
         | older ISA's were better documented and easier to understand
         | will not get you very far.
         | 
         | Most if not all top computer science / computer engineering
         | programs in the states teach digital logic design, x86 /
         | x86-64, computer architecture, compilers, communication
         | networks in very fundamental detail as required courses. The
         | emphasis is still there.
        
         | leggomylibro wrote:
         | I'm with you up to the last paragraph.
         | 
         | It's not useless. FPGAs have plummeted in cost and there are
         | now open-source toolchains for some of them. There's also a
         | Free commercial-grade ISA that you can use in your personal
         | designs (RISC-V). These days, it is not expensive to design
         | your own well-understood computer which can run microcode
         | generated by commercial-grade compilation toolchains such as
         | GCC. Even hardware production is getting cheapER with shared
         | wafer runs like MOSIS, although custom silicon is still out of
         | reach for hobbyists.
         | 
         | Chin up, buddy. The US is not the entire world, and the
         | pendulum of our generations' zeitgeist can still swing back
         | towards the ideals of liberty and equality of access which the
         | mavens of computing once stood for. You can already buy ARM
         | application processors from vendors other than Intel/AMD, and I
         | would be surprised if we lived in a world where every new
         | computer comes with "management engine" spyware in its CPU for
         | much longer.
        
           | zadokshi wrote:
           | I love your optimism, I'm not sure if I can see a path
           | towards the public voting for a government that would make
           | the necessary adjustments to reign in the ability of
           | government powers to influence "management engine" code.
        
       | h2odragon wrote:
       | https://en.wikipedia.org/wiki/LOADALL
       | 
       | You could almost emulate a MMU on a 286
        
       | Jahak wrote:
       | nice document
        
       | saagarjha wrote:
       | Associated GitHub repository with more information:
       | https://github.com/cattius/opcodetester
       | 
       | I know next to nothing about processors at this level, but I
       | wonder if it would be possible for a skilled engineer to try to
       | find these instructions by scrutinizing the actual physical
       | instruction decoder on the chip and/or inspect the processor's
       | microcode. Are these things possible to do? If they are, is it
       | feasible to reverse engineer them?
        
         | jfkebwjsbx wrote:
         | Theoretically possible, practically impossible.
         | 
         | For two reasons: the equipment needed (extremely expensive) and
         | the complexity of the task (transistors are placed by software,
         | not humans anymore).
        
           | saagarjha wrote:
           | What kind of complexity are we looking at, roughly? Surely
           | someone with deep pockets and the necessary expertise would
           | be interested in trying to find these kinds of things, no?
        
             | jfkebwjsbx wrote:
             | I am no EE, but even if you had the tools it is harder than
             | reverse engineering any software.
             | 
             | At least with software you can go step by step and inspect
             | the memory at the very least.
             | 
             | With hardware you would have to take into account the
             | entire chip state and routing.
        
             | alxlaz wrote:
             | A modern CPU has a transistor count in the billions/low
             | tens of billions. I haven't really thought about it but I'm
             | tempted to say that looking at the decoder stage(s) alone
             | won't do. Undocumented operation doesn't have to be in the
             | form of an entire undocumented instruction. You could
             | design the device so that the "right thing" would happen
             | simply by scheduling the right instruction, with the right
             | arguments, under the right conditions (the right execution
             | unit, the right amount of pipeline clog etc.). The whole
             | thing is significantly more complex than the "fetch-decode-
             | execute" diagrams would have you believe -- execution isn't
             | strictly sequential, executing exactly the same instruction
             | won't cause the exact same transistors to "fire" each time
             | etc..
             | 
             | So the level of complexity is pretty daunting. IMHO if you
             | want to find out undocumented behaviour that was
             | deliberately introduced, you're better off looking at other
             | methods, no matter how deep your pockets are.
        
               | dmitrygr wrote:
               | > A modern CPU has a transistor count in the billions/low
               | tens of billions
               | 
               | A large percentage of that is simply 6T/cell SRAM
               | L1/L2/etc cache though
        
               | alxlaz wrote:
               | Certainly. But even those can be routed so that the WR
               | signal for a particular address also doubles as half of
               | the AND input which causes a read from another range to
               | always return zero, for example. (That's not an
               | "undocumented opcode", of course, but it can be used
               | maliciously). It's certainly not easy to do this kind of
               | meaningful obfuscation though, especially between
               | different blocks, since different blocks are usually in
               | different clock domains, too.
               | 
               | Edit: sorry, my neurons got all jumbled and I was
               | thinking of a far more general case, i.e. undocumented
               | behaviour in general, not just undocumented instructions.
               | Indeed, only a relatively small subset of all these
               | transistors is relevant in terms of undocumented
               | instructions specifically.
        
           | userbinator wrote:
           | _transistors are placed by software, not humans anymore_
           | 
           | If anything I'd say that makes it even _easier_ to reverse-
           | engineer, since the layout is far more regular. There are
           | some public tools to do this already - here 's one that
           | immediately comes to mind: https://degate.org/
        
           | [deleted]
        
         | cat_easdon wrote:
         | A team from Ruhr University Bochum reverse-engineered the
         | microcode for the AMD K8 and K10 to implement custom microcode
         | programs, they describe how they reverse-engineered the ROM
         | here: https://www.syssec.ruhr-uni-
         | bochum.de/media/emma/veroeffentl.... The problem with reverse-
         | engineering on newer CPUs is that on both Intel and AMD the
         | updates are now protected with cryptographic authentication, so
         | you can't run arbitrary custom microcode to aid in
         | understanding what it does. And as others have mentioned, the
         | hardware complexity and small feature sizes make reverse-
         | engineering the microcode engine or ROM by physical inspection
         | much harder than on the K8. I expect it will be achieved at
         | some point, though.
        
           | jfkebwjsbx wrote:
           | Even if you manage to understand the microcode, there could
           | be custom behavior in the hardware logic, right?
           | 
           | So I would guess studying the microcode does not lead to
           | proving there is nothing else going on.
           | 
           | I am no EE, though.
        
             | cat_easdon wrote:
             | Yeah, that's right - you could prove the absence of
             | malicious microcode, but not the absence of hardware
             | trojans, implants, etc. There are also the embedded
             | microcontrollers to deal with (e.g. the Management Engine,
             | Innovation Engine, and Power Management Controller on
             | Intel).
        
         | CodeArtisan wrote:
         | Yes it is, see https://arstechnica.com/gaming/2017/07/mame-
         | devs-are-crackin... or http://www.visual6502.org/
         | 
         | but next CPUs will have vertically stacked circuitry making
         | reverse engineering much more harder (impossible?).
         | 
         | https://en.wikipedia.org/wiki/Three-dimensional_integrated_c...
         | 
         | https://en.wikichip.org/wiki/intel/foveros
        
           | yjftsjthsd-h wrote:
           | Why would stacking layers make it impossible? The process
           | will be destructive, but I would expect it to _work_.
        
             | plutonorm wrote:
             | How would you ensure you only etched away the layers you
             | were interested in?
        
               | userbinator wrote:
               | It's already possible to delayer pretty accurately ---
               | see the visual6502 project linked above.
        
         | [deleted]
        
       | userbinator wrote:
       | I believe the first "page" of opcodes (i.e. 1-byte opcodes, the
       | ones that don't start with 0F) has already been extensively
       | researched and documented, at least in 16 and 32-bit mode; the
       | interesting things are all in the "second page", the ones that
       | begin with 0F and are relatively new instructions, and the
       | awkward and somewhat inconsistent way in which 64-bit mode was
       | implemented.
       | 
       | Also, the fact that they're trying to test undocumented behaviour
       | from within a full OS was a bit unexpected; in the retrocomputing
       | community, where CPUs like the Z80 and 6502 have been studied
       | extensively, the usual way of testing undocumented behaviour is
       | to boot into a very minimal environment whose only purpose is
       | test that behaviour, so as to eliminate any other variables from
       | the process. Logic analysers/bus monitoring are also used
       | sometimes, although that might be harder with a modern high-speed
       | CPU.
        
         | s_gourichon wrote:
         | "High speed" shouldn't be a concern, should it? By adjusting
         | the clocks I believe you can run the CPU as slow as you wish.
         | 
         | Complexity and the ratio of visible behavior over unobservable
         | state is astronomically worse than for a 8bit CPU and therefore
         | a concern, still.
        
       | kchoudhu wrote:
       | Related talk about this from Blackhat a few years ago:
       | 
       | https://www.youtube.com/watch?v=KrksBdWcZgQ
        
         | anon73044 wrote:
         | Unfortunately Chris works for Intel now so I don't think he'll
         | be giving any more of these talks in the future. (At least
         | until his NDA expires)
        
       | guerrilla wrote:
       | Will video for this be uploaded?
        
         | cat_easdon wrote:
         | Sorry - the presentation was never recorded. There's more
         | information in the GitHub repo, however:
         | https://github.com/cattius/opcodetester.
        
       | s_gourichon wrote:
       | Just thinking aloud (not the only one, obviously).
       | 
       | So is this the combined result of market mechanics? Intel being
       | leader their top priority was to release the fastest chips at all
       | costs, letting security/simplicity/sustainability behind? On top
       | of that, complexity becomed another barrier to competitors. This
       | feels insane, unsustainable.
       | 
       | Whole parts of the industry have already switched to alternative
       | architectures. MIPS was prevalent in set-top box, then replaced
       | with ARM in the 2010s. ARM reigns on most mobile devices. Risc-V
       | is on the rise.
       | 
       | Areas craving for performance without concern about power
       | consumption or security still run on Intel. For how long?
       | 
       | Supercomputers get 10x more power from GPUs than CPUs, switching
       | to an alternative may come.
       | 
       | Could we imagine the gamer market switching to nVidia on
       | ARM/RISC?
       | 
       | It Intel architecture a huge sinking ship?
        
       | floatingatoll wrote:
       | The final thesis behind this slide deck is alongside it in the
       | repo: https://github.com/cattius/opcodetester/
        
       | smitty1e wrote:
       | Undocumented for you doesn't remove the possibility that someone,
       | somewhere, has a firm grasp of what that opcode does, and why.
        
         | cat_easdon wrote:
         | Author here - the aim of this project was to explore exactly
         | why such opcodes are problematic for security. Even if they're
         | implemented with entirely innocent intentions - e.g. for
         | debug+verification purposes - they can lead to vulnerabilities
         | in operating systems, emulators, and hypervisors. They induce
         | edge cases which developers can't protect against if they don't
         | know they exist in the first place (due to the lack of any
         | public documentation). There's a more thorough writeup of the
         | project here: https://github.com/cattius/opcodetester/blob/mast
         | er/thesis.p....
        
           | smitty1e wrote:
           | I recall some years ago seeing a post on the OpenBSD mailing
           | list about Intel chip errata and thinking: "I love Big
           | Brother, and Big Brother loves loving me."
        
           | Craighead wrote:
           | Where's the AMD report?
        
       | [deleted]
        
       | kken wrote:
       | It would actually be interesting to see examples of actual
       | undocumented opcodrles. There are none in the linked article.
        
         | guidedlight wrote:
         | But then they would be documented.
         | 
         | Think McFly!
        
           | kken wrote:
           | Think more! One would guess that the search for undocumented
           | opcodes yields results...?
        
             | cat_easdon wrote:
             | Author here - the main reason there's no examples is
             | because I didn't have any interesting ones to report at the
             | time! I was trying to develop new detection methods, but
             | found only the (thousands) of undocumented software
             | prefetches which were previously reported by Domas in his
             | Sandsifter project, e.g. 0f 0d /2 and /3-7 on Intel CPUs
             | (these are documented by AMD, but not Intel, and opcode
             | behavior varies more often between the two than you'd
             | expect). Many of the interesting undocumented x86 opcodes
             | (e.g. icebp, salc, loadall) were either only present in
             | older CPUs or are now at least partially documented. There
             | are some much more interesting undocumented opcodes on
             | other architectures (which have architectural effects, e.g.
             | changing register values, halting the CPU), but that's
             | still an ongoing project.
             | 
             | Edit: 0f 0d /2 is documented as prefetchwt1 but (allegedly)
             | unsupported by the CPUs I tested it on, so the fact it
             | executes at all is undocumented.
        
       ___________________________________________________________________
       (page generated 2020-03-08 23:00 UTC)