[HN Gopher] Undocumented CPU behavior: analyzing undocumented op... ___________________________________________________________________ Undocumented CPU behavior: analyzing undocumented opcodes on Intel x86-64 (2018) [pdf] Author : luu Score : 170 points Date : 2020-03-08 10:16 UTC (12 hours ago) (HTM) web link (www.cattius.com) (TXT) w3m dump (www.cattius.com) | stopads wrote: | When I learned to program (before 9/11) there was a big emphasis | on assembly language and using low level interfaces to | communicate with other hardware. The idea was that everyone | studying computer science should understand every aspect of the | CPU down to the register and operation level, and then be able to | design logic gates to replicate that functionality if needed. | | Now we have CPUs that are fundamentally undocumented, unknowable, | and untranslatable. The entire infrastructure of the network, the | telecoms, and the cpu design itself has all been subverted to the | needs of the national security complex or corporate advertising. | | I'm not sure what computer science even means anymore. Everything | I learned is completely useless. | dfox wrote: | The set of ~8 undocumented but well-known i386 instructions | predates not only 9/11 but even 486. | tmotwu wrote: | X86 was designed way back even before pre-9/11, so "now" is not | any different from the past. We all know the rep X86 gets for | poor documentation - poor design in general. Claiming that | older ISA's were better documented and easier to understand | will not get you very far. | | Most if not all top computer science / computer engineering | programs in the states teach digital logic design, x86 / | x86-64, computer architecture, compilers, communication | networks in very fundamental detail as required courses. The | emphasis is still there. | leggomylibro wrote: | I'm with you up to the last paragraph. | | It's not useless. FPGAs have plummeted in cost and there are | now open-source toolchains for some of them. There's also a | Free commercial-grade ISA that you can use in your personal | designs (RISC-V). These days, it is not expensive to design | your own well-understood computer which can run microcode | generated by commercial-grade compilation toolchains such as | GCC. Even hardware production is getting cheapER with shared | wafer runs like MOSIS, although custom silicon is still out of | reach for hobbyists. | | Chin up, buddy. The US is not the entire world, and the | pendulum of our generations' zeitgeist can still swing back | towards the ideals of liberty and equality of access which the | mavens of computing once stood for. You can already buy ARM | application processors from vendors other than Intel/AMD, and I | would be surprised if we lived in a world where every new | computer comes with "management engine" spyware in its CPU for | much longer. | zadokshi wrote: | I love your optimism, I'm not sure if I can see a path | towards the public voting for a government that would make | the necessary adjustments to reign in the ability of | government powers to influence "management engine" code. | h2odragon wrote: | https://en.wikipedia.org/wiki/LOADALL | | You could almost emulate a MMU on a 286 | Jahak wrote: | nice document | saagarjha wrote: | Associated GitHub repository with more information: | https://github.com/cattius/opcodetester | | I know next to nothing about processors at this level, but I | wonder if it would be possible for a skilled engineer to try to | find these instructions by scrutinizing the actual physical | instruction decoder on the chip and/or inspect the processor's | microcode. Are these things possible to do? If they are, is it | feasible to reverse engineer them? | jfkebwjsbx wrote: | Theoretically possible, practically impossible. | | For two reasons: the equipment needed (extremely expensive) and | the complexity of the task (transistors are placed by software, | not humans anymore). | saagarjha wrote: | What kind of complexity are we looking at, roughly? Surely | someone with deep pockets and the necessary expertise would | be interested in trying to find these kinds of things, no? | jfkebwjsbx wrote: | I am no EE, but even if you had the tools it is harder than | reverse engineering any software. | | At least with software you can go step by step and inspect | the memory at the very least. | | With hardware you would have to take into account the | entire chip state and routing. | alxlaz wrote: | A modern CPU has a transistor count in the billions/low | tens of billions. I haven't really thought about it but I'm | tempted to say that looking at the decoder stage(s) alone | won't do. Undocumented operation doesn't have to be in the | form of an entire undocumented instruction. You could | design the device so that the "right thing" would happen | simply by scheduling the right instruction, with the right | arguments, under the right conditions (the right execution | unit, the right amount of pipeline clog etc.). The whole | thing is significantly more complex than the "fetch-decode- | execute" diagrams would have you believe -- execution isn't | strictly sequential, executing exactly the same instruction | won't cause the exact same transistors to "fire" each time | etc.. | | So the level of complexity is pretty daunting. IMHO if you | want to find out undocumented behaviour that was | deliberately introduced, you're better off looking at other | methods, no matter how deep your pockets are. | dmitrygr wrote: | > A modern CPU has a transistor count in the billions/low | tens of billions | | A large percentage of that is simply 6T/cell SRAM | L1/L2/etc cache though | alxlaz wrote: | Certainly. But even those can be routed so that the WR | signal for a particular address also doubles as half of | the AND input which causes a read from another range to | always return zero, for example. (That's not an | "undocumented opcode", of course, but it can be used | maliciously). It's certainly not easy to do this kind of | meaningful obfuscation though, especially between | different blocks, since different blocks are usually in | different clock domains, too. | | Edit: sorry, my neurons got all jumbled and I was | thinking of a far more general case, i.e. undocumented | behaviour in general, not just undocumented instructions. | Indeed, only a relatively small subset of all these | transistors is relevant in terms of undocumented | instructions specifically. | userbinator wrote: | _transistors are placed by software, not humans anymore_ | | If anything I'd say that makes it even _easier_ to reverse- | engineer, since the layout is far more regular. There are | some public tools to do this already - here 's one that | immediately comes to mind: https://degate.org/ | [deleted] | cat_easdon wrote: | A team from Ruhr University Bochum reverse-engineered the | microcode for the AMD K8 and K10 to implement custom microcode | programs, they describe how they reverse-engineered the ROM | here: https://www.syssec.ruhr-uni- | bochum.de/media/emma/veroeffentl.... The problem with reverse- | engineering on newer CPUs is that on both Intel and AMD the | updates are now protected with cryptographic authentication, so | you can't run arbitrary custom microcode to aid in | understanding what it does. And as others have mentioned, the | hardware complexity and small feature sizes make reverse- | engineering the microcode engine or ROM by physical inspection | much harder than on the K8. I expect it will be achieved at | some point, though. | jfkebwjsbx wrote: | Even if you manage to understand the microcode, there could | be custom behavior in the hardware logic, right? | | So I would guess studying the microcode does not lead to | proving there is nothing else going on. | | I am no EE, though. | cat_easdon wrote: | Yeah, that's right - you could prove the absence of | malicious microcode, but not the absence of hardware | trojans, implants, etc. There are also the embedded | microcontrollers to deal with (e.g. the Management Engine, | Innovation Engine, and Power Management Controller on | Intel). | CodeArtisan wrote: | Yes it is, see https://arstechnica.com/gaming/2017/07/mame- | devs-are-crackin... or http://www.visual6502.org/ | | but next CPUs will have vertically stacked circuitry making | reverse engineering much more harder (impossible?). | | https://en.wikipedia.org/wiki/Three-dimensional_integrated_c... | | https://en.wikichip.org/wiki/intel/foveros | yjftsjthsd-h wrote: | Why would stacking layers make it impossible? The process | will be destructive, but I would expect it to _work_. | plutonorm wrote: | How would you ensure you only etched away the layers you | were interested in? | userbinator wrote: | It's already possible to delayer pretty accurately --- | see the visual6502 project linked above. | [deleted] | userbinator wrote: | I believe the first "page" of opcodes (i.e. 1-byte opcodes, the | ones that don't start with 0F) has already been extensively | researched and documented, at least in 16 and 32-bit mode; the | interesting things are all in the "second page", the ones that | begin with 0F and are relatively new instructions, and the | awkward and somewhat inconsistent way in which 64-bit mode was | implemented. | | Also, the fact that they're trying to test undocumented behaviour | from within a full OS was a bit unexpected; in the retrocomputing | community, where CPUs like the Z80 and 6502 have been studied | extensively, the usual way of testing undocumented behaviour is | to boot into a very minimal environment whose only purpose is | test that behaviour, so as to eliminate any other variables from | the process. Logic analysers/bus monitoring are also used | sometimes, although that might be harder with a modern high-speed | CPU. | s_gourichon wrote: | "High speed" shouldn't be a concern, should it? By adjusting | the clocks I believe you can run the CPU as slow as you wish. | | Complexity and the ratio of visible behavior over unobservable | state is astronomically worse than for a 8bit CPU and therefore | a concern, still. | kchoudhu wrote: | Related talk about this from Blackhat a few years ago: | | https://www.youtube.com/watch?v=KrksBdWcZgQ | anon73044 wrote: | Unfortunately Chris works for Intel now so I don't think he'll | be giving any more of these talks in the future. (At least | until his NDA expires) | guerrilla wrote: | Will video for this be uploaded? | cat_easdon wrote: | Sorry - the presentation was never recorded. There's more | information in the GitHub repo, however: | https://github.com/cattius/opcodetester. | s_gourichon wrote: | Just thinking aloud (not the only one, obviously). | | So is this the combined result of market mechanics? Intel being | leader their top priority was to release the fastest chips at all | costs, letting security/simplicity/sustainability behind? On top | of that, complexity becomed another barrier to competitors. This | feels insane, unsustainable. | | Whole parts of the industry have already switched to alternative | architectures. MIPS was prevalent in set-top box, then replaced | with ARM in the 2010s. ARM reigns on most mobile devices. Risc-V | is on the rise. | | Areas craving for performance without concern about power | consumption or security still run on Intel. For how long? | | Supercomputers get 10x more power from GPUs than CPUs, switching | to an alternative may come. | | Could we imagine the gamer market switching to nVidia on | ARM/RISC? | | It Intel architecture a huge sinking ship? | floatingatoll wrote: | The final thesis behind this slide deck is alongside it in the | repo: https://github.com/cattius/opcodetester/ | smitty1e wrote: | Undocumented for you doesn't remove the possibility that someone, | somewhere, has a firm grasp of what that opcode does, and why. | cat_easdon wrote: | Author here - the aim of this project was to explore exactly | why such opcodes are problematic for security. Even if they're | implemented with entirely innocent intentions - e.g. for | debug+verification purposes - they can lead to vulnerabilities | in operating systems, emulators, and hypervisors. They induce | edge cases which developers can't protect against if they don't | know they exist in the first place (due to the lack of any | public documentation). There's a more thorough writeup of the | project here: https://github.com/cattius/opcodetester/blob/mast | er/thesis.p.... | smitty1e wrote: | I recall some years ago seeing a post on the OpenBSD mailing | list about Intel chip errata and thinking: "I love Big | Brother, and Big Brother loves loving me." | Craighead wrote: | Where's the AMD report? | [deleted] | kken wrote: | It would actually be interesting to see examples of actual | undocumented opcodrles. There are none in the linked article. | guidedlight wrote: | But then they would be documented. | | Think McFly! | kken wrote: | Think more! One would guess that the search for undocumented | opcodes yields results...? | cat_easdon wrote: | Author here - the main reason there's no examples is | because I didn't have any interesting ones to report at the | time! I was trying to develop new detection methods, but | found only the (thousands) of undocumented software | prefetches which were previously reported by Domas in his | Sandsifter project, e.g. 0f 0d /2 and /3-7 on Intel CPUs | (these are documented by AMD, but not Intel, and opcode | behavior varies more often between the two than you'd | expect). Many of the interesting undocumented x86 opcodes | (e.g. icebp, salc, loadall) were either only present in | older CPUs or are now at least partially documented. There | are some much more interesting undocumented opcodes on | other architectures (which have architectural effects, e.g. | changing register values, halting the CPU), but that's | still an ongoing project. | | Edit: 0f 0d /2 is documented as prefetchwt1 but (allegedly) | unsupported by the CPUs I tested it on, so the fact it | executes at all is undocumented. ___________________________________________________________________ (page generated 2020-03-08 23:00 UTC)