[HN Gopher] Hiding messages in x86 binaries using semantic duals ___________________________________________________________________ Hiding messages in x86 binaries using semantic duals Author : todsacerdoti Score : 138 points Date : 2020-08-16 14:40 UTC (8 hours ago) (HTM) web link (blog.yossarian.net) (TXT) w3m dump (blog.yossarian.net) | praalhans wrote: | About ten years ago, the same question was raised on | StackOverflow: | https://stackoverflow.com/questions/2760794/x86-cmp-instruct... | | A comment was made: "These 1-bit degrees of freedom also provide | a covert channel for compilers to "phone home" - they can | "watermark" the binaries they produce, and the compiler vendor | can ask you to please explain if they find your software with | their watermark, but with no license on file." - Bernd Jendrissek | steamraven wrote: | This would provide an interesting way of signing a binary. Encode | using all 0s, then sign, then encode the signature | crb002 wrote: | You could do stenography with ordering of function/data sections | VS an expected permutation. | 082349872349872 wrote: | An old test suite (Plum Hall?) used to warn it was generated | uniquely per-client, with steganographic watermarking at the C | source level. | tom_ wrote: | A86 reportedly did something similar: | https://en.wikipedia.org/wiki/A86_(software) | jlokier wrote: | Yes, I remember A86 documentation saying it did that to every | program it assembled, to watermark that A86 was used to | assemble it. | | In the 80s. Blast from the past! | mikorym wrote: | I think it is better to call this a noncommutative operation and | phrase it as such. Duals _are_ often in fact, different things. | | But I think what you are doing is pretty cool! I don't know much | about machine code, but whenever XOR becomes noncommutative, for | whatever reason, but _executes_ commutatively, then this should | become possible. | | Of course, by the way, XOR also has the vanilla ability to reveal | information through: plaintext XOR cypher = cyphertext and then | you do cyphertext XOR cypher to get back plaintext. In this case, | XOR is commutative, as we are only looking at execution, and | moreover each binary string is it's own inverse: thing XOR thing | = 0-string. So, you can think of XOR as reversible sequences of | encypherment. | saagarjha wrote: | I'm not sure what xor's commutativity has to do with this? | BeeOnRope wrote: | It might be worth noting that _strictly speaking_ this isn 't | safe to apply to an arbitrary binary. That is 31 c0 and 33 c0 | might behave identically when executed as xor, but they could, | for example, also be part of a larger instruction where the swap | results in a change in behavior. | | Presumably this implementation disassembles the binary which | gives a very high probability of determining whether these bytes | are _only_ executed as xor, but this isn 't in general enough | since the way the program is executed at runtime may not | correspond to disassembly [0]. | | Other patterns that might trip this up are programs that re-use | some program bytes as constants (e.g., if you needed the 16-bit | constant 0xc031 you could just point to this existing pattern in | the instruction stream [1]) or which otherwise examine or modify | their instruction bytes at runtime. | | Now of course this caveat doesn't apply to almost any "vanilla | program" compiled by a normal compiler. It's only likely to come | up in hand-written assembly or as a result of some tool or | process that purposes does this weird stuff (e.g., as an anti- | debugging measure). 99.99% of the time these swaps will work out | fine. | | --- | | [0] Further, you can't even necessarily unambiguously assign a | byte to a particular instruction, even based on the dynamic | behavior, since a given byte might be used in two _different | instructions_ based on an earlier jump which result in different | parsed boundaries. This is _really really_ unusual and usually in | the realm of demos or anti-debugging techniques, etc. | | [1] This is not actually a good idea for performance because | modern CPUs hate it when you use the same bytes for both code and | data. | ornxka wrote: | >It might be worth noting that strictly speaking this isn't | safe to apply to an arbitrary binary. That is 31 c0 and 33 c0 | might behave identically when executed as xor, but they could, | for example, also be part of a larger instruction where the | swap results in a change in behavior. | | Technically speaking, you can't apply any transformation to | arbitrary binaries, because then the encoding changes and the | code can look back at itself. This actually features heavily in | a recent Hacker News article | https://news.ycombinator.com/item?id=23557998 about reverse | engineering the Snapchat app, where the code regularly took | checksums of itself to frustrate debuggers and take different | codepaths if tampering (such as to insert breakpoints) was | detected. | | Assembly is kind of, in a sense, the ultimate homoiconic | programming language. | woodruffw wrote: | > It might be worth noting that strictly speaking this isn't | safe to apply to an arbitrary binary. That is 31 c0 and 33 c0 | might behave identically when executed as xor, but they could, | for example, also be part of a larger instruction where the | swap results in a change in behavior. | | > Presumably this implementation disassembles the binary which | gives a very high probability of determining whether these | bytes are only executed as xor, but this isn't in general | enough since the way the program is executed at runtime may not | correspond to disassembly [0]. | | Yep: steg86 uses iced[1] internally to decode and re-encode | instruction sequences. Arbitrarily replacing `31 C0` with `33 | C0` (or vice versa) in the instruction text would certainly not | go well. | | And yes: it's possible to contrive pathological programs that | make these kinds of patches impossible to perform safely. But | those fall under the "what did you expect" support category, at | least in terms of the current implementation. A compiler-based | implementation would certainly have an easier time. | | [1]: https://github.com/0xd4d/iced | BeeOnRope wrote: | Yes, I think the most practical scenarios this would fail in | the real world would be: | | 1) Signed binaries or other similar concepts such as "anti- | cheat" stuff that tries to detect binary modification. | | 2) Code that actually does do really weird stuff in order to | fool static disassembly, e.g., obfuscation which is not | uncommon in games and some other binary types. | | Of course, if you created the binary yourself, you'd be aware | of all these gotchas, so this would only come as a surprise | if you were applying it as a "third party" to some arbitrary | binary. | weinzierl wrote: | I remember that one assembler claimed to do something similar to | sort of watermark the binaries. From what I remember it did it | only in the otherwise fully functional unregistered version to | encourage people to pay for it. From my memory it was NASM but I | cannot find anything about it ever being paid for. Does anyone | remember a shareware assembler from mid nineties that did this? | | EDIT: Another comment already mentioned A86. Probably that was | the one. | jlmcgraw wrote: | :You're thinking of A86: | | https://en.m.wikipedia.org/wiki/A86_(software) | userbinator wrote: | Decades ago, I remember figuring out quite a bit of that, and | here are my original notes from that, including tables that | show the pattern clearly: MOV rA, rB: | B: AL AH BL BH CL CH DL DH A: 1 1 0 0 1 1 | 0 0 AL 1 ** ** 8A 8A ** ** 8A 8A AH 1 | ** ** 8A 8A ** ** 8A 8A BL 0 8A 8A ** ** 8A 8A ** | ** BH 0 8A 8A ** ** 8A 8A ** ** CL 1 ** | ** 8A 8A ** ** 8A 8A CH 1 ** ** 8A 8A ** ** 8A 8A | DL 0 8A 8A ** ** 8A 8A ** ** DH 0 8A 8A ** ** | 8A 8A ** ** * = "reversed" opcode (88) | MOV rA, rB ( word regs ): B: AX BX CX DX SP BP | SI DI A: 1 0 1 0 1 1 0 0 AX 1 | ** 8B ** 8B ** ** 8B 8B BX 0 8B ** 8B ** 8B 8B ** | ** CX 1 ** 8B ** 8B ** ** 8B 8B DX 0 8B | ** 8B ** 8B 8B ** ** SP 1 ** 8B ** 8B ** ** 8B 8B | BP 1 ** 8B ** 8B ** ** 8B 8B SI 0 8B ** 8B ** | 8B 8B ** ** DI 0 8B ** 8B ** 8B 8B ** ** | * = "reversed" opcode (89) The above tables | apply for the following two-operand instructions: | ADD OR ADC SBB | AND SUB XOR CMP | For TEST and XCHG, which are commutative, A86 always puts the | first operand in the r/m field if possible, while | MASM puts it in the reg field if the first operand is | a register. | weinzierl wrote: | It's been so long that I read that remark in the manual and | I'm glad I wasn't just imagining it;-) | astrobe_ wrote: | I can confirm. The author was an Intel engineer who was | dissatisfied with how his senior engineers did the DOS EXE | format in the intro of the docs of his assembler. | | A86/D86 where fantastic, with extra bonus points for the | docs. | josephcsible wrote: | This seems like it'd be almost comically easy to detect, because | normal compilers and assemblers would consistently use either 31 | C0 or 33 C0 all the time, rather than bouncing back and forth | between them within a single program. | woodruffw wrote: | Author here: it is indeed easy to detect. I didn't base my work | off of Hydan but it uses a similar strategy, and there are | several[1][2] public steganalyses of it. | | Edit: I previously claimed that this method might be easy to | deny. It isn't, and I've added a note to the post as well. | | [1]: https://cosec.inf.uc3m.es/~juan- | tapiador/papers/2009sec.pdf | | [2]: https://www.sans.org/reading- | room/whitepapers/stenganography... | jaclaz wrote: | I am not sure to understand how the program can make a | distinction between 33 C0 (because it is written as 33 C0) | and 33 C0 (because it was written as 31 C0 and was later | changed by steg86 to 33 C0)? | | Or, seen the other way, maybe steg86 can "extract" (from | already existing untouched binaries) secret messages that | were never intentionally written? | woodruffw wrote: | This is a really good question! | | steg86 currently embeds a 32-bit header for itself. You can | see the relevant constants here[1]. If the header doesn't | validate during extraction, steg86 fails instead of | extracting potential garbage. | | [1]: https://github.com/woodruffw/steg86/blob/master/src/st | eg86/b... | jaclaz wrote: | OK, but how is the 32-bit header itself embedded? | | The header is a good thing for preventing accidental | extraction from binaries not treated with steg86, but | still you need to modify 4 bytes (or 4 instructions) to | embed this header, so - at least theorically - the | possibility of a "collision" or of a false positive seems | relatively high to me. | woodruffw wrote: | > OK, but how is the 32-bit header itself embedded? | | The header is embedded according to the same rules as the | rest of the message: it's treated as a bitstring, and | flippable instructions are flipped appropriately to | encode it. | | > the possibility of a "collision" or of a false positive | seems relatively high to me. | | As others have pointed out, compilers (and assemblers) | tend to stick to a single selection choice for register- | to-register ops. Even in the case where a compiler writer | flips between them randomly, their 32 random choices | would have to align precisely with the expected header. | It's certainly not impossible, but pretty unlikely. It's | also completely remediable with a CRC32 or similar field | tacked onto the header; I just didn't think the | likelihood warranted that for the initial design. | paraboul wrote: | 4 bytes of the encoded header isn't 4 instructions, it's | 32 consecutive instructions that each translate to a bit | depending on their semantic form. | | So there is virtually no chance for a compiler to | generate a valid sequence randomly. | jaclaz wrote: | I see, each changed instruction gives a single bit (not | byte) of information, my bad. | Legogris wrote: | But that's not a fundamental issue of the idea and rather | a quirk in the steg86 implementation, right? | woodruffw wrote: | Which part? The header is specific to steg86, and some of | its constraints (like the maximum message size) are a | product of that. Otherwise, there aren't any quirks in | the implementation that I'm currently aware of. | Legogris wrote: | Are there other semantic duals in the instruction sets? | If so, when combined with cryptography, it would offer | some additional level of plausible deniability over which | of the possible messages (due to parameter selection) is | the cryptotext. | jlokier wrote: | The two most obvious are: | | - Register allocation choices | | - Instruction ordering choices | schoen wrote: | Also conditional branches because you can conceivably do | the comparison, branch, and fall-through in several | different ways. Consider the initial case | cmp ax,bx jge foo bar: ; code that gets | run if ax < bx foo: ; code that gets run if | ax >= bx | | You could change this to cmp bx,ax and then change the | jge to jl, or you could change either one alone and | reverse the order of the bar and foo code blocks, or you | could change both and not reverse the order of the bar | and foo code blocks. (You have fewer choices if you are | doing the comparison as the exit condition for a loop, | except sometimes compilers will put the loop continue | condition as an _unconditional jump_ after bar or foo, in | which case you have even _more_ choices, like whether to | do that or not!) | shawnz wrote: | The same problem is faced when you present basically any | kind of encryption scheme with random data. It would be | easy to fix that kind of problem by adding a checksum to | the secret message. | | Besides, in what scenario would you be decrypting arbitrary | binaries with this, such that it would be a problem for you | to have false positives? Just make sure to use it only on | files which you know contain secret messages. | jaclaz wrote: | >Besides, in what scenario would you be decrypting | arbitrary binaries with this, such that it would be a | problem for you to have false positives? Just make sure | to use it only on files which you know contain secret | messages. | | Well, more or less cryptography/steganography can be used | to either store "secrets" or to communicate them. | | If I used something like this to communicate, I would | probably tell the other part to download (say) a .iso | full of binaries, as opposed to a single binary. | CydeWeys wrote: | It's not easy to deny. Normal compilers flat out don't | produce binaries that look like this, so if a binary looks | like this then you definitely know something fishy is going | on even if you can't decrypt it. | | Contrast with steganography in least significant image bits; | you genuinely cannot determine at all if an image is carrying | hidden encrypted data. That's what easy to deny looks like. | woodruffw wrote: | > It's not easy to deny. Normal compilers flat out don't | produce binaries that look like this, so if a binary looks | like this then you definitely know something fishy is going | on even if you can't decrypt it. | | Yes, you're right. I've updated the post to include a note | at the bottom saying that this technique is difficult to | provide deniability with. | | > Contrast with steganography in least significant image | bits; you genuinely cannot determine at all if an image is | carrying hidden encrypted data. That's what easy to deny | looks like. | | This might be a misunderstanding on my part, but I | _thought_ that LSB-style steganography had been broken by | both statistical analyses and ML models for a while now[1]. | But it 's possible that these methods only work on | plaintext messages; I haven't looked deeply into it. | | [1]: https://github.com/b3dk7/StegExpose | royjacobs wrote: | You are right, in the sense that there are companies out | there (I used to work for one of them) that do image, | video and audio watermarking. In the simplest terms | that's also adding imperceptible "noise" that | statistically builds up to a few bits of data. | mikorym wrote: | I think from the perspective of steganography the question is | also: How often is byte code checked for anything like this? | | Steganography is like a magician's act; it's only undetected | if you don't go and look for it. I think, in principle, if | you are the only one using the tool, and didn't tell anyone, | why would anyone look for it? | ivanbakel wrote: | Which is a point made in the article - the technique is not | meant to be hard to detect a la cryptography, it's meant to be | non-obvious to spot. By exploiting the fact that nobody is | going to look through (or edit) machine instructions for | patterns in register bytes, you can include a message that is | easy to overlook. | JadeNB wrote: | This _particular_ trick is easy to detect _when you 're looking | for it_, but that doesn't mean that it's easy to realise that | you _should_ look for this trick (I 'm not about to make "check | the assembler for strange compiler choices" part of my workflow | for installation of a new binary), or that all possible such | tricks could be so easily caught. | verroq wrote: | Surely there is some way to verify if a build tempered with, | _cough_ file integrity hashes. | jermier wrote: | Cool project. I like to hide data in audio files with Deepsound | https://deepsound.soft112.com/ | rgovostes wrote: | I wrote a blog post on the weak cryptography used by Deepsound: | | https://ryan.govost.es/2018/03/09/deepsound.html | jermier wrote: | That's a great writeup. Is it possible to create a really | long passphrase whose hash can't be reversed easily? Perhaps | a diceware passphrase with six randomly chosen words? | readams wrote: | Just encrypt your data first before giving to DeepSound. | rgovostes wrote: | The difficulty of breaking Deepsound is basically | equivalent to the difficulty of reversing a SHA-1 hash. For | dictionary words and shorter passwords, consider them | broken instantaneously through pre-computed lookup tables. | | For more complex passphrases (and remember, only the first | 32 characters count here), exponential growth probably | works in your favor, even with today's Bitcoin-fueled | hyper-accelerated SHA-1 implementations. | | Even then, the scheme where they use the password directly | as the AES key is flawed. For example, in ASCII, every | octet's most-significant bit is zero, so 32 bits of your | AES key are fixed. I don't know if this enables practical | attacks, but anyone who cares about securing their data | shouldn't rely on amateur cryptography like this. | | Edit: Oh right, and aside from the password aspect, it uses | ECB mode for the encrypted content. That's not good. | Beldin wrote: | For those who are curious about ECB: see the picture & | encrypted picture of Tux on https://en.m.wikipedia.org/wi | ki/Block_cipher_mode_of_operati... | kens wrote: | I've been reverse-engineering the 8086 lately, and I recently | came across the exact register-control circuitry that makes this | steganography possible. | | The steganography takes advantage of x86 instructions where you | can swap the source and destination by replacing an instruction | like hex 31 c0 with 33 c0. The difference is bit 1, the direction | bit, supported by many instructions. | | Internally, the 8086 has 5-bit registers that specify the source | and destination, typically a register. [1] The source and | destination register get their value either from the register | specification in the instruction (bits 5-3 or bits 0-2), or | values in the microcode. | | The clever part is the outputs of the source and destination | registers go through multiplexers that can swap them. If the | instruction has a direction bit, and the direction bit is set, | then accesses to the source and destination registers are | swapped. | | The point of this is that the microcode doesn't know anything | about direction swapping, so it is implemented "for free" as far | as microcode size (but with the addition of the swapping | circuit). | | The 8086 has a "Group PLA" that categorizes instructions into | groups; one of these groups is "instructions that have a | direction bit". This prevents direction swapping from happening | for instructions where it is not supported. | | I hope this explanation makes sense; it should probably be a blog | post :-) | | [1] You might wonder why source and destination are specified | with 5 bits when the instructions use 3 bits to specify the | register. The first reason is that many registers can be accessed | as half-registers, so you need another bit. (This bit comes from | the byte/word specification bit in instructions.) Second, this | mechanism is used to access the other 8086 registers, not just | the general-purpose registers. Third, there are also invisible | temporary registers that also need accessing. Thus, the internal | register specifications are 5 bits. | woodruffw wrote: | This is a fantastic explanation, thank you! I would _love_ a | detailed blog post on it. ___________________________________________________________________ (page generated 2020-08-16 23:00 UTC)