[HN Gopher] Mandatory enforcement of indirect branch targets ___________________________________________________________________ Mandatory enforcement of indirect branch targets Author : peter_hansteen Score : 218 points Date : 2023-07-14 12:19 UTC (10 hours ago) (HTM) web link (undeadly.org) (TXT) w3m dump (undeadly.org) | SoftTalker wrote: | Theo had to get his digs in against Linux in that announcement. | Why not just focus on what OpenBSD is doing, and maybe contrast | it to what Linux does without the speculation that they will | still be doing the same thing in 20 years. | | He's unquestionably brilliant, but I've had a few encounters with | him on the mailing lists and he is _so_ quick to take offense | where none was meant and drop into name-calling and insults. I | don 't really get it. He may have some deep insecurities. | VancouverMan wrote: | That part doesn't look like a "dig" or an insult to me. | | It seems like a reasonable, relevant, and plausible assessment | of how the long-term outcomes may likely differ between | OpenBSD's stricter approach versus a looser approach, | specifically when it comes to the degree of security offered | (which is one of OpenBSD's main focuses), based on a past | situation that's similar. | | How do you know that you aren't being, to use your words, | "quick to take offense where none was meant" in this case? | jacquesm wrote: | > How do you know that you aren't being, to use your words, | "quick to take offense where none was meant" in this case? | | Past knowledge about Theo? | Ericson2314 wrote: | Are Theo and Linux more alike than OpenBSD and Linux? | PrimeMcFly wrote: | Now, yes. Linus wasn't always so abrasive though. At some | point he caught up to Theo. | LexiMax wrote: | Linus has been trying to calm down in recent years, in | large part because he decided he no longer wanted to be | lumped in with the crowd that endlessly complains about | political correctness. | | https://www.bbc.com/news/technology-45664640 | Ericson2314 wrote: | Yeah this is good stuff, and why I felt bad about making | the comparison. Not saying Theo is in that camp, but | Linus is trying to be less abrasive in general, and Theo | is not. | NoZebra120vClip wrote: | Perhaps we're reading into their personalities more than | we should, based on public social-media appearances. | | Egos tend to become exaggerated when benevolent dictator | types make public statements. Their candor and bluntness | on a mailing list or Twitter may be completely different | than their demeanor and their kindness toward | collaborators in private. | | Now we have the very public drama that happened between | Theo and that "other BSD" team to create the original | schism. But have we had any subsequent drama that caused | breakups or forks? I don't know. OpenBSD manages to plug | away and push releases out the door on schedule, right? | | Linus doesn't seem to have a lot of internal contributor | drama, judging by the way they also push releases out the | door and merge pull requests and add features. | | Really, if either Theo or Linus were unreasonable men, | their teams would fall apart and they would cease to be | leaders of anything. I think their leadership abilities | speak for themselves: they've both been committed and | dedicated to the same project since decades ago, and | they've both built and maintained cohesive teams of | contributors who seem to mostly stick around long enough | to make a difference. | | They are "thought leaders", if you will; perhaps not | charismatic ones, but canny businessmen who know how to | nurture their pet projects. | NoZebra120vClip wrote: | > Are Theo and Linux more alike than OpenBSD and Linux? | | Is a Canadian kernel developer more like a POSIX operating | system than a POSIX operating system is like a POSIX | operating system? | | I'm not sure I understand. Perhaps you meant to write "Linus" | since Linus is also a kernel developer? That seems more like | apples to apples. | redundantly wrote: | I wouldn't have it any other way. I love the OpenBSD mailing | lists. Always an entertaining read when Theo gets involved. | teknopurge wrote: | upvoted and +1. Theo has been an important leader in OSS for | decades: his brevity and impatience is a net positive. also | he is usually correct. | ris wrote: | This is my main takeaway too. As a one time OpenBSD enthusiast | (and still admirer), now I'm a bit older I find the continual | smugness starts to grate. | | Truth is, Linux has a lot more constraints on how it can | implement something because it has _users_. Users that have all | sorts of different ways they need it to work. | microtherion wrote: | I'd just like to interject for a moment. What you're referring | to as Linux, is in fact, NotOpenBSD/Linux, or as I've recently | taken to calling it, Linux as opposed to OpenBSD... | Joker_vD wrote: | That's the problem with many brilliant people: what they | perceive as their interlocutors being deliberately obtuse on | some completely obvious point is actually their interlocutors | being just as smart as they always are on some point that is | not obvious at all to them. | rkangel wrote: | Perception of relative intelligence or sensible decision | making is irrelevant. Just because you think you're doing a | better job doesn't mean you need to shit on the other person. | | You could not mention Linux at all, or you could even say "we | think this is better than Linux's approach because of X" and | it would be a great improvement. | | I have always found it interesting that Rust purposefully | avoided doing language comparisons - "we're better than | Python like this and better than C like that". Their message | purposefully avoided any positioning of it as a competition, | instead focusing just on articulating Rust's value. It was an | eye opening approach given our instinct is normally to pit | things against each other. | selectodude wrote: | I think parent agrees with you. | brynet wrote: | It's an important comparison of the mechanisms, even in 2023, | you can still find binaries on modern Linux distributions with | executable stacks due to the fail-open design, 20 years later. | | The fact that Linux hasn't learned the right lessons in 20 | years, and has chosen to "double down" in respect to IBT/BTI, | does not inspire confidence that they will ever fix it. I'd say | his 20 year estimate was in fact being pretty generous given | the evidence available. | | https://news.ycombinator.com/item?id=21554975 | jacquesm wrote: | The funny thing is that this attitude towards breaking | changes is one of the reasons why Theo is able to make this | comment at all. If he would allow breaking changes then | OpenBSD adoption likely would be higher and that in turn | would cause him to resist the kind of things that Linux would | not be able to get away with. | | It's clearly different philosophies leading to different | outcomes with neither of them clearly better than the other, | it just depends on what you need. It would be possible to | make that statement in a more graceful way. | sillywalk wrote: | "I have altered the ABI. Pray I do not alter it further." | -- Theo de Raadt | | https://marc.info/?l=openbsd-tech&m=157489277318829&w=2 | binkHN wrote: | Theo himself considers OpenBSD a "research" OS, so I don't | think he'll ever consider OpenBSD going mainstream, | especially as it allows stuff like this to happen. | jacquesm wrote: | Indeed, so it's apples-to-oranges. | mananaysiempre wrote: | > It's an important comparison of the mechanisms, even in | 2023, you can still find binaries on modern Linux | distributions with executable stacks due to the fail-open | design, 20 years later. | | Unfortunately, for C code using GCC's nested functions | extension (or for languages that want to be ABI-compatible | with C and support nested functions, like that paragon of | advanced features called Pascal /s ), there's no other | compilation strategy in current ABIs. The patches to switch C | (and not just Ada) to function descriptors[1] with an ABI | break have been sitting on the GCC mailing list since | approximately forever[2], but it doesn't seem like there's | been any progress. | | [1] The strategy is basically to compile (*fp)() not as | call *%rax | | but as (untested) test $1, %rax | jz 1f mov 8(%rax), %r10 mov (%rax), %rax | 1: call *%rax | | thus essentially inlining the (currently stack-allocated) | closure calling thunk at all indirect call sites. It is ABI- | compatible on x86 and x86-64 with all code that does not | involve nested functions, place functions at odd addresses, | or tag function pointers itself (and I think with all arm64 | and riscv code, although arm32's usage of the low pointer bit | for Thumb interworking is bound to make this trickier). | | [2] https://gcc.gnu.org/legacy-ml/gcc- | patches/2019-01/msg00735.h... | brynet wrote: | That strategy won't fly with IBT. | | Now all software must pay the price and miss out on | important mitigations, for all eternity, just because of | some largely unused feature in one compiler? | mananaysiempre wrote: | IBT is already further along here. The hypothetical | solution for executable stacks is to recompile all of | your nested-function-using or -calling code with | -ftrampolines (except that won't work without the patch | above--silently, really GCC?..). The _already real and | working_ solution for IBT is to recompile all of your | indirect-branch-using code with -fcf-protection=branch. | So, ignoring the fact that nested functions are in | practice much rarer, if you accept the former as valid | you'll need to accept the latter as well, as far as logic | as concerned. | | I wouldn't characterize this as a "largely unused feature | in one compiler" screwing things up, but rather as the | ABI on most Linux and -adjacent platforms (except SysV | Itanium and FDPIC IIRC) being incapable of supporting | closures (without executable stacks). That these are | missing from standard C, and only present in languages | that are either niche (Pascal, Ada) or don't care about | following the platform ABI (Rust, Go, C++'s lambdas), is | a defect of C (and that's at least a somewhat popular | opinion among ISO C committee members[1]). | | Of course, OpenBSD essentially does not _have_ a stable | ABI, so it's much freer to experiment here. | | [1] https://thephd.dev/lambdas-nested-functions-block- | expression... | whoopdedo wrote: | It's the price you pay for never-break-userspace. OpenBSD is | fine with the very small probability that an executable which | doesn't do branch tracking will fail to run under the | enforced rules. The answer to that is to recompile because | you've still got the source, and if not, well, tough cookies. | ndesaulniers wrote: | > the very small probability that an executable which | doesn't do branch tracking will fail to run under the | enforced rules | | Isn't it any indirect branch in any program that will trip | BTI/IBT? So most programs? I guess I disagree with the | `small probability ` part. | jacquesm wrote: | Tough cookies translates for many people into: OpenBSD is | not for me. The 'very small probability' likely approaches | '1' for sufficiently old enough stuff. And even if you do | have the source, does it still build without substantial | work? Backwards compatibility is not something to toss out | the window without thinking through the consequences. | loeg wrote: | > OpenBSD is fine with the very small probability that an | executable which doesn't do branch tracking will fail to | run under the enforced rules. | | To clarify slightly, OpenBSD is fine with the very _high_ | probability that an executable will fail under new rules. | Otherwise, yes. | [deleted] | WalterBright wrote: | I'm working on adding ENDBR support to the DMD D compiler | backend. | ntfAX wrote: | A software solution provided by the OS or language can make this | hardware solution irrelevant. | wongarsu wrote: | Windows does this in software, since approximately 8 years. | | An advantage of the software solution is that you don't need to | have the feature compiled into every library for it to work, | you just lose protection in those parts. That makes for a much | quicker rollout. Also faster iteration times, in the Windows | Insider Preview you can get the extended version that also | checks that the hashed function signature matches. | | 1: https://learn.microsoft.com/en- | us/windows/win32/secbp/contro... | josephcsible wrote: | You've got it backwards: this hardware solution makes the | software solutions irrelevant. | tialaramex wrote: | Nope. Here's the actual problem, in these crappy languages | it's really easy for mistakes to result in a stack smash, so, | these types of hacks aim to make it harder for the bad guys | to turn that into arbitrary remote code execution. Not | impossible, just harder. Specifically in this case the idea | is that they won't be able to abuse arbitrary bits of | function without calling the whole function, at a cost of | some hardware changes and emitting unnecessary code. So maybe | they can't find a whole function which works for them and | they give up. | | Using better languages makes the entire problem disappear. | You don't get a stack smash, the resulting opportunities for | remote code execution disappear. | | It suggests that maybe the "C magically shouldn't have | Undefined Behaviour" people were onto something after all. | Maybe C programmers really are so wedded to this awful | language that just being much slower than Python wouldn't | deter them. There is still the problem that none of them can | agree how this should work, but if they'll fund it maybe it's | worth pursuing to find out how much they will put up with to | keep writing C. | yakubin wrote: | I'm always amused by how many of OpenBSD's mitigations are | patching over something as basic as lack of bounds | checking, yet they'll never add bounds checking. And, as | you said, those are all just speed bumps, not fixes. | dundarious wrote: | I think one could argue that all the software mitigations | that aren't based on compile time proofs result in quite a | bit more "emitting unnecessary code", if "unnecessary" is | taken to mean "not strictly intrinsic to the task of the | program". And undefined behavior is bad, but getting rid of | it wouldn't be a silver bullet for this problem in C, I | think. All undefined behavior could become "implementation | defined" tomorrow, where the C compiler becomes more like a | high-level assembler (again), and you could still jump the | instruction pointer into arbitrary program text. | tialaramex wrote: | > All undefined behavior could become "implementation | defined" tomorrow, where the C compiler becomes more like | a high-level assembler (again), and you could still jump | the instruction pointer into arbitrary program text. | | Try to work this through in your head. Imagine how you | need to specify the working of the abstract machine in | order to allow this. How do we talk about an "instruction | pointer" on the abstract machine? What are the | instructions it's pointing to? Am I defining an entire | bytecode VM? | | Nah, instead you're going to do one of two things. One: | "Undefined Behaviour" which we explicitly took off the | table, or Two: "If this happens the program aborts". And | with that the big problem evaporates. Does it make those | C programmers happy? I expect not. | dundarious wrote: | Implementation defined means the compiler must _specify_ | the behavior, but it has near total freedom, and it can | define it specific to the target system. There is no | abstract machine. If I use GCC on Linux x86-64, then | there very much is an instruction pointer. | tialaramex wrote: | In the real world, compilers just specify that the | behaviour is undefined and tell you to suck it up. But | we're talking about a hypothetical where we aren't | allowing Undefined Behaviour. Saying "Oh, but we can if | we say it's the implementation choosing" is a get out | which is meaningless for the hypothetical. Just refuse to | engage with the hypothetical instead if you don't like | it. | dundarious wrote: | I'm using specific, standards defined language, that's | relatively well known. For example, sizeof(int) is | implementation defined, meaning it must have a documented | definition, specific to the implementation (e.g., gcc | x86_64-linux-gnu, it's 4). | | In languages like C that are closer to the machine, not | everything has to be specified strictly in terms of a | generic abstract machine. | | I'm not trying to be hostile or evasive or derisive, I'm | just genuinely responding to your original comment, that | I think missed on some important info. And my point was | that _if we imagine a different world from the real world | we 're in right now_, where in this new world, all | undefined behavior became implementation defined | behavior, then there would _still_ be a need for | mitigations like endbr64. So I 'm not painting a rosy | picture for C. I just think undefined behavior is a red | herring. Assembly doesn't have undefined behavior, but | obviously you can have all sorts of issues there. | tialaramex wrote: | > Assembly doesn't have undefined behavior, but obviously | you can have all sorts of issues there. | | The machine is in the real world and is thus obliged to | have some actual behaviour, _but_ it is not always | practical to discern what that behaviour would be let | alone make it reliable across a product line and document | it in an understandable way. As a result actually your | CPU 's documentation does in effect include "Undefined | Behaviour". | dundarious wrote: | True, when writing my comment I wanted to qualify it to | the same effect, but thought it would be an unnecessary | subtlety to the general thrust of my point. That is, we | can ignore this kind of "undefined behavior in the | machine itself" for the purposes of this particular | discussion. | tialaramex wrote: | I don't see how to ignore it though. If we're defining | the behaviour but then our "definition" just doesn't | specify the actual behaviour because it's specified in | terms of hardware with no clearly defined behaviour for | that situation then it's just word play, we're not really | doing what I set out. | tremon wrote: | It's only irrelevant if the hardware solution is available on | all the supported architectures/systems. As long as it's not, | the software version must be maintained anyway, and might | suffer from bitrot if it's no longer exercised on the major | architectures. | nullc wrote: | Is this protection really all that helpful? Surely there are | functions you can call into the top of to do your diabolical | deeds for you. | | It would be more helpful if callers would store some machine | specific hash of the function prototype and the function itself | would check the hash, so that you could only redirect to calling | a function with the right signature. | | But that would also increase the overhead further. Already this | is bad enough that it makes jump tables unattractive (which is | too bad, considering the usually jump tables have little to no | risk of control flow redirection). | tedunangst wrote: | The entire field of ROP exploits would basically never have | been developed if it were as simple as just calling the | function you want. | messe wrote: | For anybody unfamiliar with this, as I was, this appears to refer | to Intel's Indirect Branch Tracking feature[1] (and the | equivalent on ARM, BTI). The idea is that an indirect branch can | only pass control to a location that starts with an "end branch" | instruction. An indirect branch is one that jumps to a location | whose value is loaded or computed from either a register or | memory address: think calling a function pointer in C. | | Without IBT, you'd have this equivalence between C and assembly: | main() { void (*f)(); f = foo; | f(); } void foo() { } --- | main: movl $foo, %edx call *%edx | ret foo: ret | | If IBT is enabled, the above code triggers an exception because | foo doesn't begin with an "end branch" instruction. When IBT is | enabled by the compiler, the above code gets assembled as: | main: endbr64 movl $foo, %edx | call *%edx ret foo: endbr64 | ret | | Now the compiler inserts endbr64 at the start of each function | prologue. The reason for this feature, is to use as a defense in | depth against JOP, and COP attacks, as it means that the only | "widgets" available to you are entire functions, which can be far | harder to exploit and chain. | | [1]: | https://www.intel.com/content/dam/develop/external/us/en/doc... | asveikau wrote: | It was an old joke that the opposite of "goto" is "come from", | or that if goto is considered harmful, nobody said anything | about a "come from". Marking something as a branch target | reminds me of this. | | https://en.m.wikipedia.org/wiki/COMEFROM | dejj wrote: | > GOTO considered harmful | | COMEFROM considered harm-mitigating | | It ingeniously makes Return Oriented Programming (ROP) a lot | harder. | messe wrote: | > COMEFROM considered harm-mitigating | | You know, that'd be a fantastic OpenBSD release name. | | Here's hoping a dev sees this comment; there's already been | a few commenting in this thread. | wongarsu wrote: | Interesting. Seems like enforcement on Intel CPUs is supported | since Tiger Lake (so ~2020). Windows has basically the same | feature implemented in software since 2015, called Control Flow | Guard [1]. I wonder what the story there is, and if Windows has | any plans to (get everyone to) switch to the hardware version | once those CPUs have sufficient market share. | | 1: https://learn.microsoft.com/en- | us/windows/win32/secbp/contro... | andersa wrote: | Windows also recently implemented a far better version of | this called Extended Flow Guard (XFG) that not only checks | whether the location is a valid destination, but also whether | it's a valid destination for that specific source. | | For example, for any virtual function call or function | pointer call, the destination must have a correct tag with | the hash of the arguments. It's much more secure, and also | faster, since loading the tag from memory can be merged with | loading the actual code after it. | | I wish this was the one implemented in hardware.. | simcop2387 wrote: | That does sound like it would be more robust, but | definitely sounds like it'd require a lot more silicon than | the IBT that they did implement. Something like it might be | something that comes in some future revisions. | rwmj wrote: | The fun fact being that older CPUs decode ENDBR64 as a slightly | weird NOP (with no architectural effects), but it'll fault on | original Pentiums: | https://stackoverflow.com/questions/56120231/how-do-old-cpus... | rollcat wrote: | Various architectures do other interesting things with NOPs, | IIRC one convention on PowerPC had something vaguely related | to debugging or tracing (I can't remember the details or find | any references right now). | Someone wrote: | https://www.ibm.com/docs/en/aix/7.3?topic=h-hpmstat- | command: | | "random_samp_ele_crit=name | | Specifies the random criteria for selecting the | instructions for sampling. Valid values for this option are | as follows: | | ALL_INSTR | | All instructions are eligible. This value is the default | setting. | | LOAD_STORE | | The operation is routed to the Load Store Unit (LSU); for | example, load, store. | | PROB_NOP | | Sample only special no-operation instructions, which are | called Probe NOP events. | | [...]" | aidenn0 wrote: | Some MIPS cores had a superscalar NOP that would stall | every ALU by one cycle, which was necessary because they | lacked synchronization instructions. | monocasa wrote: | RISC-V has a whole HINT space that's basically just morphs | of load immediate into zero register. | | AArch64 has a similar space: https://developer.arm.com/docu | mentation/ddi0596/2020-12/Base... | | And yes, PowerPC has a similar space as well holding hints | like 'give priority to the other hardware threads on this | core' and the like. https://utcc.utoronto.ca/~cks/space/blo | g/tech/PowerPCInstruc... | rollcat wrote: | I was wondering where did I read about PowerPC, and this | is exactly the article! So, it was for thread priority. | Strikes me as an odd design choice, this probably | should've been something to be managed by the OS more | explicitly. | messe wrote: | Not just architectures, but different OSes and ABIs have | found ways to repurpose no-ops. One example[1] is Windows | using the 2-byte "MOV EDI, EDI" as a hot-patch point: it | gets replaced by a "JMP $-5" instruction which jumps 5 | bytes before the start of a function into a spot reserved | for patching. That 5 bytes is enough to contain a full jump | instruction that can then jump wherever you need it to. | | ## Why do Windows functions all begin with a pointless MOV | EDI, EDI instruction? | | [1]: https://devblogs.microsoft.com/oldnewthing/20110921-00 | /?p=95... | pclmulqdq wrote: | Intel Vtune will do this with 5-byte NOPs directly. I | think LLVM's x-ray tracing suite did this with a much | bigger NOP, also, to capture more information. | gcoakes wrote: | Good read. Thank you. | | This just worsens my fear of changing "unnecessary" code | when I don't know the original motivation for it. | jeffbee wrote: | Interesting, thanks for pointing this out! Just yesterday | I was gazing at some program containing two consecutive | xor rax, rax. I thought what's the point? But as you | point out it might be a NOP sled designed to be that | specific length. | jchw wrote: | I wonder if this is still true. Whenever I go to hook | Win32 API functions, I use an off-the-shelf length | disassembler to create a trampoline with the first n | bytes of instructions and a jmp back, and then just patch | in a jmp to my hook, but if this hot-patch point exists | it'd be a lot less painful since you can avoid basically | all of that. | | Though, I guess even if it was, it'd be silly to rely on | it even on x86 only. Maybe it would still make for a nice | fast-path? Dunno. | mattgreenrocks wrote: | That's really clever use of the opcode space. Thanks for | passing that along. | SomeRndName11 wrote: | NOP on intels is in fact xchg eax, eax | dataflow wrote: | There's a good question in the comments there that I still | don't see the answer to. How does this work if there's an | interrupt between the branch and the endbranch? Does the OS | need to save/restore the "branchness" bit? | drdrey wrote: | there is no branchness bit, if there's an endbranch you can | jump to it | dataflow wrote: | Ah so when you return from an interrupt, the check is no | longer done? | simcop2387 wrote: | I'd assume so since it wouldn't be a call/jmp coming from | a computed address in a register. That said I haven't | read the documentation for any of this. But interrupts | should be having a stack pointer change and other things | happening that would be different, which is why they use | the IRET instruction and not the RET one. | muricula wrote: | Yes, on arm the branch type is saved in SPSR_EL1 in the | BTYPE field. That stands for Saved Program State Register | for Kernel Mode (Exception Level 1) and Branch Type. https: | //developer.arm.com/documentation/ddi0595/2021-12/AArc... | __failbit wrote: | Thank you for the explanation! | haberman wrote: | Interesting. I was able to get Clang to generate this using | `-fcf-protection=branch`: https://godbolt.org/z/rooP8vPsM | | It looks like endbr64 is a 4-byte instruction. That could be a | significant code size overhead for jump tables with lots of | targets: https://godbolt.org/z/xTPToaddh | notaplumber1 wrote: | OpenBSD disables jump tables in Clang on amd64 due to IBT, | some architectures also had jump tables disabled as part of | the switch to --execute-only ("xonly") binaries by default, | e.g: powerpc64/sparc64/hppa. | | https://marc.info/?l=openbsd-cvs&m=168254711511764&w=2 | | E.g: https://marc.info/?l=openbsd-cvs&m=167337396024167&w=2 | cratermoon wrote: | In case anyone wants a very simple introduction to JOP/COP | exploits and mitigations of this type: | <https://www.theregister.com/2020/06/15/intel_cet_tiger_lake/> | codedokode wrote: | Why should every function start with endbr64 command? Aren't | functions usually called directly? | | Also, is it required to insert endbr64 command after function | calls (for return address)? | eklitzke wrote: | As to why they're not always called directly, imagine some | code like this: int FooWithoutChecks(void | *p); int Foo(void *p) { if (p == | NULL) return -1; return FooWithoutChecks(p); | } | | In general the caller is expected to call Foo if they aren't | sure if the pointer is nullable, or if they already know that | pointer is not null (e.g. because they already checked it | themselves) they can call FooWithoutChecks and avoid a null | check that they know will never be true. | | The naive way to emit assembly for this is to actually emit | two separate functions, and have Foo call FooWithoutChecks | the usual way. But notice that the FooWithoutChecks function | call is a tail call, so the compiler can use tail call | optimization. To do this it would inline FooWithoutChecks | into Foo itself, so the compiler just emits code for Foo with | the logic in FoowithoutChecks inlined into Foo. This is nice | because now when you call Foo, you avoid a call/ret | instruction, so you save two instructions on every call to | Foo. But what if someone calls FooWithoutChecks? Simple, you | just call at the offset into Foo just past the pointer | comparison. This actually just works because Foo already has | a ret instruction, so the call to FooWithoutChecks will just | reuse the existing ret. This optimization also saves some | space in the binary which has various benefits in and of | itself. | | The example here with the null pointer check is kind of | contrived, but this kind of pattern happens a LOT in real | code when you have a small wrapper function that does a tail | call to another function, and isn't specific to pointer | checks. | aidenn0 wrote: | A traditional compiler needs to insert them for all external | functions, because other compilation units may make an | indirect call. | messe wrote: | C allows for any function to be called via a function | pointer, and functions can be in different translation units, | so the compiler can't simply assume that a function will | never be called indirectly and has to pessimistically insert | endbr64 in order to maintain a reasonable ABI. | | And no, as I understand it, this is only for branch/calls not | returns. | Joker_vD wrote: | Well, if the function is marked "static", the compiler can | actually check whether the function's address is taken in | the current compilation unit or not and omit/emit ENDBR64 | accordingly (passing pointers to static functions to code | in another compilation units is legal, and should still | work). | messe wrote: | Good catch. Yeah, as long as the functions address is | never taken the compiler has a lot of leeway with static | functions; it can even avoid emitting code for them | entirely if it can prove they're never called or if it's | able to compute their results at compile-time. | josephg wrote: | Yep. Or inline them at every call site if that makes | sense to do based on the optimization level and flags. | MobiusHorizons wrote: | Is this theoretically something lto could remove? | tedunangst wrote: | If you disable dlopen and ld_preload. | codedokode wrote: | Dlopen() "sees" only functions marked as exported (with | macro like DLLEXPORT on Windows), not every function or | am I wrong? Is C that bad? | tedunangst wrote: | On openbsd at least, every global symbol is exported | unless you use an explicit symbol list. It's unusual for | executables. | josephcsible wrote: | > Why should every function start with endbr64 command? | Aren't functions usually called directly? | | They're _usually_ called directly, but unless the compiler | can prove that they _always_ are (e.g., if they 're static | and nothing in the same file takes the address), endbr64 is | required. | | > Also, is it required to insert endbr64 command after | function calls (for return address)? | | No, IBT is only for jmp and call. SS is the equivalent | mechanism for ret. | derefr wrote: | > but unless the compiler can prove that they always are | (e.g., if they're static and nothing in the same file takes | the address), endbr64 is required | | Then why not just have the compiler break down every non- | static function into two blocks: a static function that | contains all the logic, and a non-static function that just | contains an IBT and a direct jump to the static function? | (Or, better yet, place the non-static label just before the | static one, and have the non-static fall through into the | body of the static.) Then the static direct callsites won't | have to pay the overhead of executing the IBT NOP. | Joker_vD wrote: | That's absolutely doable, just... How much is predicted | unconditional jump slower/faster than ENDBR64? What's the | ratio of virtual/static calls in real-world programs? And | while your last proposal ("foo: endbr64; foo_internal: | <code>") evades those questions, it raises up questions | about maintaining function alignment (16 bytes IIRC? Is | this even necessary today?) and restructuring the | compiler to distinguish the inner/external symbol | addresses. Plus, of course, somebody has to actually sit | down and write the code to implement that, as opposed to | just adding "if (func->is_escaping) emit_endbr(...);" at | the beginning of the code that emits the object code for | a function body. | 95014_refugee wrote: | The IBT NOP is "free" in that it will evaporate in the | pipeline; it still has to be fetched and decoded to some | extent, but it does not consume execution resources. | | From a tooling perspective, what you're describing (two | entrypoints for a function, the jump you mention is | pointless) would require changes up and down the | toolchain; it would affect the compiler, all linkers, all | debuggers, etc. By contrast, just adding an additional | instruction to the function prolog is relatively low- | impact. | | It's also worth noting that at the time code for a | function is emitted, the compiler is not aware of whether | the symbol will be exported and thus discoverable in some | other module, or by symbol table lookup, so emitting the | target instruction is essentially mandatory. | dzaima wrote: | Doesn't seem like it'd be that difficult to make the | change the other direction, i.e. keep endbr64 as-is as | the default case, but if there's a direct jump/call to | anywhere that starts with endbr64, offset the immediate | by 4 bytes; could be done in any single stage of | toolchain that has that info with no extra help. But | yeah, quite low impact, might not even affect decode | throughput & cache usage for at least one of the direct | or indirect cases. | tedunangst wrote: | What is the overhead of executing the IBT NOP? | 95014_refugee wrote: | It's not "executed" per se. It consumes space in the | cache hierarchy, and a slot in the front-end decoder. It | won't ever be issued, but depending on the | microarchitecture in question it might result in an issue | cycle having less occupancy than it might have had in the | case where the subsequent instruction was available. | | With that said, the first few instructions of a called | function often stall due to stack pointer dependencies, | etc. so the true execution cost is likely to be even | smaller than the above might suggest. | [deleted] | binkHN wrote: | I still run OpenBSD where I can, especially where security is | more important. Yes, it's still missing A LOT of functionally | compared to other UNIX-like systems, but security bases tend to | be well covered. | PrimeMcFly wrote: | I don't really buy their approach to security honestly. Trying | to fix all bugs is great, but they provide little to prevent | unknown bugs bing exploited (pledge is nice for software that | opts in to use it, but otherwise not so much). I'd love to see | them implement something like AppArmor with their approach, it | would probably be amazing. | | I actually think NetBSD is a pretty interesting alternative, it | has some nice security features like veriexec that don't get | talked about much. | binkHN wrote: | I think in the past they tried to fix all the bugs, and | realized they couldn't, so they started to build all sorts of | mitigations in the same vein as the one you see posted here | today. As for pledge, and the related mitigations, yes, | they're not useful if you don't use them, but I see this as | them innovating in the space and giving application | developers more tools to build hardened applications. | | I see tools like AppArmor as band-aids to fix problems that | shouldn't exist in the first place. The problem with these | approaches are the band-aids tend to break things in | unexpected ways and when that happens they simply get removed | and unused. | PrimeMcFly wrote: | > I see tools like AppArmor as band-aids to fix problems | that shouldn't exist in the first place. | | I fundamentally disagree on that. I think tools like that | are amazing at protecting against unknown threats/exploits. | They let you lock down software and protect against future | unknown exploits, badly behaving software, malicious | employees etc. I think something similar should be a part | of any OS claiming to be security focused. Basic DAC is | woefully insufficient. | | On the other hand, the industry has largely found other | solutions like sandboxing, but I still think MAC or RBAC or | whichever has a place, certainly as art of a defense in | depth strategy. | anthk wrote: | OpenBSD has these _on_ while on compiling. | [deleted] | carlosrg wrote: | > they provide little to prevent unknown bugs bing exploited | | They provide plenty of mitigations | (https://www.openbsd.org/innovations.html). In fact OP's | article is for preventing unknown bugs from being exploited. | PrimeMcFly wrote: | They don't provide _any_ mitigations of the sort I was | clearly referencing. Specifically, for restricting | malicious code or users that already has access to the | system, exploiting insecure software that was _not_ | compiled with pledge support. | MuffinFlavored wrote: | > Yes, it's still missing A LOT of functionally compared to | other UNIX-like systems | | Could you give some examples/samples of things you have ran | into off the top of your head? | binkHN wrote: | Sure. Poor SMP support (but this has improved heavily over | the years), ancient file system, no Bluetooth (not important | if you don't need this), reduced performance (due to a lack | of optimizations and security mitigations overhead), limited | Wi-Fi support (this is for numerous reasons, but it's better | than other BSDs)... | | I could go on, but, for my needs, it works very well and some | of its simplicities are a godsend. | dark-star wrote: | I find OpenBSD's hardware support especially lacking. It | doesn't really work that well on at least 3 devices where I | tried it on (all Dell laptops from various generations, 3-10 | years old), whereas Linux runs perfectly out-of-the-box on all | three. | | Which is sad, as I kinda like the *BSD approach to things | carlosrg wrote: | Not my experience at all, it works very well with a new Acer | laptop I own: the graphics work (Intel Xe - 12th gen | processor), audio, touchpad, keyboard (and special keyboard | keys like brightness), wifi... All I had to do is to download | the firmware with fw_update, nothing more. | | Also I was pleasantly surprised to hear they support Apple | M1/M2 Macs. Asahi Linux gets a lot of press around here but I | had no idea OpenBSD supported it. ___________________________________________________________________ (page generated 2023-07-14 23:00 UTC)