hngopher.com

       [HN Gopher] Mandatory enforcement of indirect branch targets
       ___________________________________________________________________
        
       Mandatory enforcement of indirect branch targets
        
       Author : peter_hansteen
       Score  : 218 points
       Date   : 2023-07-14 12:19 UTC (10 hours ago)
        
 (HTM) web link (undeadly.org)
 (TXT) w3m dump (undeadly.org)
        
       | SoftTalker wrote:
       | Theo had to get his digs in against Linux in that announcement.
       | Why not just focus on what OpenBSD is doing, and maybe contrast
       | it to what Linux does without the speculation that they will
       | still be doing the same thing in 20 years.
       | 
       | He's unquestionably brilliant, but I've had a few encounters with
       | him on the mailing lists and he is _so_ quick to take offense
       | where none was meant and drop into name-calling and insults. I
       | don 't really get it. He may have some deep insecurities.
        
         | VancouverMan wrote:
         | That part doesn't look like a "dig" or an insult to me.
         | 
         | It seems like a reasonable, relevant, and plausible assessment
         | of how the long-term outcomes may likely differ between
         | OpenBSD's stricter approach versus a looser approach,
         | specifically when it comes to the degree of security offered
         | (which is one of OpenBSD's main focuses), based on a past
         | situation that's similar.
         | 
         | How do you know that you aren't being, to use your words,
         | "quick to take offense where none was meant" in this case?
        
           | jacquesm wrote:
           | > How do you know that you aren't being, to use your words,
           | "quick to take offense where none was meant" in this case?
           | 
           | Past knowledge about Theo?
        
         | Ericson2314 wrote:
         | Are Theo and Linux more alike than OpenBSD and Linux?
        
           | PrimeMcFly wrote:
           | Now, yes. Linus wasn't always so abrasive though. At some
           | point he caught up to Theo.
        
             | LexiMax wrote:
             | Linus has been trying to calm down in recent years, in
             | large part because he decided he no longer wanted to be
             | lumped in with the crowd that endlessly complains about
             | political correctness.
             | 
             | https://www.bbc.com/news/technology-45664640
        
               | Ericson2314 wrote:
               | Yeah this is good stuff, and why I felt bad about making
               | the comparison. Not saying Theo is in that camp, but
               | Linus is trying to be less abrasive in general, and Theo
               | is not.
        
               | NoZebra120vClip wrote:
               | Perhaps we're reading into their personalities more than
               | we should, based on public social-media appearances.
               | 
               | Egos tend to become exaggerated when benevolent dictator
               | types make public statements. Their candor and bluntness
               | on a mailing list or Twitter may be completely different
               | than their demeanor and their kindness toward
               | collaborators in private.
               | 
               | Now we have the very public drama that happened between
               | Theo and that "other BSD" team to create the original
               | schism. But have we had any subsequent drama that caused
               | breakups or forks? I don't know. OpenBSD manages to plug
               | away and push releases out the door on schedule, right?
               | 
               | Linus doesn't seem to have a lot of internal contributor
               | drama, judging by the way they also push releases out the
               | door and merge pull requests and add features.
               | 
               | Really, if either Theo or Linus were unreasonable men,
               | their teams would fall apart and they would cease to be
               | leaders of anything. I think their leadership abilities
               | speak for themselves: they've both been committed and
               | dedicated to the same project since decades ago, and
               | they've both built and maintained cohesive teams of
               | contributors who seem to mostly stick around long enough
               | to make a difference.
               | 
               | They are "thought leaders", if you will; perhaps not
               | charismatic ones, but canny businessmen who know how to
               | nurture their pet projects.
        
           | NoZebra120vClip wrote:
           | > Are Theo and Linux more alike than OpenBSD and Linux?
           | 
           | Is a Canadian kernel developer more like a POSIX operating
           | system than a POSIX operating system is like a POSIX
           | operating system?
           | 
           | I'm not sure I understand. Perhaps you meant to write "Linus"
           | since Linus is also a kernel developer? That seems more like
           | apples to apples.
        
         | redundantly wrote:
         | I wouldn't have it any other way. I love the OpenBSD mailing
         | lists. Always an entertaining read when Theo gets involved.
        
           | teknopurge wrote:
           | upvoted and +1. Theo has been an important leader in OSS for
           | decades: his brevity and impatience is a net positive. also
           | he is usually correct.
        
         | ris wrote:
         | This is my main takeaway too. As a one time OpenBSD enthusiast
         | (and still admirer), now I'm a bit older I find the continual
         | smugness starts to grate.
         | 
         | Truth is, Linux has a lot more constraints on how it can
         | implement something because it has _users_. Users that have all
         | sorts of different ways they need it to work.
        
         | microtherion wrote:
         | I'd just like to interject for a moment. What you're referring
         | to as Linux, is in fact, NotOpenBSD/Linux, or as I've recently
         | taken to calling it, Linux as opposed to OpenBSD...
        
         | Joker_vD wrote:
         | That's the problem with many brilliant people: what they
         | perceive as their interlocutors being deliberately obtuse on
         | some completely obvious point is actually their interlocutors
         | being just as smart as they always are on some point that is
         | not obvious at all to them.
        
           | rkangel wrote:
           | Perception of relative intelligence or sensible decision
           | making is irrelevant. Just because you think you're doing a
           | better job doesn't mean you need to shit on the other person.
           | 
           | You could not mention Linux at all, or you could even say "we
           | think this is better than Linux's approach because of X" and
           | it would be a great improvement.
           | 
           | I have always found it interesting that Rust purposefully
           | avoided doing language comparisons - "we're better than
           | Python like this and better than C like that". Their message
           | purposefully avoided any positioning of it as a competition,
           | instead focusing just on articulating Rust's value. It was an
           | eye opening approach given our instinct is normally to pit
           | things against each other.
        
             | selectodude wrote:
             | I think parent agrees with you.
        
         | brynet wrote:
         | It's an important comparison of the mechanisms, even in 2023,
         | you can still find binaries on modern Linux distributions with
         | executable stacks due to the fail-open design, 20 years later.
         | 
         | The fact that Linux hasn't learned the right lessons in 20
         | years, and has chosen to "double down" in respect to IBT/BTI,
         | does not inspire confidence that they will ever fix it. I'd say
         | his 20 year estimate was in fact being pretty generous given
         | the evidence available.
         | 
         | https://news.ycombinator.com/item?id=21554975
        
           | jacquesm wrote:
           | The funny thing is that this attitude towards breaking
           | changes is one of the reasons why Theo is able to make this
           | comment at all. If he would allow breaking changes then
           | OpenBSD adoption likely would be higher and that in turn
           | would cause him to resist the kind of things that Linux would
           | not be able to get away with.
           | 
           | It's clearly different philosophies leading to different
           | outcomes with neither of them clearly better than the other,
           | it just depends on what you need. It would be possible to
           | make that statement in a more graceful way.
        
             | sillywalk wrote:
             | "I have altered the ABI. Pray I do not alter it further."
             | -- Theo de Raadt
             | 
             | https://marc.info/?l=openbsd-tech&m=157489277318829&w=2
        
             | binkHN wrote:
             | Theo himself considers OpenBSD a "research" OS, so I don't
             | think he'll ever consider OpenBSD going mainstream,
             | especially as it allows stuff like this to happen.
        
               | jacquesm wrote:
               | Indeed, so it's apples-to-oranges.
        
           | mananaysiempre wrote:
           | > It's an important comparison of the mechanisms, even in
           | 2023, you can still find binaries on modern Linux
           | distributions with executable stacks due to the fail-open
           | design, 20 years later.
           | 
           | Unfortunately, for C code using GCC's nested functions
           | extension (or for languages that want to be ABI-compatible
           | with C and support nested functions, like that paragon of
           | advanced features called Pascal /s ), there's no other
           | compilation strategy in current ABIs. The patches to switch C
           | (and not just Ada) to function descriptors[1] with an ABI
           | break have been sitting on the GCC mailing list since
           | approximately forever[2], but it doesn't seem like there's
           | been any progress.
           | 
           | [1] The strategy is basically to compile (*fp)() not as
           | call *%rax
           | 
           | but as (untested)                    test $1, %rax
           | jz 1f          mov 8(%rax), %r10          mov (%rax), %rax
           | 1: call *%rax
           | 
           | thus essentially inlining the (currently stack-allocated)
           | closure calling thunk at all indirect call sites. It is ABI-
           | compatible on x86 and x86-64 with all code that does not
           | involve nested functions, place functions at odd addresses,
           | or tag function pointers itself (and I think with all arm64
           | and riscv code, although arm32's usage of the low pointer bit
           | for Thumb interworking is bound to make this trickier).
           | 
           | [2] https://gcc.gnu.org/legacy-ml/gcc-
           | patches/2019-01/msg00735.h...
        
             | brynet wrote:
             | That strategy won't fly with IBT.
             | 
             | Now all software must pay the price and miss out on
             | important mitigations, for all eternity, just because of
             | some largely unused feature in one compiler?
        
               | mananaysiempre wrote:
               | IBT is already further along here. The hypothetical
               | solution for executable stacks is to recompile all of
               | your nested-function-using or -calling code with
               | -ftrampolines (except that won't work without the patch
               | above--silently, really GCC?..). The _already real and
               | working_ solution for IBT is to recompile all of your
               | indirect-branch-using code with -fcf-protection=branch.
               | So, ignoring the fact that nested functions are in
               | practice much rarer, if you accept the former as valid
               | you'll need to accept the latter as well, as far as logic
               | as concerned.
               | 
               | I wouldn't characterize this as a "largely unused feature
               | in one compiler" screwing things up, but rather as the
               | ABI on most Linux and -adjacent platforms (except SysV
               | Itanium and FDPIC IIRC) being incapable of supporting
               | closures (without executable stacks). That these are
               | missing from standard C, and only present in languages
               | that are either niche (Pascal, Ada) or don't care about
               | following the platform ABI (Rust, Go, C++'s lambdas), is
               | a defect of C (and that's at least a somewhat popular
               | opinion among ISO C committee members[1]).
               | 
               | Of course, OpenBSD essentially does not _have_ a stable
               | ABI, so it's much freer to experiment here.
               | 
               | [1] https://thephd.dev/lambdas-nested-functions-block-
               | expression...
        
           | whoopdedo wrote:
           | It's the price you pay for never-break-userspace. OpenBSD is
           | fine with the very small probability that an executable which
           | doesn't do branch tracking will fail to run under the
           | enforced rules. The answer to that is to recompile because
           | you've still got the source, and if not, well, tough cookies.
        
             | ndesaulniers wrote:
             | > the very small probability that an executable which
             | doesn't do branch tracking will fail to run under the
             | enforced rules
             | 
             | Isn't it any indirect branch in any program that will trip
             | BTI/IBT? So most programs? I guess I disagree with the
             | `small probability ` part.
        
             | jacquesm wrote:
             | Tough cookies translates for many people into: OpenBSD is
             | not for me. The 'very small probability' likely approaches
             | '1' for sufficiently old enough stuff. And even if you do
             | have the source, does it still build without substantial
             | work? Backwards compatibility is not something to toss out
             | the window without thinking through the consequences.
        
             | loeg wrote:
             | > OpenBSD is fine with the very small probability that an
             | executable which doesn't do branch tracking will fail to
             | run under the enforced rules.
             | 
             | To clarify slightly, OpenBSD is fine with the very _high_
             | probability that an executable will fail under new rules.
             | Otherwise, yes.
        
       | [deleted]
        
       | WalterBright wrote:
       | I'm working on adding ENDBR support to the DMD D compiler
       | backend.
        
       | ntfAX wrote:
       | A software solution provided by the OS or language can make this
       | hardware solution irrelevant.
        
         | wongarsu wrote:
         | Windows does this in software, since approximately 8 years.
         | 
         | An advantage of the software solution is that you don't need to
         | have the feature compiled into every library for it to work,
         | you just lose protection in those parts. That makes for a much
         | quicker rollout. Also faster iteration times, in the Windows
         | Insider Preview you can get the extended version that also
         | checks that the hashed function signature matches.
         | 
         | 1: https://learn.microsoft.com/en-
         | us/windows/win32/secbp/contro...
        
         | josephcsible wrote:
         | You've got it backwards: this hardware solution makes the
         | software solutions irrelevant.
        
           | tialaramex wrote:
           | Nope. Here's the actual problem, in these crappy languages
           | it's really easy for mistakes to result in a stack smash, so,
           | these types of hacks aim to make it harder for the bad guys
           | to turn that into arbitrary remote code execution. Not
           | impossible, just harder. Specifically in this case the idea
           | is that they won't be able to abuse arbitrary bits of
           | function without calling the whole function, at a cost of
           | some hardware changes and emitting unnecessary code. So maybe
           | they can't find a whole function which works for them and
           | they give up.
           | 
           | Using better languages makes the entire problem disappear.
           | You don't get a stack smash, the resulting opportunities for
           | remote code execution disappear.
           | 
           | It suggests that maybe the "C magically shouldn't have
           | Undefined Behaviour" people were onto something after all.
           | Maybe C programmers really are so wedded to this awful
           | language that just being much slower than Python wouldn't
           | deter them. There is still the problem that none of them can
           | agree how this should work, but if they'll fund it maybe it's
           | worth pursuing to find out how much they will put up with to
           | keep writing C.
        
             | yakubin wrote:
             | I'm always amused by how many of OpenBSD's mitigations are
             | patching over something as basic as lack of bounds
             | checking, yet they'll never add bounds checking. And, as
             | you said, those are all just speed bumps, not fixes.
        
             | dundarious wrote:
             | I think one could argue that all the software mitigations
             | that aren't based on compile time proofs result in quite a
             | bit more "emitting unnecessary code", if "unnecessary" is
             | taken to mean "not strictly intrinsic to the task of the
             | program". And undefined behavior is bad, but getting rid of
             | it wouldn't be a silver bullet for this problem in C, I
             | think. All undefined behavior could become "implementation
             | defined" tomorrow, where the C compiler becomes more like a
             | high-level assembler (again), and you could still jump the
             | instruction pointer into arbitrary program text.
        
               | tialaramex wrote:
               | > All undefined behavior could become "implementation
               | defined" tomorrow, where the C compiler becomes more like
               | a high-level assembler (again), and you could still jump
               | the instruction pointer into arbitrary program text.
               | 
               | Try to work this through in your head. Imagine how you
               | need to specify the working of the abstract machine in
               | order to allow this. How do we talk about an "instruction
               | pointer" on the abstract machine? What are the
               | instructions it's pointing to? Am I defining an entire
               | bytecode VM?
               | 
               | Nah, instead you're going to do one of two things. One:
               | "Undefined Behaviour" which we explicitly took off the
               | table, or Two: "If this happens the program aborts". And
               | with that the big problem evaporates. Does it make those
               | C programmers happy? I expect not.
        
               | dundarious wrote:
               | Implementation defined means the compiler must _specify_
               | the behavior, but it has near total freedom, and it can
               | define it specific to the target system. There is no
               | abstract machine. If I use GCC on Linux x86-64, then
               | there very much is an instruction pointer.
        
               | tialaramex wrote:
               | In the real world, compilers just specify that the
               | behaviour is undefined and tell you to suck it up. But
               | we're talking about a hypothetical where we aren't
               | allowing Undefined Behaviour. Saying "Oh, but we can if
               | we say it's the implementation choosing" is a get out
               | which is meaningless for the hypothetical. Just refuse to
               | engage with the hypothetical instead if you don't like
               | it.
        
               | dundarious wrote:
               | I'm using specific, standards defined language, that's
               | relatively well known. For example, sizeof(int) is
               | implementation defined, meaning it must have a documented
               | definition, specific to the implementation (e.g., gcc
               | x86_64-linux-gnu, it's 4).
               | 
               | In languages like C that are closer to the machine, not
               | everything has to be specified strictly in terms of a
               | generic abstract machine.
               | 
               | I'm not trying to be hostile or evasive or derisive, I'm
               | just genuinely responding to your original comment, that
               | I think missed on some important info. And my point was
               | that _if we imagine a different world from the real world
               | we 're in right now_, where in this new world, all
               | undefined behavior became implementation defined
               | behavior, then there would _still_ be a need for
               | mitigations like endbr64. So I 'm not painting a rosy
               | picture for C. I just think undefined behavior is a red
               | herring. Assembly doesn't have undefined behavior, but
               | obviously you can have all sorts of issues there.
        
               | tialaramex wrote:
               | > Assembly doesn't have undefined behavior, but obviously
               | you can have all sorts of issues there.
               | 
               | The machine is in the real world and is thus obliged to
               | have some actual behaviour, _but_ it is not always
               | practical to discern what that behaviour would be let
               | alone make it reliable across a product line and document
               | it in an understandable way. As a result actually your
               | CPU 's documentation does in effect include "Undefined
               | Behaviour".
        
               | dundarious wrote:
               | True, when writing my comment I wanted to qualify it to
               | the same effect, but thought it would be an unnecessary
               | subtlety to the general thrust of my point. That is, we
               | can ignore this kind of "undefined behavior in the
               | machine itself" for the purposes of this particular
               | discussion.
        
               | tialaramex wrote:
               | I don't see how to ignore it though. If we're defining
               | the behaviour but then our "definition" just doesn't
               | specify the actual behaviour because it's specified in
               | terms of hardware with no clearly defined behaviour for
               | that situation then it's just word play, we're not really
               | doing what I set out.
        
           | tremon wrote:
           | It's only irrelevant if the hardware solution is available on
           | all the supported architectures/systems. As long as it's not,
           | the software version must be maintained anyway, and might
           | suffer from bitrot if it's no longer exercised on the major
           | architectures.
        
       | nullc wrote:
       | Is this protection really all that helpful? Surely there are
       | functions you can call into the top of to do your diabolical
       | deeds for you.
       | 
       | It would be more helpful if callers would store some machine
       | specific hash of the function prototype and the function itself
       | would check the hash, so that you could only redirect to calling
       | a function with the right signature.
       | 
       | But that would also increase the overhead further. Already this
       | is bad enough that it makes jump tables unattractive (which is
       | too bad, considering the usually jump tables have little to no
       | risk of control flow redirection).
        
         | tedunangst wrote:
         | The entire field of ROP exploits would basically never have
         | been developed if it were as simple as just calling the
         | function you want.
        
       | messe wrote:
       | For anybody unfamiliar with this, as I was, this appears to refer
       | to Intel's Indirect Branch Tracking feature[1] (and the
       | equivalent on ARM, BTI). The idea is that an indirect branch can
       | only pass control to a location that starts with an "end branch"
       | instruction. An indirect branch is one that jumps to a location
       | whose value is loaded or computed from either a register or
       | memory address: think calling a function pointer in C.
       | 
       | Without IBT, you'd have this equivalence between C and assembly:
       | main() {             void (*f)();             f = foo;
       | f();         }              void foo() { }              ---
       | main:             movl $foo, %edx             call *%edx
       | ret              foo:             ret
       | 
       | If IBT is enabled, the above code triggers an exception because
       | foo doesn't begin with an "end branch" instruction. When IBT is
       | enabled by the compiler, the above code gets assembled as:
       | main:             endbr64              movl $foo, %edx
       | call *%edx             ret              foo:             endbr64
       | ret
       | 
       | Now the compiler inserts endbr64 at the start of each function
       | prologue. The reason for this feature, is to use as a defense in
       | depth against JOP, and COP attacks, as it means that the only
       | "widgets" available to you are entire functions, which can be far
       | harder to exploit and chain.
       | 
       | [1]:
       | https://www.intel.com/content/dam/develop/external/us/en/doc...
        
         | asveikau wrote:
         | It was an old joke that the opposite of "goto" is "come from",
         | or that if goto is considered harmful, nobody said anything
         | about a "come from". Marking something as a branch target
         | reminds me of this.
         | 
         | https://en.m.wikipedia.org/wiki/COMEFROM
        
           | dejj wrote:
           | > GOTO considered harmful
           | 
           | COMEFROM considered harm-mitigating
           | 
           | It ingeniously makes Return Oriented Programming (ROP) a lot
           | harder.
        
             | messe wrote:
             | > COMEFROM considered harm-mitigating
             | 
             | You know, that'd be a fantastic OpenBSD release name.
             | 
             | Here's hoping a dev sees this comment; there's already been
             | a few commenting in this thread.
        
         | wongarsu wrote:
         | Interesting. Seems like enforcement on Intel CPUs is supported
         | since Tiger Lake (so ~2020). Windows has basically the same
         | feature implemented in software since 2015, called Control Flow
         | Guard [1]. I wonder what the story there is, and if Windows has
         | any plans to (get everyone to) switch to the hardware version
         | once those CPUs have sufficient market share.
         | 
         | 1: https://learn.microsoft.com/en-
         | us/windows/win32/secbp/contro...
        
           | andersa wrote:
           | Windows also recently implemented a far better version of
           | this called Extended Flow Guard (XFG) that not only checks
           | whether the location is a valid destination, but also whether
           | it's a valid destination for that specific source.
           | 
           | For example, for any virtual function call or function
           | pointer call, the destination must have a correct tag with
           | the hash of the arguments. It's much more secure, and also
           | faster, since loading the tag from memory can be merged with
           | loading the actual code after it.
           | 
           | I wish this was the one implemented in hardware..
        
             | simcop2387 wrote:
             | That does sound like it would be more robust, but
             | definitely sounds like it'd require a lot more silicon than
             | the IBT that they did implement. Something like it might be
             | something that comes in some future revisions.
        
         | rwmj wrote:
         | The fun fact being that older CPUs decode ENDBR64 as a slightly
         | weird NOP (with no architectural effects), but it'll fault on
         | original Pentiums:
         | https://stackoverflow.com/questions/56120231/how-do-old-cpus...
        
           | rollcat wrote:
           | Various architectures do other interesting things with NOPs,
           | IIRC one convention on PowerPC had something vaguely related
           | to debugging or tracing (I can't remember the details or find
           | any references right now).
        
             | Someone wrote:
             | https://www.ibm.com/docs/en/aix/7.3?topic=h-hpmstat-
             | command:
             | 
             | "random_samp_ele_crit=name
             | 
             | Specifies the random criteria for selecting the
             | instructions for sampling. Valid values for this option are
             | as follows:
             | 
             | ALL_INSTR
             | 
             | All instructions are eligible. This value is the default
             | setting.
             | 
             | LOAD_STORE
             | 
             | The operation is routed to the Load Store Unit (LSU); for
             | example, load, store.
             | 
             | PROB_NOP
             | 
             | Sample only special no-operation instructions, which are
             | called Probe NOP events.
             | 
             | [...]"
        
             | aidenn0 wrote:
             | Some MIPS cores had a superscalar NOP that would stall
             | every ALU by one cycle, which was necessary because they
             | lacked synchronization instructions.
        
             | monocasa wrote:
             | RISC-V has a whole HINT space that's basically just morphs
             | of load immediate into zero register.
             | 
             | AArch64 has a similar space: https://developer.arm.com/docu
             | mentation/ddi0596/2020-12/Base...
             | 
             | And yes, PowerPC has a similar space as well holding hints
             | like 'give priority to the other hardware threads on this
             | core' and the like. https://utcc.utoronto.ca/~cks/space/blo
             | g/tech/PowerPCInstruc...
        
               | rollcat wrote:
               | I was wondering where did I read about PowerPC, and this
               | is exactly the article! So, it was for thread priority.
               | Strikes me as an odd design choice, this probably
               | should've been something to be managed by the OS more
               | explicitly.
        
             | messe wrote:
             | Not just architectures, but different OSes and ABIs have
             | found ways to repurpose no-ops. One example[1] is Windows
             | using the 2-byte "MOV EDI, EDI" as a hot-patch point: it
             | gets replaced by a "JMP $-5" instruction which jumps 5
             | bytes before the start of a function into a spot reserved
             | for patching. That 5 bytes is enough to contain a full jump
             | instruction that can then jump wherever you need it to.
             | 
             | ## Why do Windows functions all begin with a pointless MOV
             | EDI, EDI instruction?
             | 
             | [1]: https://devblogs.microsoft.com/oldnewthing/20110921-00
             | /?p=95...
        
               | pclmulqdq wrote:
               | Intel Vtune will do this with 5-byte NOPs directly. I
               | think LLVM's x-ray tracing suite did this with a much
               | bigger NOP, also, to capture more information.
        
               | gcoakes wrote:
               | Good read. Thank you.
               | 
               | This just worsens my fear of changing "unnecessary" code
               | when I don't know the original motivation for it.
        
               | jeffbee wrote:
               | Interesting, thanks for pointing this out! Just yesterday
               | I was gazing at some program containing two consecutive
               | xor rax, rax. I thought what's the point? But as you
               | point out it might be a NOP sled designed to be that
               | specific length.
        
               | jchw wrote:
               | I wonder if this is still true. Whenever I go to hook
               | Win32 API functions, I use an off-the-shelf length
               | disassembler to create a trampoline with the first n
               | bytes of instructions and a jmp back, and then just patch
               | in a jmp to my hook, but if this hot-patch point exists
               | it'd be a lot less painful since you can avoid basically
               | all of that.
               | 
               | Though, I guess even if it was, it'd be silly to rely on
               | it even on x86 only. Maybe it would still make for a nice
               | fast-path? Dunno.
        
           | mattgreenrocks wrote:
           | That's really clever use of the opcode space. Thanks for
           | passing that along.
        
           | SomeRndName11 wrote:
           | NOP on intels is in fact xchg eax, eax
        
           | dataflow wrote:
           | There's a good question in the comments there that I still
           | don't see the answer to. How does this work if there's an
           | interrupt between the branch and the endbranch? Does the OS
           | need to save/restore the "branchness" bit?
        
             | drdrey wrote:
             | there is no branchness bit, if there's an endbranch you can
             | jump to it
        
               | dataflow wrote:
               | Ah so when you return from an interrupt, the check is no
               | longer done?
        
               | simcop2387 wrote:
               | I'd assume so since it wouldn't be a call/jmp coming from
               | a computed address in a register. That said I haven't
               | read the documentation for any of this. But interrupts
               | should be having a stack pointer change and other things
               | happening that would be different, which is why they use
               | the IRET instruction and not the RET one.
        
             | muricula wrote:
             | Yes, on arm the branch type is saved in SPSR_EL1 in the
             | BTYPE field. That stands for Saved Program State Register
             | for Kernel Mode (Exception Level 1) and Branch Type. https:
             | //developer.arm.com/documentation/ddi0595/2021-12/AArc...
        
         | __failbit wrote:
         | Thank you for the explanation!
        
         | haberman wrote:
         | Interesting. I was able to get Clang to generate this using
         | `-fcf-protection=branch`: https://godbolt.org/z/rooP8vPsM
         | 
         | It looks like endbr64 is a 4-byte instruction. That could be a
         | significant code size overhead for jump tables with lots of
         | targets: https://godbolt.org/z/xTPToaddh
        
           | notaplumber1 wrote:
           | OpenBSD disables jump tables in Clang on amd64 due to IBT,
           | some architectures also had jump tables disabled as part of
           | the switch to --execute-only ("xonly") binaries by default,
           | e.g: powerpc64/sparc64/hppa.
           | 
           | https://marc.info/?l=openbsd-cvs&m=168254711511764&w=2
           | 
           | E.g: https://marc.info/?l=openbsd-cvs&m=167337396024167&w=2
        
         | cratermoon wrote:
         | In case anyone wants a very simple introduction to JOP/COP
         | exploits and mitigations of this type:
         | <https://www.theregister.com/2020/06/15/intel_cet_tiger_lake/>
        
         | codedokode wrote:
         | Why should every function start with endbr64 command? Aren't
         | functions usually called directly?
         | 
         | Also, is it required to insert endbr64 command after function
         | calls (for return address)?
        
           | eklitzke wrote:
           | As to why they're not always called directly, imagine some
           | code like this:                   int FooWithoutChecks(void
           | *p);                  int Foo(void *p) {           if (p ==
           | NULL) return -1;           return FooWithoutChecks(p);
           | }
           | 
           | In general the caller is expected to call Foo if they aren't
           | sure if the pointer is nullable, or if they already know that
           | pointer is not null (e.g. because they already checked it
           | themselves) they can call FooWithoutChecks and avoid a null
           | check that they know will never be true.
           | 
           | The naive way to emit assembly for this is to actually emit
           | two separate functions, and have Foo call FooWithoutChecks
           | the usual way. But notice that the FooWithoutChecks function
           | call is a tail call, so the compiler can use tail call
           | optimization. To do this it would inline FooWithoutChecks
           | into Foo itself, so the compiler just emits code for Foo with
           | the logic in FoowithoutChecks inlined into Foo. This is nice
           | because now when you call Foo, you avoid a call/ret
           | instruction, so you save two instructions on every call to
           | Foo. But what if someone calls FooWithoutChecks? Simple, you
           | just call at the offset into Foo just past the pointer
           | comparison. This actually just works because Foo already has
           | a ret instruction, so the call to FooWithoutChecks will just
           | reuse the existing ret. This optimization also saves some
           | space in the binary which has various benefits in and of
           | itself.
           | 
           | The example here with the null pointer check is kind of
           | contrived, but this kind of pattern happens a LOT in real
           | code when you have a small wrapper function that does a tail
           | call to another function, and isn't specific to pointer
           | checks.
        
           | aidenn0 wrote:
           | A traditional compiler needs to insert them for all external
           | functions, because other compilation units may make an
           | indirect call.
        
           | messe wrote:
           | C allows for any function to be called via a function
           | pointer, and functions can be in different translation units,
           | so the compiler can't simply assume that a function will
           | never be called indirectly and has to pessimistically insert
           | endbr64 in order to maintain a reasonable ABI.
           | 
           | And no, as I understand it, this is only for branch/calls not
           | returns.
        
             | Joker_vD wrote:
             | Well, if the function is marked "static", the compiler can
             | actually check whether the function's address is taken in
             | the current compilation unit or not and omit/emit ENDBR64
             | accordingly (passing pointers to static functions to code
             | in another compilation units is legal, and should still
             | work).
        
               | messe wrote:
               | Good catch. Yeah, as long as the functions address is
               | never taken the compiler has a lot of leeway with static
               | functions; it can even avoid emitting code for them
               | entirely if it can prove they're never called or if it's
               | able to compute their results at compile-time.
        
               | josephg wrote:
               | Yep. Or inline them at every call site if that makes
               | sense to do based on the optimization level and flags.
        
             | MobiusHorizons wrote:
             | Is this theoretically something lto could remove?
        
               | tedunangst wrote:
               | If you disable dlopen and ld_preload.
        
               | codedokode wrote:
               | Dlopen() "sees" only functions marked as exported (with
               | macro like DLLEXPORT on Windows), not every function or
               | am I wrong? Is C that bad?
        
               | tedunangst wrote:
               | On openbsd at least, every global symbol is exported
               | unless you use an explicit symbol list. It's unusual for
               | executables.
        
           | josephcsible wrote:
           | > Why should every function start with endbr64 command?
           | Aren't functions usually called directly?
           | 
           | They're _usually_ called directly, but unless the compiler
           | can prove that they _always_ are (e.g., if they 're static
           | and nothing in the same file takes the address), endbr64 is
           | required.
           | 
           | > Also, is it required to insert endbr64 command after
           | function calls (for return address)?
           | 
           | No, IBT is only for jmp and call. SS is the equivalent
           | mechanism for ret.
        
             | derefr wrote:
             | > but unless the compiler can prove that they always are
             | (e.g., if they're static and nothing in the same file takes
             | the address), endbr64 is required
             | 
             | Then why not just have the compiler break down every non-
             | static function into two blocks: a static function that
             | contains all the logic, and a non-static function that just
             | contains an IBT and a direct jump to the static function?
             | (Or, better yet, place the non-static label just before the
             | static one, and have the non-static fall through into the
             | body of the static.) Then the static direct callsites won't
             | have to pay the overhead of executing the IBT NOP.
        
               | Joker_vD wrote:
               | That's absolutely doable, just... How much is predicted
               | unconditional jump slower/faster than ENDBR64? What's the
               | ratio of virtual/static calls in real-world programs? And
               | while your last proposal ("foo: endbr64; foo_internal:
               | <code>") evades those questions, it raises up questions
               | about maintaining function alignment (16 bytes IIRC? Is
               | this even necessary today?) and restructuring the
               | compiler to distinguish the inner/external symbol
               | addresses. Plus, of course, somebody has to actually sit
               | down and write the code to implement that, as opposed to
               | just adding "if (func->is_escaping) emit_endbr(...);" at
               | the beginning of the code that emits the object code for
               | a function body.
        
               | 95014_refugee wrote:
               | The IBT NOP is "free" in that it will evaporate in the
               | pipeline; it still has to be fetched and decoded to some
               | extent, but it does not consume execution resources.
               | 
               | From a tooling perspective, what you're describing (two
               | entrypoints for a function, the jump you mention is
               | pointless) would require changes up and down the
               | toolchain; it would affect the compiler, all linkers, all
               | debuggers, etc. By contrast, just adding an additional
               | instruction to the function prolog is relatively low-
               | impact.
               | 
               | It's also worth noting that at the time code for a
               | function is emitted, the compiler is not aware of whether
               | the symbol will be exported and thus discoverable in some
               | other module, or by symbol table lookup, so emitting the
               | target instruction is essentially mandatory.
        
               | dzaima wrote:
               | Doesn't seem like it'd be that difficult to make the
               | change the other direction, i.e. keep endbr64 as-is as
               | the default case, but if there's a direct jump/call to
               | anywhere that starts with endbr64, offset the immediate
               | by 4 bytes; could be done in any single stage of
               | toolchain that has that info with no extra help. But
               | yeah, quite low impact, might not even affect decode
               | throughput & cache usage for at least one of the direct
               | or indirect cases.
        
               | tedunangst wrote:
               | What is the overhead of executing the IBT NOP?
        
               | 95014_refugee wrote:
               | It's not "executed" per se. It consumes space in the
               | cache hierarchy, and a slot in the front-end decoder. It
               | won't ever be issued, but depending on the
               | microarchitecture in question it might result in an issue
               | cycle having less occupancy than it might have had in the
               | case where the subsequent instruction was available.
               | 
               | With that said, the first few instructions of a called
               | function often stall due to stack pointer dependencies,
               | etc. so the true execution cost is likely to be even
               | smaller than the above might suggest.
        
               | [deleted]
        
       | binkHN wrote:
       | I still run OpenBSD where I can, especially where security is
       | more important. Yes, it's still missing A LOT of functionally
       | compared to other UNIX-like systems, but security bases tend to
       | be well covered.
        
         | PrimeMcFly wrote:
         | I don't really buy their approach to security honestly. Trying
         | to fix all bugs is great, but they provide little to prevent
         | unknown bugs bing exploited (pledge is nice for software that
         | opts in to use it, but otherwise not so much). I'd love to see
         | them implement something like AppArmor with their approach, it
         | would probably be amazing.
         | 
         | I actually think NetBSD is a pretty interesting alternative, it
         | has some nice security features like veriexec that don't get
         | talked about much.
        
           | binkHN wrote:
           | I think in the past they tried to fix all the bugs, and
           | realized they couldn't, so they started to build all sorts of
           | mitigations in the same vein as the one you see posted here
           | today. As for pledge, and the related mitigations, yes,
           | they're not useful if you don't use them, but I see this as
           | them innovating in the space and giving application
           | developers more tools to build hardened applications.
           | 
           | I see tools like AppArmor as band-aids to fix problems that
           | shouldn't exist in the first place. The problem with these
           | approaches are the band-aids tend to break things in
           | unexpected ways and when that happens they simply get removed
           | and unused.
        
             | PrimeMcFly wrote:
             | > I see tools like AppArmor as band-aids to fix problems
             | that shouldn't exist in the first place.
             | 
             | I fundamentally disagree on that. I think tools like that
             | are amazing at protecting against unknown threats/exploits.
             | They let you lock down software and protect against future
             | unknown exploits, badly behaving software, malicious
             | employees etc. I think something similar should be a part
             | of any OS claiming to be security focused. Basic DAC is
             | woefully insufficient.
             | 
             | On the other hand, the industry has largely found other
             | solutions like sandboxing, but I still think MAC or RBAC or
             | whichever has a place, certainly as art of a defense in
             | depth strategy.
        
           | anthk wrote:
           | OpenBSD has these _on_ while on compiling.
        
           | [deleted]
        
           | carlosrg wrote:
           | > they provide little to prevent unknown bugs bing exploited
           | 
           | They provide plenty of mitigations
           | (https://www.openbsd.org/innovations.html). In fact OP's
           | article is for preventing unknown bugs from being exploited.
        
             | PrimeMcFly wrote:
             | They don't provide _any_ mitigations of the sort I was
             | clearly referencing. Specifically, for restricting
             | malicious code or users that already has access to the
             | system, exploiting insecure software that was _not_
             | compiled with pledge support.
        
         | MuffinFlavored wrote:
         | > Yes, it's still missing A LOT of functionally compared to
         | other UNIX-like systems
         | 
         | Could you give some examples/samples of things you have ran
         | into off the top of your head?
        
           | binkHN wrote:
           | Sure. Poor SMP support (but this has improved heavily over
           | the years), ancient file system, no Bluetooth (not important
           | if you don't need this), reduced performance (due to a lack
           | of optimizations and security mitigations overhead), limited
           | Wi-Fi support (this is for numerous reasons, but it's better
           | than other BSDs)...
           | 
           | I could go on, but, for my needs, it works very well and some
           | of its simplicities are a godsend.
        
         | dark-star wrote:
         | I find OpenBSD's hardware support especially lacking. It
         | doesn't really work that well on at least 3 devices where I
         | tried it on (all Dell laptops from various generations, 3-10
         | years old), whereas Linux runs perfectly out-of-the-box on all
         | three.
         | 
         | Which is sad, as I kinda like the *BSD approach to things
        
           | carlosrg wrote:
           | Not my experience at all, it works very well with a new Acer
           | laptop I own: the graphics work (Intel Xe - 12th gen
           | processor), audio, touchpad, keyboard (and special keyboard
           | keys like brightness), wifi... All I had to do is to download
           | the firmware with fw_update, nothing more.
           | 
           | Also I was pleasantly surprised to hear they support Apple
           | M1/M2 Macs. Asahi Linux gets a lot of press around here but I
           | had no idea OpenBSD supported it.
        
       ___________________________________________________________________
       (page generated 2023-07-14 23:00 UTC)