[HN Gopher] Someone's Been Messing with My Subnormals ___________________________________________________________________ Someone's Been Messing with My Subnormals Author : jpegqs Score : 260 points Date : 2022-09-06 15:14 UTC (7 hours ago) (HTM) web link (moyix.blogspot.com) (TXT) w3m dump (moyix.blogspot.com) | benreesman wrote: | That's...terrifying. This is a fantastic find: big, big respect | to @moyix, this is going to save people's ass. | olliej wrote: | Wow, I am surprised that -ffast-math triggers a mode switch in | the FPU in part due to the author's library problem, but also | because the documentation for clang at least[1] does not say it | impacts behaviour of denormals and in fact has a separate mode | switch for that, which is not explicitly called out as being | implied by -ffast-math. | | [1] https://clang.llvm.org/docs/UsersManual.html#cmdoption- | ffast... | nsajko wrote: | -Ofast isn't a good name for the option, but in GCC's defense the | manual is pretty clear about all this, and there's no excuse for | blindly turning on compiler options - they literally change the | semantics of your code. | bombcar wrote: | It's a quirk of language, that for compiler writers and other | algorithmic people "fast" often means "ballpark, but damn | quick". | | It's hard to come up with a similar name that isn't long. | cesarb wrote: | > It's hard to come up with a similar name that isn't long. | | The suggestion given elsewhere in these comments to call it | "unsafe math" instead of "fast math" sounds good. It's nearly | as short, and properly conveys the "you must know what you're | doing" aspect of these flags. It's even better if you're used | to Rust. | actually_a_dog wrote: | I agree. I think --ffast-math should actually be called | --finexact-math. One would also hope that explicitly disabling | an option on the command line would, you know, explicitly | disable the option, but maybe that's too much to ask. | mbauman wrote: | I don't think it should exist at all. It's such a crazy grab | bag of code changes disguised as "optimizations" that it's | completely impossible to reason about, even for folks that | "don't care" about the exact floating point arithmetic. | | It has global effects like those in TFA, and even locally you | no longer know if a line or two of arithmetic will become | more precise (e.g., by using higher precision intermediate | results), less precise, or become complete gibberish (e.g., | because it thinks it can prove you're now dividing by zero | and thus can just return whatever it wants). | trelane wrote: | -fyolo-math? | | -fgoodenough-math? | | -fbroken-but-fast-math | mbauman wrote: | I wholeheartedly disagree. -Ofast | Disregard strict standards compliance. ... | | There's strict standards compliance and then there's the crazy | grab bag of code changes that is `-ffast-math`. Further, I'd | say gevent can defensibly say that -ffast-math is okay for them | given what the manual says: -ffast-math | ... it can result in incorrect output for programs that depend | on an exact implementation of IEEE or ISO | rules/specifications for math functions. It may, | however, yield faster code for programs that do not require the | guarantees of these specifications. | | This is 100% on the compiler people. For the option name, the | documentation, and the behavior. | | https://gcc.gnu.org/onlinedocs/gcc-12.2.0/gcc/Optimize-Optio... | nsajko wrote: | Well, how would you improve the docs? Both documentation | entries seem reasonable to me. | | That said, I don't see why the -Ofast option even needs to | exist, except backwards compatibility, as -ffast-math and the | others can (and should IMO) be specified explicitly. | mrguyorama wrote: | The fact that -ffast-math makes no mention that it will | poison any other code executing in your process space is a | huge missing point of info. Docs as written, anyone not | doing scientific math should have that flag, but the | reality is that most people have some code somewhere in | their process that expects fairly sane floating point math | behavior, even if it's just displaying progress bars or | something. | nsajko wrote: | > The fact that -ffast-math makes no mention that it will | poison any other code executing in your process space | | Untrue. The doc entry for -ffast-math says "can result in | incorrect output for _programs_ that depend on an exact | implementation of IEEE or ISO rules /specifications for | math functions". Emphasis mine. | | So they clearly say that the entire program can turn | invalid when -ffast-math is used. | | You and some other people here act like the docs say | "translation unit" or something like that, instead of | "program", but this is simply not the case. | | Furthermore, the entry for -ffast-math points to entries | for suboptions that -ffast-math turns on (located right | below in the man page), e.g. -funsafe-math-optimization. | These also make clear how dangerous they can be even when | turned on one at a time. | Athas wrote: | Consider the documentation for the similar compiler flag in | the OpenCL specification: | | > -cl-unsafe-math-optimizations | | > Allow optimizations for floating-point arithmetic that | (a) assume that arguments and results are valid, (b) may | violate IEEE 754 standard and (c) may violate the OpenCL | numerical compliance requirements as defined in section 7.4 | for single-precision floating-point, section 9.3.9 for | double-precision floating-point, and edge case behavior in | section 7.5. This option includes the -cl-no-signed-zeros | and -cl-mad-enable options. | | While it stops short of saying "this will likely break your | code" (maybe because it doesn't have the nonlocal effects | of -ffast-math), it makes it much more clear that this flag | is generally unsafe and fragile, except under rather | specific circumstances. Also, it is reasonably exact about | what those circumstances are. I'm not sure -ffast-math is | documented with enough precision for a programmer to even | know whether it will break their code. Best you can do is | try and see if the program still works. | nsajko wrote: | The relevant GCC man page entries are even more clear | than the OpenCL spec excerpt. | | -ffast-math: | | > This option is not turned on by any -O option besides | -Ofast since it can result in incorrect output for | programs that depend on an exact implementation of IEEE | or ISO rules/specifications for math functions. | | It also point to the -funsafe-math-optimizations sub- | option, where it is said that: | | > Allow optimizations for floating-point arithmetic that | (a) assume that arguments and results are valid and (b) | may violate IEEE or ANSI standards. When used at link | time, it may include libraries or startup files that | change the default FPU control word or other similar | optimizations. [...] | mbauman wrote: | Yes, exactly: I'd deprecate it entirely. It shouldn't be a | single flag. | fweimer wrote: | What's missing is that it also affects linking, and results in | this strange action-at-a-distance. Maybe disabling the linker | part with -shared would be a reasonable compromise. | nsajko wrote: | You're wrong, both the doc entry for -Ofast and the one for | -ffast-math say that they can result in incorrect _programs_. | Programs are produced by linking, so I don 't see what other | way to interpret this is possible. | brigade wrote: | Why not simply replace all FP math with a constant zero? | That'd be _really_ fast and an equally valid strict | interpretation of "can result in incorrect programs." | nsajko wrote: | See https://news.ycombinator.com/newsguidelines.html, | e.g.: | | > Please don't post shallow dismissals, especially of | other people's work. A good critical comment teaches us | something. | brigade wrote: | Just because you're shallowly dismissing my comment | doesn't make it wrong. | | Linking in code with undefined (in this case, _re_ | defined) behavior doesn't automatically invalidate the | entire program. But thats the language used because once | the undefined behavior is hit at runtime, the spec no | longer defines what the behavior is and what the program | will do afterwards. | Const-me wrote: | That thread-local MXCSR register is particularly entertaining in | a thread pool environment, such as OpenMP. OSes carefully | preserve that piece of thread state across context switches. | | I tend to avoid touching that value, even when it means extra | instructions like roundpd for specific rounding mode, or shuffles | to avoid division by 0 in the unused lanes. | mananaysiempre wrote: | Following the article's links, I fail to find an actual example | of anything failing to converge in flush-subnormals mode. I mean, | I'm sure one could be squeezed out, but the justification given | amounts to "Sterbenz's lemma [the one that rephrases | "catastrophic cancellation" as "exact differences"] fails, maybe | something somewhere also will". And my (shallow but not | nonexistent) experience with numerical analysis is that proofs | lump subnormals with underflow, and most of them don't survive | even intermediate underflows. | | (AFAIU the original Intel justification for pushing subnormals | into 754 was gradual underflow, i.e. to give people at least | something to look at for debugging when they've ran out of | precision.) | | So, yes, it's not exactly polite to fiddle with floating-point | flag bits that are not yours, and it's better that this not | happen for reproducibility if nothing else, but I doubt it | actually breaks any interesting numerics. | moyix wrote: | The gevent issue has an example: | | https://github.com/gevent/gevent/pull/1820 | | I haven't examined the code of scipy.stats.skellam.sf so I | can't say for sure that it's not converging, but it's clearly | some kind of pathological behavior. | mananaysiempre wrote: | So somebody tried to calculate, for integer arguments from 0 | to 99 inclusive, the CDF of the difference of two Poisson | variables with means 4e-6 and 1e-6? I... don't know if it is | at all reasonable to expect an answer to that question. As | in, genuinely don't know--obviously it's an utterly rotten | thing to compute, but at the same time maybe somebody got | really interested in that and figured out a way to make it | work. | | Anyhow, my spelunking was cut off by sleep, so the best I can | tell that would end up in the CDFLIB[1] routine CUMCHN with X | = 8e-6, PNONC = 2e-6, DF from 0 to 99. The insides don't | really look like the kind of magic that is held up by | Sterbenz's lemma and strategically arranged to take advantage | of gradual underflow, so at first glance I wouldn't trust | anything subnormal-dependent that it would compute, but maybe | it still is? Sleep. | | [1] | https://people.sc.fsu.edu/~jburkardt/f_src/cdflib/cdflib.f90 | moyix wrote: | Yeah, unfortunately I have no idea if that was their | original goal (which seems unlikely?) or if this is just a | minimal example they came up with after tripping over the | actual problem in a more realistic setting. | | I think it suffices to show that the behavior of FTZ/DAZ | caused an actual problem for someone, though. I agree that | the vast majority of numerical code won't care about | FTZ/DAZ, but when it's enabled thread-wide you have no idea | what kind of code you'll end up affecting. | UncleEntity wrote: | My last bug report I wrote a small C++ program to put all | the values between 0x000 .. 0xfff into a tree structure and | then iterate over the tree printing out the values. | | I'd have loved if the library author replied with "why | don't you just print out the values directly?" | leni536 wrote: | Does this only affect pypi, or should I now worry about shared | libraries shipped with my distro as well? Debian is not crazy | enough to ship shared libs compiled with -ffast-math, right? | RIGHT? | moyix wrote: | Please don't do this to me, I don't know if I have it in me to | go on ANOTHER big scrape & scan. | JonChesterfield wrote: | If the package build scripts from upstream have that in them, | Debian packaged versions probably do too | cesarb wrote: | At a previous company I worked at, we had an issue with our | software (Windows-based, written in a proprietary language) | randomly crashing. After some debugging, we found that this | happened whenever the user made some specific actions, but only | if, in that session, the user had previously printed something or | opened a file picker. The culprit was either a printer driver or | a shell extension which, when loaded, changed the floating point | control word to trap. That happened whenever the culprit DLL had | been compiled by a specific compiler, which had the offending | code in the startup routine it linked into every DLL it produced. | | Our solution was the inverse of the one presented in this | article: instead of wrapping our routines to temporarily set the | floating point control word to sane values, we wrapped the calls | to either printing or the file picker, and reset the floating | point control word to its previous (and sane) value after these | calls. | becurious wrote: | Had this exact same problem. It was a specific color inkjet | driver doing this, my guess is to enable dithering or something | similar. It's one of those things that infects everything in | the code base because the way you print with GDI is to | progressively draw parts of the page - so you have to call in | and out of code that talks to the printer DC. We also had to | render one item using Direct3D retained mode and that added to | the fp control word complexity. Things seemed to be more robust | on NT based OSes. | klysm wrote: | That is one hell of a war story - I didn't realize that kind of | failure was even possible, but it is truly terrifying. | pavlov wrote: | Direct3D used to flip the x87 FPU to single precision mode by | default. This produced some amazing bugs when your other C | libraries reasonably assumed that a double would be at least | 64 bits. (The FPU mode settings affected the thread that | called Direct3D, and most programs used to be single- | threaded.) | | It seems they changed this behavior in Direct3D 10: | | https://microsoft.public.win32.programmer.directx.graphics.n. | .. | speeder wrote: | I stumbled into this bug in a rather spetacular manner. | | I was making a game using D3D, Lua and Chipmunk physics, | and some of the behaviour of the game was being odd. | | So I started to try printing random stuff with Lua, | eventually I just tried: print(5+5), and to my surprise my | console outputted "11". | | I went into Lua's irc channel to talk about this, and | everyone said I was nuts, that the number was too small to | trigger precision issues, that I was a troll and so on. | | After a lot of searching I found out about this D3D bug, so | I switched the game to use OpenGL instead there it was, 5+5 | = 10 again! | | Now why fiddling with the FPU could make 5+5 become 11, I | have no idea. | titzer wrote: | I've heard so many stories akin to this one that I just shake | my head. It's a self-inflicted wound that people who prioritize | _performance_ above other considerations _keep inflicting on | everyone else_. | | I _hope_ we learned our lessons on this specific question in | the design of Wasm. There are subnormals in Wasm and you can 't | turn them off for performance. | ack_complete wrote: | Had to deal with this same issue when I had a program | supporting plugins, DLLs compiled with Delphi would turn on all | the floating point traps. Took a while to track down what was | causing FP faults in comctl32.dll. It got so bad that I had to | put in a popup dialog that would name and shame the offending | DLL so the authors would fix their broken plugins. It's an ABI | violation in Windows since the ABI specifically defines FPU | exceptions as masked, so this was more egregious than just | turning on FTZ/DAZ (which Intel-compiled DLLs did). | | Many of these same DLLs would also hijack | SetUnhandledExceptionFilter() for their custom exception | support, which would also result in hard fastfail crashes when | they failed to unhook properly. Ended up having to hotpatch | SetUnhandledExceptionFilter() Detours-style to prevent my crash | reporting filter from being overridden. Years later, Microsoft | revealed that Office had done the same thing for the same | reasons. | | The new version of this problem is DLLs that use AVX | instructions and then don't execute a VZEROALL/VZEROUPPER | instruction before returning. This is more sinister as it | doesn't cause a failure, it just causes SSE2 code to run up to | four times slower in the thread. | astrange wrote: | You could also get an issue with x87/MMX where floating point | code wouldn't work if you wrote some MMX code and didn't do | an `emms` instruction afterward. | | This is basically the reason compiler autovectorization | doesn't do MMX. | pavon wrote: | Yep, I've encountered floating point flag incompatibilities | when dynamically loading Borland-compiled libraries into | Visual Studio compiled applications, as well as when using | C++ code via Java Native Interface. | | It is nice that diverse vendor-specific calling conventions | and ABIs are less common these days. | Xorlev wrote: | I was interested in the last point about AVX instructions, | and found https://john- | h-k.github.io/VexTransitionPenalties.html which discusses the | problem. | puffoflogic wrote: | Dynamic linking is the root of all kinds of evil, enough said. | benreesman wrote: | As a default (particularly an effectively _mandatory_ default, | looking at you glibc) it is indeed insane. | | But for something like a Python extension it's what we've got. | | Which has the ancillary benefit of surfacing stuff like this. | woodruffw wrote: | The content of this post has nothing to do with the specifics | of dynamic linking: it would be just as true if the wheels in | question had static binaries instead. | benreesman wrote: | Eh, somewhere in the middle. Someone else put '-ffast-math' | in a compile line and it poisons FP math far away with no | recompile? | | I believe it's a necessary price in this case, but it does | highlight how suboptimal it is to pay the price in other | cases. | woodruffw wrote: | It's fair to point out that shared objects _surface_ the | problem here, but I don 't know if I would lay the blame | with them: the underlying problem is that a FPU control | register isn't (and can't be) meaningfully isolated. Python | needs to use shared objects for loadable extensions, but | the contaminating code might be statically linked into that | shared object. | | (I don't say this because I want to excuse dynamic linking, | which I also generally dislike! Only that I think the | problem is somewhere else in this particular case.) | jeroenhd wrote: | What is the alternative here? To provide a python.so file with | all possible binary Python packages statically linked into it? | You'd need to update it every hour to include all the bugfixes | in every native library yanked in! To recompile Python itself | every time you install a package? Even with a compiler cache | you'd have the Gentoo experience of waiting for ages every time | you try to use the package manager. | | Dynamic linking solves a real problem, especially in this | space. It comes with new problems of its own but so does the | alternative. | [deleted] | [deleted] | compiler-guy wrote: | -funsafe-math is neither fun nor safe. | kibwen wrote: | I hereby propose that we rename "unsafe-math" to "ucking- | broken-math". | tomrod wrote: | I approve. Lets get someone with authority to make the | change. | black_knight wrote: | I ran Gentoo back in the good old days. The biggest draw was that | after about a week of compiling my system ran a lot faster | because of all the compiler optimisations one could enable | because it only had to work on your CPU. | | I might be misremembering, but I think fastmath was one of the | flags explicitly warned against in the Gentoo manual. | bombcar wrote: | It was and people would still use it because "hey it says | fast". | | The CPU flags was less interesting to me compared to being able | to disable features like X. | p_l wrote: | There was a big warning that it might produce broken system, | iirc | jeffbee wrote: | ChromeOS is sort of the successor to Gentoo. The images are | built with profile-guided, link-time, and post-link | optimization, and they are targeted to the specific CPU in a | given Chromebook. Every other Linux leaves a large amount of | performance on the table by targeting a common denominator CPU | that's 20 years old and not having PGO. | TazeTSchnitzel wrote: | Apple avoid this problem with their OS by having a separate | architecture slice for modern x64 (Haswell+). | yjftsjthsd-h wrote: | It's not a successor, it's a derivative. And yes, if you're | only targeting specific known hardware than you can and | probably should optimize for it, but most distributions fully | intend to be usable on very nearly any x86(_64) hardware so | they can't do that. | jerf wrote: | It's also a bit less relevant when everything is so fast. I | used Gentoo on a cheap-for-the-time Pentium 133MHz. Gentoo | was basically the difference between a modestly pleasant | system and an unusably slow system if I tried to run a | standard still-compiled-for-386 distro on it. | | I've long since stopped worrying about it because on the | systems I run, which are not top-of-the-line but aren't | RPis either, it's not worth worrying about anymore for most | programs. At most maybe you should target the one | particular program you use that could use a boost. | yjftsjthsd-h wrote: | Yeah, I don't know the breakdown between better hardware | and better compiler optimizations (even in the default | settings) and less differentiation between processors, | but I've done some minor not-very-scientific tests of | compiling packages with O3/march=mtune=native and in my | limited experience it wasn't particularly useful. Like, | not just small benefits, but zero or below the noise | floor benefits in my benchmarks. Obviously this is super | dependent on your workload and maybe hardware; it's an | area where if you care, you _have_ to do your own | testing. | jeffbee wrote: | Tune for native sometimes makes a difference but not | always. Targeting a platform that is known to have AVX2, | instead of detecting AVX2 at runtime and bouncing through | the PLT, can make a large difference. PGO remains the | largest opportunity. | hackingthelema wrote: | > I might be misremembering, but I think fastmath was one of | the flags explicitly warned against in the Gentoo manual. | | It is, here: | https://wiki.gentoo.org/wiki/GCC_optimization#But_I_get_bett... | TazeTSchnitzel wrote: | Global state is the root of so many evils! FPU rounding mode, FPU | flush-to-zero mode, C locale, errno, and probably some other | things should all be eliminated. The functionality should still | exist but not as global flags. | leni536 wrote: | At least many of those are thread-local. But not C locale, it | is truly horrible. | Tyr42 wrote: | Oh man, great job digging through all that. This is exactly the | kind of content I want to see. | | Don't you love your fun safe math? | ChrisRackauckas wrote: | The Julia package ecosystem has a lot of safeguards against | silent incorrect behavior like this. For example, if you try to | add a package binary build which would use fast math flags, it | will throw an error and tell you to repent: | | https://github.com/JuliaPackaging/BinaryBuilderBase.jl/blob/... | | In user codes you can do `@fastmath`, but it's at the semantic | level so it will change `sin` to `sin_fast` but not recurse down | into other people's functions, because at that point you're just | asking for trouble. There's also calls to rename it `@unsafemath` | in Julia, just to make it explicit. In summary, "Fastmath" is | overused and many times people actually want other optimizations | (automatic FMA), and people really need to stop throwing global | changes around willy-nilly, and programming languages need to | force people to avoid such global issues both semantically and | within its package ecosystems norms. | aidenn0 wrote: | Automatic FMA can change the result of operations, so it makes | (some) sense to be bundled in with fastmath. | ChrisRackauckas wrote: | But if what you want is automatic FMA, then why carry along | every other possible behavior with it? Just because you want | FMA, suddenly NaNs are turned into Infs, subnormal numbers go | to zero, handling of sin(x) at small values is inaccurate, | etc? To me that's painting numerical handling in way too | broad of strokes. FMA also only increases numerical accuracy, | it doesn't decrease numerical accuracy, so bundling it with | unsafe transformations makes one uncertain now whether it has | improved or decreased accuracy. | | For reference, to handle this well we use MuladdMacro.jl | which is a semantic transformation that turns x*y+z into | muladd expressions, and it does not recurse into functions so | it does not change the definitions of the callers inside of | the macro scope. | | https://github.com/SciML/MuladdMacro.jl | | This is something that will always increase performance and | accuracy (performance because muladd in Julia is an FMA that | is only applied if hardware FMA exists, effectively never | resorting to a software FMA emulation) because it's targeted | to do only a transformation that has that property. | eigenspace wrote: | This isn't really as valid a comparison as you might think it | is. The results of operations varying is not the problem with | 'fast-math', the problem is that can negatively impact | accuracy in catastrophic ways (among other things). | | Sure, automatic FMA can change the result, but to my | knowledge it always gives a _more_ accurate result, not a | less accurate one, and the way in which the results may | differ is bounded. | raymondh wrote: | This is a rockstar quality post. It is astonishing how much | detective work was involved. | stabbles wrote: | See also https://simonbyrne.github.io/notes/fastmath/ for a | similar story in julia, where ffast-math is now banned for | C/C++/Fortran dependencies | jesse__ wrote: | 10/10 yak shave. Would certainly read again | elina123 wrote: | bee_rider wrote: | A decorator is a nice idea for this. | | I was going to suggest another package that just resets the MXCSR | when imported, but I guess... hypothetically... some function | might actually want the FTZ behavior. | jcranmer wrote: | If you want that behavior, you should explicitly enable | it/disable it at the borders of the region where you want that | behavior, rather than screwing over everybody for your own | benefit. | jcranmer wrote: | The problem here is that enabling FTZ/DAZ flags involves | modifying global (technically thread-local) state that is | relatively expensive to do. Ideally, you'd want to twiddle these | flags only for code that wants to work in this mode, but given | the relative expense of this operation, it's not entirely | practicable to auto-add twiddling to every function call, and | doing it manually is somewhat challenging because compilers tend | to support accessing the floating-point status rather poorly. | Also, FTZ/DAZ aren't IEEE 754, so there's no portable function | for twiddling these bits as there is for other rounding mode or | exception controls. I will note that icc's -fp-model=fast and | MSVC's /fp:fast correctly do not link code with crtfastmath. | | As a side note, this kind of thing is why I think a good title | for a fast-math would be "Fast math, or how I learned to start | worrying and hate floating point." | [deleted] | titzer wrote: | I don't think flipping these flags is expensive. Can you | provide a source for that? AFAICT modern microarchitectures are | going to register-rename that into the u-ops issued to the | functional units, rather than flush the entire ROB. | mrtesthah wrote: | I thought the purpose of Python was to make development simple | and predictable. Needing to track down the compilation and linker | flags of every single shared library reveals the fallacy of this | abstraction. | RodgerTheGreat wrote: | If a language wishes to reap the rewards of a pre-existing | ecosystem, it must pay for the warts and misfeatures of that | ecosystem. Python is deeply dependent on C libraries to achieve | acceptable performance, and this is the price. | magicalhippo wrote: | Denormalized numbers is one reason why you really want to think | carefully if you try to optimize code by rewriting expressions | involving multiplication and division. | | For example, if you got "x = (a / b) * (c / d)" one might think | that rewriting it as "x = (a * c) / (b * d)" will save you a | division and gain you speed. It will and it might, respectively. | | However it will also potentially break an otherwise safe | operation. If the numbers are _very_ small, but still normal, | then the product (b * d) might result in a denormalized number, | and dividing by it will result in + /- infinity. | | However, the code might guarantee that the ratios (a / b) and (c | / d) are not too small or too large, so that multiplying them is | guaranteed to lead to a useful result. | bee_rider wrote: | Anyway, since there aren't any dependencies between a, b, c, | and d, I would expect the two divisions to end up basically in | parallel in the pipeline. So the critical path is a division | and a multiplication either way. Of course that is just a | guess. | garaetjjte wrote: | > it turns out that when you use -Ofast, -fno-fast-math does not, | in fact, disable fast math. lol. lmao. | | What about -fno-unsafe-math-optimizations? | moyix wrote: | Nope, it still links in crtfastmath: $ gcc | -Ofast -fno-unsafe-math-optimizations -fpic -shared foo.c -o | foo.so $ objdump -j .text --disassemble=set_fast_math | foo.so foo.so: file format elf64-x86-64 | Disassembly of section .text: 0000000000001040 | <set_fast_math>: 1040: f3 0f 1e fa | endbr64 1044: 0f ae 5c 24 fc stmxcsr | -0x4(%rsp) 1049: 81 4c 24 fc 40 80 00 orl | $0x8040,-0x4(%rsp) 1050: 00 1051: 0f | ae 54 24 fc ldmxcsr -0x4(%rsp) 1056: c3 | retq | Night_Thastus wrote: | Ouch. Two flags that should reasonably stop this, and neither | do. This feels a bit like the time I was told "No, -wAll does | not in fact add all warnings". | speeder wrote: | Wait, it doesn't? O.o | moyix wrote: | Nope. clang has "-Weverything", and gcc has "-Wextra", | both of which go beyond "-Wall". | | https://stackoverflow.com/questions/11714827/how-can-i- | turn-... | klysm wrote: | Pain. This is so scuffed ___________________________________________________________________ (page generated 2022-09-06 23:00 UTC)