[HN Gopher] Examples of floating point problems ___________________________________________________________________ Examples of floating point problems Author : grappler Score : 161 points Date : 2023-01-13 14:59 UTC (8 hours ago) (HTM) web link (jvns.ca) (TXT) w3m dump (jvns.ca) | weakfortress wrote: | Used to run into these problems all the time when I was doing | work in numerical analysis. | | The PATRIOT missile error (it wasn't a _disaster_ ) was more due | to the handling of timestamps than just floating point deviation. | There were several concurrent failures that allowed the SCUD to | hit it's target. IIRC the clock drift was significant and was | magnified by being converted to a floating point and, | importantly, _truncated_ into a 24 bit register. Moreover, they | weren 't "slightly off". The clock drift alone put the missile | considerably off target. | | While I don't claim that floating points didn't have a hand in | this error it's likely the correct handling of timestamps would | not have introduced the problem in the first place. Unlike the | other examples given this one is a better example of knowing your | system and problem domain rather than simply forgetting to | calculate a delta or being unaware of the limitations of IEEE | 754. "Good enough for government work" struck again here. | [deleted] | ape4 wrote: | All numbers in JavaScript are floats, unless you make an array | with Int8Array(). https://developer.mozilla.org/en- | US/docs/Web/JavaScript/Refe... | | I wonder if people sometimes make a one element integer array | this way so they can have a integer to work with. | mochomocha wrote: | Regarding denormal/subnormal numbers mentioned as "weird": the | main issue with them is that their hardware implementation is | awfully slow, to the point of being unusable for most computation | cases with even moderate FLOPs | dunham wrote: | I had one issue where pdftotext would produce different output on | different machines (Linux vs Mac). It broke some of our tests. | | I tracked down where it was happening (involving an ==), but it | magically stopped when I added print statements or looked at it | in the debugger. | | It turns out the x86 was running the math at a higher precision | and truncating when it moved values out of registers - as soon as | it hit memory, things were equal. MacOS was defaulting to | -ffloat-store to get consistency (their UI library is float | based). | | There were too many instances of == in that code base (which IMO | is a bad idea with floats), so I just added -ffloat-store to the | Linux build and called it a day. | alkonaut wrote: | x86 (x87) FP is notoriously inconsistent because of the 80 bit | extended precision that may not be used. In a JITed language | line Java/C# it's even less fun as it can theoretically be | inconsistent even for the same compiled program on different | machines. | | Thankfully the solution to that problem came when x86 (32 bit) | mostly disappeared. | WalterBright wrote: | > NaN/infinity values can propagate and cause chaos | | NaN is the most misunderstood feature of IEEE floating point. | Most people react to a NaN like they'd react to the dentist | telling them they need a root canal. But NaN is actually a very | valuable and useful tool! | | NaN is just a value that represents an invalid floating point | value. The result of any operation on a NaN is a NaN. This means | that NaNs propagate from the source of the original NaN to the | final printed result. | | "This sounds terrible" you might think. | | But let's study it a bit. Suppose you are searching an array for | a value, and the value is not in the array. What do you return | for an index into the array? People often use -1 as the "not | found" value. But then what happens when the -1 value is not | noticed? It winds up corrupting further attempts to use it. The | problem is that integers do not have a NaN value to use for this. | | What's the result of sqrt(-1.0)? It's not a number, so it's a | NaN. If a NaN appears in your results, you know you've got | mistake in your algorithm or initial values. Yes, I know, it can | be clumsy to trace it back to its source, but I submit it is | _better_ than having a bad result go unrecognized. | | NaN has value beyond that. Suppose you have an array of sensors. | One of those sensors goes bad (like they always do). What value | to you use for the bad sensor? NaN. Then, when the data is | crunched, if the result is NaN, you know that your result comes | from bad data. Compare with setting the bad input to 0.0. You | never know how that affects your results. | | This is why D (in one of its more controversial choices) sets | uninitialized floating point values to NaN rather than the more | conventional choice of 0.0. | | NaN is your friend! | inetknght wrote: | > _This means that NaNs propagate from the source of the | original NaN to the final printed result._ | | An exception would be better. Then you immediately get at the | first problem instead of having to track down the lifetime of | the observed problem to find the first problem. | insulanus wrote: | Definitely. Unfortunately, Language implementations that | guaranteed exceptions were not in wide use at the time. Also, | to have a chance at being implemented on more than one CPU, | it had to work in C and assembly. | pwpwp wrote: | I don't find this convincing. | | > What do you return for an index into the array? | | An option/maybe type would solve this much better. | | > Yes, I know, it can be clumsy to trace it back to its source | | An exception would be much better, alerting you to the exact | spot where the problem occurred. | WalterBright wrote: | > An option/maybe type would solve this much better. | | NaN's are already an option type, although implemented in | hardware. The checking comes for free. | | > An exception would be much better | | You can configure the FPU to cause an Invalid Operation | Exception, but I personally don't find that attractive. | pwpwp wrote: | Good points! | omginternets wrote: | As far as I'm aware, there's no equivalent to a stack trace | with NaN, so finding the origin of a NaN can be extremely | tedious. | ratorx wrote: | The missing bit is language tooling. The regular floating | point API exposed by most languages don't force handling of | NaNs. | | The benefit of the option type is not necessarily just the | extra value, but also the fact that the API that forces you | to handle the None value. It's the difference between null | and Option. | | Even if the API was better, I think there's value in | expressing it as Option<FloatGuaranteedToNotBeNaN> which | compiles down to using NaNs for the extra value to keep it | similar to other Option specialisations and not have to | remember about this special primitive type that has option | built in. | jcparkyn wrote: | > NaN's are already an option type, although implemented in | hardware | | The compromise with this is that it makes it impossible to | represent a _non-optional_ float, which leads to the same | issues as null pointers in c++ /java/etc. | | The impacts of NaN are almost certainly not as bad (in | aggregate) as `null`, but it'd still be nice if more | languages had ways to guarantee that certain numbers aren't | NaN (e.g. with a richer set of number types). | jordigh wrote: | Exceptions are actually part of floats, they're called | "signalling nans". | | So technically Python is correct when it decided that 0.0/0.0 | should raise an exception instead of just quietly returning | NaN. Raising an exception is a standards-conforming option. | | https://stackoverflow.com/questions/18118408/what-is-the- | dif... | WalterBright wrote: | In practice, I've found signalling NaNs to be completely | unworkable and gave up on them. The trouble is they eagerly | convert to quiet NaNs, too eagerly. | maximilianburke wrote: | I think the concept of NaNs are sound, but I think relying on | them is fraught with peril, made so by the unobvious test for | NaN-ness in many languages (ie, "if (x != x)"), and the lure of | people who want to turn on "fast math" optimizations which do | things like assume NaNs aren't possible and then dead-code- | eliminate everything that's guarded by an "x != x" test. | | Really though, I'm a fan, I just think that we need better | means for checking them in legacy languages and we need to | entirely do away with "fast math" optimizations. | WalterBright wrote: | I call them "buggy math" optimizations. The dmd D compiler | does not have a switch to enable buggy math. | jordigh wrote: | > What's the result of 1.0/0.0? It's not a number, so it's a | NaN | | It's not often that I get to correct Mr D himself, but 1.0/0.0 | is... | WalterBright wrote: | You're right. I'll fix it. | evancox100 wrote: | Example 7 really got me, can anyone explain that? I'm not sure | how "modulo" operation would be implemented in hardware, if it is | a native instruction or not, but one would hope it would give a | result consistent with the matching divide operation. | | Edit: x87 has FPREM1 which can calculate a remainder (accurately | one hopes), but I can't find an equivalent in modern SSE or AVX. | So I guess you are at the mercy of your language's library and/or | compiler? Is this a library/language bug rather than a Floating | Point gotcha? | adrian_b wrote: | This has nothing to do with the definition or implementation of | the remainder or modulo function. | | It is a problem that appears whenever you compose an inexact | function, like the conversion from decimal to binary, with a | function that is not continuous, like the remainder a.k.a. | modulo function. | | In decimal, 13.716 is exactly 3 times 4.572, so any kind of | remainder must be null, but after conversion from decimal to | binary that relationship is no longer true, and because the | remainder is not a continuous function its value may be wildly | different from the correct value. | | When you compute with approximate numbers, like the floating- | point numbers, as long as you compose only continuous | functions, the error in the final result remains bounded and | smaller errors in inputs lead to a diminished error in the | output. | | However, it is enough to insert one discontinuous function in | the computation chain for losing any guarantee about the | magnitude of the error in the final result. | | The conclusion is that whenever computing with approximate | numbers (which may also use other representations, not only | floating-point) you have to be exceedingly cautious when using | any function that is not continuous. | timerol wrote: | Based on the nearest numbers that floats represent, the two | numbers are Y = 13.715999603271484375 | (https://float.exposed/0x415b74bc) and X = | 4.57200002670288085938 (https://float.exposed/0x40924dd3). | | The division of these numbers is 2.9999998957049091386350361962 | 468173875300478102103478639802753918, but the nearest float to | that is 3. (Exactly 3.) [2] | | The modulo operation can (presumably) determine that 3X > Y, so | the modulo is Y - 2X, as normal. | | This gives inconsistent results, if you don't know that every | float is actually a range, and "3" as a float includes some | numbers that are smaller than 3. | | [1] | https://www.wolframalpha.com/input?i=13.715999603271484375+%... | [2] https://www.wolframalpha.com/input?i=2.99999989570490913863 | 5..., then https://float.exposed/0x40400000 | svat wrote: | This is useful but note that Python uses 64-bit floats (aka | "double"), so the right values are: | | * "13.716" means 13.7159999999999993037 | (https://float.exposed/0x402b6e978d4fdf3b) | | * "4.572" means 4.57200000000000006395 | (https://float.exposed/0x401249ba5e353f7d) | | * "13.716 / 4.572" means the nearest representable value to | 13.7159999999999993037 / 4.57200000000000006395 which (https: | //www.wolframalpha.com/input?i=13.7159999999999993037+...) is | 3.0 (https://float.exposed/0x4008000000000000) | | * "13.716 % 4.572" means the nearest representable value to | 13.7159999999999993037 % 4.57200000000000006395 namely to | 4.5719999999999991758 (https://www.wolframalpha.com/input?i=1 | 3.7159999999999993037+...), which is 4.57199999999999917577 | (https://float.exposed/0x401249ba5e353f7c) printed as | 4.571999999999999. | | ---------------- | | Edit: For a useful analogy (answering the GP), imagine you're | working in decimal fixed-point arithmetic with two decimal | digits (like dollars and cents), and someone asks you for | 10.01/3.34 and 10.01%3.34. Well, | | * 10.01 / 3.34 is well over 2.99 (it's over 2.997 in fact) so | you'd be justified in answering 3.00 (the nearest | representable value). | | * 10.01 % 3.34 is 3.33 (which you can represent exactly), so | you'd answer 3.33 to that one. | | (For an even bigger difference: try 19.99 and 6.67 to get | 3.00 as quotient, but 6.65 as remainder.) | kilotaras wrote: | Story time. | | Back in university I was taking part in programming competition. | I don't remember the exact details of a problem, but it was | expected to be solved as a dynamic problem with dp[n][n] as an | answer, n < 1000. But, wrangling some numbers around one could | show that dp[n][n] = dp[n-1][n-1] + 1/n, and the answer was just | the sum of first N elements of harmonic series. Unluckily for us | the intended solution had worse precision and our solution | failed. | HarryHirsch wrote: | They didn't take into account that floats come with an | estimated uncertainty, and that values that are the same within | the limits of experimental error are identical? That's a really | badly set problem! | kilotaras wrote: | I think it that particular case they just didn't do error | analysis. | | The task was to output answer with `10^-6` precision, which | they solution didn't achieve. Funnily enough the number of | other teams went the "correct" route and passed (as they were | doing additions in same order as original solution). | jordigh wrote: | One thing that pains me about this kind of zoo of problems is | that people often have the takeaway, "floating point is full of | unknowable, random errors, never use floating point, you will | never understand it." | | Floating point is amazingly useful! There's a reason why it's | implemented in hardware in all modern computers and why every | programming language has a built-in type for floats. You should | use it! And you should understand that most of its limitations | are an inherent mathematical and fundamental limitation, it is | logically impossible to do better on most of its limitations: | | 1. Numerical error is a fact of life, you can only delay it or | move it to another part of your computation, but you cannot get | rid of it. | | 2. You cannot avoid working with very small or very large things | because your users are going to try, and floating point or not, | you'd better have a plan ready. | | 3. You might not like that floats are in binary, which makes | decimal arithmetic look weird. But doing decimal arithmetic does | not get rid of numerical error, see point 1 (and binary | arithmetic thinks your decimal arithmetic looks weird too). | | But sure, don't use floats for ID numbers, that's always a | problem. In fact, don't use bigints either, nor any other | arithmetic type for something you won't be doing arithmetic on. | zokier wrote: | > One thing that pains me about this kind of zoo of problems is | that people often have the takeaway, "floating point is full of | unknowable, random errors, never use floating point, you will | never understand it." | | > Floating point is amazingly useful! | | Another thing about floats is they are for most parts actually | very predictable. In particular all basic operations should | produce bit-exact results to last ulp. Also because they are | language independent standard, you generally can get same | behavior in different languages and platforms. This makes | learning floats properly worthwhile because the knowledge is so | widely applicable | jsmith45 wrote: | >In particular all basic operations should produce bit-exact | results to last ulp. | | As long as you are not using a compiler that utilizes x87's | extended precision flaots for intermediate calculations, and | silently rounding whenever it transfers to memory (That used | to be a common issue), and as long as you are not doing dumb | stuff with compiler math flags. | | Also if you have any code anwhere in your program that relies | on correct subnormal handling, then you need to be absolutely | sure no code is compiled with `-ffast-math`, including in any | dynamically loaded code in your entire program, or your math | will break: https://simonbyrne.github.io/notes/fastmath/#flus | hing_subnor... | | And of course if you are doing anything complicated with | floating point number, there are entire fields of study about | creating numerically stable algorithms, and determining the | precision of algorithms with floating point numbers. | gumby wrote: | > Floating point is amazingly useful! There's a reason why it's | implemented in hardware in all modern computers and why every | programming language has a built-in type for floats. | | I completely agree with you even though I go out of my way to | avoid FP, and even though, due to what I usually work on, I can | often get away with avoiding FP (often fixed point works -- for | me). | | IEEE-754 is a marvelous standard. It's a short, easy to | understand standard attached to an absolutely mind boggling | number of special cases or explanation as to why certain | decisions in the simple standard were actually incredibly | important (and often really smart and non-obvious). It's the | product of some very smart people who had, through their | careers, made FP implementations and discovered why various | decisions turned out to have been bad ones. | | I'm glad it's in hardware, and not just because FP used to be | quite slow and different on every machine. I'm glad it's in | hardware because chip designers (unlike most software | developers) are anal about getting things right, and | implementing FP properly is _hard_ -- harder than using it! | [deleted] | carapace wrote: | Floating point is a goofy hacky kludge. | | > There's a reason why it's implemented in hardware in all | modern computers | | Yah, legacy. | | The reason we used it originally is that computers were small | and slow. Now that they're big and fast we could do without it, | except that there is already so much hardware and software out | there that it will never happen. | astrange wrote: | Turning all your fixed-size numeric types into variable-sized | numeric types introduces some really exciting performance and | security issues. (At least if you consider DoS security.) | | I think fixed-point math is underrated though. | dahfizz wrote: | What replacement would you propose? They all have different | tradeoffs. | carapace wrote: | (I just tried to delete my comment and couldn't because of | your reply. Such is life.) | | ogogmad made a much more constructive comment than mine: | https://news.ycombinator.com/item?id=34370745 | | It really depends on your use case. | ogogmad wrote: | > And you should understand that most of its limitations are an | inherent mathematical and fundamental limitation, it is | logically impossible to do better on most of its limitations | | You can do exact real arithmetic. But this is only done by | people who prove theorems with computers - or by the Android | calculator! https://en.wikipedia.org/wiki/Computable_analysis | | Other alternatives (also niche) are exact rational arithmetic, | computer algebra, arbitrary precision arithmetic. | | Fixed point sometimes gets used instead of floats because some | operations lose no precision over them, but most operations | still do. | saagarjha wrote: | These are only relevant in some circumstances. For example, a | calculator is typically bounded in the number of operations | you can perform to a small number (humans don't add millions | of numbers). This allows for certain representations that | don't make sense elsewhere. | lanstin wrote: | I wouldn't call computable reals the reals. They are a subset | of measure zero. Perhaps all we sentient beings can aspire to | use, but still short of the glory of the completed infinities | that even one arbitrary real represents. | | One half : ) | jordigh wrote: | In my opinion, that's in the realm of "you can only delay | it". Sure, you can treat real numbers via purely logical | deductions like a human mathematician would, but at some | point someone's going to ask, "so, where is the number on | this plot?" and that's when it's time to pay the fiddler. | | Same for arbitrary-precision calculations like big rationals. | That just gives you as much precision as your computer can | fit in memory. You will still run out of precision, just | later rather then sooner. | ogogmad wrote: | > Same for arbitrary-precision calculations like big | rationals. That just gives you as much precision as your | computer can fit in memory. You will still run out of | precision, later rather then sooner. | | Oh, absolutely. This actually shows that floats are (in | some sense) more rigorous than more idealised mathematical | approaches, because they explicitly deal with finite | memory. | | Oh, I remembered! There's also interval arithmetic, and | variants of it like affine arithmetic. At least you _know_ | when you 're losing precision. Why don't these get used | more? These seem more ideal, somehow. | gugagore wrote: | If x is the interval [-1, 1], the typical implementation | of IA will | | evaluate x-x to [-2, 2] (instead of [0, 0], and | | evaluate x*x [-1, 1] instead of [0, 1]. | | Therefore the intervals become too conservative to be | useful. | genneth wrote: | Because the interval, on average, grows exponentially | with the number of basic operations. So it quickly | becomes practically useless. | zokier wrote: | > 3. You might not like that floats are in binary, which makes | decimal arithmetic look weird. But doing decimal arithmetic | does not get rid of numerical error, see point 1 (and binary | arithmetic thinks your decimal arithmetic looks weird too). | | One thing that I suspect trips people a lot is decimal | string/literal <-> (binary) float conversions instead of the | floating point math itself. This includes the classic 0.1+0.2 | thing, and many of the problems in the article. | | I think these days using floating point hex strings/literals | more would help a lot. There are also decimal floating point | numbers that people largely ignore despite being standard for | over 15 years | jordigh wrote: | The only implementation of IEEE754 decimals I've ever seen is | in Python's Decimal package. Is there an easily-available | implementation anywhere else? | zokier wrote: | I don't think Pythons Decimal is ieee754, instead its some | sort of arbitrary precision thingy. | | GCC has builtin support for decimal floats: | https://gcc.gnu.org/onlinedocs/gcc/Decimal-Float.html | | There are also library implementations floating around, | some of them are mentioned in this thread: | https://discourse.llvm.org/t/rfc-decimal-floating-point- | supp... | | decnumber has also rust wrappers if you are so inclined | jordigh wrote: | Python's decimal absolutely is IEEE 754 (well, based on | the older standard, which has now been absorbed into IEEE | 754): | | https://github.com/python/cpython/blob/main/Lib/_pydecima | l.p... | | Cool, didn't know that gcc had built-in support. But is | it really as incomplete as it says there? | zokier wrote: | Huh, I didn't know it was that close, I'll grant that. | But I'd say still no cigar. | | One of the most elementary requirements of IEEE754 is: | | > A programming environment conforms to this standard, in | a particular radix, by implementing one or more of the | basic formats of that radix as both a supported | arithmetic format and a supported interchange format. | | (Section 3.1.2) | | While you could argue that you may configure Decimals | context parameters to match those of some IEEE754 format | and thus claim conformance as arithmetic format, Python | has absolutely no support for the specified interchange | formats. | | To be honest, seeing this I'm bit befuddled on why closer | conformance with IEEE754 is not sought. Quick search | found e.g. this issue report on adding IEEE754 | parametrized context, which is a trivial patch, and it | has been just sitting there for 10 years: | https://github.com/python/cpython/issues/53032 | | Adding code to import/export BID/DPD formats, while maybe | not as trivial, seems still comparatively small task and | would improve interoperability significantly imho. | Lind5 wrote: | AI already has led to a rethinking of computer architectures, in | which the conventional von Neumann structure is replaced by near- | compute and at-memory floorplans. But novel layouts aren't enough | to achieve the power reductions and speed increases required for | deep learning networks. The industry also is updating the | standards for floating-point (FP) arithmetic. | https://semiengineering.com/will-floating-point-8-solve-ai-m... | dkarl wrote: | I'm not on Mastodon, so I'll share here: I inherited some | numerical software that was used primarily to prototype new | algorithms and check errors for a hardware product that solved | the same problem. It was known that different versions of the | software produced slightly different answers, for seemingly no | reason. The hardware engineer who handed it off to me didn't seem | to be bothered by it. He wasn't using version control, so I | couldn't dig into it immediately, but I couldn't stop thinking | about it. | | Soon enough I had two consecutive releases in hand, which | produced different results, and which had _identical numerical | code_. The only code I had changed that ran during the numerical | calculations was code that ran _between_ iterations of the | numerical parts of the code. IIRC, it printed out some status | information like how long it had been running, how many | calculations it had done, the percent completed, and the | predicted time remaining. | | How could that be affecting the numerical calculations??? My | first thought was a memory bug (the code was in C-flavored C++, | with manual memory management) but I got nowhere looking for one. | Unfortunately, I don't remember the process by which I figured | out the answer, but at some point I wondered what instructions | were used to do the floating-point calculations. The Makefile | didn't specify any architecture at all, and for that compiler, on | that architecture, that meant using x87 floating-point | instructions. | | The x87 instruction set was originally created for floating point | coprocessors that were designed to work in tandem with Intel | CPUs. The 8087 coprocessor worked with the 8086, the 287 with the | 286, the 387 with the 386. Starting with the 486 generation, the | implementation was moved into the CPU. | | Crucially, the x87 instruction set includes a stack of eight | 80-bit registers. Your C code may specify 64-bit floating point | numbers, but since the compiled code has to copy those value into | the x87 registers to execute floating-point instructions, the | calculations are done with 80-bit precision. Then the values are | copied back into 64-bit registers. If you are doing multiple | calculations, a smart compiler will keep intermediate values in | the 80-bit registers, saving cycles and gaining a little bit of | precision as a bonus. | | Of course, the number of registers is limited, so intermediate | values may need to be copied to a 64-bit register temporarily to | make room for another calculation to happen, rounding them in the | process. And that's how code interleaved with numerical | calculations can affected the results even if it semantically | doesn't change any of the values. Calculating percent completed, | printing a progress bar -- the compiler may need to move values | out of the 80-bit registers to make room for these calculations, | and when the code changes (like you decide to also print out an | estimated time remaining) the compiler might change which | intermediate values are bumped out of the 80-bit registers and | rounded to 64 bits. | | It was silly that we were executing these ancient instructions in | 2004 on Opteron workstations, which supported SSE2, so I added a | compiler flag to enable SSE2 instructions, and voila, the | numerical results matched exactly from build to build. We also | got a considerable speedup. I later found out that there's a bit | you can flip to force x87 arithmetic to always round results to | 64 bits, probably to solve exactly the problem I encountered, but | I never circled back to try it. | jordigh wrote: | Oh man, those 80-bit registers on 32-bit machines were weird. I | was very confused as an undergrad when I ran the basic program | to find machine epsilon, and was getting a much smaller epsilon | than I expected on a 64-bit float. Turns out, the compiler had | optimised all of my code to run on registers and I was getting | the machine epsilon of the registers instead. | cratermoon wrote: | Muller's Recurrence is my favorite example of floating point | weirdness. See https://scipython.com/blog/mullers-recurrence/ and | https://latkin.org/blog/2014/11/22/mullers-recurrence-roundo... | lifefeed wrote: | My favorite floating point weirdness is that 0.1 can't be exactly | represented in floating point. | jrockway wrote: | Isn't it equally weird that 1/3 can't be exactly represented in | decimal? | pitaj wrote: | Yep! Too bad humanity has settled on decimal instead of | dozenal (base 12). | kps wrote: | Indeed, 0.1 can be represented exactly in _decimal_ floating | point, and can 't be represented in _binary_ fixed point. It | 's just that fractional values are currently almost always | represented using binary floating point, so the two get | conflated. | layer8 wrote: | The reason why the 0.1 case is weird (unexpected) is that we | use decimal notation in floating-point constants (in source | code, in formats like JSON, and in UI number inputs), but the | value that the constant actually ends up representing is | really the closest binary number, where in addition the | closeness depends on the FP precision used. If we would write | FP values in binary or hexadecimal (which some languages | support), the issue wouldn't arise. | dahfizz wrote: | > Javascript only has floating point numbers - it doesn't have an | integer type. | | Can anyone justify this? Do JS developers prefer not having exact | integers, or is this something that everyone just kinda deals | with? | thdc wrote: | I believe this is technically inaccurate; while Javascript | groups most of the number values under, well, "number", modern | underlying implementations may resort to perform integer | operations when they recognize it is possible. There are also a | couple hacks you can do with bit operations to "work" with | integers, although I don't remember them off the top of my head | - typically used for truncating and whatnot and was mainly a | performance thing. | | Also there are typed arrays and bigints if we can throw those | in, too. | saagarjha wrote: | The way runtimes optimize arithmetic is an implementation | detail and must conform to IEEE-754. | thdc wrote: | Fair point, I have been taking smis for granted | enriquto wrote: | > not having exact integers | | What do you mean? Floating-point arithmetic is, by design, | exact for small integers. The result of adding 2.0 to 3.0 is | exactly 5.0. This is one of the few cases where it is perfectly | legitimate to compare floats for equality. | | In fact, using 64-bit doubles to represent ints you get way | more ints than using plain 32-bit ints. Thus, choosing doubles | to represent integers makes perfect sense (unless you worry | about wasting a bit of memory and performance). | josefx wrote: | You can use doubles to store and calculate exact integer | values. You just wont get 2^64 integers, instead you get the | range +/-2^53 . | deathanatos wrote: | Nowadays, it has BigInt. | | If you're very careful, a double can be an integer type. (A | 53-bit one, I think?) (I don't love this line of thinking. It | has _a lot_ of sharp edges. But JS programmers effectively do | this all the time, often without thinking too hard about it.) | | (And even before BigInt, there's an odd u32-esque "type" in JS; | it's not a real type -- it doesn't appear in the JS type | system, but rather an internal one that certain operations will | be converted to internally. That's why (0x100000000 | 0) == 0 ; | even though 0x100000000 (and every other number in that | expression, and the right answer) is precisely representable as | a f64. This doesn't matter for JSON decoding, though, ... and | most other things.) | guyomes wrote: | Example 4 mentions that the result might be different with the | same code. Here is an example that is particularly counter- | intuitive. | | Some CPU have the instruction FMA(a,b,c) = ab + c and it is | guaranteed to be rounded to the nearest float. You might think | that using FMA will lead to more accurate results, which is true | most of the time. | | However, assume that you want to compute a dot product between 2 | orthogonal vectors, say (u,v) and (w,u) where w = -v. You will | write: | | p = uv + wu | | Without FMA, that amounts to two products and an addition between | two opposite numbers. This results in p = 0, which is the | expected result. | | With FMA, the compiler might optimize this code to: | | p = FMA(u, v, wu) | | That is one FMA and one product. Now the issue is that wu is | rounded to the nearest float, say x, which is not exactly -vu. So | the result will be the nearest float to uv + x, which is not | zero! | | So even for a simple formula like this, testing if two vectors | are orthogonal would not necessary work by testing if the result | is exactly zero. One recommended workaround in this case is to | test if the dot product has an absolute value smaller than a | small threshold. | [deleted] | zokier wrote: | Note that with gcc/clang you can control the auto-use of fma | with compile flags (-ffp-contract=off). It is pretty crazy imho | that gcc defaults to using fma | thxg wrote: | > It is pretty crazy imho that gcc defaults to using fma | | Yes! Different people can make different performance-vs- | correctness trade-offs, but I also think reproducible-by- | default would be better. | | Fortunately, specifying a proper standard (e.g. -std=c99 or | -std=c++11) implies -ffp-contract=off. I guess specifying | such a standard is probably a good idea independently when we | care about reproducibility. | | Edit: Thinking about it, it the days of 80-bit x87 FPUs, | strictly following the standard (specifically, always | rounding to 64 bits after every operation) may have been | prohibitively expensive. This may explain gcc's GNU mode | defaulting to -ffast-math. | zokier wrote: | > Edit: Thinking about it, it the days of 80-bit x87 FPUs, | strictly following the standard (specifically, always | rounding to 64 bits after every operation) may have been | prohibitively expensive | | afaik you could just set the precision of x87 to 32/64/80 | bits and there would not be any extra cost to the | operations | lanstin wrote: | In general with reals with any source of error anywhere, this | caution about equality is always correct. the odds of two reals | being equal is zero. | raphlinus wrote: | I have an exception that proves the rule. I thought about | responding to Julia's call, but decided this was too subtle. | But here we go... | | A central primitive in 2D computational geometry is the | orientation problem; in this case deciding whether a point | lies to the left or right of a line. In real arithmetic, the | classic way to solve it is to set up the line equation (so | the value is zero for points on the line), then evaluate that | for the given point and test the sign. | | The problem is of course that for points very near the line, | roundoff error can give the wrong answer, it is in fact an | example of cancellation. The problem has an exact answer, and | can be solved with rational numbers, or in a related | technique detecting when you're in the danger zone and upping | the floating point precision just in those cases. (This | technique is the basis of Jonathan Shewchuk's thesis). | | However, in work I'm doing, I want to take a different | approach. If the y coordinate of the point matches the y | coordinate of one of the endpoints of the line, then you can | tell orientation exactly by comparing the x coordinates. In | other cases, either you're far enough away that you know you | won't get the wrong answer due to roundoff, or you can | subdivide the line at that y coordinate. Then you get an | orientation result that is not necessarily exactly correct | wrt the original line, but you can count on it being | consistent, which is what you really care about. | | So the ironic thing is that if you had a lint that said, | "exact floating point equality is dangerous, you should use a | within-epsilon test instead," it would break the reasoning | outlined above, and you could no longer count on the | orientations being consistent. | | As I said, though, this is a very special case. _Almost_ | always, it is better to use a fuzzy test over exact equality, | and I can also list times I 've been bitten by that ( | _especially_ in fastmath conditions, which are hard to avoid | when you 're doing GPU programming). | thxg wrote: | Yes, and this is not just a theoretical concern: There was an | article here [1] in 2021 claiming that Apple M1's FMA | implementation had "flaws". There was actually no such flaw. | Instead, the author was caught off guard by the very phenomenon | you are describing. | | [1] https://news.ycombinator.com/item?id=27880461 | kloch wrote: | > if you add very big values to very small values, you can get | inaccurate results (the small numbers get lost!) | | There is a simple workaround for this: | | https://en.wikipedia.org/wiki/Kahan_summation | | It's usually only needed when adding billions of values together | and the accumulated truncation errors would be at an unacceptable | level. | phkahler wrote: | It can also come up in simple control systems. A simple low- | pass filter can fail to converge to a steady state value if the | time constant is long and the sample rate is high. | | Y += (X-Y) * alpha * dt | | When dt is small and alpha is too, the right hand side can be | too small to affect the 24bit mantissa of the left. | | I prefer a 16/32bit fixed point version that guarantees | convergene to any 16bit steady state. This happened in a power | conversion system where dt=1/40000 and I needed a filter in the | 10's of Hz. | jbay808 wrote: | This is a very important application and a tougher problem | than most would guess. There is a huge variety of ways to | numerically implement even a simple transfer function, and | they can have very different consequences in terms of | rounding and overflow. Especially if you want to not only | guarantee that it converges to a steady-state, but | furthermore that the steady-state has no error. I spent a lot | of time working on this problem for nanometre-accurate servo | controls. Floating and fixed point each have advantages | depending on the nature and dynamic range of the variable | (eg. location parameter vs scale parameter). | kergonath wrote: | > It's usually only needed when adding billions of values | together and the accumulated truncation errors would be at an | unacceptable level. | | OTOH, it's easy to implement, so I have a couple of functions | to do it easily, and I got quite a lot of use out of them. It's | probably overkill sometimes, but sometimes it's useful. | Aardwolf wrote: | > but I wanted to mention it because: | | > 1. it has a funny name | | Reasoning accepted! | [deleted] | owisd wrote: | My 'favourite' is that the quadratic formula -b+-sqrt(b2-4ac)/2a | falls apart when you solve for the positive solution using | floating point for cases where e=b2/4ac is small, the workaround | being to use the binomial expansion -b/2a*(0.5e-0.125e2+O(e3)) | svat wrote: | If you have only a couple of minutes to develop a mental model of | floating-point numbers (and you have none currently), the most | valuable thing IMO would be to spend them staring at a diagram | like this one: | https://upload.wikimedia.org/wikipedia/commons/b/b6/Floating... | (uploaded to Wikipedia by user Joeleoj123 in 2020, made using | Microsoft Paint) -- it already covers the main things you need to | know about floating-point, namely there are only finitely many | discrete representable values (the green lines), and the gaps | between them are narrower near 0 and wider further away. | | With just that understanding, you can understand the reason for | most of the examples in this post. You avoid both the extreme of | thinking that floating-point numbers are mathematical (exact) | real numbers, and the extreme of "superstition" like believing | that floating-point numbers are some kind of fuzzy blurry values | and that any operation always has some error / is "random", etc. | You won't find it surprising why 0.1 + 0.2 [?] 0.3, but 1.0 + 2.0 | will always give 3.0, but 100000000000000000000000.0 + | 200000000000000000000000.0 [?] 300000000000000000000000.0. :-) | (Sure this confidence may turn out to be dangerous, but it's | better than "superstition".) The second-most valuable thing, if | you have 5-10 minutes, may be to go to https://float.exposed/ and | play with it for a while. | | Anyway, great post as always from Julia Evans. Apart from the | technical content, her attitude is really inspiring to me as | well, e.g. the contents of the "that's all for now" section at | the end. | | The page layout example ("example 7") illustrates the kind of | issue because of which Knuth avoided floating-point arithmetic in | TeX (except where it doesn't matter) and does everything with | scaled integers (fixed-point arithmetic). (It was even worse then | before IEEE 754.) | | I think things like fixed-point arithmetic, decimal arithmetic, | and maybe even exact real arithmetic / interval arithmetic are | actually more feasible these days, and it's no longer obvious to | me that floating-point should be the default that programming | languages guide programmers towards. | sacrosancty wrote: | If you have even less time, just think of them as representing | physical measurements made with practical instruments and the | math done with analog equipment. | | The common cause of floating point problems is usually treating | them as a mathematical ideal. The quirks appear at the extremes | when you try to to un-physical things with them. You can't | measure exactly 0 V with a voltmeter, or use an instrument for | measuring the distance to stars then add a length obtained from | a micrometer without entirely losing the latter's contribution. | svat wrote: | Thanks, I actually edited my post (made the second paragraph | longer) after seeing your comment. The "physical" / "analog" | idea does help in one direction (prevents us from relying on | floating-point numbers in unsafe ways) but I think it brings | us too close to the "superstition" end of the spectrum, where | we start to think that floating-point operations are non- | deterministic, start doubting whether we can rely on (say) | the operation 2.0 + 3.0 giving exactly 5.0 (we can!), whether | addition is commutative (it is, if working with non-NaN | floats) and so on. | | You could argue that it's "safe" to distrust floating-point | entirely, but I find it more comforting to be able to take at | least some things as solid and reason about them, to refine | my mental model of when errors can happen and not happen, | etc. | | Edit: See also the _floating point isn't "bad" or random_ | section that the author just added to the post | (https://twitter.com/b0rk/status/1613986022534135809). | ogogmad wrote: | Related: In numerical analysis, I found the distinction between | forwards and backwards numerical error to be an interesting | concept. The forwards error initially seems like the only right | kind, but is often impossible to keep small in numerical linear | algebra. In particular, Singular Value Decomposition cannot be | computed with small forwards error. But the SVD can be computed | with small backwards error. | | Also: The JSON example is nasty. Should IDs then always be | strings? | gugagore wrote: | IIRC, forward error: the error between the given answer and the | right answer to the given question. | | Backward error: the error between the given question, and the | question whose right answer is the given answer. | | Easier to parse like this: a small forward error means that you | give an answer close to the right one. | | A small backward error means that the answer you give is the | right answer for a nearby question. | deathanatos wrote: | > _The JSON example is nasty._ | | Specs, vs. their implementations, vs. backwards compat. JSON | just defines a number type, and neither the grammar nor the | spec places limits on it (though the spec does call out exactly | this problem). So the JSON is to-spec valid. But | implementations have limits as to what they'll decode: JS's is | that it decodes to number (a double) by default, and thus, | loses precision. | | (I feel like this issue is pretty well known, but I suppose it | probably bites everyone once.) | | JS does have the BigInt type, nowadays. Unfortunately, while | the JSON.parse API includes a "reviver" parameter, the way it | ends up working means that it can't actually take advantage of | BigInt. | | > _Should IDs then always be strings?_ | | That's a decent-ish solution; as it side-steps the interop | issues. String, to me, is not unreasonable for an ID, as you're | not going to be doing math on it. | mikehollinger wrote: | Love it. I actually use Excel which even power users take for | granted to highlight that people _really_ need to understand the | underlying system, or the system needs to have guard rails to | prevent people from stubbing their toes. Microsoft even had to | write a page explaining what might happen [1] with floating point | wierdness. | | [1] https://docs.microsoft.com/en- | us/office/troubleshoot/excel/f... ___________________________________________________________________ (page generated 2023-01-13 23:00 UTC)