[HN Gopher] There are only four billion floats, so test them all... ___________________________________________________________________ There are only four billion floats, so test them all (2014) Author : tape_measure Score : 372 points Date : 2020-10-11 13:19 UTC (9 hours ago) (HTM) web link (randomascii.wordpress.com) (TXT) w3m dump (randomascii.wordpress.com) | peter303 wrote: | I recall hackers building standard password hashes of all eight | character ascii combinations. Then create a reverse lookup table. | But thats 400 trillion entries for stand 68 character ascii | password subset. | | 400 trillion hashes is not a big computation in the blockchain | era. | SirYandi wrote: | More info: https://en.m.wikipedia.org/wiki/Rainbow_table | ascar wrote: | Previous submission 2017: | | https://news.ycombinator.com/item?id=15252426 | | Can't believe this was already 3 years ago. Interesting read! | | Previous submission 2014: | https://news.ycombinator.com/item?id=7135261 | tape_measure wrote: | Yes, I originally came across this on HN, likely in 2017 based | on my surrounding bookmarks. It has stuck with me ever since | even though I don't do much programming anymore. At the time, I | was a TA in a MATLAB course, so I tried to emphasize that the | code runs on a machine and is fundamentally discrete. (.1 + .2 | == .3) returning false is an easy way into this concept. | | 32 bits (and now 64 as @dragontamer calculated) is a huge input | space. You could verify 4 int_8 or 2 char[2] functions. | jlebar wrote: | We did this in XLA. It runs as part of CI, which I always found | pretty amazing. | https://github.com/tensorflow/tensorflow/blob/bd2608178bca1b... | | Found a lot of bugs this way. | NelsonMinar wrote: | "Several of the ceil functions were implemented by adding 0.5 to | the input value and rounding to nearest." | | What sort of clown would make that mistake and yet think they are | capable of writing a floating point library? | Exuma wrote: | Can you give an example of when this wouldn't work? I'm | definitely not qualified to write a float library (lol) but I | also can't think of any cases where that wouldn't work? | brianush1 wrote: | ceil(5) = round(5.5) = 6 | | ceil(5) should be 5 | zelphirkalt wrote: | This is probably about number precision and loosing digits, | when you add something with digits only in the beginning like | 0.5 to something small like | 0.1000000000000000000000000000013124142579275927597245205 or | whatever a barely representable floating point number is. | Then the result becomes inprecise and rounding to nearest | will result in the wrong value. | | Such effects and similar should be well known to anyone | taking on the responsibility to develop a floating point | library. | ChuckMcM wrote: | I love this. | | This is another in the "Now that we have computers with | gazillions of bits of RAM and many cores and gazillions of | cycles, what conventional wisdom is no longer wise?" series :-). | | I suggested to a client that they use Sqlite3 for their company | database rather than a disk based database. And they looked at me | shocked, as if I had spoken heresy. I pointed out how many | records they could store on a 64GB machine and that is a _small_ | data center class machine and it sort of boggled their mind. Add | read-only copies, shared from a NAS device gave them perfect | scalability in terms of queries per second (their sales folks had | their own database that they were updating). It was fun to watch | their expression go from horror to wonder to excitement when they | finally implemented it. | dragontamer wrote: | https://yourdatafitsinram.net/ | | The IBM 53-bit quantum compute result from a ton of hard drives | + compute power was also intriguing | (https://www.ibm.com/blogs/research/2019/10/on-quantum- | suprem...). | | 128PiB of disk space needed for a 54-qubit computation on their | system (hard drives). Fun to see where the supercomputers are | still needed, but smaller qubits can be emulated on normal | computers. 40 qubit calculations probably can be emulated on a | 16TB team of hard drives in RAID0. | kortilla wrote: | The trade-off isn't really size in a lot of cases though. When | stuff is stored in RAM you lose the persistence story. | michaelmcmillan wrote: | Yup! I'm currently running a b2c solution for a client that's | grossing ~$3 million/year using Python + sqlite3 behind nginx. | | I work for another shop that scolds me for even suggesting | using something else than k8s. How can a persistent file system | possibly scale? Lol. Ironically they're grossing -$3 | million/year. | tonetheman wrote: | I am curious about your case is the traffic mostly read | heavy? Ha wish you could share more. Interesting stuff. | svnpenn wrote: | > I suggested to a client that they use Sqlite3 for their | company database rather than a disk based database. | | Sqlite is a disk based database... | capableweb wrote: | Well, maybe not "disk based" per se. It can be stored on | disk, yeah. But it can also run in memory. I think in this | case it's clear that ChuckMcM meant a machine with 64GB of | RAM (not disk space), therefore they are talking about having | SQLite running in memory, not on disk. | zaroth wrote: | > _For very small numbers (less than about FLT_EPSILON_ * _0.25) | adding 0.5 gives 0.5 exactly, and this then rounds to zero. Since | about 40% of the positive floating-point numbers are smaller than | FLT_EPSILON_ * _0.25 this results in a lot of errors - over 850 | million of them!_ | | I thought Float Epsilon was the smallest possible number which | could be represented by a float? This statement doesn't seem to | make sense... | jcranmer wrote: | When I took numerical analysis, the concept was described as | "machine epsilon", which is the smallest number that can be | added to 1.0 that is not 1.0. | | The utility of this metric is that it tells you the best | possible relative error, as you can never guarantee a result | will be more correct to the true result (in the domain of | mathematical real numbers as opposed to machine floating-point | numbers) than half of machine epsilon. | | This concept is often extended into discussions of "units in | the last place" (ULPs), which is the magnitude of the value of | last bit of the mantissa. So ulp(1.0) = ulp(1.5) = machine | epsilon, ulp(2.0) = ulp(3.9) = 2 * machine epsilon, etc. A good | floating-point library will document its accuracy in terms of | ulps, so that we might say that the maximum error of the exp | function is 1 ULP and that of pow might be 6 ULPs. | Sebb767 wrote: | I thought that too, but it's actually the difference between | 1.0f and the next larger number; so it only represents the | smallest difference for the non-dotzero range. | | See here: http://www.cplusplus.com/reference/cfloat/ | MatmaRex wrote: | There are different "epsilon" constants in different languages | that mean different things. In C and C++, FLT_EPSILON is | "Difference between 1 and the least value greater than 1 that | is representable." | (http://www.cplusplus.com/reference/cfloat/). You might be | thinking of the C# constant. | neeeeees wrote: | FLT_ESPILON is the bound on rounding error for certain | operations. Plenty of numbers smaller than it can be | represented in most floating point implementations | jansan wrote: | Javascript guy here, can you help me set up that Beowulf cluster | so I can start testing my 2^64 cases? | draw_down wrote: | If your code works for 1.000000000001 but fails for | 1.000000000002, go back. | MeteorMarc wrote: | For python you cannot user the stdlib for this, but it would work | for numpy float32. | tialaramex wrote: | Many years ago now we built something that used a 60-bit | truncated hash of URIs. So that's far too small to be comfortable | that you won't run into collisions, but it's big enough that for | modest sizes (hundreds of millions up to billions) a collision is | unlikely. So you can have a slow path for the collision case, so | long as that slow path is correct. For unit testing we needed at | least one collision so we could see that everything behaves as | designed before some real user hits one. We had test data sets | with over a billion URIs in them, but none that (as far as we | could tell) had a collision. | | So I just wrote code to mint nonsense URIs for our system based | on a small integer seed (like https://foo.example/123456789), | then span up a system with a bunch of RAM (maybe 8-12GB) to | iterate through each seed, storing the hash and seed it was | generated from, until it hit a collision after a few gigabytes of | RAM, then it spat out the two URIs from the colliding seeds. | Bingo, now we had data for unit testing to show that stuff works | even in the unlikely case that two distinct URIs have the same | hash. | | Probably a day's work to build that, and then I'd guess several | hours every month saved debugging weird collision bugs because | our unit tests would now tell us if we'd screwed up and | collisions were slipping past. | kortilla wrote: | This is one of those cases where it would have been easier to | fake the hashing function in the unit tests with something that | collides when you want. What happens if you want a test case | where 3+ URLs hash to the same bucket? | nulptr wrote: | I have a question about how you generated collisions. | | If each seed is a 32-bit int, and each hash is 60 bits, then | you'll have to store 92 bits for each (seed, hash) pair. | | With 2^32 (~4.3 billion) (seed, hash) pairs, you'd need 2^32 * | 92 bits ~= 50 gigabytes. | | But you mention only needing 8-12GB. Is my calculation wrong, | or have I missed something? | [deleted] | manulp wrote: | You're looking for a collision, any of them will do, so your | computation doesn't actually need to go through all the | elements. https://en.m.wikipedia.org/wiki/Birthday_attack | Thorrez wrote: | Since the hash is 60 bits, a collision is expected after 2^30 | tries due to the birthday paradox. So by your math this would | be 12.3GB. Although this is ignoring hash table overhead, | which I expect to be at least 2x. Maybe they just got lucky. | nulptr wrote: | I think I found a way to do it with ~8GB. As you mentioned | you only need 2^30 tries due to birthday paradox. | | But, instead of using a hash table to store (seed, hash) | pairs, we can just use a list of only hashes. To generate | the next hash we use current hash. i.e. next_hash = | hash("example.com/" + string(int64(cur_hash))). So now, we | just store 60-bit hash values which takes up 2^30 * 60 bits | = 8gigs. | | The only other problem is detecting if a hash already | exists, which I think can be done using a space efficient | probabilistic hashset like a bloom filter (and when the | bloom filter says the hash already exists, we do a linear | scan to find where it exists and we have a possible | collision). | penteract wrote: | Actually, using that method, you'd need much less than | 8GB, since a collision would mean you'd hit a cycle, so | you could store something like 1 value for every 2^10 | iterations. When you found a collision with one of those, | you could examine the interval more carefully to find | exactly where the collision was. | bluesign wrote: | You can also truncate hash at 30 bits and use it to address | bucket, 12gb then would give you like 3x32 bits, which you | can store your candidates. Then compare full hash with | candidates. | stouset wrote: | Why not just run the test-cases using a 16-bit hash function? | kortilla wrote: | Likely a C programmer with little experience doing testing | with fakes. Most C projects I've worked on are horrible at | structuring the code in a way that makes it sane to do | dependency injection or mocking. | AtlasBarfed wrote: | Did you truncate the hash for dataset size considerations? If | you associate data with the hash, I would guess the test data | would dwarf the hash size, so why not do 128 or 256 hashes? | tialaramex wrote: | The hash is for content-based addressing. We're going to | fasten those 60 truncated bits to a 4 bit tag that says this | is a URI in the content-based storage. And then we can use | this 8 byte value to refer to that URI twice, or fifty | thousand times in our system even if the URI is a hundred | bytes long - the only problem is collisions, so we need to be | sure we notice if there is a collision and handle that | correctly everywhere. | wbsun wrote: | Any reason the hash function and the collision handling can't | be separated and mocked for unit tests? What if you decide to | increase the bits of hash values or change the hash algorithm? | seanmcdirmid wrote: | Couldn't you also test the recovery logic with a hasher that | yielded collision more often? It's like take your system, make | it worse (use very dumb hashing), and see how it degrades | and/or breaks? | rocqua wrote: | Seems like that would make it less of an integration test | jmholla wrote: | They said it was for unit tests. | Heliosmaster wrote: | they mention this was a unit test, for which stubbing the | hashing function would have been totally fine | treeman79 wrote: | This! | | Take over a project with thousands of integration tests. | | I trust none of them. Why? Because anything hard to test | has been mocked to oblivion. | | I have api endpoints that have massive complicated | integration tests. I eventually just got source for every | project in company and checked to see what they actually | did with API. | njharman wrote: | > Because anything hard to test has been mocked to | oblivion. | | Agree. Not an integration test. If it has mocks in it, it | is not testing the INTEGRATION of/between components. | | Mocks are for Unit tests. Where you want to isolate | testing one single UNIT from all the other crap it | integrates with (which have their own unit tests) After | all these unit tests run, you test the whole shebang | together with integration tests. | jskdvsksnb wrote: | I have the inverse - I started at a place that has | hundreds integration tests that require a database, k8s, | the whole shebang. No mocking anywhere. | | Tests take an hour to run, are extremely fragile and | flakey. Getting your local dev machine set up to run even | a subset of the tests is a half day ordeal, and it can | change periodically. | | If you have external dependencies that don't version | their APIs well it may be inevitable that you need to run | tests against them. But for developer productivity you | should have a fast/non-flakey path to run local unit | tests with no network dependencies. It's also great for | working on a plane. | 978e4721a wrote: | Cause everyone works mostly on a planes, right... | thaumasiotes wrote: | In this case, I think mocking is the right thing to do. | You want to know that you can handle the case where two | hashes collide. But for this purpose, you don't care at | all how the hashes were generated. | | If the website changes its URI hash algorithm, the | "honest" test that just supplies two URIs that happened | to collide under the earlier algorithm will suddenly | become worthless. The mocked test will continue doing | exactly what you always wanted it to do -- exercise the | "what if the hashes collide?" case in your code. | tialaramex wrote: | My preference is always to actually test parts in as close as | possible to the real state, avoiding mocks unless they're | necessary. The more "fake" the system is, the less it tells | you about how your software actually works. | | So since I _could_ generate collisions for the real hash, | that 's what I did. If it had been infeasible, I would of | course have found a different way to test. | | This particular software was written mostly in C (today I | would not write it in C) which makes it even more likely that | introducing never-used-in-production behaviour unexpectedly | perturbs the system. This same system for example ran into an | obscure (and since fixed) libc allocator bug, and a defect in | the Linux kernel, because we were really kicking the shit out | of the memory mapped file I/O with what we were doing. | seanmcdirmid wrote: | The only high fidelity mantra in testing is a bit | misguided. If you only test your system in as close to | production state as possible, you are never going to | encounter stress states that would only very rarely appear | in production. Seeing how your components independently | deal with less ideal components in the system can tell you | a lot about how they might handle rare but bad cases in | production, and allow you to over engineer them a bit more | to avoid that. | | > which makes it even more likely that introducing never- | used-in-production behaviour unexpectedly perturbs the | system. | | Isn't that a good thing? Isn't the goal to deal with the | unexpected in the first case? | jniedrauer wrote: | Tests come with a non-zero cost. They're code that needs | to be maintained. Often complex code, with poor | documentation. I don't think the value provided by | testing states that you will never actually encounter in | production outweighs the cost of maintaining the tests. | tikhonj wrote: | This approach works well until you _do_ encounter a | situation like that in production :). It 's worth testing | conservatively because our understanding of our systems | is imperfect and because assumptions that may be valid | _now_ are likely to change in the future. It 's | especially true for code that might operate at scale or | has security considerations, because both of those can | magnify the effects of conditions that occur with | _arbitrarily low probabilities_. | | Personally, I see tests as not just a way to prevent | bugs, but as a way to learn about the system being tested | --both how it behaves _now_ and, continually, how it | behaves in future iterations. In this specific case, | understanding what happens if two URIs hash to the same | value _even if they don 't "really" do that_ is an | important bit of information: it tells us not only what | would happen with a real hash collision but also how our | system would behave if there were a bug either in the | hashing algorithm itself or in the code that connects the | hashing algorithm to the component that _uses_ the | hashes. | | What happens if we introduce a caching layer between | where the hash is generated and where it's used, and | there's a cache invalidation bug? I don't know how | realistic that is _in this specific hypothetical_ , but | I'm sure there are other performance optimizations with | the same bug potential that I'm not even thinking of. | Changes like that can lead to absolute debugging | _nightmares_! Testing even "impossible" edge cases helps | catch this in a systematic way, without needing full | knowledge of which conditions might matter. | | Of course, none of this talks to the trade-off you | pointed out: test code _does_ have a real cost. That is | absolutely something to consider. I just don 't think the | answer is to draw a hard line at "don't test situations | that can't come up in production"--and, perhaps naively, | I think the "real" solution is less about deciding what | is and isn't worth testing and more about reducing the | costs of testing more. Write application code in a way | that's easy to test and treat your test code _as code_ | --keep it clean, include comments, and invest time in | setup and tooling to make your tests as easy to write and | maintain as possible. | seanmcdirmid wrote: | To achieve "defense in depth," testing components in | degenerate non-production situations (and then over | engineering them to gracefully deal with those | situations) is the only way to get there. Of course, we | aren't building bridges, so maybe it's less important. | TickleSteve wrote: | Thats not really a "defense in depth" situation. | | Defense in depth is regarding layering different types of | security, not targeted testing. | | https://en.wikipedia.org/wiki/Defense_in_depth_(computing | ) | nitrogen wrote: | It's an analogy. Multiple layers of defense against bugs. | kortilla wrote: | > The only high fidelity mantra in testing is a bit | misguided. | | That's understating it. It's incompetence because it's an | excuse to not test a huge surface area of edge cases. | kqr wrote: | > My preference is always to actually test parts in as | close as possible to the real state, avoiding mocks unless | they're necessary. The more "fake" the system is, the less | it tells you about how your software actually works. | | I've heard this before, but never actually had that problem | with fakes in practise. Where can I read more about it? | brianwawok wrote: | Have a production outage. See the code in question is | 100% tested. Scratch head a bit. Vow to fake less next | time. | nitrogen wrote: | Supposedly the software problems with the Starliner | spacecraft were also caused by extensive unit testing | under test harnesses without integration testing, as a | visible example that reached the news. | tshaddox wrote: | "Test as close as possible to the real state" makes sense | as a rule of thumb, but I'm having trouble coming up with a | reason in this specific case how mocking the hash function | could go wrong. Assuming your testing/mocking library is | reliable, this should be testing precisely what you intend | to test. I guess it's possible that your "more real" test | relies on some hidden behavior, like how your real hash | function utilizes system resources, which could affect | production, but I don't know that tests which are testing | things you're not even aware of is a great thing to | advocate for. | naniwaduni wrote: | The point of the principle is that you _don 't know_ how | the mocking can go wrong. Assuming your testing/mocking | library is reliable gives you one more thing that can | unexpectedly go wrong. | | Of course, sometimes getting verisimilar test cases is | difficult enough that you judge it not worth the effort | over mocking it up and dealing with any issues with the | mockup, foreseen and unforeseen. | tshaddox wrote: | I think you generally need to either assume or establish | that your testing environment tends to be sound, just | like you need to assume or establish that your | programming language/compiler/computing environment tends | to be sound. Your application code and test code itself | probably shouldn't be compensating for or attempting to | avoid known or unknown flaws in your testing environment. | kortilla wrote: | That's a stupid principle. If you don't have your | interface well-defined enough that switching out the | hashing implementation is difficult, your code is too | tightly coupled. | | A test handing a hash collision _is not a test of the | hashing algorithm_ so you should be able to switch out | the algorithm to something super predictable. This isn't | a discussion about mocking frameworks - you should be | able to do with by just plugging in a different hash | implementation when you construct the test. | kuiper0x2 wrote: | These types of blanket rules are useful to ensure a 100% | accurate test case. If an error in production costs me | $15 Million I don't really care if the engineer wastes a | few days to follow a dumb rule. | | Also I can employ a $75k - $100k/year engineer and say | follow this protocol for testing. If I want someone who | can optimize test case 100% of the time with zero | mistakes ever then I might have to pay $150k - $200k / | year. | Scarblac wrote: | It ensures that if someone changes the hashing algorithm, | your test for the collision mitigation now doesn't test | that anymore even though they should be unrelated. | halter73 wrote: | After going through all that effort to make the test less | "fake", I expect tialaramex verified the collision still | happened as part of the test. | lisper wrote: | Why not just use a hash function that special-cases two | test URLs (or a whole class of test URLs) and produces | collisions for them? You can use that same hash function | in production. | xxxtentachyon wrote: | Or, even simpler, just stub out your hash function and to | something that always generates the same value and test with | that. | harpiaharpyja wrote: | I was wondering why GP couldn't have swapped out the hash | function for one that just returns a fixed value, making | everything a collision. Then it would just be a matter of | verifying the system still works under that condition | (ignoring the performance hit). | | You could even measure the performance hit from that degraded | condition and create a heuristic to warn you if something is | wrong with the fast path. | jcims wrote: | I embarrassed myself as a Noogler trying to explain what in my | head seemed like a good tradeoff between hash size (also of | URIs) and collision handling. The process had fixed storage for | a table of known hashes, and every collision was tested for | validity before taking action. If you could use a truncated | hash, you'd obviously increase your collisions, but if you're | going to validate all collisions anyway, it seemed like you | could increase your coverage substantially (in the 60-bit case | it would have been 4x) while only marginally increasing your | overhead of collision handling. | | There's some kind of sparsity/entropy function involved with | optimizing these two but I'm too clueless to sort it out. | Without the vocabulary to make my case I just got weirded out | by a bunch of smart people staring at me quizzically. Sooo.. | [deleted] | btrask wrote: | It's known as the birthday paradox. There is a probability | table here: https://en.wikipedia.org/wiki/Birthday_paradox#Pr | obability_t... | Arcuru wrote: | That kind of thing happens to me all the time. I'm very often | unable to think of or explain things in the moment. | | Sometimes it helps to write a doc afterwards to get my | thoughts in order and decide if it's worth sharing. | known wrote: | Very difficult due to rounding errors; | JoeAltmaier wrote: | My god this issue has been around for too long. In the '80s | testing folk learned to shotgun the argument space to test | functions, and they've not gone an inch forward since! Or so it | seems. | | A version of the Pentium was released that couldn't divide by 10. | And even back then, testing all the mantissas on a simulator | would have taken only hours. Instead, they masked, marketed and | released a tragically flawed processor. | | Just test the input space exhaustively, folks! It's a computer; | it feels no pain. Work it hard until it fails or works perfectly. | [deleted] | fallingmeat wrote: | maybe I dont understand, but is this article only trying to | suggest exhaustive testing for single precision, single input | functions? obviously this approach is not practical as soon as | there are multiple inputs or wider precision. | brianberns wrote: | Not to mention that exhaustive testing only works when you | already know the correct result for each input. | nullc wrote: | In many cases it is practical but people wrongly assume it | isn't. In cases where its not practical exhausting a subset can | still be very powerful. | | E.g. for a function that takes two floats (x,y) test every | value of x input with y={0,-0,1,nan,inf,-1,x,-x,2x,x/2,FLT_EPSI | LON,FLT_MIN,FLT_MAX,random,random,random...}, and then repeat | with x,y swapped. | | A little knowledge of the domain can help pick better special | values, but there is a trade-off in introducing the same blind- | spots that permitted bugs in the first place. Still: testing | just one argument on all values because that's all you can | afford is still a lot better than only testing a couple special | values for both arguments. | | Exhaustive testing, like randomized testing catches errors you | weren't even thinking about-- but unlike random testing | exhaustion can reliably catch bugs which are extremely rare. | "Partial exhaustion" can have some of the same benefits. | fallingmeat wrote: | mmm, disagree based on my experience. oh also I think people | on this thread may be misusing the concept of float epsilon. | keep in mind that is not the smallest value of a float. | kardos wrote: | Testing all of the doubles may actually be possible [1]. Even | if that estimate is optimistic, testing all doubles for order | $10k would be feasible for some problems. | | [1] https://news.ycombinator.com/item?id=18109432 | pm215 wrote: | Yes, the article says "Exhaustive testing works brilliantly for | functions that take a single float as input. I used this to | great effect when rewriting all of the CRT math functions for a | game console some years ago. On the other hand, if you have a | function that takes multiple floats or a double as input then | the search space is too big. In that case a mixture of test | cases for suspected problem areas and random testing should | work." | | I think the point is that exhaustive testing is feasible in | more situations than you might at first think, and than | floating-point arithmetic is a particularly fertile source of | edge conditions and corner cases, so it's a good place to go | for exhaustive testing where you can. | fallingmeat wrote: | fertile indeed!!!! haha | | I should learn to read | zelphirkalt wrote: | Test them all and waste a huge amount of energy and computational | resources! | | I hope people test like that for really critical stuff, but not | for your everyday library for your personal startup or | e-business. Not even the big players need this. | sega_sai wrote: | It is all very nice, but if your function has two floating point | parameters you have 1.6e19 tests to do... | formerly_proven wrote: | Totally doable. That's just a few hours using 100k cores. | TheRealPomax wrote: | Worth it if your function is acutely fundamental to the | operation of your software because your software runs everyone | else's software. You buy some time on a compute cluster with | 1024 cores and 4TB of ram, and off you go, after making sure | you understand your own code and reduce that number back down | to "no it isn't, because two arguments in a given space for | fundamental maths operations, taken together, almost certainly | don't require combinatorial testing" =) | andi999 wrote: | So it only needs 4 Billion times 90s, so is this like never in | a million years? (actually it should be only around 12000 year, | so you could use a cluster of 48000 Computer and be done in 3 | month) | Someone wrote: | Distribute it over every smartphone on the planet (about 4 | billion), and the computations will be done in minutes (of | course, that's dependent on the function being tested). | | There will be orchestration overhead and transmission | latency, but I guess one could get all results in one place | in a day, _if_ one had the ability to do the "run it on every | smartphone" thing. | ChrisLomont wrote: | Then you can prove your algorithm correct using proof libraries | like Z3. | thechao wrote: | I have worked at two companies where we brute-force checked | binary floating point operators. It's annoying, error prone, | boring, and _soooooo_ satisfying. The Holy Grail^TM is brute | force F32-FMA. That one really requires a lot of thought, and | can't be reasonably brute-forced on _anyone's_ budget. | xchaotic wrote: | Articles like this make me wonder how everything even works if | basic math functions can be so wrong and how a simple concept | like floor can have different definitions/ implementations | Frost1x wrote: | It often doesn't to high degrees of precision and reliability. | It works within some tolerance range and those tolerance ranges | are managed or acceptable levels of error (often unnoticed). | | I recently helped move/port one fairly complex numeric model | from a language and environment to another language and | environment and there was an assumption made by upper | management that this was simple and _exact deterministic | results_ would be easily had. | | I warned that this could be an involved process because of | simple variations, lack of equivalent mappings between | environments and languages, some of which pointed out in this | article. | | Management were quite surprised with the level of effort needed | to dig down layers of abstraction and identify all these sort | of variations in numeric libraries from consistent high level | abstractions they were familiar with, and why those high level | abstractions simply didn't exist in the other language and | environment. They were additionally surprised inthe level of | effort needed to get satisfactory error tolerance ranges | between the two models when having to recreate high level | abstractions identically from the source language and | environment to the target language and environment that simply | didn't exist prior to the work. | | They didn't realize the variation until we performed a detailed | analysis on results and found very small but statistically | distinguishable results from the two versions. We ended up | identifying a few dozen small assumptions they made about their | high level abstractions from the source environment that were | false and caused the variation. | pierrebai wrote: | The answer is that the bugs the article claims are corners | cases, most of which will not matter in most practical uses of | floating point maths... and some of which will is due to not | emulating a weird IEEE floating point rule about rounding | numbers of the form N.5 to the even number instead of up like | everyone is taught in school. | | So "wrong" as used in the article simply means "not the same as | the IEEE behavior", not "wrong" as in "wildly inaccurate". It | matters if you want 100% exact emulation to get identical | results. It doesn't if you don't and just want speed. | phkahler wrote: | >> The answer is that the bugs the article claims are corners | cases, most of which will not matter in most practical uses | of floating point maths... | | Most bugs are corner cases of one sort or another. At the | application level you can decide how important the corners | are. When you're creating a math library that many others are | going to use, _and_ there is a standard, I think the | expectations should be higher. | CydeWeys wrote: | > some of which will is due to not emulating a weird IEEE | floating point rule about rounding numbers of the form N.5 to | the even number instead of up like everyone is taught in | school. | | Round-to-even (also known as banker's rounding, because it's | used in finance too) isn't weird at all. It minimizes | rounding error, especially on repeated calculations. The | rounding rule that everyone learns in elementary school is | _bad_ and results in systematically biased computations. We | can and should do better than that flawed approach. This post | explains it in more depth: | https://mathematica.stackexchange.com/a/2120 | SoSoRoCoCo wrote: | You should see how confusing modulo becomes of the method isn't | explicitly specified: | | https://en.wikipedia.org/wiki/Modulo_operation | jlebar wrote: | Six Stages of Debugging | | 1. That can't happen. | | 2. That doesn't happen on my machine. | | 3. That shouldn't happen. | | 4. Why does that happen? | | 5. Oh, I see. | | 6. How did that ever work? | | http://plasmasturm.org/log/6debug/ | marketingPro wrote: | Someone correct me if I'm wrong because it's been years since | my EE class. | | Basic math ends at division. Heck if you learn how math works | on computers, it feels like a number trick. Those are reliable. | | The issue comes when you need to track never ending decimal | points. Even if every transistor was dedicated to your problem, | you'd still need to round. So even if you have 32 bits of | decimals, there will be rounding sometimes. | | The human has to decide what to do. Instead of converting the | binary to decimal and calling it done, it's more important to | be consistent and predictable. | | The issues described in the article arent 5/2= 2 billion. The | article talks about rounding giving you always even numbers, | which you can imagine leads to issues if you are using both | float and using logic that involves even and odd numbers. | | Edit- can someone correct me or add details, I'm very | interested but I'm not taking ENG 240 again... :p | esperent wrote: | Exactly this. The floor function works on binary numbers but | has to round them to a decimal place. It's not simple, it | just appears to be. | bonoboTP wrote: | Floor has nothing to do with decimal. | erichocean wrote: | The period in "3.14" is called a "decimal." The digits | after the decimal in "3.14" are enumerated as "decimal | places." What each decimal place "means" depends on the | numeric base. | | So, "decimal places" or "rounding to a decimal place" has | nothing to do with base 10 (which is what I think you are | implying): decimal and "decimal places" exist in all | bases, including base 2, and the meaning of each "decimal | place" depends on the numeric base (base 10, base 2, | etc.). | nitrogen wrote: | Sometimes people use "binary point" to distinguish from | decimal point and avoid the confusion of the dec- prefix. | erichocean wrote: | "Fraction point" and "fraction (or fractional) places" | might be a decent alternative that clearly works in any | base, while also conveying its purpose. | CydeWeys wrote: | This is why the big rational data type is so useful on | occasion. When you just want to get one calculation exactly | right, and performance doesn't matter so much because you | aren't doing huge numbers of repeated operations (as in ML), | use a big rational with its arbitrary precision numerator and | denominator and you don't need to worry about even epsilon | error. | visarga wrote: | > because you aren't doing huge numbers of repeated | operations (as in ML) | | In ML precision doesn't count so much. It's often possible | that we train on 32bits and deploy the model on 16bits. In | fact stochasticity is useful and we add it in by dropout or | other things. You can drop any connection (or millions of | them) in a neural net and get almost the same result. There | have been papers that showed that reducing the number of | bits used to represent weights might even improve the | network because it has a regularising effect. The real | brain is also stochastic. | bonoboTP wrote: | Except if you run exp, log, sqrt, sin, cos etc. Then you | immediately leave the rationals. | CydeWeys wrote: | Yes, it definitely depends on what you're doing with the | numbers, but personally I don't recall ever having needed | trigonometric functions in my decade plus working life | doing standard line of business type stuff. | marketingPro wrote: | I can't tell if you are dismissing this, but as an FYI | trig is used in Engineering. | CydeWeys wrote: | I said "it definitely depends on what you're doing with | the numbers". That's not a dismissal; I'm simply pointing | out that neither I nor many if not most other software | engineers use trig on a frequent basis. I'm well aware | that it's used in actual engineering, but that's just not | what most of us here do. | ViralBShah wrote: | We've certainly tested all Float32 inputs to some functions in | Julia's math packages. I'm unable to find the relevant issues | this very moment, otherwise would certainly post them. | raphlinus wrote: | I suspect https://github.com/JuliaLang/julia/pull/29700 might | be one of them. As the original post states, rounding is (a) | amenable to hardware acceleration, (b) easy to get wrong or | slow, and (c) possible to test exhaustively. | throwaway7281 wrote: | I was once in the position to test 3M possible inputs - that took | only a few minutes. It's a good strategy to keep in mind. | zzzeek wrote: | Crap now I have 67 million failures to fix. | wtn wrote: | In the early 1980s, truncated floating point calculations caused | the Vancouver Stock Exchange index value to drift down for about | 23 months, until it was shown as less than half the correct | value: | https://en.wikipedia.org/wiki/Vancouver_Stock_Exchange#Round... | [deleted] | Const-me wrote: | In addition to many correctness issues, that code is relatively | slow. | | SSE4.1 introduced roundps instruction that's both fast and | correct. Unlike AVX2 or FMA3, SSE4.1 support is almost universal | by now. In most of my current projects, I check support on | startup and refuse to run if SSE4.1 ain't supported. Just don't | forget to set the corresponding macros for DirectXMath, Eigen, | etc. | dragontamer wrote: | Things have gotten even faster since then. | | * Target: 64-bit space (2^64). | | * GPUs are more flexible and faster. A Vega64 (a few years old | now) has 4096 shaders ("CudaThreads") at 1.6 GHz. It can search | the entire 32-bit floating point (or integer) space in less than | a second. 1.6 GHz x 4096 shaders == 2^42.5 bitspace per second. | | * People exhaustively searching a space are relatively patient: | and willing to wait months for the results. 60 seconds x 60 | minutes x 24 hours x 30 days == 2^21.3 | | 1.6 GHz x 4096 shaders x 60 seconds x 60 minutes x 24 hours x 30 | days is roughly equal to 2^63.8. We've done it: a month of GPU | effort is all you need to exhaustively search the 64-bit space | | ------- | | We've entered a world (~3 years ago), where it is reasonable to | brute force the 64-bit space, so long as you're willing to wait | about a month for the results. | | If you're not willing to wait a month, then use 2 GPUs, or 4 | GPUs, or a cluster of GPUs like the DGX A100 (or multiple | clusters). | Keyframe wrote: | Sounds like a small and relatively cheap farm could do it in | hours or minutes even. | dragontamer wrote: | Hmmm... assuming 4-GPUs per node, and 1200W per node, you'll | need 216kW to generate the result in a day. | | But that also assumes that the operation you're testing is | 1-clock tick, which is only true for the most trivial of | calculations. Still, 216kWs per day per "test-case clock | ticks" shows us that the 2^64 exhaustive bitspace is within a | reasonable organization's grasp. | dan-robertson wrote: | I think your units are wrong. You want 216kW days, not | 216kW per day. 216kW days is 18.6Gj. | antoinealb wrote: | Note that the article is specifically talking about testing | accelerated code. This is usually using machine-specific | features, meaning it won't run on a graphic card. | | I am also not sure about how fast you are if you need to | transfer data out of the GPU, or to add conditional branching | to the shader code (it used to be that branches were super slow | in GPUs, not sure where they are now). | | I could see it working for some other problems though. | dragontamer wrote: | > meaning it won't run on a graphic card. | | GPUs implement the ceil and round functions. | | Heck, GPUs implement popcount, bit-reverse, and a whole slew | of instructions that CPUs don't. (Mainly bitreverse and | bpermute. Intel x86 only has permute (aka pshufb), but not | the backwards version. BSwap can't emulate bitreverse) | | > (it used to be that branches were super slow in GPUs, not | sure where they are now). | | Branches are fine. Its divergent branches (in both CPUs and | GPUs) that are slow. A uniform branch will be branch- | predicted on a CPU, while a uniform branch on a GPU will have | all 64-threads (AMD GCN) or 32-threads (NVidia or RDNA) | follow the instruction pointer. | | There's no slowdown with uniform branches, either on CPUs or | GPUs. Both CPUs and GPUs do poorly on divergent branches, but | GPUs do MUCH worse than CPUs. | daveFNbuck wrote: | The article was about testing a specific implementation of | floor, ceil, and round using CPU instructions. You can run | floor, ceil, and round on the GPU, but that won't tell you | whether your new implementation using CPU instructions is | correct. | dragontamer wrote: | I thought the post was about how the 32-bit space could | be exhausted in minutes with common hardware in the year | 2014. Which means that any problem I have which is in the | 32-bit space is a candidate for brute force. | | Given the advances in computers over the past few years: | exhausting the 64-bit (or 2x 32-bit) space is | "reasonable" for the year 2020. At least, reasonable for | people who are willing to wait a month (on one GPU), or | willing to buy a cluster of GPUs. | masklinn wrote: | > I thought the post was about how the 32-bit space could | be exhausted in minutes with common hardware in the year | 2014. Which means that any problem I have which is in the | 32-bit space is a candidate for brute force. | | No, the post is about exhaustive testing, and that it's | completely _reasonable_ on 32 bit values because it only | takes a few minutes. | | > Given the advances in computers over the past few | years: exhausting the 64-bit (or 2x 32-bit) space is | "reasonable" for the year 2020. | | That's not actually reasonable, it's _feasible_ for very | specific contexts. | | > At least, reasonable for people who are willing to wait | a month (on one GPU), or willing to buy a cluster of | GPUs. | | And whose code specifically can run on GPU. TFA _is about | CPU-optimised code_ , the entire point was to | exhaustively test an SSE3 implementations. How do you do | that on a GPU? | dragontamer wrote: | > That's not actually reasonable, it's feasible for very | specific contexts. | | I was inventing a new hash function for myself, just for | giggles a few weeks ago. | | I needed a constant: I started by choosing an arbitrary | constant (I started with 0x31415926...), but then I | realized that the 32-bit space, and even 50+ bit spaces | were feasible on my GPU. | | I ended up rewriting the random-number generator on the | GPU, and searched for the number which caused the largest | number of "avalanche" bit-flips across the entire 32-bit | seed space of the RNG. | | Where an "avalanche" bit flip is 16 bits flipped, and | 16-bits remaining the same, after the RNG was applied to | the seed. | | For example: seed = 1 * K, where K == 0xAAAAAAAA will | cause 16-bit flips, and 16-bits to remain the same. | | Repeat for all 32-bit values of seed, searching for the | optimal K. 0xAAAAAAAA wasn't the best number (aka: | 1010101010101010 binary), but you can see where my | thought process was. | | My RNG was more complicated than just a single multiply, | but I think you can see where the general thought process | was with my above example. | | -------- | | There was a time when constants were arbitrarily chosen | for these kinds of functions. But today, we can literally | test ALL numbers and just pick the best. | | If K is constrained by some other requirements (in my | case: it must be odd, and a few other tidbits to work | with my RNG), then you shrink the search space from | 64-bits down to something that can be accomplished in | just a few days of search. | | ------------- | | As far as I'm concerned, the blog post is talking about | how modern computers have conquered the 32-bit space. | | My current post is how the 64-bit space could be feasibly | explored. It wasn't even that long ago when 64-bit space | was considered cryptographic-secure (see DES crypto | algorithm or WEP). | mlyle wrote: | No, the article is that if you have some code that | operates on a 32 bit value, it's reasonable on most | modern hardware to try all 2^32 values to ensure quality. | | There may be some GPU code that is reasonable to test on | all 2^64 values, but it doesn't help us verify the | implementation of SSE (Intel architecture) ceil/round/etc | against reference implementations at all, because trying | all double precision values would not complete in a | practical amount of time and offloading it to a GPU means | we are not testing the SSE implementation that we're | trying to verify anymore. | a1369209993 wrote: | > trying all double precision values would not complete | in a practical amount of time | | Actually we're getting close-ish: AVX512(=8x64) at 4GHz | (or two dependency chains at 2GHz) times 64 CPU cores | (which TFA's author has mentioned having) is 2 teraops. | Double-precision space of 2^64 / 2Tops is 3 | months/instruction on a single computer. So you could | brute-force verify a simple function in a year or two, | and at least the SIMD width and number of cores is likely | to continue increasing. | daveFNbuck wrote: | The post was about exhausting the space to test code. The | example they used of code that should have been tested | this way was CPU code. If you're testing CPU code, | exhausting the space quickly on a GPU isn't helpful. | xigency wrote: | > Note that the article is specifically talking about testing | accelerated code. This is usually using machine-specific | features, meaning it won't run on a graphic card. | | I think there are a lot of computing algorithms you might run | on a GPU where you might want to verify the correctness over | the range of 32-bit or 64-bit numbers (integers or floating | point). | OscarCunningham wrote: | You could also use this to verify 32-bit functions with two | arguments. | vortico wrote: | Does anyone have code to reduce the search space by a few orders | of magnitude to test functions with a float64 argument or two or | three float32 arguments in a few seconds? Perhaps by skipping | "boring" floats that aren't near endpoints that are unlikely to | produce errors that more interesting floats do? | rigtorp wrote: | Yes, I've done this using libFuzzer. Supply custom mutation and | crossover functions that generates interesting floating point | numbers. Then let the coverage guided fuzzer exercise your | code. https://rigtorp.se/fuzzing-floating-point-code/ | daveFNbuck wrote: | That's essentially what fuzzy testing libraries do. You tell | the library what types of arguments you want, write your test, | and it puts in some known corner cases and a few random values | that change each time you run the test (but are saved as new | corner cases when the test fails in the better libraries). | TheRealPomax wrote: | all floats are equally boring. | vortico wrote: | No, -0.f is more interesting than 4.1045857301e23f. It's more | interesting because it's more likely to fail a test where | others don't. | ramshorns wrote: | Well, if the ceil function failed on 1.0f, maybe even the | ordinary numbers are worth testing. | | But... would randomness help? Take a representative sample of | like 1/64th of the double space and you might run into most of | the common problems. | hinkley wrote: | I decided to build a scatter plot to demonstrate what I was 95% | sure was a data loss bug (preproduction, thankfully) in the | initializer for a CSPRNG. | | I figured plotting a few bytes if output (256x256) would do | nicely. I plotted all possible two byte inputs just to make sure | it was working, and that ran instantly, so I tried 3 bytes and | that was still pretty fast, so I did 4 while I was at lunch or in | a meeting. | | Sure enough, big black splotches from dropping some of the seed | data on the floor. Please to be fixing. | | Old farts know a lot of things have been tried before and how | those worked out. But we also think doing a complex operation 4 | billion times is going to be irretrievably slow. Always test your | assumptions. | elvis70 wrote: | "When in doubt, use brute force" -- Ken Thompson | IncRnd wrote: | > I've written in great detail about how to compare floating- | point values using an epsilon, but there are times when it is | just not appropriate. Sometimes there really is an answer that is | correct, and in those cases anything less than perfection is just | sloppy. | | Certain applications will choose fixed point to avoid what | appears to be the imprecisions of floating point. ___________________________________________________________________ (page generated 2020-10-11 23:00 UTC)