hngopher.com

       [HN Gopher] There are only four billion floats, so test them all...
       ___________________________________________________________________
        
       There are only four billion floats, so test them all (2014)
        
       Author : tape_measure
       Score  : 372 points
       Date   : 2020-10-11 13:19 UTC (9 hours ago)
        
 (HTM) web link (randomascii.wordpress.com)
 (TXT) w3m dump (randomascii.wordpress.com)
        
       | peter303 wrote:
       | I recall hackers building standard password hashes of all eight
       | character ascii combinations. Then create a reverse lookup table.
       | But thats 400 trillion entries for stand 68 character ascii
       | password subset.
       | 
       | 400 trillion hashes is not a big computation in the blockchain
       | era.
        
         | SirYandi wrote:
         | More info: https://en.m.wikipedia.org/wiki/Rainbow_table
        
       | ascar wrote:
       | Previous submission 2017:
       | 
       | https://news.ycombinator.com/item?id=15252426
       | 
       | Can't believe this was already 3 years ago. Interesting read!
       | 
       | Previous submission 2014:
       | https://news.ycombinator.com/item?id=7135261
        
         | tape_measure wrote:
         | Yes, I originally came across this on HN, likely in 2017 based
         | on my surrounding bookmarks. It has stuck with me ever since
         | even though I don't do much programming anymore. At the time, I
         | was a TA in a MATLAB course, so I tried to emphasize that the
         | code runs on a machine and is fundamentally discrete. (.1 + .2
         | == .3) returning false is an easy way into this concept.
         | 
         | 32 bits (and now 64 as @dragontamer calculated) is a huge input
         | space. You could verify 4 int_8 or 2 char[2] functions.
        
       | jlebar wrote:
       | We did this in XLA. It runs as part of CI, which I always found
       | pretty amazing.
       | https://github.com/tensorflow/tensorflow/blob/bd2608178bca1b...
       | 
       | Found a lot of bugs this way.
        
       | NelsonMinar wrote:
       | "Several of the ceil functions were implemented by adding 0.5 to
       | the input value and rounding to nearest."
       | 
       | What sort of clown would make that mistake and yet think they are
       | capable of writing a floating point library?
        
         | Exuma wrote:
         | Can you give an example of when this wouldn't work? I'm
         | definitely not qualified to write a float library (lol) but I
         | also can't think of any cases where that wouldn't work?
        
           | brianush1 wrote:
           | ceil(5) = round(5.5) = 6
           | 
           | ceil(5) should be 5
        
           | zelphirkalt wrote:
           | This is probably about number precision and loosing digits,
           | when you add something with digits only in the beginning like
           | 0.5 to something small like
           | 0.1000000000000000000000000000013124142579275927597245205 or
           | whatever a barely representable floating point number is.
           | Then the result becomes inprecise and rounding to nearest
           | will result in the wrong value.
           | 
           | Such effects and similar should be well known to anyone
           | taking on the responsibility to develop a floating point
           | library.
        
       | ChuckMcM wrote:
       | I love this.
       | 
       | This is another in the "Now that we have computers with
       | gazillions of bits of RAM and many cores and gazillions of
       | cycles, what conventional wisdom is no longer wise?" series :-).
       | 
       | I suggested to a client that they use Sqlite3 for their company
       | database rather than a disk based database. And they looked at me
       | shocked, as if I had spoken heresy. I pointed out how many
       | records they could store on a 64GB machine and that is a _small_
       | data center class machine and it sort of boggled their mind. Add
       | read-only copies, shared from a NAS device gave them perfect
       | scalability in terms of queries per second (their sales folks had
       | their own database that they were updating). It was fun to watch
       | their expression go from horror to wonder to excitement when they
       | finally implemented it.
        
         | dragontamer wrote:
         | https://yourdatafitsinram.net/
         | 
         | The IBM 53-bit quantum compute result from a ton of hard drives
         | + compute power was also intriguing
         | (https://www.ibm.com/blogs/research/2019/10/on-quantum-
         | suprem...).
         | 
         | 128PiB of disk space needed for a 54-qubit computation on their
         | system (hard drives). Fun to see where the supercomputers are
         | still needed, but smaller qubits can be emulated on normal
         | computers. 40 qubit calculations probably can be emulated on a
         | 16TB team of hard drives in RAID0.
        
         | kortilla wrote:
         | The trade-off isn't really size in a lot of cases though. When
         | stuff is stored in RAM you lose the persistence story.
        
         | michaelmcmillan wrote:
         | Yup! I'm currently running a b2c solution for a client that's
         | grossing ~$3 million/year using Python + sqlite3 behind nginx.
         | 
         | I work for another shop that scolds me for even suggesting
         | using something else than k8s. How can a persistent file system
         | possibly scale? Lol. Ironically they're grossing -$3
         | million/year.
        
           | tonetheman wrote:
           | I am curious about your case is the traffic mostly read
           | heavy? Ha wish you could share more. Interesting stuff.
        
         | svnpenn wrote:
         | > I suggested to a client that they use Sqlite3 for their
         | company database rather than a disk based database.
         | 
         | Sqlite is a disk based database...
        
           | capableweb wrote:
           | Well, maybe not "disk based" per se. It can be stored on
           | disk, yeah. But it can also run in memory. I think in this
           | case it's clear that ChuckMcM meant a machine with 64GB of
           | RAM (not disk space), therefore they are talking about having
           | SQLite running in memory, not on disk.
        
       | zaroth wrote:
       | > _For very small numbers (less than about FLT_EPSILON_ * _0.25)
       | adding 0.5 gives 0.5 exactly, and this then rounds to zero. Since
       | about 40% of the positive floating-point numbers are smaller than
       | FLT_EPSILON_ * _0.25 this results in a lot of errors - over 850
       | million of them!_
       | 
       | I thought Float Epsilon was the smallest possible number which
       | could be represented by a float? This statement doesn't seem to
       | make sense...
        
         | jcranmer wrote:
         | When I took numerical analysis, the concept was described as
         | "machine epsilon", which is the smallest number that can be
         | added to 1.0 that is not 1.0.
         | 
         | The utility of this metric is that it tells you the best
         | possible relative error, as you can never guarantee a result
         | will be more correct to the true result (in the domain of
         | mathematical real numbers as opposed to machine floating-point
         | numbers) than half of machine epsilon.
         | 
         | This concept is often extended into discussions of "units in
         | the last place" (ULPs), which is the magnitude of the value of
         | last bit of the mantissa. So ulp(1.0) = ulp(1.5) = machine
         | epsilon, ulp(2.0) = ulp(3.9) = 2 * machine epsilon, etc. A good
         | floating-point library will document its accuracy in terms of
         | ulps, so that we might say that the maximum error of the exp
         | function is 1 ULP and that of pow might be 6 ULPs.
        
         | Sebb767 wrote:
         | I thought that too, but it's actually the difference between
         | 1.0f and the next larger number; so it only represents the
         | smallest difference for the non-dotzero range.
         | 
         | See here: http://www.cplusplus.com/reference/cfloat/
        
         | MatmaRex wrote:
         | There are different "epsilon" constants in different languages
         | that mean different things. In C and C++, FLT_EPSILON is
         | "Difference between 1 and the least value greater than 1 that
         | is representable."
         | (http://www.cplusplus.com/reference/cfloat/). You might be
         | thinking of the C# constant.
        
         | neeeeees wrote:
         | FLT_ESPILON is the bound on rounding error for certain
         | operations. Plenty of numbers smaller than it can be
         | represented in most floating point implementations
        
       | jansan wrote:
       | Javascript guy here, can you help me set up that Beowulf cluster
       | so I can start testing my 2^64 cases?
        
       | draw_down wrote:
       | If your code works for 1.000000000001 but fails for
       | 1.000000000002, go back.
        
       | MeteorMarc wrote:
       | For python you cannot user the stdlib for this, but it would work
       | for numpy float32.
        
       | tialaramex wrote:
       | Many years ago now we built something that used a 60-bit
       | truncated hash of URIs. So that's far too small to be comfortable
       | that you won't run into collisions, but it's big enough that for
       | modest sizes (hundreds of millions up to billions) a collision is
       | unlikely. So you can have a slow path for the collision case, so
       | long as that slow path is correct. For unit testing we needed at
       | least one collision so we could see that everything behaves as
       | designed before some real user hits one. We had test data sets
       | with over a billion URIs in them, but none that (as far as we
       | could tell) had a collision.
       | 
       | So I just wrote code to mint nonsense URIs for our system based
       | on a small integer seed (like https://foo.example/123456789),
       | then span up a system with a bunch of RAM (maybe 8-12GB) to
       | iterate through each seed, storing the hash and seed it was
       | generated from, until it hit a collision after a few gigabytes of
       | RAM, then it spat out the two URIs from the colliding seeds.
       | Bingo, now we had data for unit testing to show that stuff works
       | even in the unlikely case that two distinct URIs have the same
       | hash.
       | 
       | Probably a day's work to build that, and then I'd guess several
       | hours every month saved debugging weird collision bugs because
       | our unit tests would now tell us if we'd screwed up and
       | collisions were slipping past.
        
         | kortilla wrote:
         | This is one of those cases where it would have been easier to
         | fake the hashing function in the unit tests with something that
         | collides when you want. What happens if you want a test case
         | where 3+ URLs hash to the same bucket?
        
         | nulptr wrote:
         | I have a question about how you generated collisions.
         | 
         | If each seed is a 32-bit int, and each hash is 60 bits, then
         | you'll have to store 92 bits for each (seed, hash) pair.
         | 
         | With 2^32 (~4.3 billion) (seed, hash) pairs, you'd need 2^32 *
         | 92 bits ~= 50 gigabytes.
         | 
         | But you mention only needing 8-12GB. Is my calculation wrong,
         | or have I missed something?
        
           | [deleted]
        
           | manulp wrote:
           | You're looking for a collision, any of them will do, so your
           | computation doesn't actually need to go through all the
           | elements. https://en.m.wikipedia.org/wiki/Birthday_attack
        
           | Thorrez wrote:
           | Since the hash is 60 bits, a collision is expected after 2^30
           | tries due to the birthday paradox. So by your math this would
           | be 12.3GB. Although this is ignoring hash table overhead,
           | which I expect to be at least 2x. Maybe they just got lucky.
        
             | nulptr wrote:
             | I think I found a way to do it with ~8GB. As you mentioned
             | you only need 2^30 tries due to birthday paradox.
             | 
             | But, instead of using a hash table to store (seed, hash)
             | pairs, we can just use a list of only hashes. To generate
             | the next hash we use current hash. i.e. next_hash =
             | hash("example.com/" + string(int64(cur_hash))). So now, we
             | just store 60-bit hash values which takes up 2^30 * 60 bits
             | = 8gigs.
             | 
             | The only other problem is detecting if a hash already
             | exists, which I think can be done using a space efficient
             | probabilistic hashset like a bloom filter (and when the
             | bloom filter says the hash already exists, we do a linear
             | scan to find where it exists and we have a possible
             | collision).
        
               | penteract wrote:
               | Actually, using that method, you'd need much less than
               | 8GB, since a collision would mean you'd hit a cycle, so
               | you could store something like 1 value for every 2^10
               | iterations. When you found a collision with one of those,
               | you could examine the interval more carefully to find
               | exactly where the collision was.
        
           | bluesign wrote:
           | You can also truncate hash at 30 bits and use it to address
           | bucket, 12gb then would give you like 3x32 bits, which you
           | can store your candidates. Then compare full hash with
           | candidates.
        
         | stouset wrote:
         | Why not just run the test-cases using a 16-bit hash function?
        
           | kortilla wrote:
           | Likely a C programmer with little experience doing testing
           | with fakes. Most C projects I've worked on are horrible at
           | structuring the code in a way that makes it sane to do
           | dependency injection or mocking.
        
         | AtlasBarfed wrote:
         | Did you truncate the hash for dataset size considerations? If
         | you associate data with the hash, I would guess the test data
         | would dwarf the hash size, so why not do 128 or 256 hashes?
        
           | tialaramex wrote:
           | The hash is for content-based addressing. We're going to
           | fasten those 60 truncated bits to a 4 bit tag that says this
           | is a URI in the content-based storage. And then we can use
           | this 8 byte value to refer to that URI twice, or fifty
           | thousand times in our system even if the URI is a hundred
           | bytes long - the only problem is collisions, so we need to be
           | sure we notice if there is a collision and handle that
           | correctly everywhere.
        
         | wbsun wrote:
         | Any reason the hash function and the collision handling can't
         | be separated and mocked for unit tests? What if you decide to
         | increase the bits of hash values or change the hash algorithm?
        
         | seanmcdirmid wrote:
         | Couldn't you also test the recovery logic with a hasher that
         | yielded collision more often? It's like take your system, make
         | it worse (use very dumb hashing), and see how it degrades
         | and/or breaks?
        
           | rocqua wrote:
           | Seems like that would make it less of an integration test
        
             | jmholla wrote:
             | They said it was for unit tests.
        
             | Heliosmaster wrote:
             | they mention this was a unit test, for which stubbing the
             | hashing function would have been totally fine
        
             | treeman79 wrote:
             | This!
             | 
             | Take over a project with thousands of integration tests.
             | 
             | I trust none of them. Why? Because anything hard to test
             | has been mocked to oblivion.
             | 
             | I have api endpoints that have massive complicated
             | integration tests. I eventually just got source for every
             | project in company and checked to see what they actually
             | did with API.
        
               | njharman wrote:
               | > Because anything hard to test has been mocked to
               | oblivion.
               | 
               | Agree. Not an integration test. If it has mocks in it, it
               | is not testing the INTEGRATION of/between components.
               | 
               | Mocks are for Unit tests. Where you want to isolate
               | testing one single UNIT from all the other crap it
               | integrates with (which have their own unit tests) After
               | all these unit tests run, you test the whole shebang
               | together with integration tests.
        
               | jskdvsksnb wrote:
               | I have the inverse - I started at a place that has
               | hundreds integration tests that require a database, k8s,
               | the whole shebang. No mocking anywhere.
               | 
               | Tests take an hour to run, are extremely fragile and
               | flakey. Getting your local dev machine set up to run even
               | a subset of the tests is a half day ordeal, and it can
               | change periodically.
               | 
               | If you have external dependencies that don't version
               | their APIs well it may be inevitable that you need to run
               | tests against them. But for developer productivity you
               | should have a fast/non-flakey path to run local unit
               | tests with no network dependencies. It's also great for
               | working on a plane.
        
               | 978e4721a wrote:
               | Cause everyone works mostly on a planes, right...
        
               | thaumasiotes wrote:
               | In this case, I think mocking is the right thing to do.
               | You want to know that you can handle the case where two
               | hashes collide. But for this purpose, you don't care at
               | all how the hashes were generated.
               | 
               | If the website changes its URI hash algorithm, the
               | "honest" test that just supplies two URIs that happened
               | to collide under the earlier algorithm will suddenly
               | become worthless. The mocked test will continue doing
               | exactly what you always wanted it to do -- exercise the
               | "what if the hashes collide?" case in your code.
        
           | tialaramex wrote:
           | My preference is always to actually test parts in as close as
           | possible to the real state, avoiding mocks unless they're
           | necessary. The more "fake" the system is, the less it tells
           | you about how your software actually works.
           | 
           | So since I _could_ generate collisions for the real hash,
           | that 's what I did. If it had been infeasible, I would of
           | course have found a different way to test.
           | 
           | This particular software was written mostly in C (today I
           | would not write it in C) which makes it even more likely that
           | introducing never-used-in-production behaviour unexpectedly
           | perturbs the system. This same system for example ran into an
           | obscure (and since fixed) libc allocator bug, and a defect in
           | the Linux kernel, because we were really kicking the shit out
           | of the memory mapped file I/O with what we were doing.
        
             | seanmcdirmid wrote:
             | The only high fidelity mantra in testing is a bit
             | misguided. If you only test your system in as close to
             | production state as possible, you are never going to
             | encounter stress states that would only very rarely appear
             | in production. Seeing how your components independently
             | deal with less ideal components in the system can tell you
             | a lot about how they might handle rare but bad cases in
             | production, and allow you to over engineer them a bit more
             | to avoid that.
             | 
             | > which makes it even more likely that introducing never-
             | used-in-production behaviour unexpectedly perturbs the
             | system.
             | 
             | Isn't that a good thing? Isn't the goal to deal with the
             | unexpected in the first case?
        
               | jniedrauer wrote:
               | Tests come with a non-zero cost. They're code that needs
               | to be maintained. Often complex code, with poor
               | documentation. I don't think the value provided by
               | testing states that you will never actually encounter in
               | production outweighs the cost of maintaining the tests.
        
               | tikhonj wrote:
               | This approach works well until you _do_ encounter a
               | situation like that in production :). It 's worth testing
               | conservatively because our understanding of our systems
               | is imperfect and because assumptions that may be valid
               | _now_ are likely to change in the future. It 's
               | especially true for code that might operate at scale or
               | has security considerations, because both of those can
               | magnify the effects of conditions that occur with
               | _arbitrarily low probabilities_.
               | 
               | Personally, I see tests as not just a way to prevent
               | bugs, but as a way to learn about the system being tested
               | --both how it behaves _now_ and, continually, how it
               | behaves in future iterations. In this specific case,
               | understanding what happens if two URIs hash to the same
               | value _even if they don 't "really" do that_ is an
               | important bit of information: it tells us not only what
               | would happen with a real hash collision but also how our
               | system would behave if there were a bug either in the
               | hashing algorithm itself or in the code that connects the
               | hashing algorithm to the component that _uses_ the
               | hashes.
               | 
               | What happens if we introduce a caching layer between
               | where the hash is generated and where it's used, and
               | there's a cache invalidation bug? I don't know how
               | realistic that is _in this specific hypothetical_ , but
               | I'm sure there are other performance optimizations with
               | the same bug potential that I'm not even thinking of.
               | Changes like that can lead to absolute debugging
               | _nightmares_! Testing even  "impossible" edge cases helps
               | catch this in a systematic way, without needing full
               | knowledge of which conditions might matter.
               | 
               | Of course, none of this talks to the trade-off you
               | pointed out: test code _does_ have a real cost. That is
               | absolutely something to consider. I just don 't think the
               | answer is to draw a hard line at "don't test situations
               | that can't come up in production"--and, perhaps naively,
               | I think the "real" solution is less about deciding what
               | is and isn't worth testing and more about reducing the
               | costs of testing more. Write application code in a way
               | that's easy to test and treat your test code _as code_
               | --keep it clean, include comments, and invest time in
               | setup and tooling to make your tests as easy to write and
               | maintain as possible.
        
               | seanmcdirmid wrote:
               | To achieve "defense in depth," testing components in
               | degenerate non-production situations (and then over
               | engineering them to gracefully deal with those
               | situations) is the only way to get there. Of course, we
               | aren't building bridges, so maybe it's less important.
        
               | TickleSteve wrote:
               | Thats not really a "defense in depth" situation.
               | 
               | Defense in depth is regarding layering different types of
               | security, not targeted testing.
               | 
               | https://en.wikipedia.org/wiki/Defense_in_depth_(computing
               | )
        
               | nitrogen wrote:
               | It's an analogy. Multiple layers of defense against bugs.
        
               | kortilla wrote:
               | > The only high fidelity mantra in testing is a bit
               | misguided.
               | 
               | That's understating it. It's incompetence because it's an
               | excuse to not test a huge surface area of edge cases.
        
             | kqr wrote:
             | > My preference is always to actually test parts in as
             | close as possible to the real state, avoiding mocks unless
             | they're necessary. The more "fake" the system is, the less
             | it tells you about how your software actually works.
             | 
             | I've heard this before, but never actually had that problem
             | with fakes in practise. Where can I read more about it?
        
               | brianwawok wrote:
               | Have a production outage. See the code in question is
               | 100% tested. Scratch head a bit. Vow to fake less next
               | time.
        
               | nitrogen wrote:
               | Supposedly the software problems with the Starliner
               | spacecraft were also caused by extensive unit testing
               | under test harnesses without integration testing, as a
               | visible example that reached the news.
        
             | tshaddox wrote:
             | "Test as close as possible to the real state" makes sense
             | as a rule of thumb, but I'm having trouble coming up with a
             | reason in this specific case how mocking the hash function
             | could go wrong. Assuming your testing/mocking library is
             | reliable, this should be testing precisely what you intend
             | to test. I guess it's possible that your "more real" test
             | relies on some hidden behavior, like how your real hash
             | function utilizes system resources, which could affect
             | production, but I don't know that tests which are testing
             | things you're not even aware of is a great thing to
             | advocate for.
        
               | naniwaduni wrote:
               | The point of the principle is that you _don 't know_ how
               | the mocking can go wrong. Assuming your testing/mocking
               | library is reliable gives you one more thing that can
               | unexpectedly go wrong.
               | 
               | Of course, sometimes getting verisimilar test cases is
               | difficult enough that you judge it not worth the effort
               | over mocking it up and dealing with any issues with the
               | mockup, foreseen and unforeseen.
        
               | tshaddox wrote:
               | I think you generally need to either assume or establish
               | that your testing environment tends to be sound, just
               | like you need to assume or establish that your
               | programming language/compiler/computing environment tends
               | to be sound. Your application code and test code itself
               | probably shouldn't be compensating for or attempting to
               | avoid known or unknown flaws in your testing environment.
        
               | kortilla wrote:
               | That's a stupid principle. If you don't have your
               | interface well-defined enough that switching out the
               | hashing implementation is difficult, your code is too
               | tightly coupled.
               | 
               | A test handing a hash collision _is not a test of the
               | hashing algorithm_ so you should be able to switch out
               | the algorithm to something super predictable. This isn't
               | a discussion about mocking frameworks - you should be
               | able to do with by just plugging in a different hash
               | implementation when you construct the test.
        
               | kuiper0x2 wrote:
               | These types of blanket rules are useful to ensure a 100%
               | accurate test case. If an error in production costs me
               | $15 Million I don't really care if the engineer wastes a
               | few days to follow a dumb rule.
               | 
               | Also I can employ a $75k - $100k/year engineer and say
               | follow this protocol for testing. If I want someone who
               | can optimize test case 100% of the time with zero
               | mistakes ever then I might have to pay $150k - $200k /
               | year.
        
               | Scarblac wrote:
               | It ensures that if someone changes the hashing algorithm,
               | your test for the collision mitigation now doesn't test
               | that anymore even though they should be unrelated.
        
               | halter73 wrote:
               | After going through all that effort to make the test less
               | "fake", I expect tialaramex verified the collision still
               | happened as part of the test.
        
               | lisper wrote:
               | Why not just use a hash function that special-cases two
               | test URLs (or a whole class of test URLs) and produces
               | collisions for them? You can use that same hash function
               | in production.
        
           | xxxtentachyon wrote:
           | Or, even simpler, just stub out your hash function and to
           | something that always generates the same value and test with
           | that.
        
           | harpiaharpyja wrote:
           | I was wondering why GP couldn't have swapped out the hash
           | function for one that just returns a fixed value, making
           | everything a collision. Then it would just be a matter of
           | verifying the system still works under that condition
           | (ignoring the performance hit).
           | 
           | You could even measure the performance hit from that degraded
           | condition and create a heuristic to warn you if something is
           | wrong with the fast path.
        
         | jcims wrote:
         | I embarrassed myself as a Noogler trying to explain what in my
         | head seemed like a good tradeoff between hash size (also of
         | URIs) and collision handling. The process had fixed storage for
         | a table of known hashes, and every collision was tested for
         | validity before taking action. If you could use a truncated
         | hash, you'd obviously increase your collisions, but if you're
         | going to validate all collisions anyway, it seemed like you
         | could increase your coverage substantially (in the 60-bit case
         | it would have been 4x) while only marginally increasing your
         | overhead of collision handling.
         | 
         | There's some kind of sparsity/entropy function involved with
         | optimizing these two but I'm too clueless to sort it out.
         | Without the vocabulary to make my case I just got weirded out
         | by a bunch of smart people staring at me quizzically. Sooo..
        
           | [deleted]
        
           | btrask wrote:
           | It's known as the birthday paradox. There is a probability
           | table here: https://en.wikipedia.org/wiki/Birthday_paradox#Pr
           | obability_t...
        
           | Arcuru wrote:
           | That kind of thing happens to me all the time. I'm very often
           | unable to think of or explain things in the moment.
           | 
           | Sometimes it helps to write a doc afterwards to get my
           | thoughts in order and decide if it's worth sharing.
        
       | known wrote:
       | Very difficult due to rounding errors;
        
       | JoeAltmaier wrote:
       | My god this issue has been around for too long. In the '80s
       | testing folk learned to shotgun the argument space to test
       | functions, and they've not gone an inch forward since! Or so it
       | seems.
       | 
       | A version of the Pentium was released that couldn't divide by 10.
       | And even back then, testing all the mantissas on a simulator
       | would have taken only hours. Instead, they masked, marketed and
       | released a tragically flawed processor.
       | 
       | Just test the input space exhaustively, folks! It's a computer;
       | it feels no pain. Work it hard until it fails or works perfectly.
        
         | [deleted]
        
       | fallingmeat wrote:
       | maybe I dont understand, but is this article only trying to
       | suggest exhaustive testing for single precision, single input
       | functions? obviously this approach is not practical as soon as
       | there are multiple inputs or wider precision.
        
         | brianberns wrote:
         | Not to mention that exhaustive testing only works when you
         | already know the correct result for each input.
        
         | nullc wrote:
         | In many cases it is practical but people wrongly assume it
         | isn't. In cases where its not practical exhausting a subset can
         | still be very powerful.
         | 
         | E.g. for a function that takes two floats (x,y) test every
         | value of x input with y={0,-0,1,nan,inf,-1,x,-x,2x,x/2,FLT_EPSI
         | LON,FLT_MIN,FLT_MAX,random,random,random...}, and then repeat
         | with x,y swapped.
         | 
         | A little knowledge of the domain can help pick better special
         | values, but there is a trade-off in introducing the same blind-
         | spots that permitted bugs in the first place. Still: testing
         | just one argument on all values because that's all you can
         | afford is still a lot better than only testing a couple special
         | values for both arguments.
         | 
         | Exhaustive testing, like randomized testing catches errors you
         | weren't even thinking about-- but unlike random testing
         | exhaustion can reliably catch bugs which are extremely rare.
         | "Partial exhaustion" can have some of the same benefits.
        
           | fallingmeat wrote:
           | mmm, disagree based on my experience. oh also I think people
           | on this thread may be misusing the concept of float epsilon.
           | keep in mind that is not the smallest value of a float.
        
         | kardos wrote:
         | Testing all of the doubles may actually be possible [1]. Even
         | if that estimate is optimistic, testing all doubles for order
         | $10k would be feasible for some problems.
         | 
         | [1] https://news.ycombinator.com/item?id=18109432
        
         | pm215 wrote:
         | Yes, the article says "Exhaustive testing works brilliantly for
         | functions that take a single float as input. I used this to
         | great effect when rewriting all of the CRT math functions for a
         | game console some years ago. On the other hand, if you have a
         | function that takes multiple floats or a double as input then
         | the search space is too big. In that case a mixture of test
         | cases for suspected problem areas and random testing should
         | work."
         | 
         | I think the point is that exhaustive testing is feasible in
         | more situations than you might at first think, and than
         | floating-point arithmetic is a particularly fertile source of
         | edge conditions and corner cases, so it's a good place to go
         | for exhaustive testing where you can.
        
           | fallingmeat wrote:
           | fertile indeed!!!! haha
           | 
           | I should learn to read
        
       | zelphirkalt wrote:
       | Test them all and waste a huge amount of energy and computational
       | resources!
       | 
       | I hope people test like that for really critical stuff, but not
       | for your everyday library for your personal startup or
       | e-business. Not even the big players need this.
        
       | sega_sai wrote:
       | It is all very nice, but if your function has two floating point
       | parameters you have 1.6e19 tests to do...
        
         | formerly_proven wrote:
         | Totally doable. That's just a few hours using 100k cores.
        
         | TheRealPomax wrote:
         | Worth it if your function is acutely fundamental to the
         | operation of your software because your software runs everyone
         | else's software. You buy some time on a compute cluster with
         | 1024 cores and 4TB of ram, and off you go, after making sure
         | you understand your own code and reduce that number back down
         | to "no it isn't, because two arguments in a given space for
         | fundamental maths operations, taken together, almost certainly
         | don't require combinatorial testing" =)
        
         | andi999 wrote:
         | So it only needs 4 Billion times 90s, so is this like never in
         | a million years? (actually it should be only around 12000 year,
         | so you could use a cluster of 48000 Computer and be done in 3
         | month)
        
           | Someone wrote:
           | Distribute it over every smartphone on the planet (about 4
           | billion), and the computations will be done in minutes (of
           | course, that's dependent on the function being tested).
           | 
           | There will be orchestration overhead and transmission
           | latency, but I guess one could get all results in one place
           | in a day, _if_ one had the ability to do the "run it on every
           | smartphone" thing.
        
         | ChrisLomont wrote:
         | Then you can prove your algorithm correct using proof libraries
         | like Z3.
        
         | thechao wrote:
         | I have worked at two companies where we brute-force checked
         | binary floating point operators. It's annoying, error prone,
         | boring, and _soooooo_ satisfying. The Holy Grail^TM is brute
         | force F32-FMA. That one really requires a lot of thought, and
         | can't be reasonably brute-forced on _anyone's_ budget.
        
       | xchaotic wrote:
       | Articles like this make me wonder how everything even works if
       | basic math functions can be so wrong and how a simple concept
       | like floor can have different definitions/ implementations
        
         | Frost1x wrote:
         | It often doesn't to high degrees of precision and reliability.
         | It works within some tolerance range and those tolerance ranges
         | are managed or acceptable levels of error (often unnoticed).
         | 
         | I recently helped move/port one fairly complex numeric model
         | from a language and environment to another language and
         | environment and there was an assumption made by upper
         | management that this was simple and _exact deterministic
         | results_ would be easily had.
         | 
         | I warned that this could be an involved process because of
         | simple variations, lack of equivalent mappings between
         | environments and languages, some of which pointed out in this
         | article.
         | 
         | Management were quite surprised with the level of effort needed
         | to dig down layers of abstraction and identify all these sort
         | of variations in numeric libraries from consistent high level
         | abstractions they were familiar with, and why those high level
         | abstractions simply didn't exist in the other language and
         | environment. They were additionally surprised inthe level of
         | effort needed to get satisfactory error tolerance ranges
         | between the two models when having to recreate high level
         | abstractions identically from the source language and
         | environment to the target language and environment that simply
         | didn't exist prior to the work.
         | 
         | They didn't realize the variation until we performed a detailed
         | analysis on results and found very small but statistically
         | distinguishable results from the two versions. We ended up
         | identifying a few dozen small assumptions they made about their
         | high level abstractions from the source environment that were
         | false and caused the variation.
        
         | pierrebai wrote:
         | The answer is that the bugs the article claims are corners
         | cases, most of which will not matter in most practical uses of
         | floating point maths... and some of which will is due to not
         | emulating a weird IEEE floating point rule about rounding
         | numbers of the form N.5 to the even number instead of up like
         | everyone is taught in school.
         | 
         | So "wrong" as used in the article simply means "not the same as
         | the IEEE behavior", not "wrong" as in "wildly inaccurate". It
         | matters if you want 100% exact emulation to get identical
         | results. It doesn't if you don't and just want speed.
        
           | phkahler wrote:
           | >> The answer is that the bugs the article claims are corners
           | cases, most of which will not matter in most practical uses
           | of floating point maths...
           | 
           | Most bugs are corner cases of one sort or another. At the
           | application level you can decide how important the corners
           | are. When you're creating a math library that many others are
           | going to use, _and_ there is a standard, I think the
           | expectations should be higher.
        
           | CydeWeys wrote:
           | > some of which will is due to not emulating a weird IEEE
           | floating point rule about rounding numbers of the form N.5 to
           | the even number instead of up like everyone is taught in
           | school.
           | 
           | Round-to-even (also known as banker's rounding, because it's
           | used in finance too) isn't weird at all. It minimizes
           | rounding error, especially on repeated calculations. The
           | rounding rule that everyone learns in elementary school is
           | _bad_ and results in systematically biased computations. We
           | can and should do better than that flawed approach. This post
           | explains it in more depth:
           | https://mathematica.stackexchange.com/a/2120
        
         | SoSoRoCoCo wrote:
         | You should see how confusing modulo becomes of the method isn't
         | explicitly specified:
         | 
         | https://en.wikipedia.org/wiki/Modulo_operation
        
         | jlebar wrote:
         | Six Stages of Debugging
         | 
         | 1. That can't happen.
         | 
         | 2. That doesn't happen on my machine.
         | 
         | 3. That shouldn't happen.
         | 
         | 4. Why does that happen?
         | 
         | 5. Oh, I see.
         | 
         | 6. How did that ever work?
         | 
         | http://plasmasturm.org/log/6debug/
        
         | marketingPro wrote:
         | Someone correct me if I'm wrong because it's been years since
         | my EE class.
         | 
         | Basic math ends at division. Heck if you learn how math works
         | on computers, it feels like a number trick. Those are reliable.
         | 
         | The issue comes when you need to track never ending decimal
         | points. Even if every transistor was dedicated to your problem,
         | you'd still need to round. So even if you have 32 bits of
         | decimals, there will be rounding sometimes.
         | 
         | The human has to decide what to do. Instead of converting the
         | binary to decimal and calling it done, it's more important to
         | be consistent and predictable.
         | 
         | The issues described in the article arent 5/2= 2 billion. The
         | article talks about rounding giving you always even numbers,
         | which you can imagine leads to issues if you are using both
         | float and using logic that involves even and odd numbers.
         | 
         | Edit- can someone correct me or add details, I'm very
         | interested but I'm not taking ENG 240 again... :p
        
           | esperent wrote:
           | Exactly this. The floor function works on binary numbers but
           | has to round them to a decimal place. It's not simple, it
           | just appears to be.
        
             | bonoboTP wrote:
             | Floor has nothing to do with decimal.
        
               | erichocean wrote:
               | The period in "3.14" is called a "decimal." The digits
               | after the decimal in "3.14" are enumerated as "decimal
               | places." What each decimal place "means" depends on the
               | numeric base.
               | 
               | So, "decimal places" or "rounding to a decimal place" has
               | nothing to do with base 10 (which is what I think you are
               | implying): decimal and "decimal places" exist in all
               | bases, including base 2, and the meaning of each "decimal
               | place" depends on the numeric base (base 10, base 2,
               | etc.).
        
               | nitrogen wrote:
               | Sometimes people use "binary point" to distinguish from
               | decimal point and avoid the confusion of the dec- prefix.
        
               | erichocean wrote:
               | "Fraction point" and "fraction (or fractional) places"
               | might be a decent alternative that clearly works in any
               | base, while also conveying its purpose.
        
           | CydeWeys wrote:
           | This is why the big rational data type is so useful on
           | occasion. When you just want to get one calculation exactly
           | right, and performance doesn't matter so much because you
           | aren't doing huge numbers of repeated operations (as in ML),
           | use a big rational with its arbitrary precision numerator and
           | denominator and you don't need to worry about even epsilon
           | error.
        
             | visarga wrote:
             | > because you aren't doing huge numbers of repeated
             | operations (as in ML)
             | 
             | In ML precision doesn't count so much. It's often possible
             | that we train on 32bits and deploy the model on 16bits. In
             | fact stochasticity is useful and we add it in by dropout or
             | other things. You can drop any connection (or millions of
             | them) in a neural net and get almost the same result. There
             | have been papers that showed that reducing the number of
             | bits used to represent weights might even improve the
             | network because it has a regularising effect. The real
             | brain is also stochastic.
        
             | bonoboTP wrote:
             | Except if you run exp, log, sqrt, sin, cos etc. Then you
             | immediately leave the rationals.
        
               | CydeWeys wrote:
               | Yes, it definitely depends on what you're doing with the
               | numbers, but personally I don't recall ever having needed
               | trigonometric functions in my decade plus working life
               | doing standard line of business type stuff.
        
               | marketingPro wrote:
               | I can't tell if you are dismissing this, but as an FYI
               | trig is used in Engineering.
        
               | CydeWeys wrote:
               | I said "it definitely depends on what you're doing with
               | the numbers". That's not a dismissal; I'm simply pointing
               | out that neither I nor many if not most other software
               | engineers use trig on a frequent basis. I'm well aware
               | that it's used in actual engineering, but that's just not
               | what most of us here do.
        
       | ViralBShah wrote:
       | We've certainly tested all Float32 inputs to some functions in
       | Julia's math packages. I'm unable to find the relevant issues
       | this very moment, otherwise would certainly post them.
        
         | raphlinus wrote:
         | I suspect https://github.com/JuliaLang/julia/pull/29700 might
         | be one of them. As the original post states, rounding is (a)
         | amenable to hardware acceleration, (b) easy to get wrong or
         | slow, and (c) possible to test exhaustively.
        
       | throwaway7281 wrote:
       | I was once in the position to test 3M possible inputs - that took
       | only a few minutes. It's a good strategy to keep in mind.
        
       | zzzeek wrote:
       | Crap now I have 67 million failures to fix.
        
       | wtn wrote:
       | In the early 1980s, truncated floating point calculations caused
       | the Vancouver Stock Exchange index value to drift down for about
       | 23 months, until it was shown as less than half the correct
       | value:
       | https://en.wikipedia.org/wiki/Vancouver_Stock_Exchange#Round...
        
       | [deleted]
        
       | Const-me wrote:
       | In addition to many correctness issues, that code is relatively
       | slow.
       | 
       | SSE4.1 introduced roundps instruction that's both fast and
       | correct. Unlike AVX2 or FMA3, SSE4.1 support is almost universal
       | by now. In most of my current projects, I check support on
       | startup and refuse to run if SSE4.1 ain't supported. Just don't
       | forget to set the corresponding macros for DirectXMath, Eigen,
       | etc.
        
       | dragontamer wrote:
       | Things have gotten even faster since then.
       | 
       | * Target: 64-bit space (2^64).
       | 
       | * GPUs are more flexible and faster. A Vega64 (a few years old
       | now) has 4096 shaders ("CudaThreads") at 1.6 GHz. It can search
       | the entire 32-bit floating point (or integer) space in less than
       | a second. 1.6 GHz x 4096 shaders == 2^42.5 bitspace per second.
       | 
       | * People exhaustively searching a space are relatively patient:
       | and willing to wait months for the results. 60 seconds x 60
       | minutes x 24 hours x 30 days == 2^21.3
       | 
       | 1.6 GHz x 4096 shaders x 60 seconds x 60 minutes x 24 hours x 30
       | days is roughly equal to 2^63.8. We've done it: a month of GPU
       | effort is all you need to exhaustively search the 64-bit space
       | 
       | -------
       | 
       | We've entered a world (~3 years ago), where it is reasonable to
       | brute force the 64-bit space, so long as you're willing to wait
       | about a month for the results.
       | 
       | If you're not willing to wait a month, then use 2 GPUs, or 4
       | GPUs, or a cluster of GPUs like the DGX A100 (or multiple
       | clusters).
        
         | Keyframe wrote:
         | Sounds like a small and relatively cheap farm could do it in
         | hours or minutes even.
        
           | dragontamer wrote:
           | Hmmm... assuming 4-GPUs per node, and 1200W per node, you'll
           | need 216kW to generate the result in a day.
           | 
           | But that also assumes that the operation you're testing is
           | 1-clock tick, which is only true for the most trivial of
           | calculations. Still, 216kWs per day per "test-case clock
           | ticks" shows us that the 2^64 exhaustive bitspace is within a
           | reasonable organization's grasp.
        
             | dan-robertson wrote:
             | I think your units are wrong. You want 216kW days, not
             | 216kW per day. 216kW days is 18.6Gj.
        
         | antoinealb wrote:
         | Note that the article is specifically talking about testing
         | accelerated code. This is usually using machine-specific
         | features, meaning it won't run on a graphic card.
         | 
         | I am also not sure about how fast you are if you need to
         | transfer data out of the GPU, or to add conditional branching
         | to the shader code (it used to be that branches were super slow
         | in GPUs, not sure where they are now).
         | 
         | I could see it working for some other problems though.
        
           | dragontamer wrote:
           | > meaning it won't run on a graphic card.
           | 
           | GPUs implement the ceil and round functions.
           | 
           | Heck, GPUs implement popcount, bit-reverse, and a whole slew
           | of instructions that CPUs don't. (Mainly bitreverse and
           | bpermute. Intel x86 only has permute (aka pshufb), but not
           | the backwards version. BSwap can't emulate bitreverse)
           | 
           | > (it used to be that branches were super slow in GPUs, not
           | sure where they are now).
           | 
           | Branches are fine. Its divergent branches (in both CPUs and
           | GPUs) that are slow. A uniform branch will be branch-
           | predicted on a CPU, while a uniform branch on a GPU will have
           | all 64-threads (AMD GCN) or 32-threads (NVidia or RDNA)
           | follow the instruction pointer.
           | 
           | There's no slowdown with uniform branches, either on CPUs or
           | GPUs. Both CPUs and GPUs do poorly on divergent branches, but
           | GPUs do MUCH worse than CPUs.
        
             | daveFNbuck wrote:
             | The article was about testing a specific implementation of
             | floor, ceil, and round using CPU instructions. You can run
             | floor, ceil, and round on the GPU, but that won't tell you
             | whether your new implementation using CPU instructions is
             | correct.
        
               | dragontamer wrote:
               | I thought the post was about how the 32-bit space could
               | be exhausted in minutes with common hardware in the year
               | 2014. Which means that any problem I have which is in the
               | 32-bit space is a candidate for brute force.
               | 
               | Given the advances in computers over the past few years:
               | exhausting the 64-bit (or 2x 32-bit) space is
               | "reasonable" for the year 2020. At least, reasonable for
               | people who are willing to wait a month (on one GPU), or
               | willing to buy a cluster of GPUs.
        
               | masklinn wrote:
               | > I thought the post was about how the 32-bit space could
               | be exhausted in minutes with common hardware in the year
               | 2014. Which means that any problem I have which is in the
               | 32-bit space is a candidate for brute force.
               | 
               | No, the post is about exhaustive testing, and that it's
               | completely _reasonable_ on 32 bit values because it only
               | takes a few minutes.
               | 
               | > Given the advances in computers over the past few
               | years: exhausting the 64-bit (or 2x 32-bit) space is
               | "reasonable" for the year 2020.
               | 
               | That's not actually reasonable, it's _feasible_ for very
               | specific contexts.
               | 
               | > At least, reasonable for people who are willing to wait
               | a month (on one GPU), or willing to buy a cluster of
               | GPUs.
               | 
               | And whose code specifically can run on GPU. TFA _is about
               | CPU-optimised code_ , the entire point was to
               | exhaustively test an SSE3 implementations. How do you do
               | that on a GPU?
        
               | dragontamer wrote:
               | > That's not actually reasonable, it's feasible for very
               | specific contexts.
               | 
               | I was inventing a new hash function for myself, just for
               | giggles a few weeks ago.
               | 
               | I needed a constant: I started by choosing an arbitrary
               | constant (I started with 0x31415926...), but then I
               | realized that the 32-bit space, and even 50+ bit spaces
               | were feasible on my GPU.
               | 
               | I ended up rewriting the random-number generator on the
               | GPU, and searched for the number which caused the largest
               | number of "avalanche" bit-flips across the entire 32-bit
               | seed space of the RNG.
               | 
               | Where an "avalanche" bit flip is 16 bits flipped, and
               | 16-bits remaining the same, after the RNG was applied to
               | the seed.
               | 
               | For example: seed = 1 * K, where K == 0xAAAAAAAA will
               | cause 16-bit flips, and 16-bits to remain the same.
               | 
               | Repeat for all 32-bit values of seed, searching for the
               | optimal K. 0xAAAAAAAA wasn't the best number (aka:
               | 1010101010101010 binary), but you can see where my
               | thought process was.
               | 
               | My RNG was more complicated than just a single multiply,
               | but I think you can see where the general thought process
               | was with my above example.
               | 
               | --------
               | 
               | There was a time when constants were arbitrarily chosen
               | for these kinds of functions. But today, we can literally
               | test ALL numbers and just pick the best.
               | 
               | If K is constrained by some other requirements (in my
               | case: it must be odd, and a few other tidbits to work
               | with my RNG), then you shrink the search space from
               | 64-bits down to something that can be accomplished in
               | just a few days of search.
               | 
               | -------------
               | 
               | As far as I'm concerned, the blog post is talking about
               | how modern computers have conquered the 32-bit space.
               | 
               | My current post is how the 64-bit space could be feasibly
               | explored. It wasn't even that long ago when 64-bit space
               | was considered cryptographic-secure (see DES crypto
               | algorithm or WEP).
        
               | mlyle wrote:
               | No, the article is that if you have some code that
               | operates on a 32 bit value, it's reasonable on most
               | modern hardware to try all 2^32 values to ensure quality.
               | 
               | There may be some GPU code that is reasonable to test on
               | all 2^64 values, but it doesn't help us verify the
               | implementation of SSE (Intel architecture) ceil/round/etc
               | against reference implementations at all, because trying
               | all double precision values would not complete in a
               | practical amount of time and offloading it to a GPU means
               | we are not testing the SSE implementation that we're
               | trying to verify anymore.
        
               | a1369209993 wrote:
               | > trying all double precision values would not complete
               | in a practical amount of time
               | 
               | Actually we're getting close-ish: AVX512(=8x64) at 4GHz
               | (or two dependency chains at 2GHz) times 64 CPU cores
               | (which TFA's author has mentioned having) is 2 teraops.
               | Double-precision space of 2^64 / 2Tops is 3
               | months/instruction on a single computer. So you could
               | brute-force verify a simple function in a year or two,
               | and at least the SIMD width and number of cores is likely
               | to continue increasing.
        
               | daveFNbuck wrote:
               | The post was about exhausting the space to test code. The
               | example they used of code that should have been tested
               | this way was CPU code. If you're testing CPU code,
               | exhausting the space quickly on a GPU isn't helpful.
        
           | xigency wrote:
           | > Note that the article is specifically talking about testing
           | accelerated code. This is usually using machine-specific
           | features, meaning it won't run on a graphic card.
           | 
           | I think there are a lot of computing algorithms you might run
           | on a GPU where you might want to verify the correctness over
           | the range of 32-bit or 64-bit numbers (integers or floating
           | point).
        
         | OscarCunningham wrote:
         | You could also use this to verify 32-bit functions with two
         | arguments.
        
       | vortico wrote:
       | Does anyone have code to reduce the search space by a few orders
       | of magnitude to test functions with a float64 argument or two or
       | three float32 arguments in a few seconds? Perhaps by skipping
       | "boring" floats that aren't near endpoints that are unlikely to
       | produce errors that more interesting floats do?
        
         | rigtorp wrote:
         | Yes, I've done this using libFuzzer. Supply custom mutation and
         | crossover functions that generates interesting floating point
         | numbers. Then let the coverage guided fuzzer exercise your
         | code. https://rigtorp.se/fuzzing-floating-point-code/
        
         | daveFNbuck wrote:
         | That's essentially what fuzzy testing libraries do. You tell
         | the library what types of arguments you want, write your test,
         | and it puts in some known corner cases and a few random values
         | that change each time you run the test (but are saved as new
         | corner cases when the test fails in the better libraries).
        
         | TheRealPomax wrote:
         | all floats are equally boring.
        
           | vortico wrote:
           | No, -0.f is more interesting than 4.1045857301e23f. It's more
           | interesting because it's more likely to fail a test where
           | others don't.
        
         | ramshorns wrote:
         | Well, if the ceil function failed on 1.0f, maybe even the
         | ordinary numbers are worth testing.
         | 
         | But... would randomness help? Take a representative sample of
         | like 1/64th of the double space and you might run into most of
         | the common problems.
        
       | hinkley wrote:
       | I decided to build a scatter plot to demonstrate what I was 95%
       | sure was a data loss bug (preproduction, thankfully) in the
       | initializer for a CSPRNG.
       | 
       | I figured plotting a few bytes if output (256x256) would do
       | nicely. I plotted all possible two byte inputs just to make sure
       | it was working, and that ran instantly, so I tried 3 bytes and
       | that was still pretty fast, so I did 4 while I was at lunch or in
       | a meeting.
       | 
       | Sure enough, big black splotches from dropping some of the seed
       | data on the floor. Please to be fixing.
       | 
       | Old farts know a lot of things have been tried before and how
       | those worked out. But we also think doing a complex operation 4
       | billion times is going to be irretrievably slow. Always test your
       | assumptions.
        
       | elvis70 wrote:
       | "When in doubt, use brute force" -- Ken Thompson
        
       | IncRnd wrote:
       | > I've written in great detail about how to compare floating-
       | point values using an epsilon, but there are times when it is
       | just not appropriate. Sometimes there really is an answer that is
       | correct, and in those cases anything less than perfection is just
       | sloppy.
       | 
       | Certain applications will choose fixed point to avoid what
       | appears to be the imprecisions of floating point.
        
       ___________________________________________________________________
       (page generated 2020-10-11 23:00 UTC)