hngopher.com

       [HN Gopher] There Are Only Four Billion Floats-So Test Them All ...
       ___________________________________________________________________
        
       There Are Only Four Billion Floats-So Test Them All (2014)
        
       Author : albrewer
       Score  : 94 points
       Date   : 2023-02-09 16:51 UTC (6 hours ago)
        
 (HTM) web link (randomascii.wordpress.com)
 (TXT) w3m dump (randomascii.wordpress.com)
        
       | FpUser wrote:
       | Silly me. As far as I can remember I've always used epsilon when
       | comparing floats.
        
         | eru wrote:
         | That's not a general solution.
         | 
         | (And if you know what you are doing, you can also compare
         | floats exactly in some circumstances.)
         | 
         | But the article already mentions that.
        
       | mattbillenstein wrote:
       | Not often you can employ exhaustive testing, but it's useful in
       | some cases.
       | 
       | I did a project in college for a VLSI minor where we implemented
       | a 16bit multiplier - after we got the chips back we were supposed
       | to do scan testing, but I just wrote up a little program in an
       | fpga, wired up my chip to test, and swept all the possible inputs
       | verifying the output. I think I could only run this at 10MHz, and
       | you had to shift in the operands one bit at a time, but it still
       | only took on the order of an hour or two to complete.
       | 
       | I also verified this setup by capturing some cycles on a logic
       | analyzer - then pulled together a little awk script to turn a
       | dump of that into the actual numbers I could check. Fun project.
        
       | chriswarbo wrote:
       | 4 billion inputs is a little on the extreme end, especially for
       | slower interpreted languages, but it's always useful to look for
       | functions which can be exhaustively tested; e.g. booleans and
       | enums.
       | 
       | I spotted a bug in a code review today, which was formatting
       | dates with ordinal numbers, like "9th Feb". The ordinal logic
       | only has to work for inputs 1 to 31, so I wrote an exhaustive
       | test, which it failed by outputting "11st", "12nd" and "13rd"
       | (due to only looking at the last digit)
        
         | ben0x539 wrote:
         | A while back for some videogame related data processing I ended
         | up writing a "regular number to roman numeral" (or maybe the
         | other way around) function in idk ruby or something. Not super
         | convinced that I had gotten it right, I sat down to write a few
         | tests, and then I realized I only needed it to work for numbers
         | up to like 16. So I deleted the actual code, extended the table
         | of test cases to cover 1 through 20 just in case, and just did
         | a lookup in that.
        
           | SnowHill9902 wrote:
           | That's actually the right solution to that specific problem.
        
           | giraffe_lady wrote:
           | true senior eng shit right there.
           | 
           | https://i.imgflip.com/7alx6q.jpg
        
       | archi42 wrote:
       | I loved this, since this resonates with my own experience. If the
       | problem space is small enough, just test against a known good
       | (but slower) implementation. With custom algorithms this can also
       | be very interesting for the person writing the test: Their
       | reference code needs to be correct beyond doubt, but can forfeit
       | mostly every manual optimization. Also, for larger input spaces I
       | like to do a nested for-loop over vectors of "interesting"
       | values, plus add a large set of random values.
        
       | bhaney wrote:
       | > For very small numbers (less than about FLT_EPSILON * 0.25)
       | adding 0.5 gives 0.5 exactly, and this then rounds to zero
       | 
       | Is that not correct? Isn't it expected that very small numbers
       | round to zero?
        
         | zerocrates wrote:
         | This is in the context of the implementation of a ciel()
         | function: a very small but still positive float should have a
         | ciel() of 1. As said near this part, IEEE default rounding is
         | round half to nearest even: 0.5 rounds to 0, the closest even
         | integer, so if your addition of 0.5 to the argument to ciel()
         | results in exactly 0.5, the result will incorrectly be 0
         | instead of 1.
        
         | teraflop wrote:
         | The quote is talking about code that is _supposed_ to find the
         | ceiling of a number, and does so by adding 0.5 and then
         | rounding to the nearest value (breaking ties by rounding
         | towards even).
         | 
         | If you have a very small but non-zero positive number, it gets
         | rounded to 0.5 after the addition, and then the second rounding
         | incorrectly rounds toward 0 instead of 1.
         | 
         | The result is that you have a value x such that x > ceil(x),
         | which shouldn't ever happen.
        
           | bhaney wrote:
           | > supposed to find the ceiling
           | 
           | Oh duh. I somehow forgot that over the span of a few
           | sentences. Thanks.
        
       | zX41ZdbW wrote:
       | I also did this when searching for a fast string-to-float
       | conversion function for ClickHouse:
       | 
       | https://github.com/ClickHouse/ClickHouse/blob/master/src/IO/...
       | 
       | - we create a table, insert all four billion floats there, and
       | check our functions with SQL queries.
       | 
       | Currently, the state-of-the-art library is
       | https://github.com/lemire/fast_double_parser, but we use a
       | slightly worse (but more performant) routine.
       | 
       | It is also interesting how to format floats in string in an
       | efficient and user-friendly way.
       | 
       | For this problem, we switched iteratively: strtod -> Google's
       | Double Conversion -> Ryu (patched) -> Dragonbox. Currently, we
       | use Dragonbox.
        
         | ignoramous wrote:
         | > _...but we use a slightly worse (but more performant)
         | routine._
         | 
         | If it ( _dragonbox_?) is more performant, why is it also worse?
        
           | wiredfool wrote:
           | Presumably it's not as accurate?
        
       | neilv wrote:
       | I used a similar idea the other day, while implementing in Rust
       | the CRC algorithm that a device required, but the vendor didn't
       | document 100% clearly.
       | 
       | Fortunately, I had some device vendor sample code in C (of
       | varying quality and provenance), which I believed was probably
       | correct, but I wasn't 100% certain which (if any) of numerous CRC
       | standard variants it was actually implementing (so I couldn't
       | just use an off-the-shelf Rust library).
       | 
       | So, the fastest/best way to get a perfectly compatible Rust
       | implementation seemed to be:
       | 
       | 1. Write idiomatic Rust code to try to mimic the behavior of the
       | C code.
       | 
       | 2. Write some C code to generate a text file of exhaustive
       | results from the known-good C implementation, for all possible
       | inputs up to length N.
       | 
       | 3. Exhaustively check the Rust implementation against that text
       | file of inputs and outputs.
        
         | AlotOfReading wrote:
         | For future reference there's an excellent tool for discovering
         | the parameters of a CRC model with pairs of inputs and outputs
         | called RevEng [0] (github copy: [1]). Once you have the
         | parameters you can use the CRC crate [2] with a predefined CRC
         | or a custom algorithm (highly unlikely).
         | 
         | [0] https://reveng.sourceforge.io/
         | 
         | [1] https://github.com/berney/reveng
         | 
         | [2] https://crates.io/crates/crc
        
       ___________________________________________________________________
       (page generated 2023-02-09 23:00 UTC)