[HN Gopher] Floating point visually explained (2017) ___________________________________________________________________ Floating point visually explained (2017) Author : luisha Score : 228 points Date : 2021-11-28 12:49 UTC (10 hours ago) (HTM) web link (fabiensanglard.net) (TXT) w3m dump (fabiensanglard.net) | mgaunard wrote: | I don't understand how M * 2^E (modulo small representation | details) is difficult to grasp. Then you have decimal types as M | * 10^E. | | It's certainly much clearer than the windowing and bucketing in | this article. | ribit wrote: | I think windowing and bucketing are a useful conceptual model | for understanding the precision of the encoding space. I | understand how FP values work in practice but this article | really hits the nail on it's head when explaining the less | tangible aspects of FP encoding. | cturner wrote: | His approach works well for me. I don't retain arbitrary facts. | At least part of this is a fear that if I don't properly | understand its dynamics I will misapply it, better to discard | it. | | The windowing explanation shows me a path from the intent of | the designer through to the implementation (the algorithm). Now | I can retain that knowledge. | marcosdumay wrote: | > I don't retain arbitrary facts. | | I still fail to understand what is arbitrary in scientific | notation. The designers almost certainly didn't think in | terms of windows when creating the data types, they probably | thought about the mantissa and exponent the way it's usually | explained, and maybe about information density (entropy per | bit) when comparing with other possibilities. | | Anyway, if it helps, great. Different explanations are always | welcome. But this one is post-fact. | wruza wrote: | Ahem, exponential forms go back to Archimedes, and no design | (or intent thereof) assumed windows or offsets in FP. It's | just a fixed-precision floating-point notation, no more no | less. The problem persists with decimal tax calculations like | $11111.11 x 12% == $1333.33|32, where 0.0032 is lost in a | monetary form. It also persists with on-paper calculations | because there may be no room for all the digits that your | algorithm produces (think x/7) and you have to choose your | final acceptable precision. | | _At least part of this is a fear that if I don 't properly | understand its dynamics I will misapply it_ | | The explanation that you like can't save from it either. It's | not something you think through and just write the correct | code. Quick exercise with the "new" knowledge you've got: you | have an array of a million floats, add them up correctly: | ryan-duve wrote: | As someone who never learned the "dreadful notation... | discouraging legions of programmers", this was really helpful! | The "canonical" equation doesn't mean anything to me and the | window/offset explanation does. | | I'm sure the author intended to keep the article short, but I | think it would benefit from more examples, including a number | like 5e-5 or something. It isn't clear to me how the | window/offset explains that. | | Edit: to clarify, I do not understand how the floating point | representation of 0.00005 is interpreted with windows/offsets. | ricardobeat wrote: | 5e-5 is not really related to FP, it's just scientific | notation. The number itself is still stored as shown in the | post, it's just being printed in a more compact form. | | The number after _e_ is a power of 10: 5e2 = | 5 * 10^2 = 5 * 100 = 500 5e-5 = 5 * 10^-5 = 5 / 100000 | = 0.00005 | | Once you internalize this, you just read the exponent as "x | zeroes to the left". | boibombeiro wrote: | Scientific notation IS a floating point. | taeric wrote: | Scientific notation is easily seen as just floating point in | base ten, though. Had the same rules about leading 1 and | such. Add in significant digits, and you have most all of the | same concerns. | TT-392 wrote: | It probably differs a lot per person, and I usually do find | graphical explanations of stuff easyer to graph. But that formula | was way easyer to understand for me than the other explanation. | Maybe it is related to the fact that I am pretty ok at math, but | I got kinda bad dyslexia. | dang wrote: | Some past related threads: | | _Floating Point Visually Explained (2017)_ - | https://news.ycombinator.com/item?id=23081924 - May 2020 (96 | comments) | | _Floating Point Visually Explained (2017)_ - | https://news.ycombinator.com/item?id=19084773 - Feb 2019 (17 | comments) | | _Floating Point Visually Explained_ - | https://news.ycombinator.com/item?id=15359574 - Sept 2017 (106 | comments) | vlmutolo wrote: | The "default" formula as presented in the article seems... | strange. Is this really how it's normally taught? | (-1)^S * 1.M * 2^(E - 127) | | This seems unnecessarily confusing. And 1.M isn't notation I've | seen before. If we expand 1.M into 1 + M : 0 < M < 1, then we | pretty quickly arrive at the author's construction of "windows" | and "offsets". (-1)^S * (1 + M) * 2^(E - 127) | (-1)^S * 2^log_2(1 + M) * 2^(E - 127) let F := | log_2(1 + M) 0 < F < 1 Note: [0 < M < 1] | implies [1 < M + 1 < 2] implies [0 < log_2(1+M) < 1] | (-1)^S * 2^(E - 127 + F) | | Since we know F is between 0 and 1, we can see that F controls | where the number lands between 2^(E - 127) and 2^(E - 127 + 1) | (ignoring the sign). It's the "offset". | dirkk0 wrote: | I also found this video from SimonDev educative (and | entertaining): https://www.youtube.com/watch?v=Oo89kOv9pVk | baryphonic wrote: | I found a tool[0] that helps me debug potential floating point | issues when they arise. This one has modes for half-, single- and | double-precision IEEE-754 floats, so I can deal with the various | issues when converting between 64-bit and 32-bit floats. | | [0] https://news.ycombinator.com/item?id=29370883 | ToddWBurgess wrote: | See the Intel math coprocessor was a blast from the past. I sold | a few of them and installed a few of them when I used to work in | computer stores. | bool3max wrote: | This is _the_ article that finally helped me grasp and understand | IEEE-754. An excellent read. | defaultname wrote: | Several years ago I made a basic, interactive explanation of the | same- | | https://dennisforbes.ca/articles/understanding-floating-poin... | | At the time it was to educate a group I was working with about | why their concern that "every number in JavaScript is a double" | doesn't mean that 1 is actually 1.000000001 (e.g. everyone knows | that floating point numbers are potentially approximations, so | there is a widespread belief that every integer so represented is | the same). | kuharich wrote: | Past comments: https://news.ycombinator.com/item?id=15359574, | https://news.ycombinator.com/item?id=19084773 | Sirenos wrote: | Why is everyone complaining about people finding floats hard? | Sure, scientific notation is easy to grasp, but you can't | honestly tell me that it's trivial AFTER you consider rounding | modes, subnormals, etc. Maybe if hardware had infinite precision | like the real numbers nobody would be complaining ;) | | One thing I dislike in discussions about floats is this incessant | focus on the binary representation. The representation is NOT the | essence of the number. Sure it matters if you are a hardware guy, | or need to work with serialized floats, or some NaN-boxing | trickery, but you can get a perfectly good understanding of | binary floats by playing around with the key parameters of a | floating point number system: | | - precision = how many bits you have available | | - exponent range = lower/upper bounds for exponent | | - radix = 2 for binary floats | | Consider listing out all possible floats given precision=3, | exponents from -1 to 1, radix=2. See what happens when you have a | real number that needs more than 3 bits of precision. What is the | maximum rounding error using different rounding strategies? Then | move on to subnormals and see how that adds a can of worms to | underflow scenarios that you don't see in digital integer | arithmetic. For anyone interested in a short book covering all | this, I would recommend "Numerical Computing with IEEE Floating | Point Arithmetic" by Overton [1]. | | [1]: https://cs.nyu.edu/~overton/book/index.html | xyzzy_plugh wrote: | The binary form is important to understand the implementation | details. You even mention underflow. It's difficult for most | people to initially understand why you can't store a large | number that can be represented by an equivalent size integer as | a float accurately. | | The binary form handily demonstrates the limitations. | Understanding the floating point instructions is kinda optional | but still valuable. | | Otherwise everyone should just use varint-encoded arbitrary | precision numbers. | colejohnson66 wrote: | It also explains why 0.1+0.2 is _not_ 0.3. With binary | IEEE-754 floats, none of those can be represented exactly[a]. | With decimal IEEE-754 floats, it 's possible, but the | majority of hardware people interact with works on binary | floats. | | [a]: Sure, if you `console.log(0.1)`, you'll get 0.1, but | it's not possible to express it in binary _exactly_ ; only | after rounding. 0.5, however, _is_ exactly representable. | jacobolus wrote: | Python 3.9.5 >>> 0.1.hex() | '0x1.999999999999ap-4' >>> 0.2.hex() | '0x1.999999999999ap-3' >>> (0.1 + 0.2).hex() | '0x1.3333333333334p-2' >>> 0.3.hex() | '0x1.3333333333333p-2' | colejohnson66 wrote: | But they are repeating. So, by definition, they are not | exactly representable in a (binary) floating point | system. Again, that's why 0.1 + 0.2 is not 0.3 in binary | floating point. | jhgb wrote: | > It's difficult for most people to initially understand why | you can't store a large number that can be represented by an | equivalent size integer as a float accurately. | | Because you don't have all the digits available just for the | mantissa? That seems quite intuitive to me, even if you don't | know about the corner cases of FP. This isn't one of them. | bee_rider wrote: | I was going to respond with something like this. I think | for getting a general flavor*, just talking about | scientific notation with rounding to a particular number of | digits at every step is fine. | | I guess -- the one thing that explicitly looking at the | bits does bring to the table, is the understanding that (of | course) the number of bits or digits in the mantissa must | be less than the number of bits or digits in an equivalent | length integer. Of course, this is pretty obvious if you | think about the fact that they are the same size in memory, | but if we're talking about sizes in memory, then I guess | we're talking about bits implicitly, so may as well make it | explicit. | | * actually, much more than just getting a general flavor, | most of the time in numerical linear algebra stuff the | actual bitwise representation is usually irrelevant, so you | can get pretty far without thinking about bits. | GuB-42 wrote: | I think the binary representation is the essence of floating | point numbers, and if you go beyond the "sometimes, the result | is slightly wrong" stage, you have to understand it. | | And so far the explanation in the article is the best I found, | not least because subnormal numbers appear naturally. | | There is a mathematical foundation behind it of course, but it | is not easy for a programmer like me. I think it is better to | think in term of bits and the integers they make, because | that's what the computer sees. And going this way, you get NaN- | boxing and serialization as a bonus. | | Now, I tend to be most comfortable with a "machine first", | bottom-up, low level approach to problems. Mathematical and | architectural concepts are fine and all, but unless I have some | idea about how it looks like in memory and the kind of | instructions being run, I tend to feel lost. Some people may be | more comfortable with high level reasoning, we don't all have | the same approach, that's what I call real diversity and it is | a good thing. | Sirenos wrote: | Sorry, I didn't mean to downplay the value of using concrete | examples. I absolutely agree that everyone learns better from | concrete settings, which is why my original comment fixed the | parameters for people to play with. I was referring more to | the discussions of how exponents are stored biased, the | leading bit in the mantissa is implied = 1 (except for | subnormals), and so on. All these are distracting features | that can (and should) be covered once the reader has a strong | intuition of the more fundamental aspects. | sharikous wrote: | As one who understands floats I really wish there was a better | notation for literals, the best would be a floating literal in | binary representation. | | For integers you can write 0x0B, 11, 0b1011 and have a very | precise representation | | For floats you write 1e-1 or 0.1 and you get an ugly truncation. | If it were possible to write something like 1eb-1 (for 0.5) and | 1eb-2 (for 0.25)... people would be incentivate to use nice | negative power of 2 floats, which are much less error prone than | ugly base conversions. | | This way you can overcame the fears around floats being nonexact | and start writing more accurate tests (bit per bit) in many cases | jacobolus wrote: | > _better notation for literals [...] something like 1eb-1 (for | 0.5) and 1eb-2 (for 0.25)_ | | There are floating point hex literals. These can be written as | 0x1p-1 == 0.5 and 0x1p-2 == 0.25. | | You can use them in C/C++, Java, Julia, Swift, ..., but they | are not supported everywhere. | | https://observablehq.com/@jrus/hexfloat | Aardwolf wrote: | C++ hex floats are an interesting combination of 3 numeric | bases in one! | | the mantissa is written in base 16 | | the exponent is written in base 10 | | the exponent itself is a power of 2 (not of 16 or 2), so | that's base 2 | | One can only wonder how that came to be. I think they chose | base 10 for the exponent to allow using the 'f' suffix to | denote float (as opposed to double) | adgjlsfhk1 wrote: | Julia is missing 32 bit and 16 bit hex floats unfortunately. | simonbyrne wrote: | You can just wrap the literal in a conversion function, eg | Float32(0x1p52), which should get constant propagated at | compile time. | adgjlsfhk1 wrote: | I know, it just isn't as nice to read. | stncls wrote: | I'm not sure this is what you are asking for, but would this be | suitable? | | > ISO C99 and ISO C++17 support floating-point numbers written | not only in the usual decimal notation, such as 1.55e1, but | also numbers such as 0x1.fp3 written in hexadecimal format. | [...] The exponent is a decimal number that indicates the power | of 2 by which the significant part is multiplied. Thus '0x1.f' | is 1 15/16, 'p3' multiplies it by 8, and the value of 0x1.fp3 | is the same as 1.55e1. | | https://gcc.gnu.org/onlinedocs/gcc/Hex-Floats.html | wruza wrote: | Are you asking for pow(2, n)? Or for `float mkfloat(int e, int | m)` which literally implements the given formula? I doubt that | the notation you're suggesting will be used in code, except for | really rare bitwise cases. | | The initial precision doesn't really matter, because if you | plan to use this value in a computation, it will quickly | accumulate an error, which you have to deal with anyway. There | are three ways to deal with it: 1) ignore it, 2) account for | it, 3) use numeric methods which retain it in a decent range. | You may accidentally (1)==(3), but the problem doesn't go away | in general. | brianzelip wrote: | Wild, just got done listening to Coding Blocks podcast episode on | data structure primitives where they go in depth on floating | point, fixed point, binary floating point, and more. Great | listen! See around 54min mark - | https://www.codingblocks.net/podcast/data-structures-primiti... | bullen wrote: | Why is there no float type with linear precision? | adgjlsfhk1 wrote: | that's called fixed point. there isn't hardware for it because | it is cheap to make in software using integer math. | djmips wrote: | Some DSP chips had hardware for fixed point. I think it's a | shame that C never added support for fixed point. | bullen wrote: | But wouldn't it be sweet to have SIMD acceleration for fixed | point? | | Or is that something you can do with integers and SIMD today? | | Say 4x4 matrix multiplication? | adgjlsfhk1 wrote: | your add is just an integer add. your multiply is a mul_hi, | 2 shifts, 1 add and 1 mul. | adwn wrote: | Having implemented a nested PID controller using fixed- | point arithmetic in an FPGA, I can tell you that fixed- | point is a pain in the ass, and you only use it when | floating-point is too slow, too large (in terms of on-chip | resources), or consumes too much power, or when you | _really_ need to control the precision going into and out | of an arithmetic operation. | mhh__ wrote: | Are you asking if integer SIMD exists? (It does) | jacksonkmarley wrote: | Anyone got a good reference for pratical implications of floating | point vs fixed point in measurement calculations (especially | timing measurements)? Gotchas, rules of thumb, best practice etc? | wodenokoto wrote: | I was kinda hoping for a visualization of which numbers exists in | floating point. | | While I always new about 0.1 + 0.2 -> | 0.30000000000000004 | | it was still kind of an epiphany realizing that floating point | numbers don't so much have rounding errors as they are simply | discreet numbers. | | You can move from one float to the next, which is a meaningfull | operation on discreet numbers like integers, but not continuous | numbers like rational and irrational numbers. | | I also feel like that is a much more important take away from a | user of floating points knowing what the mantissa is. | rlanday wrote: | > You can move from one float to the next, which is a | meaningfull operation on discreet numbers like integers, but | not continuous numbers like rational and irrational numbers. | | What do you mean by continuous? Obviously if you take a number | line and remove either the rational or irrational numbers, you | will end up with infinitely many holes. | | The thing that makes floating point numbers unique is that, for | any given representation, there are actually only _finitely_ | many. There's a largest possible value and a smallest possible | value and each number will have gaps on either side of it. I | think you meant that the rationals and irrationals are _dense_ | (for any two distinct numbers, you can find another number | between them), which is also false for floating-point numbers. | bspammer wrote: | I highly recommend watching this video, which explains floats | in the context of Mario 64: | https://www.youtube.com/watch?v=9hdFG2GcNuA | | Games are a great way to explain floats because they're so | visual. Specifically, check out 3:31, where the game has been | hacked so that the possible float values in one of the movement | directions is quite coarse. | | (If you're curious what a PU is, and would like some completely | useless knowledge, this video is also an absolute classic and | very entertaining: https://www.youtube.com/watch?v=kpk2tdsPh0A) | darkest_ruby wrote: | Having read this article I understand floating point even less | now ___________________________________________________________________ (page generated 2021-11-28 23:00 UTC)