[HN Gopher] Ideal divisors: when a division compiles down to jus... ___________________________________________________________________ Ideal divisors: when a division compiles down to just a multiplication Author : chmaynard Score : 85 points Date : 2021-04-28 17:18 UTC (5 hours ago) (HTM) web link (lemire.me) (TXT) w3m dump (lemire.me) | whalesalad wrote: | I have been seeing this Wordpress theme a lot lately and it is so | bad the way the content is pushed so far to the right. On my 27" | monitor the left margin of the text is essentially the center of | the display. | jlarocco wrote: | I have a 27" monitor also, and it looks fine to me. | tachyonbeam wrote: | 27" monitor on Chrome here. Looks pretty far to the right. | tzs wrote: | How about if the browser window is maximized? | | What I get in that case is 23 cm for the darkish blue left | sidebar and 36 cm for the light blue right area that contains | the article. Within that right area, the article is in 19 cm | white rectangle with just under 2 cm margins. | | So, starting from the left here is what I see: | | 15 cm empty dark blue background, | | 6 cm of sidebar text on dark blue background, | | 2 more cm of empty dark blue background, | | 2 cm of empty light blue background, | | A tad under 2 cm of empty white background, | | About 14.5 cm of text on white background, | | A tad under 2 cm of empty white background, | | 15.5 cm of empty light blue background. | | It really is quite ugly. It looks like something I would do | because I completely suck at CSS. | dhosek wrote: | I almost never1 run a browser maximized on anything bigger | than my 16" laptop screen (and even that not so much). Most | of the time, I have browser windows that are 1/2 (on my 24" | monitor) or 1/3 (on my ultrawide monitor) of the width and | the full height (2/3 width on my laptop)). | | 1 The exception being browser windows to run sites like | Netflix where I'm really using it as a video interface. | tyingq wrote: | One thing that often works in this situation with chrome is the | "Emulate CSS Media Type: Print" option from the rendering tab | in Dev Tools. The theme appears to be the wordpress default | theme from 2015, so you should see less of it as times goes on. | omgJustTest wrote: | Floating point, while convenient does have substantial drawbacks | in the implementations of circuits that we have today. Efficiency | gains like this somewhat expose facets that are normally hidden. | It will be interesting to see the transition to either analog NN | or quantum computers, which inherently have better | representations of continuous values. | | Edit: Additionally the serial vs parallel processing paradigm | shift is a main advantage of the switch too! Queued systems seem | to have severe scheduling limitations, which the real-world does | not always inherently have. | svat wrote: | The linked article is about integer division and has nothing to | do with floating point, and in fact I can't see how this idea | could be applied to floating-point arithmetic. Can it? | wizzwizz4 wrote: | Not directly, no. The underlying mathematics _kind of_ can, | but you 'd need dedicated circuitry (or bit-hacking) for it | because of the floating point parts, and it would fall over | around subnormals. | andrewfromx wrote: | i'd like to find these numbers for base9 vs base10. see | https://stackoverflow.com/questions/67226727/is-there-a-way-... | if you look at that sample golang code, is that the right way to | go about this? Or is there a much simpler way already built into | a high level language like go? | [deleted] | jvickers wrote: | In JavaScript (specifically node / V8) if dividing multiple | values by the same divisor, is it fastest to calculate the | reciprocal and then multiply it? | | How about other languages (and compiler systems)? | | Are there binary tricks worth knowing about to quickly calculate | reciprocals? | brrrrrm wrote: | It's typically faster to calculate a reciprocal (especially at | compile time) and then multiply it, but this is often at the | expense of precision. In C++, for example, the -O3 flag on gcc | will still emit a division (divss) to preserve compliant | floating point precision in this example: | https://godbolt.org/z/PY3Wjno4P | | However, when we enable `-ffast-math`, we permit the compiler | to do a bit more aggressive optimization in this domain | (trading off precision), and we see multiplication (mulss) is | emitted: https://godbolt.org/z/KEb1nc43z | svat wrote: | If you've never seen it before, this is a beautiful blog post on | related matters: "Optimising pointer subtraction with 2-adic | integers" http://blog.sigfpe.com/2010/05/optimising-pointer- | subtractio... | kevinventullo wrote: | A few months back I wrote a blog post very much inspired by | Dan's post, though a little more academic: | https://kevinventullo.com/2020/12/21/2-adic-logarithms-and-f... | | The idea is to exploit deeper structures in p-adic number | theory to develop useful and novel algorithms with unsigned | integer arithmetic, e.g. fast integer exponentiation. Feedback | welcome! | | (Warning: math heavy) | thechao wrote: | > I don't know how gcc generates its approximate 2-adic | reciprocals. Possibly it uses something based on the Euclidean | GCD algorithm. I wasn't able to find the precise line of source | in a reasonable time. | | It's 0x1FFF...F / 7 = 0x249...249249249. | | Just about the oddest thing I've seen; I guess, if I were the | author, it'd be my "1 in 10000" day? | xyzzy_plugh wrote: | Makes more sense in binary: 0xFFF..F | 111111111111111111111111 0x7 | 000000000000000000000111 0x249..9 | 001001001001001001001001 | tragomaskhalos wrote: | Genuine question: why would any compiler bother to optimise | division with such obtuse dividends, other than a purists' | satisfaction ? | warkdarrior wrote: | Division is orders of magnitude slower than multiplication. | Tuna-Fish wrote: | Less than one order of magnitude these days. On Tiger Lake, | integer multiply is 3 cycles, while integer division is 12 | for 32-bit integers and 15 for 64-bit. | | Of course, the multiply is fully pipelined while the division | is only partially, but even that leaves only a 10-fold | difference. | | Division is much faster these days than it used to be. It's | still worth it to have compiler optimizations for it, but | it's not important to avoid it anymore. | gowld wrote: | Why do some people assume decimal ten as the base of the | orders of magnitude when discussing binary arithmetic? | cma wrote: | Timespan/cycle count is almost always denoted in base 10 | in conversation. | Fordec wrote: | > Less than one order of magnitude these days. | | > only a 10-fold difference. | | A ten fold difference is _literally_ one order of | magnitude. Which is what the post you 're replying to | factually stated. | cma wrote: | orders usually means 2 or more | jbay808 wrote: | Embedded systems programmers don't always have such luxury! | martincmartin wrote: | They're often special cases of general algorithms. You can do | many divisions as combinations of multiplications & shifts; in | this case, the resulting code simplifies down to the smallest | possible. | Denvercoder9 wrote: | There's no specific optimisation for these specific numbers. | There's a more general optimisation to turn division by a fixed | number into a multiplication and shift where possible (because | division is a _lot_ slower than multiplication), and there is | another optimisation that eliminates useless shifts. Combined | they yield this result for these specific numbers. | joe_the_user wrote: | Yeah, a more detailed description of this general method | seems missing from the article. Anyone care to provide it? | chrchang523 wrote: | See the "Labor of Division" blog posts by the author of | libdivide: | | https://ridiculousfish.com/blog/posts/labor-of-division- | epis... | | https://ridiculousfish.com/blog/posts/labor-of-division- | epis... | joe_the_user wrote: | So basically calculating the multiplicative inverse of a | number and storing it as something like a float (digits + | exponent) with the digits and exponent stored separately | to avoid overflow, underflow, accuracy issues. | | And it's faster if and only if you're repeating division | by a single number since you're still doing the | equivalent of divide to get the invest (but repeated | division is actually quite common so this is useful). | pkaye wrote: | The book "Hackers Delight" goes over a lot of these kinds | of software algorithms for integers. | layoutIfNeeded wrote: | >I call the corresponding divisors ideal | | Not a particularly good name, as "ideal" already has an | established meaning in algebra: | https://en.m.wikipedia.org/wiki/Ideal_(ring_theory) | gowld wrote: | It's not an ideal name, even though it's ideal. | withinboredom wrote: | This is computer science, not algebra. | Tainnor wrote: | algebra plays a huge role in CS, e.g. cryptography | chrchang523 wrote: | See also https://libdivide.com/ , when you need to repeatedly | divide by a constant that isn't known until runtime. ___________________________________________________________________ (page generated 2021-04-28 23:00 UTC)