[HN Gopher] Implementing Cosine in C from Scratch (2020) ___________________________________________________________________ Implementing Cosine in C from Scratch (2020) Author : signa11 Score : 150 points Date : 2022-03-29 16:38 UTC (6 hours ago) (HTM) web link (austinhenley.com) (TXT) w3m dump (austinhenley.com) | aaaronic wrote: | Mandatory Physics "troll" comment: | | For small angles, sin(x) ~= x and cos(x) ~= 1 -- the ["small | angle approximation"](https://en.wikipedia.org/wiki/Small- | angle_approximation) | | It's actually kind of ridiculous how many times it comes up and | works well enough in undergraduate Mechanics. | andi999 wrote: | Dont forget optics, there tan x=sin x | aaaronic wrote: | Yep, it's all over Physics, really. Many non-mechanical | concepts are often mapped to a mechanical analog model (so | many tiny oscillators out there!). | Sharlin wrote: | Useful in astronomy and telephoto photography as well. | jhgb wrote: | This a perfectly nice Programming 101 example of a recursive | function -- if an angle is too large, calculate its sine using | the sine of the half-angle, otherwise return the angle (or, if | you're fancy, some simple polynomial approximation of the | sine). I'm sure everyone did this in school (we did, in fact). | gugagore wrote: | It's not obvious when you should switch over to an | approximation (base case), so I'd say it's not a good example | to introduce recursion. I have never seen it, and I did not | do it in school. | jhgb wrote: | It's not obvious but that makes it a nice example in | parameter optimization as well. | scythe wrote: | It's "obvious" if you remember how the Taylor series for | sin(x) works. The error between x and sin(x) is bounded | above by x^3/6. So there you go. | aaaronic wrote: | When the difference between iterations is less than | whatever threshold of accuracy you're looking for, I'd | assume. | | "Relaxation" algorithms for calculating fields work that | way. | dekhn wrote: | IIRC this comes up in the derivation of the boltzmann | distribution. I also recall stirlings approximation for x!. | nwallin wrote: | It's all well and good to have a library with fast sin/cos/tan | functions, but always remember to cheat if you can. | | For instance, in a game, it's common to have a gun/spell or | whatever that shoots enemies in a cone shape. Like a shotgun or | burning hands. One way to code this is to calculate the angle | between where you're pointing and the enemies, (using arccos) and | if that angle is small enough, apply the damage or whatever. | | A better way to do it to take the vector where the shotgun is | pointing, and the vector to a candidate enemy is, and take the | dot product of those two. Pre-compute the cosine of the angle of | effect, and compare the dot product to that -- if the dot product | is higher, it's a hit, if it's lower, it's a miss. (you can get | away with _very_ rough normalization in a game, for instance | using the rqsrt instruction in sse2) You 've taken a potentially | slow arccos operation and turned it into a fast handful of basic | float operations. | | Or for instance, you're moving something in a circle over time. | You might have something like float angle = 0 | for (...) angle += 0.01 <something with | sin(angle) cos(angle)> | | Instead, you might do: const float sin_dx = | sin(0.01) const float cos_dx = cos(0.01) float | sin_angle = 0 float cos_angle = 1 for (...) | const float new_sin_angle = sin_angle * cos_dx + cos_angle * | sin_dx cos_angle = cos_angle * cos_dx - sin_angle * | sin_dx sin_angle = new_sin_angle | | And you've replaced a sin/cos pair with 6 elementary float | operations. | | And there's the ever popular comparing squares of distances | instead of comparing distances, saving yourself a square root. | | In general, inner loops should never have a transcendental or | exact square root in them. If you think you need it, there's | almost always a way to hoist from an inner loop out into an outer | loop. | aj7 wrote: | So 57 years ago, there weren't any programming books. Well, there | was one. McCracken and Dorn, Numerical Methods and Fortran | Programming. A kindly childless couple, Ben and Bluma Goldin, who | lived in the same apartment house in Brooklyn, bought me this | book as a gift. An axiomatic lesson from the book was distrust of | Taylor expansions, and the necessity of moving the expansion | point of any function to a favorable location. Because harmonic | functions are very well behaved, it should also be obvious that | even a very coarse lookup table + interpolation should converge | to high accuracy quickly. Finally, use of trigonometric | identities to zero in on the desired angle is helpful. OK I'll | shut up now. | ok123456 wrote: | A lot of Basic implementations, that had software support for | floating point numbers, implemented it like this. Microsoft BASIC | used a 6th degree polynomial iirc. | nonrandomstring wrote: | Audio stuff uses different methods depending on the application. | Additive oscillators need to be accurate and can use some closed | forms and series, mostly people use Taylor and quadratic | interpolation as the article shows. For tables remember you only | really need to compute amd store one quadrant and shift it | around. But for LFOs people use all manner of dirty and quick | methods like Bhaskara. | | https://en.wikipedia.org/wiki/Bhaskara_I%27s_sine_approximat... | dilyevsky wrote: | I remember researching cordic for some pet fpga project i had and | finding some libs/discussions so it's definitely used. | xioxox wrote: | Here's a solution which is vectorizable: | http://gallium.inria.fr/blog/fast-vectorizable-math-approx/ | | I've also read that Chebyshev polynomials are far better for | approximating functions than Taylor series (eg | https://en.wikipedia.org/wiki/Approximation_theory) | freemint wrote: | This matters less once you put things through the horner | scheme. And if you are not you are leaving performance on the | table. | dheera wrote: | > So we are really only calculating cosine from 0 to pi. | | You could go further and only to 0 to pi/2 as it is mirrored and | flipped for pi/2 to pi. | | Or you could go even further and do only 0 to pi/4 and use a | simple trig identity and your presumably parallely-developed sin | function for the cos of pi/4 to pi. | | > One of the implementations is nearly 3x faster than math.h | | Is this true even for -O3? | blt wrote: | If you want sinusoidal oscillation over time, then you can | integrate the second-order differential equation x'' = -o2 x | instead. This only costs a few arithmetic instructions per time | step. On the other hand, it adds dependency on a state other than | the current time, and you must be careful to use energy- | conserving integration. | duped wrote: | Here's a quick way of doing it with one matrix multiplication, | you just need to precompute cos(o) and sin(o) with a bit of | intuition: [ x(n + 1) ] = [cos(o), -sin(o)] [ | x(n) ] [ y(n + 1) ] [sin(o), cos(o)] [ y(n) ] | where x(n) = cos(o n) y(n) = sin(o n) | x(0) = 1 y(0) = 0 o = frequency (radians) * | time_step | | There are two ways to look at this, from the systems | perspective this is computing the impulse response of a | critically stable filter with poles at e^(+/-jo). | Geometrically, it's equivalent to starting at (1, 0) and | rotating around the unit circle by o radians each time step. | You can offset to arbitrary phases. | | It's not suitable for numerous cases (notably if the frequency | needs to change in real time) but provided that the sum of the | coefficients sum to `1.0` it's guaranteed stable and will not | alias. | user_7832 wrote: | Slightly off topic but kudos to the author for explaining the | process extremely clearly. As someone without experience of low | level programing this was quite understandable for me (albeit | with uni-level maths knowledge). | em3rgent0rdr wrote: | Speaking of lookup tables, I recall reading the DX-7 | (https://www.righto.com/2021/11/reverse-engineering-yamaha-dx... | shared a few months ago on HN), that DX-7 has a 4096-entry 14-bit | sine ROM built into the chip itself. | anonymousiam wrote: | I did the same thing for sqrt() seven years ago. It benchmarked | at about five times faster than math.h/libm, but the trade was a | reduction in the maximum value of the result. My version would | not produce accurate results for inputs greater than 2^16. It did | work very well for generating real-time laser pointing X/Y | coordinates for an Archimedean Spiral. | | https://en.wikipedia.org/wiki/Archimedean_spiral | | (This was on a MSP430 platform with a FPU that only did | multiplication.) | adgjlsfhk1 wrote: | I'm surprised this was faster than an initial guess (half the | exponent and mantisa) followed by newton's method. | [deleted] | ThePhysicist wrote: | I remember doing this in Macromedia Flash as I wanted to make | stuff go in circles and the language didn't have any higher math | functions back then. Good ole times. | todd8 wrote: | The article uses a game movement function as a possible | motivation for needing the cosine. If object.rotation is constant | as the spiral animation seems to suggest, there is an important | optimization for rapidly computing the sequence: cos(theta), | cos(theta + delta), cos(theta + 2 _delta), cos(theta + 3_ delta), | ... | | This comes up in movement under constant rate of rotation as well | as Fourier transforms. See [1] for details, but the basic idea | uses the simple trig identities: cos(theta + | delta) = cos(theta) - (a*cos(theta) + b*sin(theta)) | sin(theta + delta) = sin(theta) - (a*sin(theta) - b*cos(theta)) | | The values for a and b must be calculated, but only once if delta | is constant: a = 2 sin^2(delta/2) b = | sin(delta) | | By using these relationships, it is only necessary to calculate | cos(theta) and sin(theta) _once_ for the first term in the series | in order to calculate the entire series (until accumulated errors | become a problem). | | [1] Press, William H. et. al., _Numerical Recipes_ , Third | Edition, Cambridge University Press, p. 219 | dancsi wrote: | The implementation of the __cos kernel in Musl is actually quite | elegant. After reducing the input to the range [-pi/4, pi/4], it | just applies the best degree-14 polynomial for approximating the | cosine on this interval. It turns out that this suffices for | having an error that is less than the machine precision. The | coefficients of this polynomial can be computed with the Remez | algorithm, but even truncating the Chebyshev expansion is going | to yield much better results than any of the methods proposed by | the author. | brandmeyer wrote: | This has been the standard algorithm used by every libm for | decades. Its not special to Musl. | enriquto wrote: | But isn't this code rarely called in practice? I guess on | intel architectures the compiler just calls the fsin | instruction of the cpu. | stephencanon wrote: | No. FSIN has accuracy issues as sibling mentions, but is | also much slower than a good software implementation (it | varies with uArch, but 1 result every ~100 cycles is | common; even mediocre scalar software implementations can | produce a result every twenty cycles). | jcelerier wrote: | Wasn't there some blog article a few years ago which showed | how glibc's implementation was faster than fsin ? | chrisseaton wrote: | > I guess on intel architectures the compiler just calls | the fsin instruction of the cpu. | | Do people do that in practice? It's on the FPU, which is | basically legacy emulated these days, and it's inaccurate. | enriquto wrote: | > the FPU, which is basically legacy | | you'll pry my long doubles from my cold, dead hands! | chrisseaton wrote: | You're selectively quoting me - I said it's 'legacy | emulated'. It's emulated using very long-form microcode, | so basically emulated in software. I didn't say it was | simply 'legacy'. | jhgb wrote: | The question is if you wouldn't be better served with | double-doubles today. You get ~100 bits of mantissa AND | you can still vectorize your computations. | enriquto wrote: | Sure. There should be a gcc flag to make "long double" | become quadruple precision. | | The thing is, my first programming language was x86 | assembler and the fpu was the funniest part. Spent weeks | as a teenager writing almost pure 8087 code. I have a lot | of emotional investment in that tiny rolling stack of | extended precision floats. | colejohnson66 wrote: | How is "quad floating point" implemented on x86? Is it | software emulation? | jhgb wrote: | FSIN _only_ works on x87 registers which you will rarely | use on AMD64 systems -- you really want to use at least | scalar SSE2 today (since that is whence you receive your | inputs as per typical AMD64 calling conventions anyway). | Moving data from SSE registers to the FP stack _just_ to | calculate FSIN and then moving it back to SSE will probably | kill your performance even if your FSIN implementation is | good. If you 're vectorizing your computation over 4 double | floats or 8 single floats in an AVX register, it gets even | worse for FSIN. | stephencanon wrote: | Moving between x87 and xmm registers is actually fairly | cheap (it's through memory, so it's not free, but it's | also not _that_ bad). FSIN itself is catastrophically | slow. | jhgb wrote: | Fair enough, and I imagine there may even be some | forwarding going on? There often is when a load follows a | store, if I remember correctly. (Of course this _will_ be | microarchitecture-dependent.) | [deleted] | adgjlsfhk1 wrote: | No. The fsin instruction is inaccurate enough to be | useless. It gives 0 correct digits when the output is close | to 0. | enriquto wrote: | > 0 correct digits when the output is close to 0 | | this is an amusing way to describe the precision of sub- | normal floating point numbers | adgjlsfhk1 wrote: | It's not just sub-normal numbers. As | https://randomascii.wordpress.com/2014/10/09/intel- | underesti... shows, fsin only uses 66 bits of pi, which | means you have roughly no precision whenever | abs(sin(x))<10^-16 which is way bigger than the biggest | subnormal (10^-307 or so) | lifthrasiir wrote: | It is much more amusing if you describe it in ulps; for | some inputs the error can reach > 2^90 ulps, more than | the mantissa size itself. | lifthrasiir wrote: | x87 trig functions can be very inaccurate due to the | Intel's original sloppy implementation and subsequent | compatibility requirements [1]. | | [1] http://nighthacks.com/jag/blog/134/index.html | toolslive wrote: | Yup. Chebyshev is the way to go (after 30s of consideration) | potiuper wrote: | https://github.com/ifduyue/musl/blob/master/src/math/__cos.c | ufo wrote: | I think the polynomial calculation in the end looks | interesting. It doesn't use Horner's rule. | lifthrasiir wrote: | It does use Horner's rule, but splits the expression into | two halves in order to exploit instruction-level | parallelism. | jacobolus wrote: | Considering the form of both halves is the same, are | compilers smart enough to vectorize this code? | llimllib wrote: | > Input y is the tail of x. | | What does "tail" mean in this context? | lifthrasiir wrote: | It is a double-double representation [1], where a logical | fp number is represented with a sum of two machine fp | numbers x (head, larger) and y (tail, smaller). This | effectively doubles the mantissa. In the context of musl | this representation is produced from the range reduction | process [2]. | | [1] https://en.wikipedia.org/wiki/Quadruple- | precision_floating-p... | | [2] https://github.com/ifduyue/musl/blob/6d8a515796270eb6ce | c8a27... | jhgb wrote: | Does it make sense to use a double-double input when you | only have double output? Sine is Lispchitz-limited by 1 | so I don't see how this makes a meaningful difference. | lifthrasiir wrote: | The input might be double but the constant pi is not. Let | f64(x) be a function from any real number to double, so | that an ordinary expression `a + b` actually computes | f64(a + b) and so on. Then in general f64(sin(x)) may | differ from f64(sin(f64(x mod 2pi))); since you can't | directly compute f64(sin(x mod 2pi)), you necessarily | need more precision during argument reduction so that | f64(sin(x)) = f64(sin(f64timeswhatever(x mod 2pi))). | jhgb wrote: | But am I correct in thinking that is at worst a 0.5 ulp | error in this case? The lesser term in double-double | can't be more than 0.5 ulp of the greater term and | sensitivity of both sine and cosine to an error in the | input will not be more than 1. | | Also, in case of confusion, I was specifically commenting | on the function over the [-pi/4, pi/4] domain in https:// | github.com/ifduyue/musl/blob/master/src/math/__cos.c , | which the comment in | https://news.ycombinator.com/item?id=30846546 was | presumably about. | adgjlsfhk1 wrote: | Double rounding can still bite you. You are forced to | incur up to half an ulp of error from your polynomial, so | taking another half ulp in your reduction can lead to a | total error of about 1 ulp. | lifthrasiir wrote: | Yeah, sine and cosine are not as sensitive (but note that | many libms target 1 or 1.5 ulp error for them, so a 0.5 | ulp error might still be significant). For tangent | however you definitely need more accurate range | reduction. | [deleted] | skulk wrote: | I learned how to write programs from Scratch[0]. Back in the | pre-1.4 days (IIRC) there was no native sin/cos functions, so | they had to be "implemented" somehow to translate arbitrary | angles into delta (x,y). The way that I found people did this was | as follows (for sin(A), cos(A)): | | Invisible sprite: | | [ when I receive "sine" ]: [ go to center ] | [ turn A degrees ] [ move 1 step forward ] | | Now, anyone that wants to know sin(A), cos(A) just has to read | that sprite's X and Y position. | | [0]: https://scratch.mit.edu | em3rgent0rdr wrote: | I guess scratch (or LOGO) is Turing complete, so someone could | make a CPU built using a little turtle in a box where all | functionality is built out of LOGO commands. | aaaronic wrote: | I teach lookup tables specifically for sine/cosine (we just use | one with a phase shift and modulus) on the Gameboy Advance. The | hardware has no floating point unit support, so we do it in 8.8 | fixed point with 1-degree resolution. | | It turns out the BIOS on the hardware can do sine/cosine, but you | have to do that via inline assembly (swi calls) and it's no | faster than the simple lookup table. | | This is all just to get terms for the affine matrix when rotating | sprites/backgrounds. | waynecochran wrote: | You are doing the polynomial evaluation wrong for Taylor Series. | First of all you should be using Horner's Method to reduce the | number of multiplications needed and then you should be using FMA | to increase precision of the solution. | https://momentsingraphics.de/FMA.html | phonebucket wrote: | This is a fun article! | | Alternative avenues that complement the approaches shown: | | - Pade Approximants | (https://en.wikipedia.org/wiki/Pad%C3%A9_approximant) can be | better than long Taylor Series for this kind of thing. | | - Bhaskara I's sin approximation | (https://en.wikipedia.org/wiki/Bhaskara_I%27s_sine_approximat...) | is easily adaptable to cosine, remarkably accurate for its | simplicity and also fast to calculate. | deepspace wrote: | I was surprised that pre-calculating x _x as double xx = x_ x | made a difference. | | The author's compiler must not be great at optimization if this | is the case. | msk-lywenn wrote: | My favorite implementation of a super fast but not that accurate | cosinus is from a now defunct forum. Good thing we have | webarchive ! | https://web.archive.org/web/20111104163237/http://www.devmas... | karma_fountain wrote: | You can also use the Chebyshev polynomials to economise a | higher degree taylor series with bit of a loss of accuracy. | slim wrote: | you can't because boycott Chebyshev | rurban wrote: | You can't boycott Chebyshev, he died 1894 or so. | gweinberg wrote: | I think you should be able to get good results for precalculating | the cosines for a set of angles and then using the formula for | the cosine of the sum of 2 angles combined with the taylor | series. The series converges very quickly for very small angles, | right? | dboreham wrote: | This takes me back in time, to c. 1982 when I got curious about | how computers calculated trig functions. Luckily I was using one | of the first OSS systems (modulo AT&T lawyers): Unix V7. So I | pulled up the source and took a look. You can still see it here: | https://github.com/v7unix/v7unix/blob/master/v7/usr/src/libm... | There's a comment referencing "Hart & Cheney", which is this book | : https://www.google.com/books/edition/Computer_Approximations... | Also luckily I had access to a library that had the book which I | was able to read, at least in part. This taught me a few things : | that the behavior of a computer can be a white box, not a black | box; the value of having access to the source code; to use | comments effectively; that for many difficult tasks someone | probably had written a book on the subject. Also that polynomials | can approximate transcendental functions. | hatsunearu wrote: | Since cosine is even and periodic, you really only need like, up | to x = 0 to pi/4 and the rest you can infer just from that data. | mhh__ wrote: | That is how it's often implemented i.e. shift back to the | domain you actually need to be able to compute. | stncls wrote: | This is a very nice article, and I think the three initial | objectives are fully attained (simple enough, accurate enough, | fast enough). | | Some of the performance claims have a caveat, though: Lookup | tables and micro-benchmarks don't mix well. At all. | | I just added a simple loop that thrashes the L3 cache every 500 | iterations (diff here: | https://goonlinetools.com/snapshot/code/#sm4fjqtvjyn36dyednc...). | Now the method recommended at the end (cos_table 0_001 LERP) is | _slower_ than glibc 's cos() (while still having an accuracy that | is more than 10^8 times worse)! | | Time benchmark output: cos_table_1_LERP | 0.0038976235167879 cos_table_0_1_LERP | 0.0042602838286585 cos_table_0_01_LERP | 0.0048867938030232 cos_table_0_001_LERP | 0.0091254562801794 cos_table_0_0001_LERP | 0.0139627164397000 cos_math_h | 0.0089332715693581 | | Lookup table sizes: cos_table_1_LERP | 64 bytes cos_table_0_1_LERP 512 bytes | cos_table_0_01_LERP 5040 bytes | cos_table_0_001_LERP 50280 bytes | cos_table_0_0001_LERP 502664 bytes | adgjlsfhk1 wrote: | How long does the cache thrashing loop take compared to 500 | iterations? I'm not sure how much you can trash the cache while | still being performance bottle-necked by a table-based | implementation of a trig function. | stncls wrote: | Yes of course the cache thrashing takes a lot longer. But I | modified the code to time only the cos() functions (see | diff), using the rdtsc instruction. That's why it had to be | every 500 iterations: If it was after each iteration, then | rdtsc itself would become the bottleneck! | adgjlsfhk1 wrote: | My point wasn't about how you were timing the code, what I | meant is in the real world, I don't think there are | programs where this will be an issue because if they are | thrashing their caches too much, they won't be bottlenecked | on trig. The advantage of table based functions is that | they are really fast if you are calling a lot of them at | once, but that's also the only case where they need to be | fast. If only 1/10000 instructions is trig, it's OK if the | trig is a little slower. | anugrahit wrote: | charlieyu1 wrote: | I think off-zero Taylor series along with lookup table should be | at least tried. You can approximate cos(x+a) using only cos(a) | and sin(a) and the series converges really fast | SeanLuke wrote: | A LERP lookup table is fast; but only somewhat slower, and | probably quite a bit more accurate, would be a lookup table | interpolated by a Catmull-Rom spline. | marcodiego wrote: | I don't know if this benchmark is trustworthy. Using a lookup | table pollutes cache. It definitely needs to be compared with | interactions on other parts of the code to get a good performance | measurement. | naoqj wrote: | >I couldn't find any notable projects on GitHub that use a table | for trig functions, but I'm sure they exist. | | Doom? | zekica wrote: | Doom also uses the famous fast inverse square root method which | although I know how it works still amazes me. | naoqj wrote: | It doesn't, to the best of my knowledge. It appeared in an id | game for the first time in Quake 3. | freemint wrote: | Julia? | adgjlsfhk1 wrote: | Julia uses a very similar approach to musl here. Reduction | mod 2pi followed by polynomial kernel. We use tables for | `exp` and `log`, but the trig functions have good reductions | so we use those instead. The implementation is here https://g | ithub.com/JuliaLang/julia/blob/03af78108afe6058f54c... for | anyone interested. | dekhn wrote: | lookup tables and lerp are the most common solution I've seen | (for a wide range of functions). You can also memorize with an | LRU cache. | phkahler wrote: | >> lookup tables and lerp are the most common solution I've | seen | | The author missed one nice table method. When you need sin() | and cos() split the angle into a coarse and fine angle. Coarse | might be 1/256 of a circle and fine would be 1/65536 of a | circle (whatever that angle is). You look up the sin/cos of the | coarse angle in a 256 entry table. Then you rotate that by the | fine angle which uses another 256 sin * cos entries. The | rotation between coarse values is more accurate than linear | interpolation and give almost the same accuracy as a 65536 | entry table using only 768 entries. You can use larger tables | or even another rotation by finer angle steps to get more | accuracy. | LeifCarrotson wrote: | > _Then you rotate that by the fine angle_ | | Explain? | | Are you trying to reduce the error in the linear | interpolation by describing the convex curve between the | endpoints? | | The derivative of the cosine is the sine, and the derivative | of the sine is the cosine...so I'd expect the required output | of the fine angle table to interpolate 0.15 between 0.1 and | 0.2 and the required output to interpolate 45.15 between 45.1 | and 45.2 degrees will be very different. | marginalia_nu wrote: | Remember that cos(a+b)=cos(a)cos(b)-sin(a)sin(b) | phkahler wrote: | >> Explain? | | Yeah I thought that might not be clear enough. Let's say | you want 0.1 degree accuracy but don't want a 3600 entry | table. Make one table for every 1 degree. You get to use | the same table for sin and cos by changing the index. That | is the coarse table with 360 entries. Then make another | table with sin and cos values for every 0.1 degrees, or | specifically 0.1*n for n going from 0-9. This is another 20 | values, 10 sin and 10 cos for small angles. | | Take your input angle and split it into a coarse (integer | degrees) and fine (fractional degrees) angle. Now take the | (sin(x),cos(x)) from the coarse table as a vector and | rotate it using the sin & cos values of the fine angle | using a standard 2x2 rotation matrix. | | You can size these tables however you need. I would not use | 360 and 10 for their sizes, but maybe 256 and 256. This can | also be repeated with a 3rd table for "extra fine" angles. | marginalia_nu wrote: | > You can also memorize with an LRU cache. | | That's probably not going to perform well for an operation this | fast. The computation is faster than main memory read. | dekhn wrote: | You can fit quite a large LRU cache in the L2 cache of the | processor. You never want to go to main memory for a lerp | table. | Sharlin wrote: | In some cases (eg. oldskool 2D graphics) a 256-entry LUT is | enough. And a 8-bit fixed point format can be enough so the | whole table fits in 256 bytes. | munificent wrote: | Yeah, lerping into a lookup table is common in audio and (old | school) games. | | A common trick to make that faster is to not use doubles and | radians to represent the phase. Instead, represent phase using | an integer in some power-of-two range. That lets you truncate | the phase to fit in a single period using a bit mask instead of | the relatively slow modulo. ___________________________________________________________________ (page generated 2022-03-29 23:00 UTC)