[HN Gopher] What is a flop? ___________________________________________________________________ What is a flop? Author : RafelMri Score : 43 points Date : 2023-09-05 08:50 UTC (14 hours ago) (HTM) web link (nhigham.com) (TXT) w3m dump (nhigham.com) | paulddraper wrote: | Floating Point Operations Per | nightmonkey wrote: | My last two startups. | HPsquared wrote: | You could use it as a unit of accounting for investing losses. | A gigaflop would be a loss of 1 billion, etc. | [deleted] | yujian wrote: | this | jjgreen wrote: | I'm usually a fan of Higham, but the last few posts have been | weak. | liotier wrote: | Flops ? | jjgreen wrote: | Very witty Wilde ... | MattyDub wrote: | Can somebody explain why a square root is also considered a flop? | Surely that involves more work than the other four operations the | article listed. Is there some hardware algorithm for the square | root that is as fast as (e.g.) division? | AlbertCory wrote: | Disappointed this is not about basketball. | dragontamer wrote: | My personal notes on this subject: | | * MIPS was perhaps the integer equivalent to FLOP, still used in | modern microcontrollers because the 8051 at 12MHZ would only | execute 1MIPS (12 clocks per instruction). Modern 8051 chips | obviously have sped up to 1 clock per instruction, but MIPS (and | Dhrystone MIPS in particular) are still a common benchmark today. | | * FLOPs is very difficult to calculate in theory because modern | CPUs have vector units, and multiple pipelines per core. You | could have 3x AVX512 instructions in parallel on today's CPUs on | a single core. | | * FLOPs we're traditionally a 64-bit operation for the | supercomputer community. Today, most FLOPs are 32-bit for video | games. Finally, the deep learning / neural net guys have | popularized 16-bit flops, and even 8-bit iops. | | * 'The' flop is a misnomer because it's almost always the | multiply-and-accumulate instruction: X = A + B * C. Which... Is | two operations per instruction (per shader/SIMD lane). Eeehhh | whatever. Who cares about these details? | | * As 'Dhrystone' is the benchmark for MIPS, the benchmark for | 64-bit flops is Linpack. | gumby wrote: | > MIPS was perhaps the integer equivalent to FLOP, still used | in modern microcontrollers because the 8051 at 12MHZ would only | execute 1MIPS (12 clocks per instruction) | | Actually the origin of this term was VAX MIPS (VAX 780 | specifically) because that was a ubiquitous, pretty fast for | its time minicomputer. There were faster machines, and slower | mainframes still being built, but that was what the late 70s | were like. | | When the 8051 was released in 1980 it surely didn't run at 12 | MHz! Back then the Z80 sold because it could run 8080 code at a | blistering 2 MHz. | | BTW the benchmark for FLOPS in those days was Whetstone, hence | the otherwise weird name "Dhrystone" | dragontamer wrote: | The 1981 manual for the 8051 contains numerous references to | 12MHz. | | http://bitsavers.informatik.uni- | stuttgart.de/components/inte... | | It was 12T clocked: even though the clock was 12MHz, it would | only operate at 1MHz / 1MIPS, because it took 12-clock-ticks | to even perform one addition. | | IIRC, there was a standard crystal (11.0592 MHz crystal?? I | forget exactly) for the communications at the time. So going | just above 11 MHz (or really, just above 11.0592 MHz) was | needed for reliable serial comms. | gumby wrote: | Wow, right on page 1-2! I'm surprised -- I don't remember | anything running that fast back then. Thanks. | | (Love those old Intel books too) | | Nevertheless, FWIW, MIPS started out as Vax MIPS, and at | first people often used to write "VAX MIPS". | segfaultbuserr wrote: | > 'The' flop is a misnomer because it's almost always the | multiply-and-accumulate instruction: X = A + B * C. Which... Is | two operations per instruction (per shader/SIMD lane). Eeehhh | whatever. Who cares about these details? | | If FMA is supported, it can either be counted as one or two | operations, depending on the rule of the benchmark involved or | the marketing of the processor. The marketing specification of | a processor's theoretical peak performance sometimes counts a | single-instruction FMA as two operations. On the other hand, | for the purpose of code profiling, counting FMA as one | operation is more realistic... As you said, who cares about | these details? | fluoridation wrote: | Given that the point of the FLOPS unit is to compare | processors, it does make more sense to count complex | instructions as more than a single floating-point operation. | If one CPU could multiply a 4x4 matrix by a vector in a | single instruction that can run a million times per second, | and another CPU needed ~32 instructions and so can only | multiply 500k matrices per second but retires 16 million | instructions in that same second, it would be silly to | compare instructions instead of multiplications and | additions. | dragontamer wrote: | As a computer-engineer, the circuit design needed to make a | fast multiplication operation (ie: Wallace Tree, and | similar) are an order-of-magnitude larger than the circuit | design needed for fast addition (ie: a Kogge-Stone Carry | lookahead Adder). | | This idea that additions and multiplications can be | combined like this as "equivalent operations" is kinda | bullshit. But hey, if its "how its done" (and its done this | way because multiply-then-add is how you do matrix- | multiplications...) then so be it. | | Just remember that this is an arbitrary subdivision of a | matrix multiplication operation, that may not have much | relevance as a benchmark outside of matrix multiplications. | fluoridation wrote: | It was just an example, not necessarily a realistic one. | The point is that we want to compare how quickly a | processor will compute our problem, not how many | instructions it's going to execute. If it was a car you | want to compare things like its top speed and | acceleration, not something inane like engine revolutions | per kilometer. You measure and compare things that are | relevant to the user, not implementation details. | namirez wrote: | Floating point operations per ...? | GenericDev wrote: | Floating Point Operations Per Second [1] | | [1] https://academickids.com/encyclopedia/index.php/FLOPS | Zambyte wrote: | > One should speak in the singular of a FLOPS and not of a | FLOP, although the latter is frequently encountered. The | final S stands for second and does not indicate a plural. | | The author of this post seems to have fallen for this error. | dataflow wrote: | I think their point was that the p stood for "per", not for | the second letter of "operation". ___________________________________________________________________ (page generated 2023-09-05 23:00 UTC)