[HN Gopher] Intel's port 7 AGU blunder (2019) ___________________________________________________________________ Intel's port 7 AGU blunder (2019) Author : nkurz Score : 43 points Date : 2020-06-14 22:19 UTC (1 days ago) (HTM) web link (blogs.fau.de) (TXT) w3m dump (blogs.fau.de) | nkurz wrote: | Does anyone have a good answer to his "Why" at the end? | | _What's a total mystery to me is why Intel chose to build an AGU | that cannot handle all kinds of addresses. In 2017, it was | indicated to me that there "was not enough space on the die." I | find this hard to believe, especially because the problem | prevailed in (at least) three further generations of Intel CPUs | after Haswell._ | | Is die space really a plausible answer as to why Intel would | bother to put in a second AGU, but cripple it so that it can only | work on "simple" addresses that even their own compiler is not | "smart" enough to generate? | jjoonathan wrote: | Simple + plausible explanation: approximately nobody uses fancy | instructions with fancy addressing, so they butcher-chopped | some verilog (does intel use verilog?) in one generation to | meet a target and the chopped code outlived its original | purpose. | | Does OP really think legacy code dynamics don't apply to intel? | | Wasn't there a bug not long ago where you could thermally kill | a CPU by pulling clever tricks to get vector unit utilization | higher than was supposed to be possible (presumably because | nobody cared enough to precisely figure out "possible")? | Traster wrote: | Having had conversations with some hardware designers who work | on similar, but not identical problems often it's not a matter | of space on the die directly. There's a few things that happen | that can lead to these results. Often what's happening is you | have existing functionality on the die for other purposes - | whether that be some accumulators or arithmetic functions, that | you can minimally tweak to get a new function with practically | no extra cost - maybe just some control logic. It would make a | lot of sense to me in this case that the reason they have the | reduced functionality is because they didn't design an AGU at | all, they found a way to serve that purpose with almost | existing hardware. Obviously there are other reasons too though | -such as not having the routing resource or deciding a integer | multiplier was too expensive whereas what appears to just be an | adder is actually quite cheap. | gchadwick wrote: | > Is die space really a plausible answer as to why Intel would | bother to put in a second AGU, but cripple it so that it can | only work on "simple" addresses that even their own compiler is | not "smart" enough to generate? | | Maybe, it's possible whoever was responsible for handling this | part of the core was given a strict area budget and adding in | the full set of addressing modes pushed them over. In | retrospect increasing that area budget to allow it could have | been the smart move but not all decisions in CPU | microarchitecture are necessarily well informed! Perhaps in | this case they did in investigate and with the benchmark set | they were using decided simple addressing modes only didn't | have a large performance impact. | | Another possibility is they wanted to add the full set of | addressing modes and did so, then found this caused a timing | issue (as in it produced a critical path that would reduce | maximum frequency too much) or power issue and the way around | the timing/power issue involved a large amount of work they | didn't have time to do and/or the issue was discovered late in | the day and the timing fix would have introduced some | interesting new corner cases they felt increased verification | risk too much. Given it's persisted over a few generations this | is maybe less likely but perhaps the timing issue was so severe | they couldn't work around it over multiple micro-architectures | (or simply can not be fixed). Intel may simply care less about | the restriction than the author (i.e. no-one who spends lots of | money with them has kicked up a fuss about it, nor do they | think the performance restriction is causing them a large issue | which is costing them sales). | | (I used to work at arm designing A-class CPUs. Finding your new | microarchitecture that could execute X things at once had some | annoying timing issue when doing so, often related to exception | handling or something else rare, that required you to add odd | restrictions to work around and then trying to understand how | much the odd restrictions impacted performance and whether you | should go a different way to avoid the restrictions was a | common part of the job) | | Edit: Oh and another reason you may point to die space is | whilst the feature itself may not add much extra area it can | push other things apart, making for longer wires, meaning | larger buffers to drive them to make timing (or they're simply | too far part to meet timing) so more area and power for those | buffers. | cesarb wrote: | My uninformed guess: | | Following the link to SO, from there the link to RWT, and from | there the link to the previous page in the review, it mentions | "The new port 6 on Haswell is a scalar integer port. It only | accesses the GPRs and the integer bypass network." which makes | me think that the "not enough space on the die" might not refer | to the space for the Port 7 Store AGU itself, but to the | resources it uses. The address modes with the index register | need to read _two_ registers, while allowing only | "base+offset" means it only has to read _one_ register. I don | 't know how expensive an extra read connection to the integer | bypass network would be, but I have read that read ports for | the register file are expensive, to the point where having two | identical copies of the register file (written in parallel) to | reduce the number of read ports per copy can be considered a | valid optimization. So the "not enough space" might also be for | the register file (AFAIK register files use a lot of area, | which is why RISC-V has a variant with half the number of | registers for some embedded use cases) or its connections. ___________________________________________________________________ (page generated 2020-06-15 23:00 UTC)