[HN Gopher] Intel's port 7 AGU blunder (2019)
       ___________________________________________________________________
        
       Intel's port 7 AGU blunder (2019)
        
       Author : nkurz
       Score  : 43 points
       Date   : 2020-06-14 22:19 UTC (1 days ago)
        
 (HTM) web link (blogs.fau.de)
 (TXT) w3m dump (blogs.fau.de)
        
       | nkurz wrote:
       | Does anyone have a good answer to his "Why" at the end?
       | 
       |  _What's a total mystery to me is why Intel chose to build an AGU
       | that cannot handle all kinds of addresses. In 2017, it was
       | indicated to me that there "was not enough space on the die." I
       | find this hard to believe, especially because the problem
       | prevailed in (at least) three further generations of Intel CPUs
       | after Haswell._
       | 
       | Is die space really a plausible answer as to why Intel would
       | bother to put in a second AGU, but cripple it so that it can only
       | work on "simple" addresses that even their own compiler is not
       | "smart" enough to generate?
        
         | jjoonathan wrote:
         | Simple + plausible explanation: approximately nobody uses fancy
         | instructions with fancy addressing, so they butcher-chopped
         | some verilog (does intel use verilog?) in one generation to
         | meet a target and the chopped code outlived its original
         | purpose.
         | 
         | Does OP really think legacy code dynamics don't apply to intel?
         | 
         | Wasn't there a bug not long ago where you could thermally kill
         | a CPU by pulling clever tricks to get vector unit utilization
         | higher than was supposed to be possible (presumably because
         | nobody cared enough to precisely figure out "possible")?
        
         | Traster wrote:
         | Having had conversations with some hardware designers who work
         | on similar, but not identical problems often it's not a matter
         | of space on the die directly. There's a few things that happen
         | that can lead to these results. Often what's happening is you
         | have existing functionality on the die for other purposes -
         | whether that be some accumulators or arithmetic functions, that
         | you can minimally tweak to get a new function with practically
         | no extra cost - maybe just some control logic. It would make a
         | lot of sense to me in this case that the reason they have the
         | reduced functionality is because they didn't design an AGU at
         | all, they found a way to serve that purpose with almost
         | existing hardware. Obviously there are other reasons too though
         | -such as not having the routing resource or deciding a integer
         | multiplier was too expensive whereas what appears to just be an
         | adder is actually quite cheap.
        
         | gchadwick wrote:
         | > Is die space really a plausible answer as to why Intel would
         | bother to put in a second AGU, but cripple it so that it can
         | only work on "simple" addresses that even their own compiler is
         | not "smart" enough to generate?
         | 
         | Maybe, it's possible whoever was responsible for handling this
         | part of the core was given a strict area budget and adding in
         | the full set of addressing modes pushed them over. In
         | retrospect increasing that area budget to allow it could have
         | been the smart move but not all decisions in CPU
         | microarchitecture are necessarily well informed! Perhaps in
         | this case they did in investigate and with the benchmark set
         | they were using decided simple addressing modes only didn't
         | have a large performance impact.
         | 
         | Another possibility is they wanted to add the full set of
         | addressing modes and did so, then found this caused a timing
         | issue (as in it produced a critical path that would reduce
         | maximum frequency too much) or power issue and the way around
         | the timing/power issue involved a large amount of work they
         | didn't have time to do and/or the issue was discovered late in
         | the day and the timing fix would have introduced some
         | interesting new corner cases they felt increased verification
         | risk too much. Given it's persisted over a few generations this
         | is maybe less likely but perhaps the timing issue was so severe
         | they couldn't work around it over multiple micro-architectures
         | (or simply can not be fixed). Intel may simply care less about
         | the restriction than the author (i.e. no-one who spends lots of
         | money with them has kicked up a fuss about it, nor do they
         | think the performance restriction is causing them a large issue
         | which is costing them sales).
         | 
         | (I used to work at arm designing A-class CPUs. Finding your new
         | microarchitecture that could execute X things at once had some
         | annoying timing issue when doing so, often related to exception
         | handling or something else rare, that required you to add odd
         | restrictions to work around and then trying to understand how
         | much the odd restrictions impacted performance and whether you
         | should go a different way to avoid the restrictions was a
         | common part of the job)
         | 
         | Edit: Oh and another reason you may point to die space is
         | whilst the feature itself may not add much extra area it can
         | push other things apart, making for longer wires, meaning
         | larger buffers to drive them to make timing (or they're simply
         | too far part to meet timing) so more area and power for those
         | buffers.
        
         | cesarb wrote:
         | My uninformed guess:
         | 
         | Following the link to SO, from there the link to RWT, and from
         | there the link to the previous page in the review, it mentions
         | "The new port 6 on Haswell is a scalar integer port. It only
         | accesses the GPRs and the integer bypass network." which makes
         | me think that the "not enough space on the die" might not refer
         | to the space for the Port 7 Store AGU itself, but to the
         | resources it uses. The address modes with the index register
         | need to read _two_ registers, while allowing only
         | "base+offset" means it only has to read _one_ register. I don
         | 't know how expensive an extra read connection to the integer
         | bypass network would be, but I have read that read ports for
         | the register file are expensive, to the point where having two
         | identical copies of the register file (written in parallel) to
         | reduce the number of read ports per copy can be considered a
         | valid optimization. So the "not enough space" might also be for
         | the register file (AFAIK register files use a lot of area,
         | which is why RISC-V has a variant with half the number of
         | registers for some embedded use cases) or its connections.
        
       ___________________________________________________________________
       (page generated 2020-06-15 23:00 UTC)