[HN Gopher] IBM doubles its 14nm EDRAM density, adds hundreds of...
       ___________________________________________________________________
        
       IBM doubles its 14nm EDRAM density, adds hundreds of megabytes of
       cache
        
       Author : insulanian
       Score  : 94 points
       Date   : 2020-03-08 19:39 UTC (3 hours ago)
        
 (HTM) web link (fuse.wikichip.org)
 (TXT) w3m dump (fuse.wikichip.org)
        
       | baybal2 wrote:
       | I wonder if it will ever see a chance to go mainstream.
       | 
       | The memory bottleneck is pretty much the only thing in CPU design
       | that didn't see a dramatic improvement over the years. Its
       | elimination is the only obvious improvement pathway still left
       | with expectation of double digit performance gains.
       | 
       | So we need either very big and very fast caches, or extremely
       | wide and low latency memory. Both options are quite costly.
       | 
       | Adding on die DRAM that can work at least as fast as 500mhz will
       | surely require some specialty process with a lot of compromises
       | like the one in the article.
       | 
       | Gluing something like HBM2 to the die for a second option moves
       | the cost from the specialty process to the specialty packaging.
       | Not much better.
        
         | deepnotderp wrote:
         | Commodity DRAM latency is mostly array line dominated, not due
         | to proximity/distance, see eg
         | https://ieeexplore.ieee.org/document/6522354
         | 
         | Also eDRAM is difficult to scale
        
         | hinkley wrote:
         | With all of the speculation security bugs in chip cache
         | management, I can't help but wonder if we won't eventually go
         | full NUMA and turn the cache memory (or at least L2+) into a
         | directly addressable space, either by the kernel or directly by
         | application code. At which point your working set is explicitly
         | on the processor, instead of implicitly.
         | 
         | I also wonder if chiplets will be the vehicle by which this
         | comes to pass.
        
         | jiggawatts wrote:
         | > Gluing something like HBM2 to the die for a second
         | 
         | This is something AMD is already doing for EPYC, and they've
         | already used HBM2 in their GPUs.
         | 
         | So I'm surprised they haven't released any CPU models with
         | crazy huge L4 caches using a few GB of HBM2.
         | 
         | Then again, Intel made a laptop CPU with a huge 128MB cache and
         | their comment was that it didn't make that big of a difference.
         | I believe the performance boost was less than 5% for going from
         | 64MB to 128MB.
        
           | toohotatopic wrote:
           | Hasn't Intel bought the company that was on the brink of
           | producing those CPU-memory combos? Unfortunately, I haven't
           | been able to find the name of the company or an article about
           | it.
        
             | deepnotderp wrote:
             | You mean UpMem or Venray Technology?
        
               | toohotatopic wrote:
               | Sorry but I don't remember the name at all.
        
           | hinkley wrote:
           | Maybe there's an inflection point where imperative management
           | of the cache is more effective than heuristic management.
        
           | vvanders wrote:
           | Read access patterns matter more than cache sizes, triple
           | digit improvements are possible of you have linear reads.
        
         | dfox wrote:
         | The reason why mainstream DRAM interfaces are narrow and "slow"
         | is that you need row-at-a-time access patterns to really
         | saturate the interconnect which is something that does not
         | happen for general purpose workloads and causing such access
         | patterns requires large caches which by themselves solve the
         | issue and then also physical package pins and pad structures
         | are one of the most expensive things in semiconductor design.
         | 
         | In end the DRAM array is bunch of analog magic and the
         | interface works by copying the row you want into SRAM buffer on
         | the chip which you then can access however you want. And the
         | slowest operation in all that are the copies between SRAM row
         | buffer and the actual DRAM array. (what I call SRAM buffer is
         | usually called "column sense amplifiers", but for the highlevel
         | view it in fact is surprisingly wide array of 6T SRAM flipflops
         | and some analog magic)
        
           | hinkley wrote:
           | So how many levels of cache do we have now between the ALU
           | and the memory cell of record??
        
         | petra wrote:
         | Zeno semi talk about their 1T-sram which increases density by
         | 5x. Maybe that will work.
        
         | thedance wrote:
         | Intel laptop parts had 128MB of eDRAM starting in 2013. Is that
         | mainstream enough for you?
        
       | RantyDave wrote:
       | Twelve point two billion transistors. That's absolutely nuts.
       | Does anyone have a ballpark figure for how much a 'drawer' of
       | four of these things costs? What's it supposed to run, is this an
       | Oracle/DB2 beast?
        
         | tibbetts wrote:
         | There is a version of DB2 for mainframe, but it's a totally
         | different codebase as I understand it. The operating system on
         | a mainframe provides a lot of what modern app developers get
         | from their database and caching systems, generally with better
         | fault tolerance and availability. So if you have an app built
         | for mainframe, it often will not have an external database
         | dependency. Doing transactions can just look like writing to
         | memory or files.
        
         | rodgerd wrote:
         | When I was running Z for Linux the costing was in the order of
         | six figures per processor.
        
         | WC3w6pXxgGd wrote:
         | The cost will decrease over time, just like all tech.
        
       | magicalhippo wrote:
       | Impressive tech. How big is the market for these machines these
       | days? Like how many Z15 CPs would they expect to sell (assuming
       | each Z15 install can vary a lot in size).
        
         | microtherion wrote:
         | I suppose there is a world market for maybe five of those...
        
         | pm90 wrote:
         | This is likely catering specifically to IBM's customers who
         | have been them from the mainframe days and continue to rely on
         | IBM products (airline, banking etc.). The systems used by these
         | orgs are massive in complexity and I'm not sure how much they
         | want to invest in refactoring them to run on COTS hardware....
         | it probably doesn't make sense for them financially.
        
           | nabla9 wrote:
           | There is market for reliability and scale in compact size, so
           | they can get new customers.
           | 
           | Companies like Robinhood may discover that it's actually
           | cheaper to by reliable hardware and write software into it
           | than try to write software that is as reliable using COTS
           | hardware.
        
           | dfox wrote:
           | IBM started to market what is essentially z with only IFL CPs
           | as kind of k8s in a box so they obviously try to expand into
           | lower tier markets.
        
       | [deleted]
        
       | smartstakestime wrote:
       | This is my type of tech. Not glamorous but highly functional.
        
       ___________________________________________________________________
       (page generated 2020-03-08 23:00 UTC)