[HN Gopher] SiFive Tapes Out First 5nm TSMC 32-bit RISC-V Chip w...
       ___________________________________________________________________
        
       SiFive Tapes Out First 5nm TSMC 32-bit RISC-V Chip with 7.2 Gbps
       HBM3
        
       Author : pabs3
       Score  : 144 points
       Date   : 2021-04-15 06:29 UTC (1 days ago)
        
 (HTM) web link (www.tomshardware.com)
 (TXT) w3m dump (www.tomshardware.com)
        
       | zozbot234 wrote:
       | IIRC, 32-bit RISC-V is only intended for deep embedded workloads,
       | with 64-bit for general purpose compute. So a SoC w/ a single
       | 32-bit core would seem to be a less-than-ideal fit for the
       | cutting-edge 5nm process.
        
         | tyingq wrote:
         | The core is supposed to compete with the Cortex M7. The
         | smallest process M7 I can find is the STM32H7, which is 40nm.
        
           | makapuf wrote:
           | I rave for stm32 with high end processes (10nm or less),
           | whether that makes sense or not. I just love stm32..
        
         | dragontamer wrote:
         | Routers / Switches have extremely weird performance
         | characteristics, and I think that's what SiFive is targeting
         | with this chip.
         | 
         | * HBM3 for the highest memory bandwidth (10Gbps switches need
         | tons and tons of bandwidth. That's 10Gbps per direction per
         | connection, 8x ports is 160Gbps, and then that's multiplied
         | multiple times over by every memcpy / operation your chip
         | actually does. You need to DELIVER 160Gbps, which means your
         | physical RAM-bandwidth needs to be an order of magnitude
         | greater than that)
         | 
         | * Embedded 32-bit design for low-power usage.
         | 
         | * All switches have small, fixed size buffers. Memory capacity
         | is not a problem, its feasible to imagine useful switches and
         | routers (even 10Gbps, 40Gbps, or 100Gbps) that only have
         | hundreds-of-MBs of RAM. As such, 32-bit is sufficient and
         | 64-bit is a waste (You'd rather half your pointer memory
         | requirements with 32-bit pointers rather than go beyond 4GB
         | capacity)
        
           | GoblinSlayer wrote:
           | It's E76 with F set, and F set is huge compared to RV64I. And
           | the article proposes HPC as possible application.
        
           | rjsw wrote:
           | Routers need quite a bit of memory to handle IPv6.
           | 
           | Switches as an application of this makes sense.
        
           | jandrese wrote:
           | IPv6 address comparison on a 32 bit design is fairly awkward.
           | Switches won't care, but routers need to make routing
           | decisions.
        
           | foobiekr wrote:
           | While these are all good points, this really does not appear
           | to be a competitive NPU design on any axis that matters. I
           | don't know what this chip is for, but a router NPU it is not,
           | nor a switch. Maybe some soho switch or smart NIC, but those
           | have moved on far along the performance spectrum away from
           | the place where this would fit.
        
         | zibzab wrote:
         | Yeah, this seems like an odd move to me.
         | 
         | For this kinda of applications the static leakage of the newer
         | & smaller node will probably hurt rather than help.
        
         | [deleted]
        
         | justincormack wrote:
         | I think it means 32 bit floating point, not 32 bit CPU, as it
         | mentions "other relatively simplistic applications that do not
         | require full precision" but its a bit unclear.
        
           | phendrenad2 wrote:
           | The quote that stands out to me is that the core is "ideal
           | for applications which require high performance -- but have
           | power constraints (e.g., Augmented Reality and Virtual
           | Reality , IoT Edge Compute, Biometric Signal Processing, and
           | Industrial Automation)."
        
         | Fordec wrote:
         | With my industry / product management / business strategy hat
         | on, totally agree from SiFive's perspective.
         | 
         | With my early days electronics hat on, the 5nm process adds
         | additional energy performance gains that in conjunction with
         | RISCV in an embedded environment, especially in a battery
         | powered remote operation use case, has me salivating at what
         | could be achieved from a would-be customer perspective.
        
       | volta83 wrote:
       | HBM2 is like 2Tb/s, how is HBM3 7GB/s ?
        
         | hajile wrote:
         | HBM3 wasn't just supposed to be about speed. It also offers a
         | 512-bit option that doesn't require a silicon interposer. I'd
         | guess this was added to make cheaper consumer GPU designs
         | possible.
         | 
         | I suspect they're using the HBM2 spec for the narrow bus and
         | cheaper interposer while keeping speeds lower and only using a
         | couple stacks instead of the 16 or so HBM2 stacks required for
         | those 2Tb/s speeds you mention. It makes sense given that their
         | chip likely couldn't use a huge amount of bandwidth anyway.
        
         | virtuallynathan wrote:
         | I think that's per-Pin bandwidth?
        
         | vmception wrote:
         | HBM3 was expected to be like 4GB/s per pin which was seen as
         | double HBM2 per pin, so this is therefore almost even double
         | that, which is good news
         | 
         | The HBM2 total memory bandwidth is like 2TB/s, just different
         | scale
         | 
         | Anyway I could totally be using wrong nomenclature and
         | terminology, feel free to discuss, these aren't assertions or
         | aren't strongly held assertions
        
       | throwaway4good wrote:
       | What is the use case of this chip? I have the feeling it is some
       | way away from a general purpose CPU / SOC like the Apple M1?
        
         | 01100011 wrote:
         | RTFA?
         | 
         | > The SoC can be used for AI and HPC applications and can be
         | further customized by SiFive customers to meet their needs.
         | Meanwhile, elements from this SoC can be licensed and used for
         | other N5 designs without any significant effort.
         | 
         | > The SoC contains the SiFive E76 32-bit CPU core(s) for AI,
         | microcontrollers, edge-computing, and other relatively
         | simplistic applications that do not require full precision.
        
           | throwaway4good wrote:
           | So it is a proof of concept / demo of subcomponents someone
           | else may license? Is that a correct interpretation?
        
             | sanxiyn wrote:
             | Yes.
        
       | klelatti wrote:
       | How did SiFive get anywhere near 5nm TSMC?
        
         | baq wrote:
         | perhaps paid some money when the process wasn't booked till the
         | end of time
        
         | lizknope wrote:
         | They pay money just like any other customer of TSMC. SiFive has
         | a lot of buzz in the industry. I wouldn't be surprised that
         | TSMC wanted to work with them.
         | 
         | But there are other intermediary companies that help startups
         | group multiple chips from multiple companies together into a
         | single mask. This is called a "shuttle" and allows the
         | companies to split the costs of the masks (I've heard up to $30
         | million for 5nm)
         | 
         | SiFive is probably building about 2,000 of these chips for
         | development boards. They aren't trying to order a hundred
         | million like Nvidia.
        
           | klelatti wrote:
           | Thanks that's very interesting. No intention in any way to
           | belittle SiFive - just puzzled as to how they managed to get
           | onto this process when it's obviously so much in demand. Good
           | for them!
        
         | RicoElectrico wrote:
         | For test chips there is something called shuttle.
         | 
         | Other than that, foundries are known to sponsor IP development
         | on their processes.
        
         | snypher wrote:
         | "The tape out means that the documentation for the chip has
         | been submitted for manufacturing to TSMC, which essentially
         | means that the SoC has been successfully simulated. The silicon
         | is expected to be obtained in Q2 2021."
         | 
         | Would this mean the actual chip delivery may still be delayed?
        
           | StringyBob wrote:
           | Chip manufacturing has many steps. For a new leading edge
           | process it may take 3-6 months to get silicon back after
           | submitting the design to a silicon foundry for manufacturing.
           | 
           | For a small volume 'shuttle' run hopefully there won't be
           | delays, but this is not the same as having working chips!
           | 
           | The foundry will do initial checks it is manufacturable at
           | 'tapeout' when you submit your design, but you don't know for
           | sure if your chip works with intended functionality until you
           | get it back! You are relying on lots and lots of simulations
           | up front before your 'tape-out'.
           | 
           | Sometimes issues are found and a chip requires a re-spin -
           | basically another go with the bugs fixed. You want to do this
           | as few times as possible (ideally right first time) due to
           | cost and time of these iterations.
        
         | gumby wrote:
         | It's also in TSMC's marketing interest to product a small
         | number of RISC V parts with their latest process.
         | 
         | Plus it's probably fun for some of the people there.
        
       | ohazi wrote:
       | I know they're separate lines and capacity is sold well in
       | advance and all that, but this chip shortage still baffles me.
       | 
       | A startup can tape out a 5 nm chip, but STMicroelectronics can't
       | make any of their 40-130 nm microcontrollers for the next year?
       | 
       | Also car companies are supposedly the culprit, even though their
       | volume is only in the low tens of millions per year, and the
       | dustup is apparently over only six months of capacity? What? I
       | get that the auto industry is a nice reliable long-term source of
       | revenue for chip companies, but fabs should barely be sneezing at
       | that sort of volume.
        
         | lizknope wrote:
         | I'm in the semiconductor company.
         | 
         | I don't really understand your question.
         | 
         | Anyone can start a company and tape out a chip even in 5nm. My
         | previous startup did something similar. We used an intermediate
         | company between us and TSMC that specifically works with
         | smaller companies. They (or TSMC) will bundle together 4 to 20
         | chips into a common mask as a "shuttle" run. Shuttle runs are
         | really only used to get samples for the first version of your
         | chip. You can't really go to production with them because the
         | mask has chips from multiple different companies but this
         | allows all of the companies to share the mask costs (I've heard
         | up to $30 million for 5nm)
         | 
         | What is ST Micro talking about? I assume they can produce chips
         | but can't get the volume that they want. SiFive are probably
         | producing about 2,000 of these chips for development and test
         | boards. ST Micro would be buying in the hundreds of millions or
         | tens of billions range.
        
           | bogomipz wrote:
           | >" Shuttle runs are really only used to get samples for the
           | first version of your chip."
           | 
           | Is a "tape out" the same thing as a shuttle run/sample chip
           | run?
        
             | Kliment wrote:
             | a "tape out" is the process of transforming a design into a
             | physical die - i.e. a manufacturing run. It's when you hand
             | over a design to a foundry to do their thing with it.
        
           | zibzab wrote:
           | Sounds like OSH Park for silicon...
           | 
           | Anyway, I'm still not sure why SiFive is doing this. Seems
           | like a waste of money even as a prototype
        
             | lizknope wrote:
             | The article mentions that is is from the OpenFive division
             | of SiFive. OpenFive used to be Open Silicon and their
             | business model was working with other companies to take
             | their Verilog RTL and do all of the physical design
             | (synthesis to logic gates, place and route of the standard
             | cells, timing analysis, test vector generation) and then
             | work with the foundries to deliver all of the data for
             | manufacturing.
             | 
             | Since Open Silicon is now OpenFive and part of SiFive they
             | literally have all this experience in house and don't need
             | to depend on another company between them and TSMC.
             | 
             | https://en.wikipedia.org/wiki/Open-Silicon
        
             | variaga wrote:
             | SiFive is in the business of selling IP cores and back-end
             | implementation services. The gold standard for IP core
             | validation is "silicon proven" i.e. that it's not just a
             | nice theoretical design on paper, but someone has actually
             | turned it into a physical chip and tested the real life
             | performance.
             | 
             |  _Lots_ of people will try to sell you their designs and
             | services. Picking the wrong ones can waste millions of
             | dollars and months /years of time.
             | 
             | The money spent on this a prototype buys SiFive credibility
             | for both aspects of their business (assuming the chip
             | works) - "we were able to do this for ourselves, so you
             | know we'll be able to do it for you".
             | 
             | So it's not a waste, it's a marketing expense, and a
             | necessary one.
        
           | varispeed wrote:
           | Out of curiosity - what software is being used to design
           | chips? Is there anything within reach of a small company, or
           | something open source?
        
             | thechao wrote:
             | Front-end is HDLs -- (System)Verilog, VHDL, etc.
             | Implementation and formal will be Jasper & its ilk. Backend
             | (physical, etc.) use fab-specific bespoke software from the
             | majors (Cadence, NXP, MG, Synopsis, ...).
             | 
             | The front-end stuff could be done by _one person_ ;
             | Verilator is a great example (although it's now "in house"
             | to NXP). Implementation, LEC, etc. are mathematically
             | intimidating -- they're proof engines -- but doable by a
             | small team.
             | 
             | Physical _requires_ inside knowledge of the fabs. The fabs
             | aren 't going to let you participate unless you're a major,
             | because it costs them a lot of money, and each additional
             | participant is another potential leak of their critical IP.
             | 
             | The tooling is all "vertical" and starts on the backend. If
             | you can't do backend, you're not a player.
        
             | jecel wrote:
             | The commercial tools are indeed very expensive but the
             | required data files can be as much of a problem. Normally
             | you have to sign a bunch of NDAs (non disclosure
             | agreements) to get your hands on the design rules and
             | standard cell libraries supplied by the foundries and
             | required to make the tools work.
             | 
             | One effort to organize several previously available open
             | source tools into a practical system is OpenLane, which is
             | based on the DARPA OpenRoad project:
             | 
             | https://woset-workshop.github.io/PDFs/2020/a21.pdf
             | 
             | Recently, Google has financed a project where a foundry has
             | made its data files available without any NDAs:
             | 
             | https://github.com/google/skywater-pdk
             | 
             | The combination has made it possible to have completely
             | open source chip designs.
        
         | PragmaticPulp wrote:
         | > Also car companies are supposedly the culprit, even though
         | their volume is only in the low tens of millions per year, and
         | the dustup is apparently over only six months of capacity?
         | What? I get that the auto industry is a nice reliable long-term
         | source of revenue for chip companies, but fabs should barely be
         | sneezing at that sort of volume.
         | 
         | I agree. I think the blame on automakers has been blown out of
         | proportion. It doesn't make any sense that automakers cancelled
         | orders, then reinstated those orders again with some extra
         | demand, and now the entire chip market is stalled.
         | 
         | It's most likely due to the fact that consumer demand is up
         | everywhere. The pandemic didn't hit the economy nearly as hard
         | as expected, and we piled a lot of stimulus on top of that.
         | Savings rate went up a bit, but much discretionary spending was
         | diverted away from things like dining out and toward buying
         | consumer goods.
         | 
         | > STMicroelectronics can't make any of their 40-130 nm
         | microcontrollers for the next year
         | 
         | They're almost certainly making huge volumes of
         | microcontrollers, but they're all spoken for with orders from
         | the highest bidders.
         | 
         | We won't have inventory sitting on shelves again until fab
         | capacity isn't being 100% occupied by existing orders. Need
         | some surplus before we can get parts at DigiKey.
        
         | bravo22 wrote:
         | A lot of chips are made on mature fab lines because they don't
         | need the performance of 5nm lines or can't justify the mask
         | costs.
         | 
         | No one is investing in mature fab lines because they're not
         | leading edge and they're being run to amortize the initial
         | investmnet made into them years ago. Therefore not much
         | additional capacity for mature lines.
         | 
         | So yes you can see 5nm chips being taped out but the 40-130nm
         | chips are squeezed for capacity. Also this chip is likely not
         | running in the same crazy volumes that ST microcontrollers. It
         | is easier for TSMC to squeeze in a few dozen to a hundred
         | wafers for SiFive on their line.
        
           | dragontamer wrote:
           | > A lot of chips are made on mature fab lines because they
           | don't need the performance of 5nm lines or can't justify the
           | mask costs.
           | 
           | Alternatively: they're car-scale products dealing primarily
           | with high electric currents (10s or 100s of milliamps) and/or
           | higher voltages (5V instead of 1.3V).
           | 
           | Smaller chips use (and therefore output) less current than
           | larger scale chips. But if your goal is to output 10mA to
           | better drive an IGBT or other transistor anyway, then you
           | really prefer 40nm to 130nm ANYWAY, because those larger
           | sizes are just a lot better at moving those large currents
           | around.
           | 
           | Bigger wires mean bigger currents.
        
             | bravo22 wrote:
             | High voltage MOSFETs and IGBTs are built on a completely
             | different process. Size is definitely not an issue with
             | them. It is about exotic doping to create the desired
             | characteristics.
             | 
             | They're built using much larger feature sizes but on
             | completely separate lines.
        
               | dragontamer wrote:
               | I'm not really in the industry. But I know that high-
               | voltage MOSFETs / IGBTs need substantial amounts of
               | current to turn on / off adequately. Under typical use,
               | there's a dedicated chip called a "Gate Driver" that
               | provides that current, between a microcontroller and the
               | IGBT.
               | 
               | Its not that the IGBT / MOSFETs are built on these
               | microcontrollers. Its that the Gate-Driver can be
               | integrated into a microcontroller (simplifying the
               | circuit design and reducing the number of parts you need
               | to buy).
               | 
               | Under normal circumstances, a microcontroller can
               | probably source/sink 1mA (too little to adequately turn
               | on an IGBT). You amplify the 1mA with a gate-driver chip
               | into 100mA, and then the amplified 100mA is used to turn
               | on/off the IGBT.
               | 
               | By integrating a gate-driver into the microcontroller,
               | you save a part.
        
           | variaga wrote:
           | Your point is valid, but this is almost certainly a shuttle
           | run, so it won't be even one full wafer.
        
             | bravo22 wrote:
             | You're right. Definitely a "hot" wafer for the engineering
             | samples.
        
         | monocasa wrote:
         | ST fabs their own chips. If their fabs don't have the capacity,
         | it's a huge slog to tape them out to a radically different
         | process at another company.
        
         | Kliment wrote:
         | This is an extremely low volume prototype run. You can get
         | those scheduled on short notice. Fabs love them because they
         | can do process optimization using them, without impacting
         | production customers. They're ridiculously expensive per-die
         | and you commit to accept a much higher failure rate than
         | normal.
         | 
         | ST can and is making microcontrollers. It's just that they've
         | sold their production for a year ahead, before it's even been
         | manufactured. Car companies fucked everyone over by flipping a
         | large volume of orders back and forth causing bullwhip effect
         | on the whole industry, and lots of knock-on effects in other
         | industries who suddenly got told (occasionally too late) that
         | they need to plan their inventory a year ahead because they
         | can't get anything at short notice anymore. Car companies
         | vehicle production volume is tens of millions, but each vehicle
         | has thousands to tens of thousands of ICs. The six months you
         | are mentioning are not the capacity period, they are the _lead
         | times_ involved.
         | 
         | I don't want to repeat the whole story but I wrote a comment
         | about this on another thread. See
         | https://news.ycombinator.com/item?id=26659709
        
           | jankeymeulen wrote:
           | Thousands to tens of thousands per car? I think you're off by
           | an order of magnitude.
        
             | rowanG077 wrote:
             | What? You think it's ten thousands to hundred thousand.
             | Hundred thousand seems excessive to me.
        
             | buildbot wrote:
             | I know a typical Mercedes has roughly a hundred individual
             | computers, not too far reached to think the average chip
             | count could be 10 or higher per device on the can bus.
        
             | mschuster91 wrote:
             | Almost everything in a car has a _number_ of chips. Power
             | regulations, communication buses... and in electric cars
             | with thousands of batteries, _at least_ one chip per
             | battery for protection.
        
               | osamagirl69 wrote:
               | This is blatantly false, unless you are confusing battery
               | for an assembled battery pack. In EVs each battery
               | management IC can run somewhere in the range of 4-14
               | cells in series per chip, and they almost universally run
               | banks of up to 100 cells in parallel. For example, in the
               | tesla model s the pack is comprised into submodules of 76
               | cells in parallel and 6 of those groups in series per
               | management chip--so only one management chip per 456
               | cells.
        
               | dragontamer wrote:
               | Electric cars have ONE battery with thousands of *cells*.
               | I do realize that the colloquial term for "cell" is
               | "battery" (ex: an AA cell is called a battery), but it
               | becomes important to be precise with our words when
               | talking about manufacturing.
               | 
               | Small scale Li-ion does a protection-IC per cell (ex:
               | cell phones), mostly because cell phones are so small
               | they only use one cell.
               | 
               | Larger scale Li-ion, such as Laptop batteries, may use
               | one-IC per cell, OR one-protection IC for all 3x or 4x
               | cells combined. As long as all the cells are soldered
               | together, one protection IC is cheaper and still usable.
               | 
               | At electric-car scales, you have thousands-and-thousands
               | of cells. You can't just manage all of them with one IC,
               | so you build an IC per bundle. Maybe 48 cells or
               | 100-cells per IC or so.
        
               | mschuster91 wrote:
               | Indeed yes I meant cells, I'm not a native English
               | speaker.
               | 
               | > At electric-car scales, you have thousands-and-
               | thousands of cells. You can't just manage all of them
               | with one IC, so you build an IC per bundle. Maybe 48
               | cells or 100-cells per IC or so.
               | 
               | Ah okay, I had more expected something on the order of 1
               | IC per 4 cells to allow individual cell health
               | monitoring.
        
               | dragontamer wrote:
               | > Indeed yes I meant cells, I'm not a native English
               | speaker.
               | 
               | You're doing fine. Native English speakers don't know the
               | difference between cell or battery either. This is more
               | of a precise / technical engineering distinction.
               | 
               | * 9V Battery (https://imgur.com/FHJdhIK), a collection of
               | 6x cells.
               | 
               | * AAAA Cell (one singular chemical reaction of 1.5V)
               | 
               | Notice that the imgur is wrong: they call it a AAAA
               | battery (when the proper term is a AAAA cell).
               | 
               | --------
               | 
               | "Battery" is a bunch of objects doing one task.
               | Originally, a "battery" described cannons. Or two rooks
               | (in chess) that work together. Or... 6x 1.5V cells
               | working together to produce a 9V battery.
        
           | ohazi wrote:
           | > Fabs love them because they can do process optimization
           | using them, without impacting production customers.
           | 
           | I didn't realize that, but it makes a lot of sense. I assumed
           | that they acted more like the downstream manufacturers that
           | I'm used to dealing with, that don't even want to talk to you
           | unless they think you're going to place a huge order.
        
       | winter_blue wrote:
       | HBM might be an interesting idea. I would love to see multiple
       | bandwidth levels of memory becoming a norm, with computers a very
       | fast small amount of memory, and a larger set of DRR4 or DRR5. We
       | already have multiple levels of cache, why not having multiple
       | levels of RAM? Operating systems and software would need to
       | accommodate a new reality where NUMA is the norm though. But it's
       | good that we even have the concept of NUMA, so this is not
       | entirely uncharted/unfamiliar territory.
        
         | wmf wrote:
         | You would love to see computers become harder to program?
        
           | makapuf wrote:
           | It canbe nice to have the opportunity to program something
           | harder but faster. Counter example: Itanium, which was too
           | hard to program (compilers) for.
        
             | sanxiyn wrote:
             | It is kind of ironic that compiler theory has advanced and
             | now we can target Itanium no problem. It was a bit (well, a
             | lot) ahead of its time.
        
           | winter_blue wrote:
           | I would try to build a new compiler (or a LLVM intermediary
           | processing layer) that does NUMA optimizations.
        
       | [deleted]
        
       ___________________________________________________________________
       (page generated 2021-04-16 22:01 UTC)