[HN Gopher] SiFive Tapes Out First 5nm TSMC 32-bit RISC-V Chip w... ___________________________________________________________________ SiFive Tapes Out First 5nm TSMC 32-bit RISC-V Chip with 7.2 Gbps HBM3 Author : pabs3 Score : 144 points Date : 2021-04-15 06:29 UTC (1 days ago) (HTM) web link (www.tomshardware.com) (TXT) w3m dump (www.tomshardware.com) | zozbot234 wrote: | IIRC, 32-bit RISC-V is only intended for deep embedded workloads, | with 64-bit for general purpose compute. So a SoC w/ a single | 32-bit core would seem to be a less-than-ideal fit for the | cutting-edge 5nm process. | tyingq wrote: | The core is supposed to compete with the Cortex M7. The | smallest process M7 I can find is the STM32H7, which is 40nm. | makapuf wrote: | I rave for stm32 with high end processes (10nm or less), | whether that makes sense or not. I just love stm32.. | dragontamer wrote: | Routers / Switches have extremely weird performance | characteristics, and I think that's what SiFive is targeting | with this chip. | | * HBM3 for the highest memory bandwidth (10Gbps switches need | tons and tons of bandwidth. That's 10Gbps per direction per | connection, 8x ports is 160Gbps, and then that's multiplied | multiple times over by every memcpy / operation your chip | actually does. You need to DELIVER 160Gbps, which means your | physical RAM-bandwidth needs to be an order of magnitude | greater than that) | | * Embedded 32-bit design for low-power usage. | | * All switches have small, fixed size buffers. Memory capacity | is not a problem, its feasible to imagine useful switches and | routers (even 10Gbps, 40Gbps, or 100Gbps) that only have | hundreds-of-MBs of RAM. As such, 32-bit is sufficient and | 64-bit is a waste (You'd rather half your pointer memory | requirements with 32-bit pointers rather than go beyond 4GB | capacity) | GoblinSlayer wrote: | It's E76 with F set, and F set is huge compared to RV64I. And | the article proposes HPC as possible application. | rjsw wrote: | Routers need quite a bit of memory to handle IPv6. | | Switches as an application of this makes sense. | jandrese wrote: | IPv6 address comparison on a 32 bit design is fairly awkward. | Switches won't care, but routers need to make routing | decisions. | foobiekr wrote: | While these are all good points, this really does not appear | to be a competitive NPU design on any axis that matters. I | don't know what this chip is for, but a router NPU it is not, | nor a switch. Maybe some soho switch or smart NIC, but those | have moved on far along the performance spectrum away from | the place where this would fit. | zibzab wrote: | Yeah, this seems like an odd move to me. | | For this kinda of applications the static leakage of the newer | & smaller node will probably hurt rather than help. | [deleted] | justincormack wrote: | I think it means 32 bit floating point, not 32 bit CPU, as it | mentions "other relatively simplistic applications that do not | require full precision" but its a bit unclear. | phendrenad2 wrote: | The quote that stands out to me is that the core is "ideal | for applications which require high performance -- but have | power constraints (e.g., Augmented Reality and Virtual | Reality , IoT Edge Compute, Biometric Signal Processing, and | Industrial Automation)." | Fordec wrote: | With my industry / product management / business strategy hat | on, totally agree from SiFive's perspective. | | With my early days electronics hat on, the 5nm process adds | additional energy performance gains that in conjunction with | RISCV in an embedded environment, especially in a battery | powered remote operation use case, has me salivating at what | could be achieved from a would-be customer perspective. | volta83 wrote: | HBM2 is like 2Tb/s, how is HBM3 7GB/s ? | hajile wrote: | HBM3 wasn't just supposed to be about speed. It also offers a | 512-bit option that doesn't require a silicon interposer. I'd | guess this was added to make cheaper consumer GPU designs | possible. | | I suspect they're using the HBM2 spec for the narrow bus and | cheaper interposer while keeping speeds lower and only using a | couple stacks instead of the 16 or so HBM2 stacks required for | those 2Tb/s speeds you mention. It makes sense given that their | chip likely couldn't use a huge amount of bandwidth anyway. | virtuallynathan wrote: | I think that's per-Pin bandwidth? | vmception wrote: | HBM3 was expected to be like 4GB/s per pin which was seen as | double HBM2 per pin, so this is therefore almost even double | that, which is good news | | The HBM2 total memory bandwidth is like 2TB/s, just different | scale | | Anyway I could totally be using wrong nomenclature and | terminology, feel free to discuss, these aren't assertions or | aren't strongly held assertions | throwaway4good wrote: | What is the use case of this chip? I have the feeling it is some | way away from a general purpose CPU / SOC like the Apple M1? | 01100011 wrote: | RTFA? | | > The SoC can be used for AI and HPC applications and can be | further customized by SiFive customers to meet their needs. | Meanwhile, elements from this SoC can be licensed and used for | other N5 designs without any significant effort. | | > The SoC contains the SiFive E76 32-bit CPU core(s) for AI, | microcontrollers, edge-computing, and other relatively | simplistic applications that do not require full precision. | throwaway4good wrote: | So it is a proof of concept / demo of subcomponents someone | else may license? Is that a correct interpretation? | sanxiyn wrote: | Yes. | klelatti wrote: | How did SiFive get anywhere near 5nm TSMC? | baq wrote: | perhaps paid some money when the process wasn't booked till the | end of time | lizknope wrote: | They pay money just like any other customer of TSMC. SiFive has | a lot of buzz in the industry. I wouldn't be surprised that | TSMC wanted to work with them. | | But there are other intermediary companies that help startups | group multiple chips from multiple companies together into a | single mask. This is called a "shuttle" and allows the | companies to split the costs of the masks (I've heard up to $30 | million for 5nm) | | SiFive is probably building about 2,000 of these chips for | development boards. They aren't trying to order a hundred | million like Nvidia. | klelatti wrote: | Thanks that's very interesting. No intention in any way to | belittle SiFive - just puzzled as to how they managed to get | onto this process when it's obviously so much in demand. Good | for them! | RicoElectrico wrote: | For test chips there is something called shuttle. | | Other than that, foundries are known to sponsor IP development | on their processes. | snypher wrote: | "The tape out means that the documentation for the chip has | been submitted for manufacturing to TSMC, which essentially | means that the SoC has been successfully simulated. The silicon | is expected to be obtained in Q2 2021." | | Would this mean the actual chip delivery may still be delayed? | StringyBob wrote: | Chip manufacturing has many steps. For a new leading edge | process it may take 3-6 months to get silicon back after | submitting the design to a silicon foundry for manufacturing. | | For a small volume 'shuttle' run hopefully there won't be | delays, but this is not the same as having working chips! | | The foundry will do initial checks it is manufacturable at | 'tapeout' when you submit your design, but you don't know for | sure if your chip works with intended functionality until you | get it back! You are relying on lots and lots of simulations | up front before your 'tape-out'. | | Sometimes issues are found and a chip requires a re-spin - | basically another go with the bugs fixed. You want to do this | as few times as possible (ideally right first time) due to | cost and time of these iterations. | gumby wrote: | It's also in TSMC's marketing interest to product a small | number of RISC V parts with their latest process. | | Plus it's probably fun for some of the people there. | ohazi wrote: | I know they're separate lines and capacity is sold well in | advance and all that, but this chip shortage still baffles me. | | A startup can tape out a 5 nm chip, but STMicroelectronics can't | make any of their 40-130 nm microcontrollers for the next year? | | Also car companies are supposedly the culprit, even though their | volume is only in the low tens of millions per year, and the | dustup is apparently over only six months of capacity? What? I | get that the auto industry is a nice reliable long-term source of | revenue for chip companies, but fabs should barely be sneezing at | that sort of volume. | lizknope wrote: | I'm in the semiconductor company. | | I don't really understand your question. | | Anyone can start a company and tape out a chip even in 5nm. My | previous startup did something similar. We used an intermediate | company between us and TSMC that specifically works with | smaller companies. They (or TSMC) will bundle together 4 to 20 | chips into a common mask as a "shuttle" run. Shuttle runs are | really only used to get samples for the first version of your | chip. You can't really go to production with them because the | mask has chips from multiple different companies but this | allows all of the companies to share the mask costs (I've heard | up to $30 million for 5nm) | | What is ST Micro talking about? I assume they can produce chips | but can't get the volume that they want. SiFive are probably | producing about 2,000 of these chips for development and test | boards. ST Micro would be buying in the hundreds of millions or | tens of billions range. | bogomipz wrote: | >" Shuttle runs are really only used to get samples for the | first version of your chip." | | Is a "tape out" the same thing as a shuttle run/sample chip | run? | Kliment wrote: | a "tape out" is the process of transforming a design into a | physical die - i.e. a manufacturing run. It's when you hand | over a design to a foundry to do their thing with it. | zibzab wrote: | Sounds like OSH Park for silicon... | | Anyway, I'm still not sure why SiFive is doing this. Seems | like a waste of money even as a prototype | lizknope wrote: | The article mentions that is is from the OpenFive division | of SiFive. OpenFive used to be Open Silicon and their | business model was working with other companies to take | their Verilog RTL and do all of the physical design | (synthesis to logic gates, place and route of the standard | cells, timing analysis, test vector generation) and then | work with the foundries to deliver all of the data for | manufacturing. | | Since Open Silicon is now OpenFive and part of SiFive they | literally have all this experience in house and don't need | to depend on another company between them and TSMC. | | https://en.wikipedia.org/wiki/Open-Silicon | variaga wrote: | SiFive is in the business of selling IP cores and back-end | implementation services. The gold standard for IP core | validation is "silicon proven" i.e. that it's not just a | nice theoretical design on paper, but someone has actually | turned it into a physical chip and tested the real life | performance. | | _Lots_ of people will try to sell you their designs and | services. Picking the wrong ones can waste millions of | dollars and months /years of time. | | The money spent on this a prototype buys SiFive credibility | for both aspects of their business (assuming the chip | works) - "we were able to do this for ourselves, so you | know we'll be able to do it for you". | | So it's not a waste, it's a marketing expense, and a | necessary one. | varispeed wrote: | Out of curiosity - what software is being used to design | chips? Is there anything within reach of a small company, or | something open source? | thechao wrote: | Front-end is HDLs -- (System)Verilog, VHDL, etc. | Implementation and formal will be Jasper & its ilk. Backend | (physical, etc.) use fab-specific bespoke software from the | majors (Cadence, NXP, MG, Synopsis, ...). | | The front-end stuff could be done by _one person_ ; | Verilator is a great example (although it's now "in house" | to NXP). Implementation, LEC, etc. are mathematically | intimidating -- they're proof engines -- but doable by a | small team. | | Physical _requires_ inside knowledge of the fabs. The fabs | aren 't going to let you participate unless you're a major, | because it costs them a lot of money, and each additional | participant is another potential leak of their critical IP. | | The tooling is all "vertical" and starts on the backend. If | you can't do backend, you're not a player. | jecel wrote: | The commercial tools are indeed very expensive but the | required data files can be as much of a problem. Normally | you have to sign a bunch of NDAs (non disclosure | agreements) to get your hands on the design rules and | standard cell libraries supplied by the foundries and | required to make the tools work. | | One effort to organize several previously available open | source tools into a practical system is OpenLane, which is | based on the DARPA OpenRoad project: | | https://woset-workshop.github.io/PDFs/2020/a21.pdf | | Recently, Google has financed a project where a foundry has | made its data files available without any NDAs: | | https://github.com/google/skywater-pdk | | The combination has made it possible to have completely | open source chip designs. | PragmaticPulp wrote: | > Also car companies are supposedly the culprit, even though | their volume is only in the low tens of millions per year, and | the dustup is apparently over only six months of capacity? | What? I get that the auto industry is a nice reliable long-term | source of revenue for chip companies, but fabs should barely be | sneezing at that sort of volume. | | I agree. I think the blame on automakers has been blown out of | proportion. It doesn't make any sense that automakers cancelled | orders, then reinstated those orders again with some extra | demand, and now the entire chip market is stalled. | | It's most likely due to the fact that consumer demand is up | everywhere. The pandemic didn't hit the economy nearly as hard | as expected, and we piled a lot of stimulus on top of that. | Savings rate went up a bit, but much discretionary spending was | diverted away from things like dining out and toward buying | consumer goods. | | > STMicroelectronics can't make any of their 40-130 nm | microcontrollers for the next year | | They're almost certainly making huge volumes of | microcontrollers, but they're all spoken for with orders from | the highest bidders. | | We won't have inventory sitting on shelves again until fab | capacity isn't being 100% occupied by existing orders. Need | some surplus before we can get parts at DigiKey. | bravo22 wrote: | A lot of chips are made on mature fab lines because they don't | need the performance of 5nm lines or can't justify the mask | costs. | | No one is investing in mature fab lines because they're not | leading edge and they're being run to amortize the initial | investmnet made into them years ago. Therefore not much | additional capacity for mature lines. | | So yes you can see 5nm chips being taped out but the 40-130nm | chips are squeezed for capacity. Also this chip is likely not | running in the same crazy volumes that ST microcontrollers. It | is easier for TSMC to squeeze in a few dozen to a hundred | wafers for SiFive on their line. | dragontamer wrote: | > A lot of chips are made on mature fab lines because they | don't need the performance of 5nm lines or can't justify the | mask costs. | | Alternatively: they're car-scale products dealing primarily | with high electric currents (10s or 100s of milliamps) and/or | higher voltages (5V instead of 1.3V). | | Smaller chips use (and therefore output) less current than | larger scale chips. But if your goal is to output 10mA to | better drive an IGBT or other transistor anyway, then you | really prefer 40nm to 130nm ANYWAY, because those larger | sizes are just a lot better at moving those large currents | around. | | Bigger wires mean bigger currents. | bravo22 wrote: | High voltage MOSFETs and IGBTs are built on a completely | different process. Size is definitely not an issue with | them. It is about exotic doping to create the desired | characteristics. | | They're built using much larger feature sizes but on | completely separate lines. | dragontamer wrote: | I'm not really in the industry. But I know that high- | voltage MOSFETs / IGBTs need substantial amounts of | current to turn on / off adequately. Under typical use, | there's a dedicated chip called a "Gate Driver" that | provides that current, between a microcontroller and the | IGBT. | | Its not that the IGBT / MOSFETs are built on these | microcontrollers. Its that the Gate-Driver can be | integrated into a microcontroller (simplifying the | circuit design and reducing the number of parts you need | to buy). | | Under normal circumstances, a microcontroller can | probably source/sink 1mA (too little to adequately turn | on an IGBT). You amplify the 1mA with a gate-driver chip | into 100mA, and then the amplified 100mA is used to turn | on/off the IGBT. | | By integrating a gate-driver into the microcontroller, | you save a part. | variaga wrote: | Your point is valid, but this is almost certainly a shuttle | run, so it won't be even one full wafer. | bravo22 wrote: | You're right. Definitely a "hot" wafer for the engineering | samples. | monocasa wrote: | ST fabs their own chips. If their fabs don't have the capacity, | it's a huge slog to tape them out to a radically different | process at another company. | Kliment wrote: | This is an extremely low volume prototype run. You can get | those scheduled on short notice. Fabs love them because they | can do process optimization using them, without impacting | production customers. They're ridiculously expensive per-die | and you commit to accept a much higher failure rate than | normal. | | ST can and is making microcontrollers. It's just that they've | sold their production for a year ahead, before it's even been | manufactured. Car companies fucked everyone over by flipping a | large volume of orders back and forth causing bullwhip effect | on the whole industry, and lots of knock-on effects in other | industries who suddenly got told (occasionally too late) that | they need to plan their inventory a year ahead because they | can't get anything at short notice anymore. Car companies | vehicle production volume is tens of millions, but each vehicle | has thousands to tens of thousands of ICs. The six months you | are mentioning are not the capacity period, they are the _lead | times_ involved. | | I don't want to repeat the whole story but I wrote a comment | about this on another thread. See | https://news.ycombinator.com/item?id=26659709 | jankeymeulen wrote: | Thousands to tens of thousands per car? I think you're off by | an order of magnitude. | rowanG077 wrote: | What? You think it's ten thousands to hundred thousand. | Hundred thousand seems excessive to me. | buildbot wrote: | I know a typical Mercedes has roughly a hundred individual | computers, not too far reached to think the average chip | count could be 10 or higher per device on the can bus. | mschuster91 wrote: | Almost everything in a car has a _number_ of chips. Power | regulations, communication buses... and in electric cars | with thousands of batteries, _at least_ one chip per | battery for protection. | osamagirl69 wrote: | This is blatantly false, unless you are confusing battery | for an assembled battery pack. In EVs each battery | management IC can run somewhere in the range of 4-14 | cells in series per chip, and they almost universally run | banks of up to 100 cells in parallel. For example, in the | tesla model s the pack is comprised into submodules of 76 | cells in parallel and 6 of those groups in series per | management chip--so only one management chip per 456 | cells. | dragontamer wrote: | Electric cars have ONE battery with thousands of *cells*. | I do realize that the colloquial term for "cell" is | "battery" (ex: an AA cell is called a battery), but it | becomes important to be precise with our words when | talking about manufacturing. | | Small scale Li-ion does a protection-IC per cell (ex: | cell phones), mostly because cell phones are so small | they only use one cell. | | Larger scale Li-ion, such as Laptop batteries, may use | one-IC per cell, OR one-protection IC for all 3x or 4x | cells combined. As long as all the cells are soldered | together, one protection IC is cheaper and still usable. | | At electric-car scales, you have thousands-and-thousands | of cells. You can't just manage all of them with one IC, | so you build an IC per bundle. Maybe 48 cells or | 100-cells per IC or so. | mschuster91 wrote: | Indeed yes I meant cells, I'm not a native English | speaker. | | > At electric-car scales, you have thousands-and- | thousands of cells. You can't just manage all of them | with one IC, so you build an IC per bundle. Maybe 48 | cells or 100-cells per IC or so. | | Ah okay, I had more expected something on the order of 1 | IC per 4 cells to allow individual cell health | monitoring. | dragontamer wrote: | > Indeed yes I meant cells, I'm not a native English | speaker. | | You're doing fine. Native English speakers don't know the | difference between cell or battery either. This is more | of a precise / technical engineering distinction. | | * 9V Battery (https://imgur.com/FHJdhIK), a collection of | 6x cells. | | * AAAA Cell (one singular chemical reaction of 1.5V) | | Notice that the imgur is wrong: they call it a AAAA | battery (when the proper term is a AAAA cell). | | -------- | | "Battery" is a bunch of objects doing one task. | Originally, a "battery" described cannons. Or two rooks | (in chess) that work together. Or... 6x 1.5V cells | working together to produce a 9V battery. | ohazi wrote: | > Fabs love them because they can do process optimization | using them, without impacting production customers. | | I didn't realize that, but it makes a lot of sense. I assumed | that they acted more like the downstream manufacturers that | I'm used to dealing with, that don't even want to talk to you | unless they think you're going to place a huge order. | winter_blue wrote: | HBM might be an interesting idea. I would love to see multiple | bandwidth levels of memory becoming a norm, with computers a very | fast small amount of memory, and a larger set of DRR4 or DRR5. We | already have multiple levels of cache, why not having multiple | levels of RAM? Operating systems and software would need to | accommodate a new reality where NUMA is the norm though. But it's | good that we even have the concept of NUMA, so this is not | entirely uncharted/unfamiliar territory. | wmf wrote: | You would love to see computers become harder to program? | makapuf wrote: | It canbe nice to have the opportunity to program something | harder but faster. Counter example: Itanium, which was too | hard to program (compilers) for. | sanxiyn wrote: | It is kind of ironic that compiler theory has advanced and | now we can target Itanium no problem. It was a bit (well, a | lot) ahead of its time. | winter_blue wrote: | I would try to build a new compiler (or a LLVM intermediary | processing layer) that does NUMA optimizations. | [deleted] ___________________________________________________________________ (page generated 2021-04-16 22:01 UTC)