[HN Gopher] How Does an FPGA Work? ___________________________________________________________________ How Does an FPGA Work? Author : sph Score : 146 points Date : 2023-05-03 17:11 UTC (5 hours ago) (HTM) web link (learn.sparkfun.com) (TXT) w3m dump (learn.sparkfun.com) | nuancebydefault wrote: | It seems that operations on FPGAs can run much more efficiently | than their cpu equivalent. For an 'AND' operation, a cpu needs to | load code and data from a memory into registers, run the logic | and write the result register back to some memory. This while | filling up the pipeline for subsequent operations. | | The FPGA on the other hand has the output ready one clock cycle | after the inputs stream in, and can have many such operations in | parallel. One might ask, why are cpus not being replaced by | FPGAs? | | Another interesting question, can software (recipes for cpus) be | transpiled to be efficiently run on FPGAs? | | I could ask GPT those questions, but the HN community will | provide more insight I guess. | pfyra wrote: | > Another interesting question, can software (recipes for cpus) | be transpiled to be efficiently run on FPGAs? | | Yes. At least for c and c++. It is called High Level Synthesis. | Lramseyer wrote: | These are really good questions to be asking, and to help with | that let's consider 3 attributes of compute complexity: time, | space, and memory | | The traditional way of computing on a CPU is in essence a list | of instructions to be computed. These instructions all go to | the same place (the CPU core) to be computed. Since the space | is constant, the instructions are computed sequentially in | time. Most programmers aren't concerned with redesigning a CPU, | so we typically only think about computing in time (and memory | of course) | | On an FPGA (and custom silicon) the speedup comes from being | able to compute in both time and space. Instead of your | instructions existing in memory, and computed in time, they can | be represented in separate logic elements (in space) and they | can each do separate things in time. So in a way, you're | trading space for time. This is how the speed gains are | achieved. | | Where this all breaks down is the optimization and scheduling. | A sequential task is relatively easy to optimize since you're | optimizing in time (and memory to an extent.) Scheduling is | easy too, since, they can be prioritized and queued up. | However, when you're computing in space, you have to optimize | in 2 spatial dimensions and in time. When you have multiple | tasks that that need to be completed, you then need to place | them together and not have them overlap. | | Think trying to fit a ton of different shaped tiles on a table, | where you need to be constantly adding and removing tiles in a | way that doesn't disrupt the placement of other tiles (at least | not too often.) It's kind of a pain, but for some more | constrained problem sets, it might make sense. | | These aren't impossible problems, and for some tasks, the time | or power usage savings is worth the additional complexity. But | sequential optimization is way easier, and good enough for most | tasks. However, if our desire for faster computing outpaces our | ability to make faster CPUs, you may see more FPGAs doing this | sort of thing. We already have FPGAs that are capable of | partial reconfiguration, and some pretty good software tools to | go along with it. | | TL;DR: Geometry is hard. | toast0 wrote: | > The FPGA on the other hand has the output ready one clock | cycle after the inputs stream in, and can have many such | operations in parallel. One might ask, why are cpus not being | replaced by FPGAs? | | FPGAs are more or less a flexible replacement for an | application specific (logic level) integrated circuit. A CPU | can do a wide variety of tasks, with a small penalty for | switching tasks. An ASIC can do one thing and that's it, a FPGA | can do many things, but with a large penalty for task | switching. (you can have a CPU as an ASIC or an FPGA, but...). | ASICs require a lot of upfront design work and costs, so you | can't use them for everything. ASICs and especially CPUs tend | to be able to achieve a higher clock speed that FPGAs, but it | kind of depends. | | > Another interesting question, can software (recipes for cpus) | be transpiled to be efficiently run on FPGAs? | | Not really; the way problems are solved is drastically | different, and I'd expect most things would need to be | reconceptualized to fit. And a lot of software isn't really | suited to living as a logic circuit. Exceptions would be | encoding, compression, encryption, the inverses of all of | those, signal processing, etc. Things where you have a data | pipeline and 'the same thing' happens to all the data. | jcranmer wrote: | FPGAs are the next big frontier for software development, and | have been since the '90s, they just need the programming model | worked out. This is the traditional story told about FPGAs, but | GPGPU programming suddenly overtaking FPGA development about | 2010 despite their awkward programming models makes that story | rather suspect. The thing is, a lot of the benefits of FPGAs | are really best-case scenarios, and when you move to more | typical scenarios, their competitiveness as an architecture | dwindles dramatically. | | Pipelining on an FPGA requires being able to find, and fill, | spatial duplication of the operations being done. If you've got | conditional operations in a pipeline, now your pipeline isn't | so full anymore, and this hurts performance on an FPGA far more | than on a CPU (which spends a lot of power trying to keep its | pipelines full). But needing to keep the pipelines spatially | connected also means you have to be able to find a physical | connection between the two stages of a pipeline, and the | physical length of that connection also imposes limitations on | the frequency you can run the FPGA at. | | If you care about FLOPS (or throughput in general), the problem | with FPGAs is that they are running at a clock speed about a | tenth of a CPU. This requires a 10x improvement in performance | just to stand still; given that software development for FPGAs | requires essentially a completely different mindset than for | CPUs or even GPUs, it's not common to have use cases that work | well on FPGAs. | | (I should say that a lot of my information about programming | FPGAs comes from ex-FPGA developers, and the "ex-" part will | certainly have its own form of bias in these opinions). | davemp wrote: | Yeah I don't really see FPGAs ever making their way down to | consumers the way GPUs and CPUs have (end users actually | programming them). | | For (semi) fixed pipeline operations FPGAs will basically | always be worse than some slightly more specialized ASIC like | a GPU/AI engine. | | One area FPGAs can be exceptionally good at is real-time | operations. You have much better control over timing in the | general on FPGAs vs MCU/CPUs, but I don't think that's | inherent (you could probably alter the mcu architecture a bit | and close the gap). | | I could be wrong but I also think you get better power draw | for things like mid to low volume glue chips in embedded | systems because you're not powering big SRAM banks and DMAs | just to pipe data between a couple hardware interfaces. This | is only because of market forces though obviously, because if | mid to low volume ASICs become viable in terms of dev time | they'll be much better. | pjc50 wrote: | > One might ask, why are cpus not being replaced by FPGAs? | | Most of the time you want data-dependent execution. FPGA | systems excel at "fixed pipeline" systems, where you have e.g. | an audio filter chain .. but even that is usually done in | efficient DSP CPUs. | | > Another interesting question, can software (recipes for cpus) | be transpiled to be efficiently run on FPGAs? | | A _subset_ can. Things like recursion are right out. Various | companies have tools to do this, but you usually end up having | to rework either the source you 're feeding them, or the HDL | output. | burnished wrote: | They both use the same kind of components; the FPGA does not | have a speed advantage, you are simply comparing the speed of a | very simple circuit element to the speed of a very complicated | pipeline. | | You would use an FPGA to simulate a special purpose circuit, | which would be faster than a CPU for its specific purpose. We | have CPUs because having a general purpose processing chip is | incredibly handy when you want to be able to do more than one | thing. | | EDIT: I forgot to mention that the device outputs in one clock | cycle by definition: if your clock is too fast then your | components output signals dont have time to stabilize and you | will get read errors, so you ensure your clock is slow enough | for everything to stabilize. | JackSlateur wrote: | For the same reasons we do not replace CPUs with GPUs: not the | right tool | | Check out the instruction set of modern CPUs | convolvatron wrote: | one big problem is memory. basic cpus have alot of facilities | for high-speed synchronous interface with DRAM, and truly vast | amount of resource for cache. | | partially as a result, a good model for compiling code to fpgas | uses a dataflow paradigm, since we don't need to serialize all | operations through a memory fetch, cache, or even register | file. | | if we hadn't decided to move all our computing to the cloud, I | suspect fpga accelerator boards for applications which map well | to that model would have some traction in specialized areas. | signal processing is definitely one such. | quadrature wrote: | >One might ask, why are cpus not being replaced by FPGAs? | | they do sometimes !, for very specific applications. The | problem is that an FPGA is programmed for one specific task and | would have to be taken offline and reprogrammed if you wanted | to do something else with it. Its not general purpose like a | CPU where you can load up any program and have it run. | | Programming an FPGA is also comparatively much harder to reason | about than a CPU because of the parallelism and timing you | described. | MSFT_Edging wrote: | Some of the more modern Xilinx stuff has features where you | don't need to take down the whole FPGA to reload a bitstream | onto part of the chip. Its really neat, you can do live | reprogramming of one component and leave the others alone or | have an A/B setup where one updates while the other is | unchanged. | JohnFen wrote: | Yes, I'm working on a Xilinx ARM processor with an FPGA. | The FPGA and the CPU are independent units in the chip that | can each operate with or without the other. We can indeed | reprogram the FPGA without taking the system down. | davemp wrote: | It goes even further. You can partially reconfigure the | FPGA fabric itself: | https://support.xilinx.com/s/article/34924?language=en_US | quadrature wrote: | That is really cool, hadn't heard of that before. | barelyauser wrote: | What is simpler: making logical circuit "A" or making a circuit | that emulates logical circuit "A" and its relatives? | markx2 wrote: | If anyone in unaware you can buy the very impressive Pocket. | https://www.analogue.co/pocket | | The current list of what it can do with FPGA is listed here - | https://openfpga-cores-inventory.github.io/analogue-pocket/ and | the inevitable sub-reddit is a good resource. | https://old.reddit.com/r/AnaloguePocket/ | gchadwick wrote: | There's also the MiSTer project: https://github.com/MiSTer- | devel/Wiki_MiSTer/wiki. Not hand-held (yet...) and hardware is | less slick but a bunch more systems and also fully open source. | phendrenad2 wrote: | MiSTer makes me kind of sad, the DE10-nano board it's based | on is 7 years old at this point, and the actual FPGA chip on | the board is probably over twice as old as that. And this is | still the peak of hobby FPGA chips. I wonder why Moore's Law | is hitting the FPGA industry particularly hard all of a | sudden. | willis936 wrote: | There are better FPGA options, they're just more expensive. | The DE-10 Nano was strategically chosen as "powerful enough | to meet most wants while still being within a reasonable | budget". | | No one's going to plunk down $10k for a 19 EV Zynq | UltraScale+ with 1.1M LEs, but they will spend $200 on a | Cylcone V with 210k LEs. | MrHeather wrote: | The article says FPGAs are too power hungry for handheld | devices. Did Analogue do anything special to solve this problem | on the Pocket? | agg23 wrote: | That's honestly not true at all; it all just depends on your | platform. On the Pocket, the FPGA _is_ the processor (there | are actually two FPGAs, one for the actual emulation core, | and one for scaling video, and there's technically a PIC | microcontroller for uploading bitstreams and managing UI). | The FPGAs are still not much power compared to the display | itself. With the in-built current sensor on the dev kits, the | highest we've measured drawn by the main FPGA is ~300mAh. Now | this sensor isn't going to be the best measurement, but it's | something to go off of. | eulgro wrote: | > ~300 mAh | | mA? You're not very convincing here. | WhiteDawn wrote: | Personally I think this is the biggest selling feature of | FPGA based emulation. | | The reality is both Software and FPGA emulation can be done | very well and with very low latency, however to achieve | this in software you generally require high end power | hungry hardware. | | A steam deck can run a highly accurate sega genesis | emulator with read-ahead rollback, screen scaling, shaders | and all the fixings no problem, but in theory the pocket | can provide the exact same experience with an order of | magnitude less power. | | It's not quite apples to oranges of course, but the | comfortable battery life does make the pocket much more | practical. | agg23 wrote: | When being nitpicky about latency is where FPGAs truly | shine. You lose a good bit of it by connecting to HDMI (I | think the Pocket docked is 1/4 a frame, and MiSTer has a | similar mode) (EDIT: MiSTer can do 4 scanlines, but it's | not compatible with some displays), but when we're | talking about analog display methods or inputs, you can | achieve accurate timings with much less effort than on a | modern day computer. | | For a full computer like the Steam Deck, you have to deal | with preemption, display buffers, and more, which _will_ | add latency. Now if you went bare metal, you could | definitely drive a display with super low latency, | hardware accurate emulation, but obviously that's not | what most people are doing. | agg23 wrote: | Not to draw attention to myself or anything, but if you're | interested in learning to make cores for the Analogue Pocket or | MiSTer (or similar) platforms, I highly recommend taking a look | at the resources and wiki I'm slowly building - | https://github.com/agg23/analogue-pocket-utils/ | | I started ~7 months ago with approximately no FPGA or hardware | experience, have now ported ~6 cores from MiSTer to Pocket, and | just released my first core of my own, the original Tamagotchi | - https://github.com/agg23/fpga-tamagotchi/ | | If you want to join in, I and several other devs are very | willing to help talk you through it. We primarily are on the | FPGAming Discord server - https://discord.gg/Gmcmdhzs - which | is probably the best place to get a hold of me as well. | jonny_eh wrote: | I also recommend the official dock. It basically turns it into | an easy to use Mister. | sph wrote: | My mind is blown but I'm also wondering if this isn't some kind | of incredible over-engineering? Surely CPUs are fast enough to | emulate these kind of devices in software. If they aren't, they | must be an order of magnitude simpler in complexity. | | I wouldn't ordinarily care about emulators, but actual hardware | emulators is the craziest thing I've heard in a while. All that | for a small handheld console? | | If only I was not so broke... | lprib wrote: | Sure it would probably be cheaper to chuck a cortex-A* or | similar mid-range MCU in there. One advantage of FPGAs that | it can achieve "perfect" emulation of a Z80 (or other) since | it's running on the logic gate level. No software task | latency, no extra sound buffering, etc. It can re-create the | original clock-per-clock. | arein3 wrote: | It's impressive as well | agg23 wrote: | Software is orders of magnitude simpler in complexity, yes. | The difference between a software emulator and a logic level | emulator are immense. | | But take the example of the difficulties with a software NES | emulator: | | In hardware, there is one clock that is fed into the 3 main | disparate systems; the CPU, APU (audio), and PPU (picture). | They all use different clock dividers, but they're still fed | off of the same source clock. Each of these chips operate in | parallel to produce the output expected, and there's some | bidirectional communication going on there as well. | | In a software emulator, the only parallel you get is on | multiple cores, but you can approximate it with threading | (i.e. preemption). For simplicity, you stick with a single | thread. You run 3 steps of the PPU at once, then one step of | the CPU and APU. You've basically just sped through the first | two steps, because who will notice those two cycles? They | took no "real" time, they were performed as fast as the | software could perform them. Probably doesn't matter, as no | one could tell that for 10ns, this happened. | | You need to add input. You use USB. That has a minimum | polling interval of 1000Hz, plus your emulator processing | time (is it going to have to go in the "next frame" packet?), | but controls on systems like the NES were practically | instantly available the moment the CPU read. | | Now you need to produce output. You want to hook up your | video, but wait, you need to feed it into a framebuffer. | That's at least one frame of latency unless you're able to | precompute everything for the next frame. Your input is | delayed a frame, because it has to be fed into the next | batch, the previous batch (for this frame) is already done. | You use the basis of 60fps (which is actually slightly wrong) | to time your ticking of your emulator. | | Now you need to hook up audio. Audio must go into a buffer or | it will under/overflow. This adds latency, and you need to | stay on top of how close you are to falling outside of your | bounds. But you were using FPS for pacing, so now how to you | reconcile that? | | ---- | | Cycle accurate and low latency software solutions are | certainly not easy, and it's impossible for true low latency | on actual OS running CPUs. Embedded-style systems with RTOSes | might be able to get pretty close, but it's still not going | to be the same as being able to guarantee the exact same (or | as near as we can tell) timing for every cycle. | | I want to be clear that none of these hardware | implementations are actually that accurate, but they could | be, and people are working hard to improve them constantly | rtkwe wrote: | The benefit of FPGAs is you can get nearly gate perfect | emulation of an old games system. We've had emulators for | years that get most things right but some games and minor | things in old games require specific software patches to | ensure the odd why they used the chips available produces the | same output. There's a great old article from 2011 about the | power required at the time to get a nearly perfect emulation | of a NES. [0] The goal with the Pocket and all of Analogue's | consoles isn't to be just another emulation machine but to | run as close as possible to the original at a hardware level. | That's their whole niche, hardware level 'emulation' of old | consoles. | | [0] https://arstechnica.com/gaming/2011/08/accuracy-takes- | power-... | Waterluvian wrote: | Emulating "accurately" is so difficult that not even | Nintendo's Game Boy emulator on the Switch does it properly. | I've been replaying old games and comparing some questionable | moments with my original Game Boy, and the timings are not | quite right in some cases. | | For example in Link's Awakening, there's a wiggle screen | effect done by writing to OAM during HBlank. On the Switch it | lags very differently than my GB (try it by getting into the | bed where you find the ocarina). Or with Metroid 2, the sound | when you kill an Omega Metroid is different too. It pitch | shifts along with the "win" jingle. | | These have almost zero impact on playability. But for purists | and emudevs it's a popular pursuit. | photochemsyn wrote: | Here's a nice series that picks up where this one leaves off | (shows how flip-flop/LUT units are organized into cells inside a | PLB, programmable logic block). It also is the first step in a | tutorial on using Verilog, building a hardware finite state | machine, and eventually a RISC-V processor on a FPGA: | | https://www.digikey.com/en/maker/projects/introduction-to-fp... | user070223 wrote: | From my understanding | | FPGA doesn't have the instruction pipeline as the command is | encoded in the gates themselves. It means that on runtime the | FPGA is not turing complete[0] as opposed to the CPU[1]. | | There is a phrase "data is code and code is data" in security | context. The new saying if FPGA would ever replace cpus' as the | main computation hardware(as you don't need turing complete when | you keep using the same apps[microservices]) is something like | "code is execution and execution is code" as you imprint the code | in the gates. It would get rid of a whole class/subclass of | memory safety vulernabilitie. | | This paradigm change is like what webassembly did to the web. The | slogan should be "make the bitstream go mainstream" Some made a | demo running wasm on fpga[1], not sure if using a cpu or directly | | of course you move complexity to compiling, and increase loading | speed, all for order of magnitude faster execution | | Companies devloped high level synthesis compilers but it's | diffcult and challenging as you need to synchronize parallel | excution piplines which you don't have to in cpu since it has | steady clock rate for each step in the pipeline | | A copmany named legup computing(acquired by microchip) compiled | memcached/redis applications to fpga and improved perfromance & | power efficency by an order of magnitude(10x) | | There are a lot of intellectual properties in hardware design as | opposed to software so tools and knowledge is scarce. | | If anyone works / want to work on this problem hit me up in the | comments | | [0] Unless you implement a cpu on top of the fpga :) | | [1] Assuming infinte memory, which is false, but good enough | | [2] https://github.com/denisvasilik/wasm-fpga | proto_lambda wrote: | > FPGA doesn't have the instruction pipeline as the command is | encoded in the gates themselves. It means that on runtime the | FPGA is not turing complete[0] as opposed to the CPU[1]. | | That obviously depends entirely on the circuit, many | sufficiently advanced circuits probably end up being | accidentally turing complete. | JohnFen wrote: | You can implement turing-complete CPUs in FPGA fabric. | proto_lambda wrote: | That's exactly what OP's footnotes say, yes. | jschveibinz wrote: | We used them for real time array signal processing and beam- | forming. They worked great. | y0ungarmanii wrote: | I saw various comments about how FPGAs are not ready for consumer | hardware, apple is using them in the airpod max already (probably | for filtering audio) | | Check the link below | https://www.ifixit.com/Teardown/AirPods+Max+Teardown/139369 | | They really excel for high throughput & low latency - which noise | canceling sounds like a good example of! In addition to this, | they are already being used in communication systems & data | centers to speed up latency sensitive computations. Edge AI seems | like a big market that they will be used for soon, probably more | likely b/c they can be flashed unlike ASICs and new NN | architectures drop every couple of years. | burnished wrote: | Neat. If the author is around, might I suggest pushing some of | the 'why use an FPGA' to the front? I think it would benefit from | a more concrete example motivating the use of an FPGA - like a | picture of some simple circuit using a seven segment display on a | broad board next to a picture of an FPGA implementing the same | circuit in order to make it more clear that it is a substitute | for putting experiments together by hand. I think it will help | newcomers better contextualize what is happening and why. | | I think in the same vein your wrap up of why you might want to do | something in hardware vs software is great and well placed. | | Hmmm, I guess now is as good a time as any to bumblefuck around | with small electronics projects for fun. Thanks for the reminder! | beardyw wrote: | > Neat. If the author is around, might I suggest pushing some | of the 'why use an FPGA' to the front? | | I think the problem is identifying cases where you really need | an FPGA. Most of the time you don't. | burnished wrote: | I suggest it purely for educational purposes. The first | struggle isn't identifying the best use case - its | understanding wtf is going on. Putting it in terms of | something more familiar is helpful for that. | | Your thing would make for a wonderful followup topic though. | cycomanic wrote: | What do you mean by "you". Maybe "you" as in a general | consumer don't need an FPGA, but I guess one could argue a | general consumer doesn't need a general purpose computer | either. | | There are certainly many use cases where you absolutely do | need an FPGA, i.e. anything were you need to process large | amount of IO in realtime. For example the guys from simulavr | (talk about how they use an FPGA for display correction) | here: https://simulavr.com/blog/testing-ar-mode-image- | processing/ | | Many modern devices would not function without FPGAs | JohnFen wrote: | > anything were you need to process large amount of IO in | realtime. | | I'm working on a FPGA-based system right now. We're using | an FPGA precisely because this is what we're doing -- about | a hundred I/O ports that have to be processed with as | little latency as possible. | beardyw wrote: | I think we can agree that this discussion does not involve | general consumers! | | "Many cases" is not the opposite of most cases. | kanetw wrote: | (SimulaVR dev) It's not wrong to say that in most cases, | tasks are better solved without an FPGA. But when you need | one you need one (or an ASIC if you have the volume and | don't need reconfigurability) | asdfman123 wrote: | This is meant to be an introduction though, right? You can | simply write "some people do X, and others claim Y is better" | then move on. | | I read several paragraphs of the article and I still don't | know why you'd use one, despite taking computer architecture | and analog electronics courses in undergrad. | | I don't want to read about logic gates again and I don't want | to read about the nuances before I broadly understand what | the point is. | | For anyone else still wondering, here's Wikipedia: | | > FPGAs have a remarkable role in embedded system development | due to their capability to start system software development | simultaneously with hardware, enable system performance | simulations at a very early phase of the development, and | allow various system trials and design iterations before | finalizing the system architecture. | | Basically, rapid prototyping I guess. That makes sense. | awjlogan wrote: | If that was an ask for a specific example, one of the most | common uses for FPGAs is DSPs. Say you have a simple FIR | filter of, say, 63 taps. To do this in a CPU requires you | to load two values and do a multiply/accumulate for each | tap in sequence. Very (!!) optimistically, that's about 192 | instructions. With an FPGA, you can do all the | multiplications in parallel and then just sum the outputs - | probably done in 2 cycles and with pipelining your | throughput could be a sample every clock. | | If the FPGA is too slow, too power inefficient etc you can | (if you have the money!) take the same core design and put | it in an ASIC. The FPGA provides an excellent prototyping | environment; in this example you can tune the filter | parameters before committing to a full ASIC. | pjc50 wrote: | > multiply/accumulate for each tap in sequence. Very (!!) | optimistically, that's about 128 instructions | | This is what all those vector instructions are for. | | FPGA is kind of invaluable if you have lots of streams | coming in at high megabit rates, though, and need to | preprocess down to a rate the CPU and memory bus can | handle. | awjlogan wrote: | Yes, indeed :) Didn't want to muddy the waters with | vector instructions, and it's fair to say that the | dedicated DSP chip market has been squeezed by FPGAs on | one side and vectorised (even lightly, like the | Cortex-M4/M7 DSP extension) CPUs on the other. | asdfman123 wrote: | Explain it to me like I'm your mom. | nfriedly wrote: | I've read that AMD's 7040-series mobile CPUs will have an "FPGA- | based AI engine developed by Xilinx" [1] - I'm wondering how | _programmable_ that will be. | | I know there's been some performance difficulties emulating the | PlayStation 3's various floating point modes. It's the kid of | thing that I think an on-chip FPGA could theoretically help with, | although I don't know if it'd be worth the trouble in this | specific case. (Or if AMD's implementation will be flexible | enough to help.) | | [1]: https://www.anandtech.com/show/18844/amd-unveils-ryzen- | mobil... | sph wrote: | Sadly the article doesn't go into details about how the | programmable RAM is wired to the actual logic gates, which seems | to me the most interesting and challenging part of designing an | FPGA. | | In my mediocre understanding of digital circuits, RAM is usually | addressable, so it has to be wired in a more direct manner to | enable such a design. | | I posted this article because someone mentioned some Ryzen chip | having an FPGA in another post, and I am now left wondering: | | 1. why don't we have more user-programmable FPGAs in our fancy | desktop mainboards | | 2. is there a SoC board, ARM or RISC-V based, with an FPGA on | board? The slower the CPU, the more useful an FPGA would be to | accelerate compute tasks | duskwuff wrote: | > Sadly the article doesn't go into details about how the | programmable RAM is wired to the actual logic gates | | Not sure what you mean by that. Do you mean how a RAM is used | as a lookup table to implement logic gates, how routing works, | or how block RAM is integrated into the FPGA fabric? | | > is there a SoC board, ARM or RISC-V based, with an FPGA on | board? | | Better yet, there are a number of FPGAs available with an ARM | SoC on board. Xilinx Zynq, Intel Cyclone V SoC, various others. | pjc50 wrote: | > RAM is usually addressable, so it has to be wired in a more | direct manner to enable such a design | | DRAM is necessarily a grid. | | SRAM, in e.g. the standard 6-transistor cell form, you can kind | of dump individual bits anywhere you need one. | | > why don't we have more user-programmable FPGAs in our fancy | desktop mainboards | | They tend to be horrifyingly expensive and there are few use | cases you can't outperform with a GPU or even just vector | instructions. Most of the interesting use cases for FPGAs are | when you have direct access to the pins and can wire them up to | high-speed signalling, which really isn't home user friendly. | | Also all the tooling is proprietary. | | > is there a SoC board, ARM or RISC-V based, with an FPGA on | board | | Buy a medium sized FPGA and download a CPU of your choice. | | (I have a downloadable-CPU-sized FPGA board on my desk for | testing not yet shipped ASIC designs. It costs about six | thousand dollars and has a 48-week lead time on Farnell) | sph wrote: | > Buy a medium sized FPGA and download a CPU of your choice. | | Damn, _of course_ one would be able to download a CPU and | "emulate it" in hardware. | | I never imagined that would be possible. Now I'm thinking | that I had infinite free time, I would buy an FPGA and design | a modern Lisp CPU. A RISC-V based design with native Lisp | support. Who needs hardware when you can just emulate it in | an FPGA. | | That's seriously cool technology. | MSFT_Edging wrote: | As for question 1, they're far more common in server grade | stuff where typically they are baked in. Consumer stuff just | doesn't need/use as much IO throughput and muxing that the FPGA | provides on say, a large networking switch. | | There are PCIe compatible FPGAs that you can plug into your | desktop like a graphics card to accelerate certain tasks. In | general though, our workstation hardware just isn't specialized | enough to require them, but can be extended to do so. If | something is a large enough business model, they'll just make | an ASIC. | aphedox wrote: | After Intel acquired Altera they released a series of x86 Xeon | chips with integrated FPGAs. Look up the Xeon 6138P. | wildzzz wrote: | Both Intel and Xilinx sell FPGAs with hard ARM cores inside so | you can run real Linux while being able to interface with | custom logic. Additionally, it's pretty common to create ARM, | RISC-V, or PowerPC soft cores in the FPGA when there is no hard | cores available. These mimic the real cores and will run | software while allowing for things like custom instructions | that can take advantage of the flexibility of FPGA fabric. The | Xilinx Zynq and Intel Cyclone V have options for hard ARM | cores. There are various designs of boards out there you can | buy that implement Arduino or Raspberry Pi shield | compatibility. The XUP PYNQ-Z2 supports both interfaces and | runs a Zynq-7000 with a real ARM core. | | You can do other things with soft cores that are not possible | with an off the shelf CPU like triple mode redundancy. This is | when you run a lot of the logic in triplicate and vote on the | results to prevent a bit flip from messing up the software. | This is common for space-based CPUs that are running on FPGAs. | It's expensive to design a new chip in a very small run so it's | much cheaper to just put the core on an off the shelf FPGA and | use the rest of the FPGA fabric for custom logic functions. | gchadwick wrote: | > Sadly the article doesn't go into details about how the | programmable RAM is wired to the actual logic gates, which | seems to me the most interesting and challenging part of | designing an FPGA. | | It does, that's the part under the 'Look-Up Tables' section. | The key is there aren't any actual logic gates just lots of | little RAMs. You implement an arbitrary blob of logic by having | the inputs form the address then the RAM gives the result of | the logical function. | stephen_g wrote: | Well, they do have some logic gates - usually the cells have | at least one flip flop, as well as the LUT. | roadbuster wrote: | > You implement an arbitrary blob of logic by having the | inputs form the address > then the RAM gives the result of | the logical function. | | This is incorrect. Modern FPGAs are composed of small, | configurable blocks which contain all sorts of logic. The | idea is that the configurable blocks can be (internally) | wired-up to implement your logic of choice. The wiring | configuration is "loaded" at power-on and retained in | memories within each, configurable block. | gchadwick wrote: | Well indeed modern FPGA fabric along with the various fixed | function blocks can be very complex, but this is a | beginners 'How Does an FPGA Work?' for which a bunch of | LUTs connected by programmable interconnect is a useful | approximation. | PragmaticPulp wrote: | > 1. why don't we have more user-programmable FPGAs in our | fancy desktop mainboards | | It has been tried, but GPUs are so fast and efficient enough | that it's rarely worth it. | | It's very easy to attach an FPGA to the PCIe bus as an add-in | card exactly like your GPU. In fact, many FPGA dev boards come | in exactly this format. They're available, they're just not in | demand. | | > 2. is there a SoC board, ARM or RISC-V based, with an FPGA on | board? The slower the CPU, the more useful an FPGA would be to | accelerate compute tasks | | Plenty of FPGA parts include ARM cores. It's a fairly standard | chip configuration. | | You can also connect an FPGA and an SoC with PCIe or other | interconnects. It's really not an obstacle. | | FPGAs just aren't very efficient from a cost or dev time | perspective for most applications. They're indispensable when | you need them, though. | rjsw wrote: | There are plenty of boards that have one of the combined ARM & | FPGA chips, Zynq (Xylinx/AMD) or Cyclone (Altera/Intel). | dddiaz1 wrote: | Another really cool use case for FPGAs is for ultra fast analysis | of genomic data. This guide walks you through setting up an F1 | instance (AWS FPGA) to do that: https://aws- | quickstart.github.io/quickstart-illumina-dragen/ | mpd wrote: | I really enjoyed the recent Hackerbox[0] featuring an FPGA. I'd | never worked with one prior to that. | | https://hackerboxes.com/collections/past-hackerboxes/product... | jokoon wrote: | So can a large FPGA be somehow used to brute force encryption? | | I don't really understand electronics to see if a GPU could be | faster than a FPGA, but my guess is yes? | | It seems that anything that can be programmed is inherently | slower than a FPGA equivalent doing the same task. | | Does larger enough key size always defeat a FPGA? | | I would guess that it becomes power and cost prohibitive for a | private company to deliver such possibility, but of course, a | large government entity like the NSA might have enough resource | to pay for enough FPGA to decrypt most things. | braho wrote: | Even though the FPGA fabric might encode the solution more | effectively, there are other important differentiators: clock | speed and memory bandwidth. GPUs have higher clock speeds and | typically better memory bandwidth (related of course). | | With the higher clock speed, GPUs can well outperform FPGAs for | many problems. ___________________________________________________________________ (page generated 2023-05-03 23:00 UTC)