[HN Gopher] How Does an FPGA Work?
       ___________________________________________________________________
        
       How Does an FPGA Work?
        
       Author : sph
       Score  : 146 points
       Date   : 2023-05-03 17:11 UTC (5 hours ago)
        
 (HTM) web link (learn.sparkfun.com)
 (TXT) w3m dump (learn.sparkfun.com)
        
       | nuancebydefault wrote:
       | It seems that operations on FPGAs can run much more efficiently
       | than their cpu equivalent. For an 'AND' operation, a cpu needs to
       | load code and data from a memory into registers, run the logic
       | and write the result register back to some memory. This while
       | filling up the pipeline for subsequent operations.
       | 
       | The FPGA on the other hand has the output ready one clock cycle
       | after the inputs stream in, and can have many such operations in
       | parallel. One might ask, why are cpus not being replaced by
       | FPGAs?
       | 
       | Another interesting question, can software (recipes for cpus) be
       | transpiled to be efficiently run on FPGAs?
       | 
       | I could ask GPT those questions, but the HN community will
       | provide more insight I guess.
        
         | pfyra wrote:
         | > Another interesting question, can software (recipes for cpus)
         | be transpiled to be efficiently run on FPGAs?
         | 
         | Yes. At least for c and c++. It is called High Level Synthesis.
        
         | Lramseyer wrote:
         | These are really good questions to be asking, and to help with
         | that let's consider 3 attributes of compute complexity: time,
         | space, and memory
         | 
         | The traditional way of computing on a CPU is in essence a list
         | of instructions to be computed. These instructions all go to
         | the same place (the CPU core) to be computed. Since the space
         | is constant, the instructions are computed sequentially in
         | time. Most programmers aren't concerned with redesigning a CPU,
         | so we typically only think about computing in time (and memory
         | of course)
         | 
         | On an FPGA (and custom silicon) the speedup comes from being
         | able to compute in both time and space. Instead of your
         | instructions existing in memory, and computed in time, they can
         | be represented in separate logic elements (in space) and they
         | can each do separate things in time. So in a way, you're
         | trading space for time. This is how the speed gains are
         | achieved.
         | 
         | Where this all breaks down is the optimization and scheduling.
         | A sequential task is relatively easy to optimize since you're
         | optimizing in time (and memory to an extent.) Scheduling is
         | easy too, since, they can be prioritized and queued up.
         | However, when you're computing in space, you have to optimize
         | in 2 spatial dimensions and in time. When you have multiple
         | tasks that that need to be completed, you then need to place
         | them together and not have them overlap.
         | 
         | Think trying to fit a ton of different shaped tiles on a table,
         | where you need to be constantly adding and removing tiles in a
         | way that doesn't disrupt the placement of other tiles (at least
         | not too often.) It's kind of a pain, but for some more
         | constrained problem sets, it might make sense.
         | 
         | These aren't impossible problems, and for some tasks, the time
         | or power usage savings is worth the additional complexity. But
         | sequential optimization is way easier, and good enough for most
         | tasks. However, if our desire for faster computing outpaces our
         | ability to make faster CPUs, you may see more FPGAs doing this
         | sort of thing. We already have FPGAs that are capable of
         | partial reconfiguration, and some pretty good software tools to
         | go along with it.
         | 
         | TL;DR: Geometry is hard.
        
         | toast0 wrote:
         | > The FPGA on the other hand has the output ready one clock
         | cycle after the inputs stream in, and can have many such
         | operations in parallel. One might ask, why are cpus not being
         | replaced by FPGAs?
         | 
         | FPGAs are more or less a flexible replacement for an
         | application specific (logic level) integrated circuit. A CPU
         | can do a wide variety of tasks, with a small penalty for
         | switching tasks. An ASIC can do one thing and that's it, a FPGA
         | can do many things, but with a large penalty for task
         | switching. (you can have a CPU as an ASIC or an FPGA, but...).
         | ASICs require a lot of upfront design work and costs, so you
         | can't use them for everything. ASICs and especially CPUs tend
         | to be able to achieve a higher clock speed that FPGAs, but it
         | kind of depends.
         | 
         | > Another interesting question, can software (recipes for cpus)
         | be transpiled to be efficiently run on FPGAs?
         | 
         | Not really; the way problems are solved is drastically
         | different, and I'd expect most things would need to be
         | reconceptualized to fit. And a lot of software isn't really
         | suited to living as a logic circuit. Exceptions would be
         | encoding, compression, encryption, the inverses of all of
         | those, signal processing, etc. Things where you have a data
         | pipeline and 'the same thing' happens to all the data.
        
         | jcranmer wrote:
         | FPGAs are the next big frontier for software development, and
         | have been since the '90s, they just need the programming model
         | worked out. This is the traditional story told about FPGAs, but
         | GPGPU programming suddenly overtaking FPGA development about
         | 2010 despite their awkward programming models makes that story
         | rather suspect. The thing is, a lot of the benefits of FPGAs
         | are really best-case scenarios, and when you move to more
         | typical scenarios, their competitiveness as an architecture
         | dwindles dramatically.
         | 
         | Pipelining on an FPGA requires being able to find, and fill,
         | spatial duplication of the operations being done. If you've got
         | conditional operations in a pipeline, now your pipeline isn't
         | so full anymore, and this hurts performance on an FPGA far more
         | than on a CPU (which spends a lot of power trying to keep its
         | pipelines full). But needing to keep the pipelines spatially
         | connected also means you have to be able to find a physical
         | connection between the two stages of a pipeline, and the
         | physical length of that connection also imposes limitations on
         | the frequency you can run the FPGA at.
         | 
         | If you care about FLOPS (or throughput in general), the problem
         | with FPGAs is that they are running at a clock speed about a
         | tenth of a CPU. This requires a 10x improvement in performance
         | just to stand still; given that software development for FPGAs
         | requires essentially a completely different mindset than for
         | CPUs or even GPUs, it's not common to have use cases that work
         | well on FPGAs.
         | 
         | (I should say that a lot of my information about programming
         | FPGAs comes from ex-FPGA developers, and the "ex-" part will
         | certainly have its own form of bias in these opinions).
        
           | davemp wrote:
           | Yeah I don't really see FPGAs ever making their way down to
           | consumers the way GPUs and CPUs have (end users actually
           | programming them).
           | 
           | For (semi) fixed pipeline operations FPGAs will basically
           | always be worse than some slightly more specialized ASIC like
           | a GPU/AI engine.
           | 
           | One area FPGAs can be exceptionally good at is real-time
           | operations. You have much better control over timing in the
           | general on FPGAs vs MCU/CPUs, but I don't think that's
           | inherent (you could probably alter the mcu architecture a bit
           | and close the gap).
           | 
           | I could be wrong but I also think you get better power draw
           | for things like mid to low volume glue chips in embedded
           | systems because you're not powering big SRAM banks and DMAs
           | just to pipe data between a couple hardware interfaces. This
           | is only because of market forces though obviously, because if
           | mid to low volume ASICs become viable in terms of dev time
           | they'll be much better.
        
         | pjc50 wrote:
         | > One might ask, why are cpus not being replaced by FPGAs?
         | 
         | Most of the time you want data-dependent execution. FPGA
         | systems excel at "fixed pipeline" systems, where you have e.g.
         | an audio filter chain .. but even that is usually done in
         | efficient DSP CPUs.
         | 
         | > Another interesting question, can software (recipes for cpus)
         | be transpiled to be efficiently run on FPGAs?
         | 
         | A _subset_ can. Things like recursion are right out. Various
         | companies have tools to do this, but you usually end up having
         | to rework either the source you 're feeding them, or the HDL
         | output.
        
         | burnished wrote:
         | They both use the same kind of components; the FPGA does not
         | have a speed advantage, you are simply comparing the speed of a
         | very simple circuit element to the speed of a very complicated
         | pipeline.
         | 
         | You would use an FPGA to simulate a special purpose circuit,
         | which would be faster than a CPU for its specific purpose. We
         | have CPUs because having a general purpose processing chip is
         | incredibly handy when you want to be able to do more than one
         | thing.
         | 
         | EDIT: I forgot to mention that the device outputs in one clock
         | cycle by definition: if your clock is too fast then your
         | components output signals dont have time to stabilize and you
         | will get read errors, so you ensure your clock is slow enough
         | for everything to stabilize.
        
         | JackSlateur wrote:
         | For the same reasons we do not replace CPUs with GPUs: not the
         | right tool
         | 
         | Check out the instruction set of modern CPUs
        
         | convolvatron wrote:
         | one big problem is memory. basic cpus have alot of facilities
         | for high-speed synchronous interface with DRAM, and truly vast
         | amount of resource for cache.
         | 
         | partially as a result, a good model for compiling code to fpgas
         | uses a dataflow paradigm, since we don't need to serialize all
         | operations through a memory fetch, cache, or even register
         | file.
         | 
         | if we hadn't decided to move all our computing to the cloud, I
         | suspect fpga accelerator boards for applications which map well
         | to that model would have some traction in specialized areas.
         | signal processing is definitely one such.
        
         | quadrature wrote:
         | >One might ask, why are cpus not being replaced by FPGAs?
         | 
         | they do sometimes !, for very specific applications. The
         | problem is that an FPGA is programmed for one specific task and
         | would have to be taken offline and reprogrammed if you wanted
         | to do something else with it. Its not general purpose like a
         | CPU where you can load up any program and have it run.
         | 
         | Programming an FPGA is also comparatively much harder to reason
         | about than a CPU because of the parallelism and timing you
         | described.
        
           | MSFT_Edging wrote:
           | Some of the more modern Xilinx stuff has features where you
           | don't need to take down the whole FPGA to reload a bitstream
           | onto part of the chip. Its really neat, you can do live
           | reprogramming of one component and leave the others alone or
           | have an A/B setup where one updates while the other is
           | unchanged.
        
             | JohnFen wrote:
             | Yes, I'm working on a Xilinx ARM processor with an FPGA.
             | The FPGA and the CPU are independent units in the chip that
             | can each operate with or without the other. We can indeed
             | reprogram the FPGA without taking the system down.
        
               | davemp wrote:
               | It goes even further. You can partially reconfigure the
               | FPGA fabric itself:
               | https://support.xilinx.com/s/article/34924?language=en_US
        
             | quadrature wrote:
             | That is really cool, hadn't heard of that before.
        
         | barelyauser wrote:
         | What is simpler: making logical circuit "A" or making a circuit
         | that emulates logical circuit "A" and its relatives?
        
       | markx2 wrote:
       | If anyone in unaware you can buy the very impressive Pocket.
       | https://www.analogue.co/pocket
       | 
       | The current list of what it can do with FPGA is listed here -
       | https://openfpga-cores-inventory.github.io/analogue-pocket/ and
       | the inevitable sub-reddit is a good resource.
       | https://old.reddit.com/r/AnaloguePocket/
        
         | gchadwick wrote:
         | There's also the MiSTer project: https://github.com/MiSTer-
         | devel/Wiki_MiSTer/wiki. Not hand-held (yet...) and hardware is
         | less slick but a bunch more systems and also fully open source.
        
           | phendrenad2 wrote:
           | MiSTer makes me kind of sad, the DE10-nano board it's based
           | on is 7 years old at this point, and the actual FPGA chip on
           | the board is probably over twice as old as that. And this is
           | still the peak of hobby FPGA chips. I wonder why Moore's Law
           | is hitting the FPGA industry particularly hard all of a
           | sudden.
        
             | willis936 wrote:
             | There are better FPGA options, they're just more expensive.
             | The DE-10 Nano was strategically chosen as "powerful enough
             | to meet most wants while still being within a reasonable
             | budget".
             | 
             | No one's going to plunk down $10k for a 19 EV Zynq
             | UltraScale+ with 1.1M LEs, but they will spend $200 on a
             | Cylcone V with 210k LEs.
        
         | MrHeather wrote:
         | The article says FPGAs are too power hungry for handheld
         | devices. Did Analogue do anything special to solve this problem
         | on the Pocket?
        
           | agg23 wrote:
           | That's honestly not true at all; it all just depends on your
           | platform. On the Pocket, the FPGA _is_ the processor (there
           | are actually two FPGAs, one for the actual emulation core,
           | and one for scaling video, and there's technically a PIC
           | microcontroller for uploading bitstreams and managing UI).
           | The FPGAs are still not much power compared to the display
           | itself. With the in-built current sensor on the dev kits, the
           | highest we've measured drawn by the main FPGA is ~300mAh. Now
           | this sensor isn't going to be the best measurement, but it's
           | something to go off of.
        
             | eulgro wrote:
             | > ~300 mAh
             | 
             | mA? You're not very convincing here.
        
             | WhiteDawn wrote:
             | Personally I think this is the biggest selling feature of
             | FPGA based emulation.
             | 
             | The reality is both Software and FPGA emulation can be done
             | very well and with very low latency, however to achieve
             | this in software you generally require high end power
             | hungry hardware.
             | 
             | A steam deck can run a highly accurate sega genesis
             | emulator with read-ahead rollback, screen scaling, shaders
             | and all the fixings no problem, but in theory the pocket
             | can provide the exact same experience with an order of
             | magnitude less power.
             | 
             | It's not quite apples to oranges of course, but the
             | comfortable battery life does make the pocket much more
             | practical.
        
               | agg23 wrote:
               | When being nitpicky about latency is where FPGAs truly
               | shine. You lose a good bit of it by connecting to HDMI (I
               | think the Pocket docked is 1/4 a frame, and MiSTer has a
               | similar mode) (EDIT: MiSTer can do 4 scanlines, but it's
               | not compatible with some displays), but when we're
               | talking about analog display methods or inputs, you can
               | achieve accurate timings with much less effort than on a
               | modern day computer.
               | 
               | For a full computer like the Steam Deck, you have to deal
               | with preemption, display buffers, and more, which _will_
               | add latency. Now if you went bare metal, you could
               | definitely drive a display with super low latency,
               | hardware accurate emulation, but obviously that's not
               | what most people are doing.
        
         | agg23 wrote:
         | Not to draw attention to myself or anything, but if you're
         | interested in learning to make cores for the Analogue Pocket or
         | MiSTer (or similar) platforms, I highly recommend taking a look
         | at the resources and wiki I'm slowly building -
         | https://github.com/agg23/analogue-pocket-utils/
         | 
         | I started ~7 months ago with approximately no FPGA or hardware
         | experience, have now ported ~6 cores from MiSTer to Pocket, and
         | just released my first core of my own, the original Tamagotchi
         | - https://github.com/agg23/fpga-tamagotchi/
         | 
         | If you want to join in, I and several other devs are very
         | willing to help talk you through it. We primarily are on the
         | FPGAming Discord server - https://discord.gg/Gmcmdhzs - which
         | is probably the best place to get a hold of me as well.
        
         | jonny_eh wrote:
         | I also recommend the official dock. It basically turns it into
         | an easy to use Mister.
        
         | sph wrote:
         | My mind is blown but I'm also wondering if this isn't some kind
         | of incredible over-engineering? Surely CPUs are fast enough to
         | emulate these kind of devices in software. If they aren't, they
         | must be an order of magnitude simpler in complexity.
         | 
         | I wouldn't ordinarily care about emulators, but actual hardware
         | emulators is the craziest thing I've heard in a while. All that
         | for a small handheld console?
         | 
         | If only I was not so broke...
        
           | lprib wrote:
           | Sure it would probably be cheaper to chuck a cortex-A* or
           | similar mid-range MCU in there. One advantage of FPGAs that
           | it can achieve "perfect" emulation of a Z80 (or other) since
           | it's running on the logic gate level. No software task
           | latency, no extra sound buffering, etc. It can re-create the
           | original clock-per-clock.
        
             | arein3 wrote:
             | It's impressive as well
        
           | agg23 wrote:
           | Software is orders of magnitude simpler in complexity, yes.
           | The difference between a software emulator and a logic level
           | emulator are immense.
           | 
           | But take the example of the difficulties with a software NES
           | emulator:
           | 
           | In hardware, there is one clock that is fed into the 3 main
           | disparate systems; the CPU, APU (audio), and PPU (picture).
           | They all use different clock dividers, but they're still fed
           | off of the same source clock. Each of these chips operate in
           | parallel to produce the output expected, and there's some
           | bidirectional communication going on there as well.
           | 
           | In a software emulator, the only parallel you get is on
           | multiple cores, but you can approximate it with threading
           | (i.e. preemption). For simplicity, you stick with a single
           | thread. You run 3 steps of the PPU at once, then one step of
           | the CPU and APU. You've basically just sped through the first
           | two steps, because who will notice those two cycles? They
           | took no "real" time, they were performed as fast as the
           | software could perform them. Probably doesn't matter, as no
           | one could tell that for 10ns, this happened.
           | 
           | You need to add input. You use USB. That has a minimum
           | polling interval of 1000Hz, plus your emulator processing
           | time (is it going to have to go in the "next frame" packet?),
           | but controls on systems like the NES were practically
           | instantly available the moment the CPU read.
           | 
           | Now you need to produce output. You want to hook up your
           | video, but wait, you need to feed it into a framebuffer.
           | That's at least one frame of latency unless you're able to
           | precompute everything for the next frame. Your input is
           | delayed a frame, because it has to be fed into the next
           | batch, the previous batch (for this frame) is already done.
           | You use the basis of 60fps (which is actually slightly wrong)
           | to time your ticking of your emulator.
           | 
           | Now you need to hook up audio. Audio must go into a buffer or
           | it will under/overflow. This adds latency, and you need to
           | stay on top of how close you are to falling outside of your
           | bounds. But you were using FPS for pacing, so now how to you
           | reconcile that?
           | 
           | ----
           | 
           | Cycle accurate and low latency software solutions are
           | certainly not easy, and it's impossible for true low latency
           | on actual OS running CPUs. Embedded-style systems with RTOSes
           | might be able to get pretty close, but it's still not going
           | to be the same as being able to guarantee the exact same (or
           | as near as we can tell) timing for every cycle.
           | 
           | I want to be clear that none of these hardware
           | implementations are actually that accurate, but they could
           | be, and people are working hard to improve them constantly
        
           | rtkwe wrote:
           | The benefit of FPGAs is you can get nearly gate perfect
           | emulation of an old games system. We've had emulators for
           | years that get most things right but some games and minor
           | things in old games require specific software patches to
           | ensure the odd why they used the chips available produces the
           | same output. There's a great old article from 2011 about the
           | power required at the time to get a nearly perfect emulation
           | of a NES. [0] The goal with the Pocket and all of Analogue's
           | consoles isn't to be just another emulation machine but to
           | run as close as possible to the original at a hardware level.
           | That's their whole niche, hardware level 'emulation' of old
           | consoles.
           | 
           | [0] https://arstechnica.com/gaming/2011/08/accuracy-takes-
           | power-...
        
           | Waterluvian wrote:
           | Emulating "accurately" is so difficult that not even
           | Nintendo's Game Boy emulator on the Switch does it properly.
           | I've been replaying old games and comparing some questionable
           | moments with my original Game Boy, and the timings are not
           | quite right in some cases.
           | 
           | For example in Link's Awakening, there's a wiggle screen
           | effect done by writing to OAM during HBlank. On the Switch it
           | lags very differently than my GB (try it by getting into the
           | bed where you find the ocarina). Or with Metroid 2, the sound
           | when you kill an Omega Metroid is different too. It pitch
           | shifts along with the "win" jingle.
           | 
           | These have almost zero impact on playability. But for purists
           | and emudevs it's a popular pursuit.
        
       | photochemsyn wrote:
       | Here's a nice series that picks up where this one leaves off
       | (shows how flip-flop/LUT units are organized into cells inside a
       | PLB, programmable logic block). It also is the first step in a
       | tutorial on using Verilog, building a hardware finite state
       | machine, and eventually a RISC-V processor on a FPGA:
       | 
       | https://www.digikey.com/en/maker/projects/introduction-to-fp...
        
       | user070223 wrote:
       | From my understanding
       | 
       | FPGA doesn't have the instruction pipeline as the command is
       | encoded in the gates themselves. It means that on runtime the
       | FPGA is not turing complete[0] as opposed to the CPU[1].
       | 
       | There is a phrase "data is code and code is data" in security
       | context. The new saying if FPGA would ever replace cpus' as the
       | main computation hardware(as you don't need turing complete when
       | you keep using the same apps[microservices]) is something like
       | "code is execution and execution is code" as you imprint the code
       | in the gates. It would get rid of a whole class/subclass of
       | memory safety vulernabilitie.
       | 
       | This paradigm change is like what webassembly did to the web. The
       | slogan should be "make the bitstream go mainstream" Some made a
       | demo running wasm on fpga[1], not sure if using a cpu or directly
       | 
       | of course you move complexity to compiling, and increase loading
       | speed, all for order of magnitude faster execution
       | 
       | Companies devloped high level synthesis compilers but it's
       | diffcult and challenging as you need to synchronize parallel
       | excution piplines which you don't have to in cpu since it has
       | steady clock rate for each step in the pipeline
       | 
       | A copmany named legup computing(acquired by microchip) compiled
       | memcached/redis applications to fpga and improved perfromance &
       | power efficency by an order of magnitude(10x)
       | 
       | There are a lot of intellectual properties in hardware design as
       | opposed to software so tools and knowledge is scarce.
       | 
       | If anyone works / want to work on this problem hit me up in the
       | comments
       | 
       | [0] Unless you implement a cpu on top of the fpga :)
       | 
       | [1] Assuming infinte memory, which is false, but good enough
       | 
       | [2] https://github.com/denisvasilik/wasm-fpga
        
         | proto_lambda wrote:
         | > FPGA doesn't have the instruction pipeline as the command is
         | encoded in the gates themselves. It means that on runtime the
         | FPGA is not turing complete[0] as opposed to the CPU[1].
         | 
         | That obviously depends entirely on the circuit, many
         | sufficiently advanced circuits probably end up being
         | accidentally turing complete.
        
           | JohnFen wrote:
           | You can implement turing-complete CPUs in FPGA fabric.
        
             | proto_lambda wrote:
             | That's exactly what OP's footnotes say, yes.
        
       | jschveibinz wrote:
       | We used them for real time array signal processing and beam-
       | forming. They worked great.
        
       | y0ungarmanii wrote:
       | I saw various comments about how FPGAs are not ready for consumer
       | hardware, apple is using them in the airpod max already (probably
       | for filtering audio)
       | 
       | Check the link below
       | https://www.ifixit.com/Teardown/AirPods+Max+Teardown/139369
       | 
       | They really excel for high throughput & low latency - which noise
       | canceling sounds like a good example of! In addition to this,
       | they are already being used in communication systems & data
       | centers to speed up latency sensitive computations. Edge AI seems
       | like a big market that they will be used for soon, probably more
       | likely b/c they can be flashed unlike ASICs and new NN
       | architectures drop every couple of years.
        
       | burnished wrote:
       | Neat. If the author is around, might I suggest pushing some of
       | the 'why use an FPGA' to the front? I think it would benefit from
       | a more concrete example motivating the use of an FPGA - like a
       | picture of some simple circuit using a seven segment display on a
       | broad board next to a picture of an FPGA implementing the same
       | circuit in order to make it more clear that it is a substitute
       | for putting experiments together by hand. I think it will help
       | newcomers better contextualize what is happening and why.
       | 
       | I think in the same vein your wrap up of why you might want to do
       | something in hardware vs software is great and well placed.
       | 
       | Hmmm, I guess now is as good a time as any to bumblefuck around
       | with small electronics projects for fun. Thanks for the reminder!
        
         | beardyw wrote:
         | > Neat. If the author is around, might I suggest pushing some
         | of the 'why use an FPGA' to the front?
         | 
         | I think the problem is identifying cases where you really need
         | an FPGA. Most of the time you don't.
        
           | burnished wrote:
           | I suggest it purely for educational purposes. The first
           | struggle isn't identifying the best use case - its
           | understanding wtf is going on. Putting it in terms of
           | something more familiar is helpful for that.
           | 
           | Your thing would make for a wonderful followup topic though.
        
           | cycomanic wrote:
           | What do you mean by "you". Maybe "you" as in a general
           | consumer don't need an FPGA, but I guess one could argue a
           | general consumer doesn't need a general purpose computer
           | either.
           | 
           | There are certainly many use cases where you absolutely do
           | need an FPGA, i.e. anything were you need to process large
           | amount of IO in realtime. For example the guys from simulavr
           | (talk about how they use an FPGA for display correction)
           | here: https://simulavr.com/blog/testing-ar-mode-image-
           | processing/
           | 
           | Many modern devices would not function without FPGAs
        
             | JohnFen wrote:
             | > anything were you need to process large amount of IO in
             | realtime.
             | 
             | I'm working on a FPGA-based system right now. We're using
             | an FPGA precisely because this is what we're doing -- about
             | a hundred I/O ports that have to be processed with as
             | little latency as possible.
        
             | beardyw wrote:
             | I think we can agree that this discussion does not involve
             | general consumers!
             | 
             | "Many cases" is not the opposite of most cases.
        
             | kanetw wrote:
             | (SimulaVR dev) It's not wrong to say that in most cases,
             | tasks are better solved without an FPGA. But when you need
             | one you need one (or an ASIC if you have the volume and
             | don't need reconfigurability)
        
           | asdfman123 wrote:
           | This is meant to be an introduction though, right? You can
           | simply write "some people do X, and others claim Y is better"
           | then move on.
           | 
           | I read several paragraphs of the article and I still don't
           | know why you'd use one, despite taking computer architecture
           | and analog electronics courses in undergrad.
           | 
           | I don't want to read about logic gates again and I don't want
           | to read about the nuances before I broadly understand what
           | the point is.
           | 
           | For anyone else still wondering, here's Wikipedia:
           | 
           | > FPGAs have a remarkable role in embedded system development
           | due to their capability to start system software development
           | simultaneously with hardware, enable system performance
           | simulations at a very early phase of the development, and
           | allow various system trials and design iterations before
           | finalizing the system architecture.
           | 
           | Basically, rapid prototyping I guess. That makes sense.
        
             | awjlogan wrote:
             | If that was an ask for a specific example, one of the most
             | common uses for FPGAs is DSPs. Say you have a simple FIR
             | filter of, say, 63 taps. To do this in a CPU requires you
             | to load two values and do a multiply/accumulate for each
             | tap in sequence. Very (!!) optimistically, that's about 192
             | instructions. With an FPGA, you can do all the
             | multiplications in parallel and then just sum the outputs -
             | probably done in 2 cycles and with pipelining your
             | throughput could be a sample every clock.
             | 
             | If the FPGA is too slow, too power inefficient etc you can
             | (if you have the money!) take the same core design and put
             | it in an ASIC. The FPGA provides an excellent prototyping
             | environment; in this example you can tune the filter
             | parameters before committing to a full ASIC.
        
               | pjc50 wrote:
               | > multiply/accumulate for each tap in sequence. Very (!!)
               | optimistically, that's about 128 instructions
               | 
               | This is what all those vector instructions are for.
               | 
               | FPGA is kind of invaluable if you have lots of streams
               | coming in at high megabit rates, though, and need to
               | preprocess down to a rate the CPU and memory bus can
               | handle.
        
               | awjlogan wrote:
               | Yes, indeed :) Didn't want to muddy the waters with
               | vector instructions, and it's fair to say that the
               | dedicated DSP chip market has been squeezed by FPGAs on
               | one side and vectorised (even lightly, like the
               | Cortex-M4/M7 DSP extension) CPUs on the other.
        
               | asdfman123 wrote:
               | Explain it to me like I'm your mom.
        
       | nfriedly wrote:
       | I've read that AMD's 7040-series mobile CPUs will have an "FPGA-
       | based AI engine developed by Xilinx" [1] - I'm wondering how
       | _programmable_ that will be.
       | 
       | I know there's been some performance difficulties emulating the
       | PlayStation 3's various floating point modes. It's the kid of
       | thing that I think an on-chip FPGA could theoretically help with,
       | although I don't know if it'd be worth the trouble in this
       | specific case. (Or if AMD's implementation will be flexible
       | enough to help.)
       | 
       | [1]: https://www.anandtech.com/show/18844/amd-unveils-ryzen-
       | mobil...
        
       | sph wrote:
       | Sadly the article doesn't go into details about how the
       | programmable RAM is wired to the actual logic gates, which seems
       | to me the most interesting and challenging part of designing an
       | FPGA.
       | 
       | In my mediocre understanding of digital circuits, RAM is usually
       | addressable, so it has to be wired in a more direct manner to
       | enable such a design.
       | 
       | I posted this article because someone mentioned some Ryzen chip
       | having an FPGA in another post, and I am now left wondering:
       | 
       | 1. why don't we have more user-programmable FPGAs in our fancy
       | desktop mainboards
       | 
       | 2. is there a SoC board, ARM or RISC-V based, with an FPGA on
       | board? The slower the CPU, the more useful an FPGA would be to
       | accelerate compute tasks
        
         | duskwuff wrote:
         | > Sadly the article doesn't go into details about how the
         | programmable RAM is wired to the actual logic gates
         | 
         | Not sure what you mean by that. Do you mean how a RAM is used
         | as a lookup table to implement logic gates, how routing works,
         | or how block RAM is integrated into the FPGA fabric?
         | 
         | > is there a SoC board, ARM or RISC-V based, with an FPGA on
         | board?
         | 
         | Better yet, there are a number of FPGAs available with an ARM
         | SoC on board. Xilinx Zynq, Intel Cyclone V SoC, various others.
        
         | pjc50 wrote:
         | > RAM is usually addressable, so it has to be wired in a more
         | direct manner to enable such a design
         | 
         | DRAM is necessarily a grid.
         | 
         | SRAM, in e.g. the standard 6-transistor cell form, you can kind
         | of dump individual bits anywhere you need one.
         | 
         | > why don't we have more user-programmable FPGAs in our fancy
         | desktop mainboards
         | 
         | They tend to be horrifyingly expensive and there are few use
         | cases you can't outperform with a GPU or even just vector
         | instructions. Most of the interesting use cases for FPGAs are
         | when you have direct access to the pins and can wire them up to
         | high-speed signalling, which really isn't home user friendly.
         | 
         | Also all the tooling is proprietary.
         | 
         | > is there a SoC board, ARM or RISC-V based, with an FPGA on
         | board
         | 
         | Buy a medium sized FPGA and download a CPU of your choice.
         | 
         | (I have a downloadable-CPU-sized FPGA board on my desk for
         | testing not yet shipped ASIC designs. It costs about six
         | thousand dollars and has a 48-week lead time on Farnell)
        
           | sph wrote:
           | > Buy a medium sized FPGA and download a CPU of your choice.
           | 
           | Damn, _of course_ one would be able to download a CPU and
           | "emulate it" in hardware.
           | 
           | I never imagined that would be possible. Now I'm thinking
           | that I had infinite free time, I would buy an FPGA and design
           | a modern Lisp CPU. A RISC-V based design with native Lisp
           | support. Who needs hardware when you can just emulate it in
           | an FPGA.
           | 
           | That's seriously cool technology.
        
         | MSFT_Edging wrote:
         | As for question 1, they're far more common in server grade
         | stuff where typically they are baked in. Consumer stuff just
         | doesn't need/use as much IO throughput and muxing that the FPGA
         | provides on say, a large networking switch.
         | 
         | There are PCIe compatible FPGAs that you can plug into your
         | desktop like a graphics card to accelerate certain tasks. In
         | general though, our workstation hardware just isn't specialized
         | enough to require them, but can be extended to do so. If
         | something is a large enough business model, they'll just make
         | an ASIC.
        
         | aphedox wrote:
         | After Intel acquired Altera they released a series of x86 Xeon
         | chips with integrated FPGAs. Look up the Xeon 6138P.
        
         | wildzzz wrote:
         | Both Intel and Xilinx sell FPGAs with hard ARM cores inside so
         | you can run real Linux while being able to interface with
         | custom logic. Additionally, it's pretty common to create ARM,
         | RISC-V, or PowerPC soft cores in the FPGA when there is no hard
         | cores available. These mimic the real cores and will run
         | software while allowing for things like custom instructions
         | that can take advantage of the flexibility of FPGA fabric. The
         | Xilinx Zynq and Intel Cyclone V have options for hard ARM
         | cores. There are various designs of boards out there you can
         | buy that implement Arduino or Raspberry Pi shield
         | compatibility. The XUP PYNQ-Z2 supports both interfaces and
         | runs a Zynq-7000 with a real ARM core.
         | 
         | You can do other things with soft cores that are not possible
         | with an off the shelf CPU like triple mode redundancy. This is
         | when you run a lot of the logic in triplicate and vote on the
         | results to prevent a bit flip from messing up the software.
         | This is common for space-based CPUs that are running on FPGAs.
         | It's expensive to design a new chip in a very small run so it's
         | much cheaper to just put the core on an off the shelf FPGA and
         | use the rest of the FPGA fabric for custom logic functions.
        
         | gchadwick wrote:
         | > Sadly the article doesn't go into details about how the
         | programmable RAM is wired to the actual logic gates, which
         | seems to me the most interesting and challenging part of
         | designing an FPGA.
         | 
         | It does, that's the part under the 'Look-Up Tables' section.
         | The key is there aren't any actual logic gates just lots of
         | little RAMs. You implement an arbitrary blob of logic by having
         | the inputs form the address then the RAM gives the result of
         | the logical function.
        
           | stephen_g wrote:
           | Well, they do have some logic gates - usually the cells have
           | at least one flip flop, as well as the LUT.
        
           | roadbuster wrote:
           | > You implement an arbitrary blob of logic by having the
           | inputs form the address > then the RAM gives the result of
           | the logical function.
           | 
           | This is incorrect. Modern FPGAs are composed of small,
           | configurable blocks which contain all sorts of logic. The
           | idea is that the configurable blocks can be (internally)
           | wired-up to implement your logic of choice. The wiring
           | configuration is "loaded" at power-on and retained in
           | memories within each, configurable block.
        
             | gchadwick wrote:
             | Well indeed modern FPGA fabric along with the various fixed
             | function blocks can be very complex, but this is a
             | beginners 'How Does an FPGA Work?' for which a bunch of
             | LUTs connected by programmable interconnect is a useful
             | approximation.
        
         | PragmaticPulp wrote:
         | > 1. why don't we have more user-programmable FPGAs in our
         | fancy desktop mainboards
         | 
         | It has been tried, but GPUs are so fast and efficient enough
         | that it's rarely worth it.
         | 
         | It's very easy to attach an FPGA to the PCIe bus as an add-in
         | card exactly like your GPU. In fact, many FPGA dev boards come
         | in exactly this format. They're available, they're just not in
         | demand.
         | 
         | > 2. is there a SoC board, ARM or RISC-V based, with an FPGA on
         | board? The slower the CPU, the more useful an FPGA would be to
         | accelerate compute tasks
         | 
         | Plenty of FPGA parts include ARM cores. It's a fairly standard
         | chip configuration.
         | 
         | You can also connect an FPGA and an SoC with PCIe or other
         | interconnects. It's really not an obstacle.
         | 
         | FPGAs just aren't very efficient from a cost or dev time
         | perspective for most applications. They're indispensable when
         | you need them, though.
        
         | rjsw wrote:
         | There are plenty of boards that have one of the combined ARM &
         | FPGA chips, Zynq (Xylinx/AMD) or Cyclone (Altera/Intel).
        
       | dddiaz1 wrote:
       | Another really cool use case for FPGAs is for ultra fast analysis
       | of genomic data. This guide walks you through setting up an F1
       | instance (AWS FPGA) to do that: https://aws-
       | quickstart.github.io/quickstart-illumina-dragen/
        
       | mpd wrote:
       | I really enjoyed the recent Hackerbox[0] featuring an FPGA. I'd
       | never worked with one prior to that.
       | 
       | https://hackerboxes.com/collections/past-hackerboxes/product...
        
       | jokoon wrote:
       | So can a large FPGA be somehow used to brute force encryption?
       | 
       | I don't really understand electronics to see if a GPU could be
       | faster than a FPGA, but my guess is yes?
       | 
       | It seems that anything that can be programmed is inherently
       | slower than a FPGA equivalent doing the same task.
       | 
       | Does larger enough key size always defeat a FPGA?
       | 
       | I would guess that it becomes power and cost prohibitive for a
       | private company to deliver such possibility, but of course, a
       | large government entity like the NSA might have enough resource
       | to pay for enough FPGA to decrypt most things.
        
         | braho wrote:
         | Even though the FPGA fabric might encode the solution more
         | effectively, there are other important differentiators: clock
         | speed and memory bandwidth. GPUs have higher clock speeds and
         | typically better memory bandwidth (related of course).
         | 
         | With the higher clock speed, GPUs can well outperform FPGAs for
         | many problems.
        
       ___________________________________________________________________
       (page generated 2023-05-03 23:00 UTC)