[HN Gopher] Conway's Game of Life on FPGA ___________________________________________________________________ Conway's Game of Life on FPGA Author : petrohi Score : 77 points Date : 2021-06-05 05:29 UTC (17 hours ago) (HTM) web link (k155la3.blog) (TXT) w3m dump (k155la3.blog) | pastrami_panda wrote: | I love that Conway published a request for proof that infinite | games existed, or something to that effect. And the proof turned | out to be known as a Gun structure (iirc). A Gun is essentially a | starting condition that leads to new projectiles continuously | getting launched into the world, thus creating infinite (and non | repeating?) games. | tantalor wrote: | Does this have anything to do with the topic? | lk0nga wrote: | I G' | darzu wrote: | I wonder how this would compare to a GPU accelerated GoL. | prestonbriggs wrote: | Golly would crush them all. | sltkr wrote: | Both of these approaches lose to a CPU. The state of the art | algorithm is Hashlife [1], which compresses both time and | space, and can evaluate billions of generations on a grid of | trillions cells in milliseconds. | | The GPA approach is really efficient at what it does but | ultimately it doesn't scale. For one, it needs 1 bit per cell | in the 2D torus, but FPGA have kilobytes or low-megabytes | amounts of memory. That makes it hard to simulate a 10,000 x | 10,000 grid, let alone a 1,000,000 x 1,000,000 grid. For two, | the FPGA explicitly calculates each iteration one-by-one. This | is pretty fast in the beginning, and it means you can use it to | calculate a billion iterations in a few seconds or a trillion | iterations in a few hours, but you can't scale past that. | | Hashlife can probably be sped up by GPUs a bit, but it | processes a symbolic representation and consequently is quite | suited to CPUs. It spends a lot of its time doing hash table | lookups (hence the name) which is not a good fit for GPUs and a | terrible fit for FPGAs. | | 1. https://en.wikipedia.org/wiki/Hashlife | marcodiego wrote: | Considering that the cell of each next generation can be | individually calculated in parallel, I don't think a GPU | implementation would be able to beat it. A GPU can have many | pipelines and quickly process many "pixels" simultaneously but | it will only be able to parallelize all of them for very small | screen sizes. | daxfohl wrote: | GPU L0 cache latency IIUC is ~20x higher than CPU. In fact in | this case I think GPU would have to use L2 cache since the data | is shared across so many cores, so now you're talking ~50x. So | even if you get full parallelism of cell computation you can | plug in the numbers and find it would be far slower than FPGA | (but still faster than CPU). | | I'm not an expert though. Maybe GPUs have some way of | mitigating the high cache latencies. | mwest217 wrote: | I implemented Conway's game of life in VHDL to run on a Xilinx | FPGA board in my digital electronics class in college - | interestingly, I think the hardest part was actually the HDMI | driving. | ChuckMcM wrote: | Very nice! I would quibble with this bit: _Verilog initially | focused on describing the hardware-very close to what could be | expressed by conventional schematic-and later added general- | purpose programming elements to create more complex components._ | | The concept here is 'inference' or 'synthesis' and it is the | fundamental difference between and HDL and an imperative | programming language. When you write general purpose statements | in an HDL, the tools have to infer what sort of hardware would | give you that behavior, and in a funny twist, the more you lean | on high level language features the more likely you are to run | into something that cannot be synthesized into hardware gates. | Perfectly valid HDL with no valid solution! | anyfoo wrote: | I've been working with FPGAs for years (in hobby, at work I'm a | mere "user" of them), and it always baffled me how poorly matched | the chosen imperative paradigm of Verilog and VHDL is to them. | | I think the idea was to make it look "familiar" to engineers by | looking like C (Verilog) or Ada (VHDL). But FPGAs are nothing | like CPUs, and what you end up with instead is an unfitting | language where you use a whole lot of "common constructs", | knowing how they will be synthesized into hardware. And worse: | Practically no good way to do abstraction. | | Functional languages are a much, much better match, because | that's what FPGAs are: Combining functions together. This works | on higher orders as well, and it works well with polymorphism! | | So privately at least, for anything substantial I've since been | using Clash, which is essentially a Haskell subset translated to | Verilog or VHDL: https://clash-lang.org | | The learning curve is steep, it definitely helped that I was | already proficient in Haskell. But then the code is so enormously | concise and modular, and I now have a small library of | abstractions that I can just reuse (for example, adding AXI4 to | my designs). It's a joy. | remexre wrote: | The Bluespec Language is another Haskell-like that compiles to | hardware, but I think it's a bit more mature, too. It also has | a second non-Haskell-like syntax to avoid scaring those used to | Verilog. | ohazi wrote: | There are other options as well: nMigen (python), Chisel | (Scala), SpinalHDL (Scala), Bluespec (Haskell). | | It's worth taking a quick glance at all of them (maybe making a | toy or two) to see if any of them strike your fancy. Being able | to do metaprogramming and parameterization in an actual | programming language rather than a not-quite C-preprocessor | pass makes a lot of things easier. | | Just about anything feels "nicer" to me than VHDL or Verilog, | but I particularly like nMigen, and find SpinalHDL to be at | least somewhat readable. | anyfoo wrote: | Thanks, I will give them a try. Right now I've done enough | with Clash to feel very comfortable with it. Having already | done a lot of Haskell before helps, because I tend to know | what's available in general. You can use almost the entirety | of Haskell's functional toolbox, including Monads and Arrows. | But it certainly won't hurt checking all the other options | out. | | I agree, metaprogramming and parameterization (both enabled | by polymorphism) is the key, and also what enables me to just | "plug in" a function that essentially implements e.g. an AXI4 | enabled register, or a whole burst transfer. | | I shudder having to do that by hand every time in | Verilog/VHDL now, tightly coupled to the rest of the logic. | progbits wrote: | Fully agreed. As an amateur, just trying to get into the FPGA | world, my experience was very similar. I've recently discovered | nMigen [1] (in a very nice series building Motorola 6800 [2]) | which approaches this by using Python and overloading type | operators, something like the TF dataflow model and that feels | more natural. | | However even for nMigen, and doubly so for Verilog/VHDL, it | feels very 90s when it comes to engineering practices. The | tooling is often lacking (module management, automated testing, | CI and so on), cryptic acronyms are used everywhere as if each | source code byte cost a LUT, you have to keep various things | (eg. signal bit widths) in sync in multiple places and many | more things which make C99 feel modern. | | I'll have to introduce Clash, it seems like a big step in the | right direction - thanks for mentioning it. | | [1] https://m-labs.hk/gateware/nmigen/ [2] | https://www.youtube.com/playlist?list=PLEeZWGE3PwbbjxV7_XnPS... | ale42 wrote: | Funny, it looks like the practical work we had to do in our CS | master class "advanced digital systems design", something like 20 | years ago, on a nowadays archaic XC4013 FPGA... (including the | VGA part). ___________________________________________________________________ (page generated 2021-06-05 23:00 UTC)