[HN Gopher] Conway's Game of Life on FPGA
       ___________________________________________________________________
        
       Conway's Game of Life on FPGA
        
       Author : petrohi
       Score  : 77 points
       Date   : 2021-06-05 05:29 UTC (17 hours ago)
        
 (HTM) web link (k155la3.blog)
 (TXT) w3m dump (k155la3.blog)
        
       | pastrami_panda wrote:
       | I love that Conway published a request for proof that infinite
       | games existed, or something to that effect. And the proof turned
       | out to be known as a Gun structure (iirc). A Gun is essentially a
       | starting condition that leads to new projectiles continuously
       | getting launched into the world, thus creating infinite (and non
       | repeating?) games.
        
         | tantalor wrote:
         | Does this have anything to do with the topic?
        
       | lk0nga wrote:
       | I G'
        
       | darzu wrote:
       | I wonder how this would compare to a GPU accelerated GoL.
        
         | prestonbriggs wrote:
         | Golly would crush them all.
        
         | sltkr wrote:
         | Both of these approaches lose to a CPU. The state of the art
         | algorithm is Hashlife [1], which compresses both time and
         | space, and can evaluate billions of generations on a grid of
         | trillions cells in milliseconds.
         | 
         | The GPA approach is really efficient at what it does but
         | ultimately it doesn't scale. For one, it needs 1 bit per cell
         | in the 2D torus, but FPGA have kilobytes or low-megabytes
         | amounts of memory. That makes it hard to simulate a 10,000 x
         | 10,000 grid, let alone a 1,000,000 x 1,000,000 grid. For two,
         | the FPGA explicitly calculates each iteration one-by-one. This
         | is pretty fast in the beginning, and it means you can use it to
         | calculate a billion iterations in a few seconds or a trillion
         | iterations in a few hours, but you can't scale past that.
         | 
         | Hashlife can probably be sped up by GPUs a bit, but it
         | processes a symbolic representation and consequently is quite
         | suited to CPUs. It spends a lot of its time doing hash table
         | lookups (hence the name) which is not a good fit for GPUs and a
         | terrible fit for FPGAs.
         | 
         | 1. https://en.wikipedia.org/wiki/Hashlife
        
         | marcodiego wrote:
         | Considering that the cell of each next generation can be
         | individually calculated in parallel, I don't think a GPU
         | implementation would be able to beat it. A GPU can have many
         | pipelines and quickly process many "pixels" simultaneously but
         | it will only be able to parallelize all of them for very small
         | screen sizes.
        
         | daxfohl wrote:
         | GPU L0 cache latency IIUC is ~20x higher than CPU. In fact in
         | this case I think GPU would have to use L2 cache since the data
         | is shared across so many cores, so now you're talking ~50x. So
         | even if you get full parallelism of cell computation you can
         | plug in the numbers and find it would be far slower than FPGA
         | (but still faster than CPU).
         | 
         | I'm not an expert though. Maybe GPUs have some way of
         | mitigating the high cache latencies.
        
       | mwest217 wrote:
       | I implemented Conway's game of life in VHDL to run on a Xilinx
       | FPGA board in my digital electronics class in college -
       | interestingly, I think the hardest part was actually the HDMI
       | driving.
        
       | ChuckMcM wrote:
       | Very nice! I would quibble with this bit: _Verilog initially
       | focused on describing the hardware-very close to what could be
       | expressed by conventional schematic-and later added general-
       | purpose programming elements to create more complex components._
       | 
       | The concept here is 'inference' or 'synthesis' and it is the
       | fundamental difference between and HDL and an imperative
       | programming language. When you write general purpose statements
       | in an HDL, the tools have to infer what sort of hardware would
       | give you that behavior, and in a funny twist, the more you lean
       | on high level language features the more likely you are to run
       | into something that cannot be synthesized into hardware gates.
       | Perfectly valid HDL with no valid solution!
        
       | anyfoo wrote:
       | I've been working with FPGAs for years (in hobby, at work I'm a
       | mere "user" of them), and it always baffled me how poorly matched
       | the chosen imperative paradigm of Verilog and VHDL is to them.
       | 
       | I think the idea was to make it look "familiar" to engineers by
       | looking like C (Verilog) or Ada (VHDL). But FPGAs are nothing
       | like CPUs, and what you end up with instead is an unfitting
       | language where you use a whole lot of "common constructs",
       | knowing how they will be synthesized into hardware. And worse:
       | Practically no good way to do abstraction.
       | 
       | Functional languages are a much, much better match, because
       | that's what FPGAs are: Combining functions together. This works
       | on higher orders as well, and it works well with polymorphism!
       | 
       | So privately at least, for anything substantial I've since been
       | using Clash, which is essentially a Haskell subset translated to
       | Verilog or VHDL: https://clash-lang.org
       | 
       | The learning curve is steep, it definitely helped that I was
       | already proficient in Haskell. But then the code is so enormously
       | concise and modular, and I now have a small library of
       | abstractions that I can just reuse (for example, adding AXI4 to
       | my designs). It's a joy.
        
         | remexre wrote:
         | The Bluespec Language is another Haskell-like that compiles to
         | hardware, but I think it's a bit more mature, too. It also has
         | a second non-Haskell-like syntax to avoid scaring those used to
         | Verilog.
        
         | ohazi wrote:
         | There are other options as well: nMigen (python), Chisel
         | (Scala), SpinalHDL (Scala), Bluespec (Haskell).
         | 
         | It's worth taking a quick glance at all of them (maybe making a
         | toy or two) to see if any of them strike your fancy. Being able
         | to do metaprogramming and parameterization in an actual
         | programming language rather than a not-quite C-preprocessor
         | pass makes a lot of things easier.
         | 
         | Just about anything feels "nicer" to me than VHDL or Verilog,
         | but I particularly like nMigen, and find SpinalHDL to be at
         | least somewhat readable.
        
           | anyfoo wrote:
           | Thanks, I will give them a try. Right now I've done enough
           | with Clash to feel very comfortable with it. Having already
           | done a lot of Haskell before helps, because I tend to know
           | what's available in general. You can use almost the entirety
           | of Haskell's functional toolbox, including Monads and Arrows.
           | But it certainly won't hurt checking all the other options
           | out.
           | 
           | I agree, metaprogramming and parameterization (both enabled
           | by polymorphism) is the key, and also what enables me to just
           | "plug in" a function that essentially implements e.g. an AXI4
           | enabled register, or a whole burst transfer.
           | 
           | I shudder having to do that by hand every time in
           | Verilog/VHDL now, tightly coupled to the rest of the logic.
        
         | progbits wrote:
         | Fully agreed. As an amateur, just trying to get into the FPGA
         | world, my experience was very similar. I've recently discovered
         | nMigen [1] (in a very nice series building Motorola 6800 [2])
         | which approaches this by using Python and overloading type
         | operators, something like the TF dataflow model and that feels
         | more natural.
         | 
         | However even for nMigen, and doubly so for Verilog/VHDL, it
         | feels very 90s when it comes to engineering practices. The
         | tooling is often lacking (module management, automated testing,
         | CI and so on), cryptic acronyms are used everywhere as if each
         | source code byte cost a LUT, you have to keep various things
         | (eg. signal bit widths) in sync in multiple places and many
         | more things which make C99 feel modern.
         | 
         | I'll have to introduce Clash, it seems like a big step in the
         | right direction - thanks for mentioning it.
         | 
         | [1] https://m-labs.hk/gateware/nmigen/ [2]
         | https://www.youtube.com/playlist?list=PLEeZWGE3PwbbjxV7_XnPS...
        
       | ale42 wrote:
       | Funny, it looks like the practical work we had to do in our CS
       | master class "advanced digital systems design", something like 20
       | years ago, on a nowadays archaic XC4013 FPGA... (including the
       | VGA part).
        
       ___________________________________________________________________
       (page generated 2021-06-05 23:00 UTC)