[HN Gopher] Accelerating Conway's Game of Life Using CUDA ___________________________________________________________________ Accelerating Conway's Game of Life Using CUDA Author : brendanrayw Score : 46 points Date : 2021-06-03 16:24 UTC (1 days ago) (HTM) web link (brendanrayw.medium.com) (TXT) w3m dump (brendanrayw.medium.com) | iseanstevens wrote: | Need to re-read in detail, but seems like great learning | material. | | I especially liked the line including "with less power comes | greater simplicity" :) | brendanrayw wrote: | Thanks for reading :D | brendanrayw wrote: | After having attended a few CUDA workshops at NVIDA's latest GTC, | I was inspired to continue learning CUDA on my own. To do so I | decided to build John Conway's famous "Game of Life" and use CUDA | to accelerate the program. I explore multiple different CUDA | techniques including managed memory, pinned memory, multiple | streams, and asynchronous memory transfers. | IdiocyInAction wrote: | Nice. I took a CUDA course at uni where I built a neural | network and a physics simulation. Optimizing them was very | involved, but ultimately very cool; I learned a ton of stuff. | | I'd love to work with CUDA in practice, but there's not that | many jobs around. | gtn42 wrote: | Nice, thanks for sharing your experience! | rrss wrote: | this was a fun read, thanks for sharing. | | FYI, the transfers from pageable memory almost certainly do not | go to the storage device in your system, unless you have high | memory pressure. "pageable" (as a cuda-ism) does mean that the | buffer _may_ be paged out to storage, but as a result it means | that (more importantly) even if the buffer is in RAM, the GPU | cannot access it directly. | | so for pageable copies the flow is probably not: | storage - buffer in RAM - device, | | but rather: original buffer in RAM | (inaccessible to the device) - intermediate buffer in RAM | (accessible to the device) - device. | | also, in several places you use the term 'stack' where I think | it should just be 'RAM' / main memory. | brendanrayw wrote: | Thanks for reading! I appreciate the feedback and the info, | I'll keep that in mind. | joe_the_user wrote: | Thanks for your effort! I really like the idea, it's similar to | a more ambitious project I'm thinking of. And I do have | questions | | Is your board a giant two-dimensional array in memory? | | Are your threads/kernels reading from this array and then | writing back to it? | | Do you do synchronization to make sure reads happen before the | later rights? | | Do you do any verification that your transition happen | correctly? | | Do you have an estimate for time spend in - transfer from | global GPU memory to each kernel, calculations in the kernel, | and time spent idling through synchronization (assuming you do | it). | jacquesm wrote: | Interesting. I'm assuming you are familiar with Hashlife? If not | check it out, it is absolutely amazing how fast it is, and as a | study in memoization maybe it will inspire you on how you can get | some more mileage out of your CUDA version. | | https://en.wikipedia.org/wiki/Hashlife | buescher wrote: | Agreed! The CUDA implementation is nice and I was going to say | "now do Hashlife" myself. Here's the original paper | https://www.lri.fr/~filliatr/m1/gol/gosper-84.pdf ___________________________________________________________________ (page generated 2021-06-04 23:01 UTC)