[HN Gopher] The Stack Monoid ___________________________________________________________________ The Stack Monoid Author : raphlinus Score : 121 points Date : 2020-09-05 17:06 UTC (1 days ago) (HTM) web link (raphlinus.github.io) (TXT) w3m dump (raphlinus.github.io) | raphlinus wrote: | A note: I just updated the post with a bunch of related work that | people have sent me since I published the first draft. It also | includes a bit more intuition behind the monoid. | agumonkey wrote: | best thread of the year | andrewflnr wrote: | The elements of this monoid kind of look like type signatures for | Forth words. I was already playing with the idea of a sort of | concurrent forth-like where each word sort of gets its own stack | context. I'm really interested in neat models that provide | partial evaluation, so I will be giving this some thought. | nilknarf wrote: | > Automatically generating monoids such as the stack monoid from | their corresponding simple sequential programs seems like an | extremely promising area for research. | | I was interested in this problem a while back and the papers I | remembered being useful were: | | "Automatic Inversion Generates Divide-and-Conquer Parallel | Programs": https://core.ac.uk/download/pdf/192663016.pdf | | which is based on "The Third Homomorphism Theorem": | http://www.cs.ox.ac.uk/people/jeremy.gibbons/publications/th... | | The motivating example is whether you can generate a mergesort | (where the monoid is combine two sorted lists) given an | implementation of an insertion sort (where the sequential program | is inserting one element by one into a sorted list). | | I remember being disappointed by the answer after reading these | papers but in the citation graph there are a bunch of relatively | recent program synthesis papers. Maybe there's something better | now. | boulos wrote: | Raph, I think some of the order-independent transparency folks | (Bavoil, Wyman, Lefohn, Salvi, etc.) have had to do this bounded | k-stack plus overflow thing. Might be a good "hmm, how did they | do it" for comparison. | raphlinus wrote: | Good call, that rings a bell and I'll look into it. A bounded | k-stack with overflow seems like a really good fit for the | capabilities provided by GPU. | boulos wrote: | Hmm. In the end, maybe transparency is a bad example, because | people immediately jumped to approximating it instead. | Wyman's review paper though is a good read regardless: | | http://cwyman.org/papers/hpg16_oitContinuum.pdf | | The GPU / accelerator BVH traversals are perhaps more | applicable (e.g., https://www.embree.org/papers/2019-HPG- | ShortStack.pdf) but they also have a different "I can restart | my computation" property that say JSON parsing wouldn't have | (though maybe if you're willing to do a lot of restarts once | you parse the inner portions and push them onto a heap...). | | Anyway, cool problem! | raphlinus wrote: | The bell it rang for me was [1], which has come up in | discussions with Patrick Walton about maintaining a sort | order for coarse rasterization in 2D vector rendering. | Taking another look at that, it seems like they propose | _both_ bounded k-stacks and linked lists as solutions for | recording all the fragments to be combined for transparency | blending, but not the hybrid of the two. I wouldn 't be | surprised if that's been considered. | | The problem is slightly different, though, because the | major focus for transparency is concurrent writing from | potentially many different shaders, while in the stack case | each thread can build its data structure without any | concurrency or need for atomics; once an aggregate is | built, it is "published" (see my prefix sum blog for | details on that) and treated as immutable. | | These kinds of concurrent approaches do become important | for later stages in the JSON parsing process, like building | hashmaps for keys in a dictionary. I'm pretty sure the | whole problem is tractable and have thoughts how to solve | it, but deliberately kept the scope of this blog post | limited, as I figured it was challenging enough. But it's | very cool to reach people who appreciate the ideas, and if | you find anything else that's relevant, I'd love to hear | from you. | | [1]: https://on- | demand.gputechconf.com/gtc/2014/presentations/S43... | wrnr wrote: | How do you get started with learning GPU programming? I mean that | in a pedagogical way, like what is the best educational way | forward. Build stupid stuff in shader toys? | Jhsto wrote: | In a sense GPU programming is similar to cellular automata. | Each cell is an invocation on the GPU, and you can assume that | you may have unlimited number of these invocations. You can | freeze cells infinitely or temporarily, but you cannot group | them. You can assume you know the identifier number of each | cell, and you can assume you know the number of cells for each | program. The hard part is now to generalize algorithms in which | only control structures are based on freezing, identifying, and | counting of cells. The cells can communicate information to the | neighboring cells, and to any arbitrary cell, but the latter is | not good for performance. | | Most universities don't really teach parallel algorithm | development in any way despite the core count increasing in | both CPU and GPU world. | | Current languages like GLSL and CUDA are very abstracted in the | sense that they don't really tell why some approach is better | or worse than the other. To me, if you can create as general | cellular automata -like solution, then such logic can be | converted and is generally fine performance-wise for SIMD | orientated programming. | raphlinus wrote: | It's a good question, and it's been suggested I write a GPU | compute 101. I did do a talk at Jane Street earlier in the year | called "A taste of GPU compute" which has some relevant | resources (a Google search on that term will give you a number | of links). | | I think there are probably a number of on-ramps. One easy way | to get started is shadertoy, which if nothing else should give | familiarity with GLSL syntax and intuition for performance in | the simplest case: each "thread" computes something (a pixel | value) independently of the others. You'll quickly run into | limitations though, as it can't really do any of the more | advanced compute stuff. I think as WebGPU becomes real, an | analogous tool that unlocks compute kernels (aka compute | shaders) could be very powerful for pedagogy. | | I think most people that do compute on GPU use CUDA, as it's | the only really practical toolchain for it. That has a large | number of high quality open source libraries, which tend to be | well-documented and have good research papers behind them. You | can start by using these libraries, then digging deeper to see | how they're implemented. | | As I've been going on about, I believe this space is ripe for | major growth in the next decade or so. As a rough guide, if you | can make your algorithm fit the GPU compute model nicely, | you'll get about 10x the compute per dollar, which is | effectively the same as compute per watt. Why leave such | performance on the table? The answer is that programming GPUs | is just too hard. In certain areas, including machine learning, | an investment of research has partially fixed that problem. But | in others there is fruit at medium height, ripe for the | picking. And in some other areas, you'll need a tall ladder, of | which I believe a solid understanding of monoids is but one | rung. | wrnr wrote: | Thanks, it was your work that kindled my interest in | programming the GPU. The fact that most vector graphics are | done by CPU libraries made me resize it's still a very | underdeveloped area. The hard thing about it is also what | makes it interesting, it's not just a different programming | language but a different architecture with other runtime | characteristics. Sort of the closest you can get to owning a | quantum computer. Anyway I'll learn some math and cuda, and | buy the book if you find the time :) | bitdizzy wrote: | This language of matching brackets is also called the Dyck | language and its syntactic monoid is the bicyclic semigroup, | which is an interesting example of an inverse semigroup. Edward | Kmett gave a talk on programming with these sorts of algebraic | structures that some might find illuminating: | https://www.youtube.com/watch?v=HGi5AxmQUwU | | Inverse semigroups are used in the geometry of interaction to | give a parallelizable interpretation of lambda calculus. I don't | have a great introductory reference. Here's some people who have | implemented a system based on the theory: | https://www.cambridge.org/core/journals/mathematical-structu... | | Sorry for the citation dump, I'm short on time and I thought they | would be interesting to those who found this submission | interesting. | | References | | https://en.wikipedia.org/wiki/Dyck_language | | https://en.wikipedia.org/wiki/Bicyclic_semigroup | chombier wrote: | About the stack/monoid structure, there was a nice talk by Conal | Elliott recently describing how it can be used to implement a | compiler [0]. He also mentions parallel computations at some | point. | | [0] https://www.youtube.com/watch?v=wvQbpS6wBa0 | agumonkey wrote: | I may be extrapolating too much but I think Steele team workig | on Fortress was also doing monoid like thinking to parallelize | heavy. He made a talk on how to delinearize problems into | partially solved subproblems to be reconciled later. | acoye wrote: | I had this crazy idea a while back using Ray tracing APIs to | piggy back a database. Testing for a row could be done shooting | in parallel rays to a structure mapped to the data. | [deleted] ___________________________________________________________________ (page generated 2020-09-06 23:00 UTC)