[HN Gopher] Purely Functional Data Structures (1996) [pdf]
       ___________________________________________________________________
        
       Purely Functional Data Structures (1996) [pdf]
        
       Author : debanjan16
       Score  : 300 points
       Date   : 2023-05-30 11:43 UTC (11 hours ago)
        
 (HTM) web link (www.cs.cmu.edu)
 (TXT) w3m dump (www.cs.cmu.edu)
        
       | dang wrote:
       | Related. Others?
       | 
       |  _Purely Functional Data Structures in Elm - course lecture notes
       | (2015)_ - https://news.ycombinator.com/item?id=12145741 - July
       | 2016 (15 comments)
       | 
       |  _What 's new in purely functional data structures since Okasaki?
       | (2010)_ - https://news.ycombinator.com/item?id=11056704 - Feb
       | 2016 (42 comments)
       | 
       |  _Purely Functional Data Structures (1996) [pdf]_ -
       | https://news.ycombinator.com/item?id=10486481 - Nov 2015 (13
       | comments)
       | 
       |  _Okasaki: Purely Functional Data Structures (1996) [pdf]_ -
       | https://news.ycombinator.com/item?id=8327838 - Sept 2014 (1
       | comment)
       | 
       |  _What 's new in purely functional data structures since Okasaki?
       | (2010)_ - https://news.ycombinator.com/item?id=7081191 - Jan 2014
       | (17 comments)
       | 
       |  _Ten Years of Purely Functional Data Structures (2008)_ -
       | https://news.ycombinator.com/item?id=5701396 - May 2013 (24
       | comments)
       | 
       |  _What 's new in purely functional data structures since
       | Okasaki?_ - https://news.ycombinator.com/item?id=1983461 - Dec
       | 2010 (2 comments)
       | 
       |  _What 's new in purely functional data structures since Okasaki_
       | - https://news.ycombinator.com/item?id=1713594 - Sept 2010 (1
       | comment)
       | 
       |  _" Purely Functional Data Structures" by Chris Okasaki [pdf]_ -
       | https://news.ycombinator.com/item?id=1138979 - Feb 2010 (12
       | comments)
       | 
       |  _Teaching, Playing, and Programming: Ten Years of Purely
       | Functional Data Structures_ -
       | https://news.ycombinator.com/item?id=112270 - Feb 2008 (2
       | comments)
       | 
       |  _Chris Okasaki 's PhD thesis on purely functional data
       | structures (pdf)_ - https://news.ycombinator.com/item?id=8221 -
       | April 2007 (1 comment)
        
       | okennedy wrote:
       | This is the canonical guide to reasoning about amortized
       | runtimes. The exponentially growing ArrayBuffer (O(1) amortized
       | append) is the classical data structure used to teach amortized
       | runtimes, but the O(1) amortized functional queue Okasaki
       | presents here gives a much better intuition for what amortized
       | runtimes are all about.
        
       | odipar wrote:
       | Okasaki got me interested in confluently persistent data-
       | structures, way back in the 2000s.
       | 
       | They seem magical! To be able to combine data from the past with
       | current data, efficiently!
       | 
       | They are almost always trees, with the exception of skip-lists,
       | with all operations O(log(n)), .
       | 
       | After creating my own programming language Enchilada that is
       | based on immutable data structures, I started considering what I
       | deemed "next level":
       | 
       |  _Uniquely represented confluently persistent data structures_
       | 
       | Combined with a Merkle tree encoding of such uniquely represented
       | data structures (they are almost always trees), you can
       | efficiently and incrementally authenticate them. Think 'block
       | chain' on steroids, with incremental cryptographic hashes. Or
       | torrents, if you are into that kind of thing.
        
       | weeksie wrote:
       | This book is near and dear to my heart. Back in the mists of time
       | (~05?) when I was learning Haskell I reimplemented several of
       | these in order to get my head around things. Lots of fond
       | memories.
        
       | 2-718-281-828 wrote:
       | is this being upvoted onto the homepage based on upvoters
       | actually understanding that this paper from 1996 is of
       | contemporary relevance and interest or more due to keywords like
       | "pure", "functional", "data" and "structure"?
        
         | paulgb wrote:
         | I can't speak for everyone, but I upvoted it from nostalgia,
         | having read the book version over a decade ago. I happened to
         | be thinking about ordering a copy for the office just
         | yesterday.
        
           | dllthomas wrote:
           | I have a copy on my desk. The bit about designing data
           | structures by analogy to number systems (and limiting carry
           | propagation) is really fun.
        
         | bradrn wrote:
         | > is this being upvoted onto the homepage based on upvoters
         | actually understanding that this paper from 1996 is of
         | contemporary relevance and interest ...?
         | 
         | In my case, yes.
        
         | jstrieb wrote:
         | I've recently used a number of structures that I learned from
         | this book. Though I don't know if the text represents the state
         | of the art in purely functional data structures, it's a pretty
         | seminal work in the area.
        
           | UncleMeat wrote:
           | Nowhere near the state of the art. Lots of improvements since
           | this was published.
           | 
           | It is a good book for learning. It is a decent book for
           | reference. When you want to really fly you will want to reach
           | for more recent work.
        
             | hardlianotion wrote:
             | Examples of more recent work for us non-specialists?
        
               | nequo wrote:
               | See the commenter here:
               | 
               | https://news.ycombinator.com/item?id=36125110
        
               | hardlianotion wrote:
               | Missed it - thanks
        
         | bluepod4 wrote:
         | Your comment is giving early 2010s hipster "you probably never
         | heard of it" vibes.
        
         | ufo wrote:
         | It is is still the go-to textbook for immutable data
         | structures. Worth the read.
        
         | turtleyacht wrote:
         | The book is on Amazon, but this submission had those keywords,
         | _plus_ it 's a PDF.
         | 
         | Of course it is of contemporary relevance; functional
         | programming (FP) is all the rage.
         | 
         | The tricky bit are questions like
         | 
         | How does this jive with existing JS functional constructs like
         | "fantasy land," for example.
         | 
         | How to "recruit" more folks to FP, or even a hybrid approach of
         | objects interacting functionally
         | 
         | Game jams using more FP-like data structures? Or more HN
         | submissions like that.
         | 
         | The harder things to evaluate are a lot of other topics, news-
         | like but investigative and curious, or sites that are
         | essentially selling a service (versus teaching the mechanism
         | behind it).
         | 
         | For SaaS stuff, since HN is about startups, I have to let it
         | slide. But the _hacking_ piece is when one person accomplishes
         | something with persistent decomposition of sequential problems,
         | or does something clever using tools or ideas from a different
         | context.
        
           | solomatov wrote:
           | My understanding is that the book is based on this PhD
           | dissertation.
        
             | turtleyacht wrote:
             | Oh.. then, yes--this was unabashedly a (positively)
             | triggered reaction (fortunately or unfortunately).
        
         | schaefer wrote:
         | This comment reads as if there is a clear, contemporary
         | successor for learning pure functional data structures.
         | 
         | Is there? If so, please do share a reference.
        
       | eointierney wrote:
       | I remember encountering an early draft when he was still doing
       | his masters? Absolutely cracking paper, one for the ages. Thanks
       | Chris :)
        
       | lemper wrote:
       | this literature was my "serious" introduction to fp. implemented
       | some of the data structure on f#.
        
       | mklauber1 wrote:
       | Definitely read that headline as "Purely Fictional Data
       | Structures". My disappointment is immense.
        
         | 082349872349872 wrote:
         | Purely Fictional Data Structures include:                 To
         | Queue a Mockingbird       The Call-stack of the Wild
         | Tesselation of the d'Urbervilles       One Flew Over the Cuckoo
         | Hash       The Data of the Woosters       Brideshead Re-visitor
         | The Catcher in the Trie       Les Miser-tables       The Nat-
         | elim of Monte Cristo
        
       | rntz wrote:
       | title typo: "Purely Functional Data Structure" -> "Purely
       | Functional Data Structures" (pluralization)
       | 
       | reads a bit weird otherwise - sounds like it's discussing a
       | particular purely functional data structure when it's actually a
       | survey of many (pretty much the canonical survey of them, in
       | fact).
        
         | debanjan16 wrote:
         | Thanks for pointing it out. I totally missed it. Sorry.
        
       | tmtvl wrote:
       | It's weird that there are so many claims in here that the data
       | structures and algorithms are perfectly performant yet there
       | isn't even one look at generated assembly or any acknowledgement
       | of the underlying system that is supposed to _run_ the code.
       | Proving things are Big O performant is neat, but at some point
       | the code has to hit hardware.
        
         | shepherdjerred wrote:
         | > at some point the code has to hit hardware
         | 
         | Yes, but that's not a concern of a computer scientist.
         | Implementation and execution of the algorithm are up to the
         | reader.
         | 
         | It's like complaining that engineers don't do enough novel
         | computer science research; of course they don't! It's not their
         | job.
        
           | Gravityloss wrote:
           | Which profession is expected to make progress on actual
           | software performance?
        
             | shepherdjerred wrote:
             | Computer scientists (and of course engineers) both care
             | about real-world performance, but some computer scientists
             | just care about the theory.
        
           | fjeifisjf wrote:
           | That's a very naive view of computer science. That is the
           | attitude of people who have given up and decided that
           | computers are too big for science now.
        
             | shepherdjerred wrote:
             | You're right, there are plenty of papers focusing on real-
             | world performance. I chose not to capture the nuance
             | because I wasn't sure how to express it succinctly.
        
               | scott_s wrote:
               | How I see it: computer science as an academic field is
               | roughly split between _theory_ and _systems_. The theory
               | folks tend to evaluate progress through proofs. The
               | systems folks tend to evaluate progress through
               | experiments.
        
             | rgbgraph wrote:
             | I've vouched for this, because it brings up an interesting
             | point (albeit, one that proven provocative and been
             | expressed before). Also, many of your comments are dead.
             | 
             | Big O/algorithmic complexity is interesting in that it
             | tends to abstract away the architecture of the underlying
             | processor. A copy is a copy is a copy. Arithmetic is
             | arithmetic is arithmetic. Regardless of what
             | instructions/processing units the underlying processor has.
             | Regardless of data access. Regardless of parallelization.
             | Regardless of memory complexity. All we use are brutish
             | units of "work" -- despite not all "workers" being equal.
             | 
             | It reminds me a bit of scientific management: treating your
             | "workers" as interchangeable units that carry out what has
             | been decided to be the most efficient movements and
             | processes -- completely disregarding the individual
             | characteristics of the worker. For a humanist example,
             | consider the individual quirks of the physiology (not only
             | in size, length, and composition of the skeletal system via
             | the muscles, bones, tendons and so on that would
             | necessitate there being specific movements and workloads
             | that best suit them -- but also in psychology; the brain as
             | its own work-producing part that is uniquely suited and
             | most efficient for specific workloads; rather than an all-
             | encompassing abstract average "best workstyle" that makes
             | no note of these characteristics, but simply decides to use
             | external metrics like "time to pick up a box using X, Y, or
             | Z form.")
             | 
             | The same parallel can be drawn to computers: different
             | processors and collections of hardware (what is essentially
             | the "body and brain") have different quirks and different
             | workloads they perform best at. For example, GPUs are much
             | more useful in the case of vectorized/parallel workloads
             | where the operations are relatively simple (f.e matrix
             | arithmetic). While you can run an iterative matrix
             | multiplication algorithm on a CPU in n^3 time -- your data
             | access costs will be significant. Instead, running a
             | parallel algorithm on a GPU, with RAM very close by, you
             | can achieve log^2 n.
             | 
             | This is where CS research really shines: not running away
             | from the realities of physical systems, but embracing them.
        
         | jove_ wrote:
         | Also, while I haven't read any further yet, based on the table
         | on page 3, their big O performance is pretty bad.
        
         | gnulinux wrote:
         | > It's weird
         | 
         | It's not weird, this is standard in academic computer science.
         | It would be weird to do _otherwise_. In a theoretical
         | dissertation /paper like this you can't just randomly bring up
         | compiled assembly, it's _completely_ and _utterly_ off topic,
         | it 's not any more on topic than bringing up if the code was
         | ran by an interpreter, or JVM, or transpiled to Haskell, or ran
         | on GPU etc...
        
           | nerdponx wrote:
           | I can't speak for other fields, but in statistics and machine
           | learning, it's not uncommon to see at least some practical
           | experiments or simulations alongside theoretical results, or
           | at least some discussion of practical considerations.
           | 
           | There are definitely plenty of pure theory papers as well,
           | and pure theory is still a contribution. However one would at
           | least hope to see subsequent work that does focus on the
           | practical aspect.
        
           | amelius wrote:
           | "Computer science is no more about computers than astronomy
           | is about telescopes."
           | 
           | -- Edsger W. Dijkstra
           | 
           | However, I do believe that astronomers put references to the
           | actual instruments they used in their publications.
        
         | wk_end wrote:
         | Other commenters have addressed the key point - it's _not_
         | weird. CLRS doesn 't look at generated assembly either.
         | 
         | One other thing I want to add, though, is that this book was
         | published in 1996. The CPU landscape at the time was far more
         | diverse, both in terms of ISA (which assembly?) and also in
         | terms of capabilities. Even on the x86 line, plenty of people
         | were still using 486s, which weren't even superscalar, and thus
         | had pretty strikingly different performance characteristics
         | than, say, a Pentium Pro (a "modern", OoO CPU). Tying your
         | algorithmic analysis to a particular piece of hardware would be
         | pretty useless; providing a satisfactory look at the
         | performance on so many different (micro-)architectures could
         | risk overwhelming the content.
         | 
         | Moreover, at the time, performance was, maybe, a little more
         | straightforward, making your concerns perhaps not quite as
         | important. Memory was slower than the CPU but not the way it is
         | now. Caches were much smaller and less useful. Branch
         | prediction was more primitive. CPUs had relatively weak
         | reordering capabilities, if any at all, and simpler pipelines.
         | There were few dedicated SIMD instructions. This was a world
         | where linked lists still often made sense, because the hardware
         | penalties you paid for them were smaller. In other words: in
         | 1996 the on-paper performance wasn't at quite as big a risk of
         | diverging quite so wildly from the real-world performance, so
         | keeping things simple and focusing on the on-paper performance
         | was less objectionable.
        
         | Tainnor wrote:
         | Yeah, Big-O notation is only part of the picture and yes,
         | "galactic algorithms" that are theoretically efficient but only
         | for astronomically big inputs do exist, but in _many_ cases
         | (especially if your algorithm isn 't doing anything
         | particularly deep or clever) linear is faster than quadratic
         | which is faster than exponential on real world input, too.
         | 
         | Mathematics is the science of modeling and abstraction and that
         | abstraction exists in order to make the process of knowledge
         | gathering easier. It works very well for the sciences, so I
         | would say the method has been proven. Of course, if the models
         | turn out to be completely inappropriate, that would be
         | different, but so far, the simple machine models used
         | (implicitly or explicitly) in the analysis of algorithms seem
         | to be rather reasonable.
         | 
         | The alternative that you suggest, trying to benchmark _concrete
         | implementations_ of algorithms has so many confounders that it
         | would be very hard to execute properly. I 'm sure people are
         | doing this too, but the formal Big O proof has a different
         | quality to it because it does not depend on those particulars.
        
           | agumonkey wrote:
           | I'd even say that maths is what appears when you stop willing
           | to pay for concretion.
        
       | malkia wrote:
       | For C++ check this one out - https://github.com/arximboldi/immer
       | and this talk from the author -
       | https://www.youtube.com/watch?v=_oBx_NbLghY (CppCon 2018: Juan
       | Pedro Bolivar Puente "The Most Valuable Values")
        
       | tpoacher wrote:
       | I have a personal pet peeve about the misuse of terminology when
       | dealing with such names, for which the only solution is to go
       | read the original reference to figure out what they meant by it.
       | 
       | E.g., in this case, to describe a data structure as "purely
       | functional" makes zero sense to me intuitively at first. You need
       | to go read the thesis and realise they're referring to data
       | structures implemented as algebraic data types, which in the
       | context of a purely functional language can themselves be thought
       | of as functions, and can therefore be described as 'functional'
       | ...
       | 
       | But unless you do that, the first thought is going to be "huh?
       | can arrays be further split into imperative vs functional?" "Does
       | he mean immutable?" "Can I use functional arrays in c?" "Are they
       | faster/safer than normal arrays?".
       | 
       | By contrast, I think "Purely Functional Data-Structure Types"
       | would have been a far more intuitive term ... but I suppose the
       | author may have felt that clarifying further wasn't punchy
       | enough, and could have made the thesis more wordy...
        
         | Joker_vD wrote:
         | A data structure _is_ a type, so  "data-structure type" is a
         | pleonasm. Please don't misuse terminology :)
        
       | aeonik wrote:
       | I'm reading this book right now. It's really great so far!
       | 
       | I've been working a lot with Trees in Clojure, and have been
       | hitting serious limitations of my understanding.
       | 
       | I also found this YouTube video from a Clojure conference that
       | reviews some different strategies for tree traversal in Clojure:
       | https://youtu.be/YgvJqWiyMRY
       | 
       | I thought that learning a Functional Lisp would make it really
       | easy to traverse trees, since that is that the language is doing
       | to actually execute its code.
       | 
       | Turns out all the stuff I want to do is simply hard.
       | 
       | Functional data structures are really awesome though, it just
       | seems to take a bit of up front investment.
        
         | jmkr wrote:
         | This has been a topic I've wanted to get into for a few years
         | now, specifically because of Clojure! So if you have any
         | additional recommendations I'd appreciate it.
         | 
         | I really enjoyed Friedman's book `Scheme and the Art of
         | Programming` because it filled in some pieces missing from "The
         | Little Schemer" (and "Seasoned Schemer"). Building stuff like
         | `kons`, `map` and all that `letrec` stuff.
         | 
         | But the big difference between Scheme and Clojure is that in
         | Scheme, while it's "functional by concept," you get to escape
         | with `(set! ...)` whenever you want to have a more traditional
         | ds/algo (like say doing the n queens problem).
         | 
         | In Clojure you can kind of do that by either escaping to Java,
         | using protocols (with atoms), or using transients, but I often
         | feel like there's a "more functional way to do stuff that
         | hasn't been taught to me."
         | 
         | I've opened up either the Okasaki dissertation or book or both,
         | but I've always had trouble reading it, and then sticking with
         | it. And some stuff like looking at Rosetta code and saying "to
         | reverse a linked list in a lisp is easy... because it's a cons
         | cell" seems like cheating. Almost like showing up to an
         | interview, saying your "linked list" is implemented in an array
         | structure and then calling `reverse()` on it.
         | 
         | Will watch that talk from 2014, must not have seen it before.
         | 
         | I guess, conceptually, day to day things in Clojure does feel
         | pretty natural, even easier, and I think I have a decent
         | understanding of it. But then when I look at leetcode type
         | problems, or something more involved, it takes a lot of mental
         | effort to translate to it. Especially things like `big O` gets
         | thrown away in my mental model. I get it, persistent data
         | structures and all that, but there's still a mystery there.
        
         | smabie wrote:
         | I feel like they are.. not so awesome: they are grossly
         | inefficient due to all the pointer chasing and are pretty much
         | guaranteed to be slower than the alternative since they trash
         | your cache.
        
           | aeonik wrote:
           | Most of the problems I'm trying to solve require those
           | pointers anyway.
           | 
           | Tracking diffs and versions of data over time: Version
           | control systems and RDMS just don't cut it.
           | 
           | Bitemporal databases are interesting to me as well.
        
           | agentultra wrote:
           | You _feel_ this way but if you do the math and benchmarks you
           | will be surprised.
           | 
           | You might not be running GHC's RTS on a resource-constrained
           | target platform but you can generate the C code that could
           | run on that platform from Haskell, leveraging the guarantees
           | you get with using Haskell's advanced type system.
           | 
           |  _Update_ : point is that GHC can optimize code and some of
           | those optimizations can avoid pointer chasing! But like most
           | optimizing compilers, some times GHC can miss them, and you
           | have to do a bit of work to tell GHC where it can avoid
           | chasing unnecessary pointers (or where it should/should-not
           | inline, give it some help to know where it can do fusion,
           | etc).
        
           | jfoutz wrote:
           | I wish I could remember the exact book, I think it's _writing
           | solid code_. There was a long example about the excel
           | evaluation engine back in the day when it was shipped on CD's
           | and had to be perfect. The approach was, use one big dumb
           | slow, but understandable and correct implementation, in
           | parallel, use the lightning fast super whiz bang new
           | implementation. Since both shared the same interface, they
           | could be swapped out, or both run at the same time.
           | 
           | I think there is real value in starting with the pure
           | functional versions, then swapping out when needed. One
           | problem that seems fairly common in large codebases is using
           | an object as a hash key. but as the code grows, that object
           | sort of gets passed around and eventually somebody updates it
           | without rehashing. That's a tough bug find. They are for me
           | anyway.
           | 
           | This is one of those rare cases where you can actually make
           | it faster later without trashing the overall design.
           | 
           | I'd encourage starting with the pure functional version
           | first, every time. I'd go further and say, leave some
           | opportunity to flip back to the pure functional version in
           | dev builds for isolating heisenbugs.
           | 
           | Blah blah, grain of salt, free advice, your milage may vary,
           | Games have different requirements than webpages, everything
           | is contextual.
           | 
           | This is one rare case where it's always worth having the
           | capability of swapping back and forth is worth it. Just use
           | it, and think hard before giving it up is a really good
           | default.
        
             | 1MachineElf wrote:
             | I wanted to verify the book reference, and I think you
             | could be correct.
             | 
             | There is a website for it: https://writingsolidcode.com/
             | 
             | The Amazon page it links to includes a few Index pages in
             | the preview. I saw these under _E_ :
             | 
             | Excel number formatting code 163
             | 
             | Excel's dialog handling code 113
        
           | wtetzner wrote:
           | They're only grossly inefficient if you wouldn't otherwise
           | need to frequently make copies of large mutable data
           | structures. It all depends on what you're trying to do.
        
           | duped wrote:
           | There are entire fields of research dedicated to studying
           | cache oblivious persistent data structures, and they're
           | almost always trees.
           | 
           | One of the key points of immutable datastructures is that
           | they're easy to reason about given that the invariants are
           | upheld. Designing an efficient memory representation that
           | upholds those invariants is an entirely separate problem
           | domain. Not every pointer dereference is slow, and not every
           | tree is represented with pointers.
        
         | mtrimpe wrote:
         | I agree that it's a great book; but I wouldn't recommend it as
         | an entry point into understanding purely functional data
         | structures.
         | 
         | Okasaki fleshes out a couple of examples and explains his
         | thought processes and heuristics for developing such data
         | structures quite well; but they're very much still notes on the
         | early-stage exploration of the concept.
         | 
         | From a historical perspective it's still a fascinating read
         | though and definitely recommend it if you want to read up on
         | the origins of immutable data structures.
        
       | paddw wrote:
       | It would be cool if someone could translate this into Typescript
       | or the like, I think it would make it a lot more readable.
        
         | haskellandchill wrote:
         | TypeScript is much less readable than Haskell and OCaml, but
         | you can easily find translations to TypeScript such as
         | https://github.com/skeate/lambdata.
        
           | nequo wrote:
           | > TypeScript is much less readable than Haskell and OCaml
           | 
           | That's like saying that Norwegian is much less readable than
           | Italian. It is in the eye of the beholder. They can both
           | express the same concepts but which one is more readable
           | depends on which one you already know.
        
             | substation13 wrote:
             | Using this to push Brainfuck at work
        
             | consilient wrote:
             | > They can both express the same concepts
             | 
             | Only in the turing tarpit sense. Out of the box, they have
             | very different capabilities. For example:
             | 
             | Higher-kinded types: easy in Haskell, hard in OCaml, mostly
             | impossible in Typescript.
             | 
             | First-class modules: OCaml has them, Typescript can sort of
             | simulate them with extraordinarily unsafe prototype
             | mangling stuff that you should never ever use, impossible
             | in Haskell
             | 
             | Open variants: Easy in OCaml and Typescript, hard in
             | Haskell
        
       | gexahaha wrote:
       | There's also a nice addendum on cstheory.stackexchange, "What's
       | new in purely functional data structures since Okasaki?" -
       | https://cstheory.stackexchange.com/questions/1539/whats-new-...
        
         | dang wrote:
         | A few discussions:
         | 
         |  _What 's new in purely functional data structures since
         | Okasaki? (2010)_ -
         | https://news.ycombinator.com/item?id=11056704 - Feb 2016 (42
         | comments)
         | 
         |  _What 's new in purely functional data structures since
         | Okasaki? (2010)_ - https://news.ycombinator.com/item?id=7081191
         | - Jan 2014 (17 comments)
         | 
         |  _What 's new in purely functional data structures since
         | Okasaki?_ - https://news.ycombinator.com/item?id=1983461 - Dec
         | 2010 (2 comments)
         | 
         |  _What 's new in purely functional data structures since
         | Okasaki_ - https://news.ycombinator.com/item?id=1713594 - Sept
         | 2010 (1 comment)
        
         | anfelor wrote:
         | I would also recommend Koen Claessen's simplified finger trees
         | (https://dl.acm.org/doi/abs/10.1145/3406088.3409026) and Zip
         | trees (https://arxiv.org/pdf/1806.06726.pdf) for purely
         | functional skip lists.
        
         | mefarza123 wrote:
         | Since Okasaki's work there have been several advancements and
         | new developments in the field:(Source: MirrorThink.ai)
         | 
         | 1. PaC-trees: Supporting Parallel and Compressed Purely-
         | Functional Collections - 2022: This paper introduces PaC-trees,
         | a purely functional data structure that supports parallel and
         | compressed collections. PaC-trees are designed to be efficient
         | in terms of space and time complexity while maintaining the
         | benefits of functional data structures.
         | 
         | 2. Proving tree algorithms for succinct data structures - 2019:
         | This paper discusses the development of tree algorithms for
         | succinct data structures, which are compact representations of
         | data that support efficient query and update operations.
         | 
         | These is some work in other fields too:
         | 
         | 1. CyBy2: a strongly typed, purely functional framework for
         | chemical data management - 2019-12-30: This paper presents
         | CyBy2, a purely functional framework for managing chemical
         | data. The framework is designed to be strongly typed and
         | enforce referential transparency through the type system.
         | 
         | 2. chemf: A purely functional chemistry toolkit - 2012-12-20:
         | This paper introduces chemf, a purely functional chemistry
         | toolkit that provides a set of algorithms and data structures
         | for working with chemical data in a functional programming
         | context.
        
           | bafe wrote:
           | [dead]
        
         | smasher164 wrote:
         | RRB trees in particular allow you to benefit from some of the
         | cache efficiency of regular vectors, while supporting
         | operations like immutable update. They are used in languages
         | like Clojure and Scala.
        
       | solomatov wrote:
       | There's a book which looks like it's based on this dissertation,
       | which probably is a better source to read if you are interested
       | in this topic: https://www.amazon.com/Purely-Functional-Data-
       | Structures-Oka...
        
       | chalcolithic wrote:
       | my dream language is rebol with immutable data structures only
        
       | 1MachineElf wrote:
       | What are software bugs that can be avoided by choosing data
       | structures like these?
       | 
       | I'm making a broad, high-level presentation about immutability in
       | technology. At my company we have folks who have heard of it in
       | the context of ransomware-resilient backups, others who have
       | heard of it in the context of infrastructure as code, and very
       | few who have heard of it in terms of data structures (distributed
       | and non-distributed). My goal is to showcase the concept in
       | various contexts so that people can better understand its role as
       | a key design choice in technology.
       | 
       | Personally I have no experience working on software that utilizes
       | these, so if others here do, I would appreciate your input on how
       | these make your software more reliable.
       | 
       | The emphasis on software reliability and bugs-avoided is because
       | the audience works under the company's risk-management division.
        
         | chrisfinazzo wrote:
         | Disclaimer: Not a programmer - I do not have a CS degree. One
         | of my minors as an undergrad was MIS and while I took a
         | database programming course as a senior, my knowledge is
         | limited.
         | 
         | The high-level explanation I recall reading years ago somewhere
         | (Joel on Software, I think??) was that a) Functional programs
         | have no side effects, leading to the notion that b) They can,
         | without much effort, be adapted for parallel tasks.
         | 
         | The example here was MapReduce, which was the original guts of
         | the Google Search algorithm.
         | 
         | E.g, Things are fast because we can send your query to many,
         | many computers to find an answer, and chances are good that
         | that machine is (relatively) close to you, which means low wait
         | times to get your answer.
        
         | thethimble wrote:
         | Purely functional data structures are very common in purely
         | functional languages like Haskell but are also used in non
         | functional languages via libraries like immutable.js.
         | 
         | At a high level, immutability forces you to be extremely
         | deliberate about state changes in your application. This
         | improves reasoning/understanding, reduces bugs, and eases
         | debugging.
         | 
         | An example of immutability that you might be familiar with
         | would be react props/state. You don't modify your state. This
         | makes reasoning about state much more simple.
        
         | otto-edgar wrote:
         | I think the point is that these data structures can be the
         | foundation of a pure-functional style of programming.
         | 
         | You wouldn't drop these into an ecosystem where the programmers
         | are used to dealing with mutable structures.
         | 
         | Now, whether using a purely functional language correlates with
         | writing less bugs is a separate discussion, not sure what the
         | modern consensus is.
        
           | mrkeen wrote:
           | > not sure what the modern consensus is.
           | 
           | Too good a phrase to pass up!
           | 
           | Modern consensus is done by constructing a replicated log -
           | an append only data-structure shared across multiple workers.
           | It's what you use Paxos for, and it's what Raft streamlines.
        
         | red_admiral wrote:
         | _Immutable_ data structures (not 100% the same thing, but a lot
         | of overlap) avoid all kinds of concurrency problems, because
         | you can safely pass them around and you'll never get a data
         | race. You don't even need any locking (just to make things
         | complicated, _lock-free_ data structures are another closely
         | related but not identical concept).
         | 
         | Once you're running a distributed system, this kind of stuff
         | comes into its own.
        
           | taeric wrote:
           | Careful with this wording. They avoid shared memory
           | mutations. They don't necessarily change data races. Rather,
           | they just change them since, by definition, every edit is now
           | creating stale data.
           | 
           | At large, in distributed software, this is a distraction.
           | Since most passing of messages around from one distributed
           | piece to the other was already doing a copy across mediums.
           | Such that sent data was already immutable from the senders
           | perspective. (Granted, the preparation step can have you step
           | on your own feet.)
        
         | Chris_Newton wrote:
         | Purely functional data structures are a means, not an end in
         | themselves.
         | 
         | Programming with immutable values and the more declarative
         | style they support does design out at least one infamous source
         | of bugs: shared mutable state. That has become increasingly
         | important in recent years, for example because modern chips
         | support ever increasing numbers of concurrent threads in
         | hardware and because a lot of the software we write is now part
         | of distributed systems.
         | 
         | That programming style has other advantages in terms of
         | readability and reasoning about code. If you can be confident
         | that a particular name in the code always refers to the same
         | value once it's been defined, tracing the flow of data through
         | a function or through an entire system often becomes much
         | easier. Having more understandable code naturally tends to
         | improve reliability, among other advantages.
         | 
         | If we adopt that programming style then we still need to
         | represent collections of values and different relationships
         | within our data, but we can't always use "traditional" data
         | structures to do that any more because many of them rely on
         | mutability, which we have excluded. So we need other ways to
         | represent structured data that _are_ compatible with
         | immutability and declarative style, and that is what purely
         | functional data structures give us.
        
         | okennedy wrote:
         | As the peer comment points out, functional data structures
         | force you to be deliberate about where/when state changes take
         | place. In particular, they force you to be cognizant of which
         | _version_ of a data structure a particular piece of code is
         | referencing.
         | 
         | Immutable structures can help significantly in reducing race
         | conditions in parallel code, as it is impossible for one thread
         | to modify part of a data structure while another thread is
         | accessing it. Each thread sees a consistent 'version' of the
         | data structure until the code explicitly replaces it with a new
         | version.
         | 
         | Immutability can also help with memory optimizations: A node of
         | one data structure can be safely re-used in another data
         | structure. For example, if you need to retain older versions of
         | a tree, you can share common nodes (this is how GIT encodes
         | version changes).
        
       | alex_lav wrote:
       | I would love to be educated. I've seen claims about the merit and
       | value of functional programming throughout my (nowadays
       | relatively long) programming career. In practice I've never once
       | seen those values or merits come to fruition - just the same
       | cycle all software goes through.
       | 
       | My very direct experience recently has been Scala + cats resulted
       | in the same buggy nonperformant software it was meant to prevent.
       | I understand that bad programmers produce bad programs,
       | regardless of language, but I feel pretty strongly that good
       | tools prevent some amount of the typical "bad" that makes bad
       | programs bad (ignoring the obviously maliciously bad examples).
       | So I don't really understand, and would like to understand, if,
       | how and when pure FP (and I suppose FP in general) actually
       | improve quality of code/life outside of toy examples.
        
         | mrkeen wrote:
         | The bottom line is that pure FP means that the same input to a
         | function gives you the same output.
         | 
         | When you debug, you just give the program the same input which
         | was problematic and you get to reproduce the error.
         | 
         | Persistent data structures make it less wildly inefficient to
         | do so.
        
       | user2342 wrote:
       | Thanks. I have the printed book, but a PDF is much more
       | comfortable!
        
       | EddieEngineers wrote:
       | _Why_ would we want to use purely functional data structures?
       | When do the pros of functional data structures outweigh the
       | additional complexity? Are there scenarios when a project would
       | want to pivot from a regular data structure to a purely
       | functional one?
        
         | yawaramin wrote:
         | Do you use git? The git commit graph is a purely functional
         | data structure.
        
         | toolslive wrote:
         | they have some nice properties that you might want to benefit
         | from. For example, they're persistent: every previous state of
         | the data structure is still in there.. aka snapshots!
        
         | haroldl wrote:
         | The most common point is that they're safe to share between
         | threads making parallel algorithms easier to invent,
         | understand, and implement correctly.
         | 
         | You can also safely re-use sub-structures without performing a
         | deep copy. For example, if you want to keep a sub-tree around
         | for later you can do that in O(1) time because it's safe to
         | keep a reference to it. If it is a mutable tree you don't know
         | what's going to happen to it so you need to do a deep copy of
         | the entire sub-tree you're holding on to. This can save a lot
         | on memory allocation and copying depending on your use case.
        
         | wtetzner wrote:
         | There's a tradeoff between the complexity of the implementation
         | of a data structure, and the use of one. While the complexity
         | of implementing purely functional data structures is often
         | (except maybe in the case of singly linked lists) higher than
         | their mutable counterparts, actually using them in a program is
         | simpler and less error prone.
         | 
         | There are obviously other trade offs as well, like performance
         | and memory usage.
        
         | nimih wrote:
         | As is explained in the abstract, purely functional data
         | structures are most useful when you want to avoid assignment,
         | either due to the restrictions of your programming environment
         | (f.ex. you're programming in Haskell), or because that is a
         | property you're interested in for other reasons (e.g. you want
         | to ensure immutability due to concurrency concerns, or exploit
         | persistence to improve performance).
        
         | rg111 wrote:
         | Deep Learning applications is one area.
         | 
         | Traditionally, OO code is written all the time.
         | 
         | But after I learned JAX/Flax, a light turned on inside my head
         | and I now write functional Deep Learning code as much as I can.
         | All my side projects and new code are purely functional in
         | JAX/Flax. PyTorch has functional API known as functorch, and I
         | have used it one project.
         | 
         | Where lots and lots of data in 3,4,5 dimensional tensors exist,
         | and you need to run lots of transformation on them, and then
         | you need to multiply a huge number of them thousand times in
         | each second- functional code makes much more sense and gives
         | immense sanity and peace of mind.
         | 
         | Those of you writing Deep Learning code, learn functional
         | programming principles (immutable data, pure functions, leaving
         | no side effect, etc.), and apply them to DL via functorch or
         | JAX.
         | 
         | Your life will never be the same.
        
           | debanjan16 wrote:
           | Do JAX and functorch have the same level of builtin
           | functionalities (operations) as the original Pytorch library?
           | 
           | Where to learn about it other than the documentation?
        
             | rg111 wrote:
             | JAX is very barebones and will require you to write much
             | more code for the same task than you write in PyTorch.
             | 
             | Functorch is still new, and honestly, there is little to
             | learn if you already know JAX. There are some talks from
             | Meta, and then there is always the docs.
        
       | [deleted]
        
       ___________________________________________________________________
       (page generated 2023-05-30 23:00 UTC)