[HN Gopher] Out of the Tar Pit (2006) [pdf] ___________________________________________________________________ Out of the Tar Pit (2006) [pdf] Author : n0w Score : 83 points Date : 2023-02-27 08:01 UTC (1 days ago) (HTM) web link (curtclifton.net) (TXT) w3m dump (curtclifton.net) | cloogshicer wrote: | I think the crux of this paper is section 7.2.2: | | > There is one final practical problem that we want to consider | -- even though we believe it is fairly rare in most application | domains. In section 7.1.1 we argued that immutable, derived data | would correspond to accidental state and could be omitted | (because the logic of the system could always be used to derive | the data on-demand). Whilst this is true, there are occasionally | situations where the ideal world approach (of having no | accidental state, and using on-demand derivation) does not give | rise to the most natural modelling of the problem. One possible | situation of this kind is for _derived data which is dependent | upon both a whole series of user inputs over time, and its own | previous values_. In such cases it can be advantageous to | maintain the accidental state even in the ideal world. An example | of this would be the derived data representing the position state | of a computer-controlled opponent in an interactive game -- it is | at all times derivable by a function of both all prior user | movements and the initial starting positions, but this is not the | way it is most naturally expressed. | | Emphasis is mine. | | I think that this type of derived data, which I put in italics | above, is quite common - contrary to what the authors of the | paper argue. Any UI code or game-like system, like simulations, | will have this kind of data. And the paper does not have a good | answer for it. I honestly think that nobody has an answer for it, | and it's why most of our UIs suck. | | I would love to see something that makes handling this type of | derived data easy. | feoren wrote: | My impression from reading this has always been "Almost there! | Keep going." If you keep pulling on these threads, I claim that | you reach some inevitable conclusions. These ideas are a superset | of all the other advice they give (and most other advice you | hear, too): | | 1. Separate identity from data. These are _completely different | things_. The number 7 is not mutable, even though my age is. Keep | going! The phrase "It was a dark and stormy night" is not | mutable, even though the opening line of your book is. Keep | going! The statement "there are seventeen cars parked on level 3 | and four spots available" is _not mutable_ , even though the | status of the parking garage is. Identity is immutable, and | statements (data) are also immutable. Only the assignment of a | statement to an identity is mutable. | | 2. Think about the operations that make sense on your data. Sure, | you have "map", "filter", "reduce" that are mathematically pure. | What about "offset" or "rotate"? Those have identity, they are | associative, individually commutative (but not with each other!). | Is there a distributive property that applies there? Okay, what | about "buy" and "sell" operating on commodities? Are those | mathematical operations? Do they have an identity element? Are | they associative and commutative? "Buy 2000 lbs of chicken" -- is | that equivalent to "Buy 1 ton of chicken"? What are its domain | and range? Not "chicken", nor even "warehouses" -- you're not | teleporting the chicken as soon as you commit that record. More | like "contract", or even "contract request". What can you do with | contracts? Is there an "identity contract"? "Buy 0 chicken"? Zero | is a useful number! Is "Buy 0 chicken" the same as "Buy 0 beef"? | Explore these questions. Find the math around your domain. | Functional purity is all good, but it's wasted if your domain is | just "Chicken : Meat : Commodity", "Beef : Meat : Commodity", | "Pine : Lumber : Commodity". Wrong. Don't think about nouns. | Think about sensible pure operations in your domain. Successive | abstraction layers should be _restricting_ what 's possible to do | with your data, by defining higher and higher-level operations. | If your abstraction layers aren't restricting what's possible to | do, they're an inner platform and you don't need them. | | 3. Don't make big decisions upfront. That's how you add | accidental complexity. Don't make any decisions you don't need | to, and prefer solutions that allow you to defer or lighten | decisions. If you follow this thread, that means you're using a | relational model. They absolutely got that right (section 8). | Otherwise you're making decisions about ownership that you don't | need to be making. Domain Driven Design has it _wrong_ here with | aggregate roots. What 's an aggregate? Which concept goes inside | of which aggregate? You do not need to make that decision. The | authors of TFA get it right, and then wrong again, when they try | to apply a relational model to shitty noun-based domain ideas | like "Give Employee objects a reference to their Department, vs. | Give Department objects a set (or array) of references to their | Employees vs. Both of the above". No. They're not following their | own advice. Can an employee exist without the concept of | "department"? Yes. Can a department exist without the concept of | employees? Probably. Therefore you _must_ encode the relationship | separately from both. Your hands are tied. If concept A can exist | independently of concept B, you _must not_ draw an arrow from A | to B. The answer is not "both", it's "neither". An employee's | assignment within a department is its own independent concept, | that knows about "employee" and "department" both. And now it's | obvious you can give this concept its own tenure and add a new | record whenever they change departments. You've found append-only | event sourcing without needing to start there -- it came from | first principles (in this case). | | 4. Operate on the broadest scope you can. Manipulate sets of | things, not individual things. This is half of functional | programming's usefulness right here. This is supported by using a | relational model. Operate on sequences of things, not individual | things: there's reactive programming. How else can you broaden | the scope of what you're operating on? | | 5. Don't just do something, stand there! That is: don't _do_ , | _plan_. Instead of immediately looping, just use "map". Instead | of immediately hitting the database, write a query plan. This | doesn't mean offload your thinking to GraphQL -- that's just | kicking the can down the road. See point #2. This is the other | half of functional programming's usefulness. Do only as much as | you need to make (or append to) a plan, and then stop. Someone | else will execute it later -- maybe! You don't care. | | There's probably more, but there are more fundamental ideas than | "use functional programming" or "use event sourcing" or "use | SOLID" -- first principles that actually lead to almost all the | other good advice you get. This paper kinda almost gets there a | couple times, then walks back again. Keep going. Keep pulling on | those threads. I suspect that the more you do this, the more your | code will change for the better. But you have to be willing to | keep going -- don't half-ass it. If you only go part way and then | stop, the benefits won't be apparent yet. | dagss wrote: | I feel event sourcing is a real world pragmatic approach to | declarative programming that this paper advocates. | | For state changes you add events to the database to describe | something that happened. Any question you may need an answer for | / business decision you want to make can be answered by querying | the events. | | The problem at the moment is that while event sourcing is | excellent at reducing accidental complexity surrounding | implementing business rules, there is little standard / commonly | used tooling around it and you end up with lots of accidental | complexity in that end. | | An example would be a database not designed to be a CRUD store | but to store events and manage read models, and manage | computation of projections etc -- while being suitable for OLTP | workloads. At a minimum, very strong support for using any kind | of construct in materialized views (since in a sense the entire | business logic is written as a "materialized view" when doing | event sourcing) | feoren wrote: | Event Sourcing is a nice card to have in your hand, but it | should not be a goal in itself. A yard is 3 feet. The atomic | mass of Hydrogen is 1.008 and its symbol is "H". These will | _never_ change. If someone came to me and said "your | 'Chemical' table is not event-sourced. We're doing event- | sourcing here. You need to change it." I would tell them to get | lost. Why the hell would you have an event for "An Element's | Mass was Modified" event stored in your database? Unless you're | developing for CERN or NREL or something, just don't. | | On the other hand, having a bank account table with a single | field for someone's money is clearly not enough. You absolutely | should be tracking every transaction that has changed that | account's value. Do you need to track every other possible | change to that account? Like, whether they want paper or | electronic mail? No, probably not. | | "Event sourcing" is a way to refactor a domain model -- take a | statement and break it into a sequence of statements that "add | up" to the original statement. | | "Add up" is key here. When you break "AccountBalance" into | "Transaction", it's clear how to "add up" transactions to | recreate the original account balance. But that's not your | goal, necessarily! The reason why this tends to make better | domain models is exactly because you have to think about | "adding up" your domain models, and what that means. "Adding | up" is an associative, probably-commutative, binary operation | with identity. Note that that means your domain MUST have a | "zero transaction". ALL of your events that you event source | need a "zero event". If you cannot come up with the "zero" of | an event, then you should not be breaking into events! The | whole point is to be able to define the monoid over it, which | requires identity. | | So instead of taking event sourcing as an end goal, make your | goal this: think about the operations that make sense on your | domain, like accumulating accounts. What else can you | accumulate? Can you add Employees together? Not really -- you | can _group_ them, into departments and events and meetings. Is | that grouping associative and commutative? Sure -- it 's just | Set Union. Is there anything about Employees you can add | together? Well, their salaries. In fact for data analysis, | employee salaries are an important cost that you probably want | to throw in a data cube. Define a Monoid over Employee | salaries. | | What other operations make sense on your data? Close, open, | start, end, group, buy, sell, move, rotate, add, multiply, | concatenate, join, reverse, inverse, combine, merge, undo, | redo, fill, empty, saturate, fix, break, import, export, | report, validate. Are they associative, commutative, | distributive, invertible? Do they have identity? Event sourcing | is such a tiny part of exploring this world. And it's worth | exploring. | dagss wrote: | I'm not arguing that event sourcing should be done for its | own sake, so I don't really want to disagree with you; but | that said your post doesn't perfectly resonate with me | either. | | When you write a typical backend system, the desired function | of the system is to interact with the external world. Without | I/O the system may as well not exist. | | Input is a desire from someone that something be done or | recording that something happened. Such input changes the | data recorded, or appends what the system should know / has | seen. This is an "event". | | All input can be framed as being an event. But "an element's | mass was modified" is not an event ... it doesn't describe | someone or something giving input to our system. | | The algebraic view on things you take seems to be treating | the system at a different level than what I think about as | event sourcing. | | Neither "an element's mass was modified" or "sell" or | "transaction" that you mention are realistic events. An event | is "User U clicked a button in the web console to Sell share | S at time T". Implementing the effects of that event -- | computing a specific read model resulting from appending the | event to the stored set of events -- may well be best done by | some algebra like you suggest, but that seems like another | topic. | | You seem to talk about models for computing and transforming | state. I talk about I/O vs data storage. | feoren wrote: | > All input can be framed as being an event. | | Sure, it _could_ be, but is it _useful_ to do that? If I | stand up and shout "The price of a Banana is $4 per | bushel!", you could record my voice and upload it as a raw | wave file. That's the rawest "input event" you can come up | with. Or you could write down "some random dude said that | bananas cost $4 around 4:30 pm and I'm not sure whether I | believe him or not". That's not the "raw input", it's been | transcribed and modified and annotated. Yet it's almost | certainly more useful to your system, and it's kinda like | event sourcing. Kinda. | | The problem with worrying about whether something is | "input" or "output" or "internal" is that you can just move | the dotted line anywhere around your system to change | those. If you break a monolith into independent reusable | building blocks, those building blocks are going to have a | completely different idea of what counts as input and | output. But who cares? You're not changing any fundamental | truth about how the domain works. Your domain model should | really be independent of worrying about what's "input" and | "output". Those lines move all the time. Instead think | about what operations make sense to do with your data, and | then think about the mathematical properties of those | operations. | | > But "an element's mass was modified" is not an event ... | it doesn't describe someone or something giving input to | our system. | | Sure it does. Someone gave you the input that a particular | element has a particular mass. How is that not input? How | else did you get that data? | | > The algebraic view on things you take seems to be | treating the system at a different level than what I think | about as event sourcing. | | This is my exact point. You _should_ think about event | sourcing this way, because that 's the only reason it's | useful: it's accidentally a source of important "domain | algebra" that you otherwise might miss. But there's _lots_ | of other important "domain algebra" that you are still | missing, and they don't necessarily look like event | sourcing. | | > An event is "User U clicked a button in the web console | to Sell share S at time T". | | But surely that's not what you're _storing_ in your system! | That would be an extreme coupling between the concept of | "selling shares" and "clicking a button". Those are | _completely unrelated_ ideas! Why would you want to tightly | couple them!? If _that 's_ what you think event sourcing | is, sorry to be blunt, but you have very badly | misunderstood it. | dagss wrote: | Fair points. But how do I get from a "domain algebra" to | practical implementation with a popular database? | | Event sourcing can be translated to adding rows to | tables, or adding documents to collections. Focusing so | much on "append" isn't only because of what kind of | events you would model, but because you store data in | databases by, well, storing it..appending it.. | | If event sourcing is only useful as a source of an | important "domain algebra": How does a domain algebra | translate to practical use of a database system for your | application? How can I focus on the "operations I want to | do on my data", when the tools I am given is pretty much | INSERT/UPDATE/SELECT, or GET/PUT, or some variety on | these lines? | feoren wrote: | Bear with me, we're going on a bit of a walk. | | An abstraction layer says: here are some operations you | can do, operating on some models. Presumably those models | are a useful or convenient perspective on some underlying | data, and those operations are important or useful as | well. At this level, we can define something like "order | a pizza", "renew my driver's license", or "show me a book | I might like to read". Note the abstraction is driven by | the needs of the users/consumers/callers of the | abstraction, _not_ its implementations. This sounds | obvious, but it 's extremely often done the wrong way | around -- "Foo : IFoo" shouldn't be muscle memory, and it | should mean "I can't currently think of any other | implementation of IFoo right now, although that might | change", and NOT mean "this interface is the header of | this class, which we have to do or we get yelled at". | | The implementation of an abstraction layer breaks down an | operation in one domain into operations in another | (presumably "lower level") domain. It's possible that | "renew my driver's license" can be reasonably written | directly in a few SQL statements. Okay, that's no big | deal. Most likely you have a few intermediate layers, or | an ORM, or whatever. So the implementation of the "DMV | API" (or whatever is defining that operation) is where | you break it down into INSERT/UPDATE/SELECT or whatever | other lower-level tools you have. | | None of this changes when you start thinking about your | higher-level operations algebraically. All that really | means is that you consider the operations and their | inputs/outputs, and you start asking questions like: are | these idempotent? Associative? Commutative? Do they have | identity? You can ask those questions at the abstraction | level. They are part of the definition of the operation. | It's up to the implementation to make sure those | properties are respected. | | Look for ways to change the operations you offer to be as | "mathematical" as you can. The more mathematical they | are, they more you'll serendipitously come up with new | and interesting and useful ways to use them; the more | reusable they'll be. | | "Renew my driver's license" is idempotent; "order a | pizza" is not. Prefer idempotency. Can we make "order a | pizza" idempotent? Sure. The client generates (or | requests) a unique ID for the order. We have "update an | order", which may actually be a family of operations. We | have "finalize an order" which places it. Finalizing the | same order twice does nothing -- it's idempotent. | | User story: a bunch of guys are sitting around and want | to order from your pizza place, but it's always annoying | to pass the phone around; everyone wants to look at the | menu at once. We want "distributed ordering", so a party | can all contribute what they want to the same order. They | want to see what everyone else is doing in real-time. I | want breadsticks, but there's only 5 of us. If anyone has | ordered breadsticks, we're all good. If we have an "add | breadsticks" operation and poor concurrency, we end up | with 5 breadsticks in our cart; someone has to notice | that and fix it. Not great. | | So what's a better operation than "add breadsticks"? How | about "make sure there are enough breadsticks"? If three | people all say "make sure there are enough breadsticks!" | that's basically the same as one person saying it. That's | an important domain operation. If all you're doing is | thinking "event sourcing", then "add breadsticks" looks | _more_ like a domain event than "make sure there are | enough breadsticks", but the latter is actually easier to | work with and leads to a better experience. You can't | make the jump from "add breadsticks" to "make sure there | are enough breadsticks" just by thinking about event | sourcing -- you get there by thinking about math. | | What happens when someone removes a pizza from the cart | at the same time as someone else is adding pepperoni to | it? This is the "Google Docs" problem. We need to think | about associativity and commutativity to really solve | these problems, not to mention random vs. sequential | identity. Can you undo these operations? If I say "extra | sauce", and someone else says "no, light sauce", we might | have a conflict to resolve. But if I then undo my "set | extra sauce" action -- what happens? It should clearly | result in "light sauce". That's the obviously correct | answer, so we can work back from that to figure out how | the "set sauce amount" operation should work. "User A | requests heavy sauce on pizza AF73" is idempotent and | reversible. "User B requests light sauce on pizza AF73". | "User A revokes their request for heavy sauce on pizza | AF73". Okay, great, we're golden. "User C removes pizza | AF73 from their order." "User D requests pepperoni on | pizza AF73". "User C undoes their operation to remove | pizza AF73 from their order." Great: pizza AF73 has | pepperoni on it, despite the fact that it had been | removed when pepperoni was added. No problem here. | | How do you handle statements like "User A revokes their | request for heavy sauce on pizza AF73?" Event sourcing | would say that this is a distinct event that you have to | INSERT in your append-only log. But why not just DELETE | the initial request? That works just fine. In any case, | the low-level steps are the easy part. | | We're not too far from event sourcing here, but that | emerged from analyzing the problem and trying to make it | as "mathematically pure" as we could. We didn't start | with event sourcing, and we saw another example (add | breadsticks) where the "event sourced" version was worse. | Event sourcing is a trick, but not the goal. The goal is | "domain algebra". | | On the back end, you'll be doing reporting on these | numbers. You want a really flexible reporting system? | There was a buzzword a few years ago, "data cube". It was | a buzzword because nobody could define what it meant, but | after thinking about it for a while, I decided it could | have a useful definition: A data cube is a pairing of a | set of categorical data (pizza sizes, customer types, | order times, whatever) and value data (costs, amounts, | prep times, ratings). Each "value data" must have a | monoid defined over it, which means some binary "add" (or | "combine") method with identity. Any time you have a | monoid defined, you get a (distributed!) "aggregate" | method for free. That's what "roll up" and "drill down" | are, in report-speak: projecting your categorical data | and aggregating all your value data using the defined | monoid. The same kind of analysis applies here, even | though event sourcing has nothing to do with any of this. | | By the way: the identity of "combine" over rating and | prep time are _not_ simply 0! Think about it harder than | that. How do you combine user ratings? Most likely your | value type will need to be able to sensibly represent | "0/0" -- that's a valid and useful value sometimes! | | We didn't get particularly "mathy" operations with our | pizza example, partly because it's a very "end user" | domain and not a reusable mid-level domain like unit | conversions or a physics engine. It's even more important | for those domains. What's a Point minus another Point? | Not a Point! Are "offset" and "rotate" associative? | Commutative? Distributive? Think about these things if | you want a reusable physics engine. | | This is not scripting. This is not even programming. It's | software engineering. | Verdex wrote: | In the past, I have been unimpressed by this paper. Perhaps | someone can shed some historical context... | | But from my perspective what happens is that the paper defines | complexity in exactly the way that allows them to deride OO, FP, | etc programming whilst simultaneously showing how awesome | functional relational programming is. It ignores complexity | that's orthogonal to what FRP addresses and ignores areas in | which FRP itself contributes to unnecessary complexity. | | It feels like a scenario where the authors had something that | they thought was neat and went out to create metrics that would | in fact show that it was neat. Maybe FRP is really neat, but I | feel that the paper itself doesn't contribute to anything because | its logic is so custom and purpose built. | mhsdef wrote: | I think the first half of the paper ("the problem") is much | stronger than the latter half ("the solution"). | bob1029 wrote: | This paper was career changing for me. | | Chapter 9 is effectively the original concept for our business | rules engine today. We use SQLite and C# UDFs as the actual | foundation. | | Using a proper relational model and SQL queries for all the | things means that domain experts can directly contribute. It also | makes it feasible to turn your domain experts into internal | customers of your developers. For B2B products, this can be make | or break. | | Building an actual business around this kind of thing is very | hazardous in my experience. Unless you are _completely_ certain | that you have the schema figured out, this promising foundation | converts to quicksand. | | Using higher normal forms is one way to reduce the consequence of | screwing up domain modeling, but you still really have to know | the relational guts of the business one way or another. | | One design-time trick is to get all stakeholders to compose their | ideal schema in something like excel and have them fill in some | sample data. Showing them an example of one of these for a | different domain is a powerful lesson in my experience. | ZitchDog wrote: | A classic paper with a dream that has yet to be realized. We | continue to bolt state on top of state. Redux bolted on top of | GraphQL on top of Redis on top of Postgres. We can do better. | lucisferre wrote: | It continues to be such a sad state of affairs. We are | constantly reinventing and repackaging well known bad | practices. | | I recall getting publicly lambasted on here for daring to | question the wisdom of an ActiveRecord-like data layer for a | front-end SPA framework. | layer8 wrote: | If only we could change that state of affairs! | | ;) | ZitchDog wrote: | Oh right, I forgot the ORM with "distributed caching" needed | between the database and GraphQL layers. More hidden state. | dcre wrote: | This paper was very influential on me when I first started | programming professionally around 2012. I don't plan on reading | it again, but my vague memory of what I got out of it is pretty | simple and I think has become pretty standard practice at this | point: avoid mutable state and use pure functions where possible. | | The framing of accidental and essential complexity is of course | very useful and not really unique to this paper. The difficulty | is there is nothing but experience-informed judgment that can | tell you which complexity is accidental and which is essential. | There is always a framing of the problem that can justify some | bit of complexity as essential. You have to judge. | feoren wrote: | Plenty of state -- even mutable state -- is essential | complexity. Think of an IDE where a user is typing out code: | the state is changing all the time. Pure functional programming | has a hard time walking the line between "avoiding" mutable | state, and "ignoring" mutable state. If you insist on | functional purity, you're already admitting defeat in the face | of essential mutable state. The walls of functional (and | especially reactive) programming are too high -- they currently | don't play very well with their Turing and von Neumann | neighbors. | | Functional programming is only "mathematically pure" because | _we already have math for it_. You can make mutating state | "mathematically pure" too, if you just invent new math for it. | For example, you can pair state with a "generation" variable | and define mutating functions as returning the new state with | generation + 1; now all your functions are pure again. That's | not all it takes, and it's not easy! But it shows how narrow- | minded the current idea of "purity" is. | cubefox wrote: | I always thought the problem with a "purely" functional view | is much more practical: From my limited Haskell experience, I | gather that you have to use recursion instead of loops, since | loops (and GOTOs) work by iteratively modifying some state, | unlike recursion. But humans think in loops. If you look in a | cookbook for a recipe, it will almost certainly contain loops | and rarely any recursions. | | Recursion programs are provably Turing equivalent to WHILE or | GOTO programs, but that doesn't mean they are equally natural | for our brain to think about. (Not to mention things like | tail recursion, which is even less intuitive.) | dannyobrien wrote: | I definitely don't think in terms of recursion (and I don't | think I've seen it explicitly in a recipe!), but I'm not | sure loops and interation are necessarily that intuitive, | either. I learned programming at an early age and I still | distinctly remember the period when loop constructs "made | sense". | | I can believe that recursion is even less easy to teach | than iteration, but it may be that this is a pedagological | issue, or that we have yet to work out the best way to | convey or document computation. | cubefox wrote: | Many recipes contain WHILE loops. E.g. while the pan is | not full, layer meat sauce, lasagna noodles, and bechamel | sauce. FOR loops (do the x process 10 times, or once for | everything in a list) are also prefectly natural. | Examples for recursion are not as easy to find. This | points to recursion itself being harder for us than | loops. Understanding loops in recipes doesn't require | pedagogical attention. | mpweiher wrote: | > Plenty of state -- even mutable state -- is essential | complexity. | | Arguably most of it! | | After all, computer don't compute. | | https://www.youtube.com/watch?v=EKWGGDXe5MA&t=297s | | _One of the miseries of life is that everyone names | everything a litte bit wrong, and so it makes everything a | little harder to understand in the world than it would be if | it were named differently. A computer does not primarily | compute in the sense of doing arithmetic. Strange. Although | they call them computers, that 's not what they primarily do. | They primarily are filing systems. People in the computer | business say they're not really computers, they are "data | handlers". All right. That's nice. Data handlers would have | been a better name because it gives a better idea of the idea | of a filing system._ | dagss wrote: | Related: The Norwegian word for "computer" is "datamaskin". | Or, "data machine". | mrkeen wrote: | > Think of an IDE where a user is typing out code: the state | is changing all the time. | | That's easy mode. Mutability is often justifiable when one | person is doing one thing to a computer at a time. Now | extrapolate to multiple people editing the same document at | once. Suddenly you're discussing CRDTs and other approaches. | And now implement undo on top of that! | | Likewise with git, or blockchain, or kafka. They're | persistent logs so you can keep your head straight while | figuring out what the current state(s) of the system can be. | Even with git, when you do an in-place mutation (force-push) | there's _still_ an extra log (reflog) behind the scenes | trying to keep the changes sane. | feoren wrote: | I think this is a good example of why "simple CRUD systems" | are actually much more complex than people usually give | them credit for. Anything seems easy if you do it badly. | But multiple people editing a document at once with CRDTs | and undo is still even easier than a basic CRUD application | done properly: now you have multiple people editing | multiple different data types at once! In such cases we | should be thinking about building CRDTs over all the | various operations users may be doing with the data, which | means thinking about associativity and commutativity of | generic data operations. Git only has to deal with merging | text files line by line; CRUD applications have to deal | with merging lots of different data types! | kccqzy wrote: | There is no essential mutable state in computer science. All | essential mutable state can be modeled away to use no mutable | state, as you have shown in the generation number idea which | is one valid way. (I'm strictly talking about computer | science problems not computer engineering problems such as | drivers.) | | The generation number idea you have shown is an excellent | idea. It immediately enables several new capabilities | compared with just mutation: | | * You are now able to determine which data is older and which | is newer without resorting to an external clock. | | * If your mutation takes a long time, there is no longer a | need to use a long-held mutex to ensure thread safety which | would prevent concurrent reads; you can create a new | generation separately and then atomically CAS it. Writers | don't block readers. | | * If you suddenly need undo/redo functionality, it's suddenly | trivial. Or diff'ing between versions. Or understanding the | reason for changes (basics of auditing). | kilgnad wrote: | Lambda calculus machines can't exist in reality though. | Only the von Neumann machine can be actualized. | | Thus from a practical perspective the foundations of actual | computing must be built on mutable state. | | You are wrong about the generations idea. It's a side | effect. Functions themselves cannot actually do anything | with a generation number. It's completely useless other | then for letting you know how many times a value has | changed. | | A generation number also doesn't negate the need for | mutexes. The system is still operating on shared state, a | generation number doesn't change this. | kccqzy wrote: | You are not getting the point of either my comment or the | original article. You are thinking from an engineering | point of view, where if you look through all the layers | of abstraction you see mutexes and shared mutable memory. | That's not my point; my point is about building up those | abstractions so they are sufficiently hidden. Functional | programmers routinely operate on a higher level of | abstraction, and they have shorter, more maintainable | code. In such code, it would be a code smell if you | continue to use mutexes, because that should've been an | implementation detail. It should've been clear from the | outset that the kind of complexity the article is talking | about is cognitive complexity. | layer8 wrote: | > All essential mutable state can be modeled away to use no | mutable state | | However, that modeling introduces complexity of its own | that not everyone agrees is entirely essential. | enlyth wrote: | > There is no essential mutable state in computer science. | | Yes, theoretically. Now imagine your mutable state is 2GB | in size, have fun creating a copy of it on every change. | akimball wrote: | The optimizer removes that. | kccqzy wrote: | You are not getting the point of my comment or the | original article. It's clear to me that the kind of | complexity being talked about in the article is cognitive | complexity not computational complexity. It's not about | choosing between an O(n) copying of data and O(1) in- | place mutation; it's about choosing code that's easy to | comprehend and maintain. | deathanatos wrote: | Their point is that you can't choose the cognitively | simpler choice because of real world design constraints, | such as not consuming all available RAM -- i.e., because | of essential complexity. | pletnes wrote: | Btrfs does stuff like instantaneous snapshots for a multi | terabyte size filesystem. I have no idea how they do it, | but apparently it's 100% possible. | feoren wrote: | Engineering is about tradeoffs. There is no One Model To | Rule Them All. Your post is a great thing to keep in mind, | but it's not a prescription. Engineers need to be able to | look at a problem from many points of views at once, and | try to find the right path for their current problem. This | is why it's so important that models play nicely with one | another, something that functional programming is getting | better at, but reactive programming still really struggles | with. | | All of your asterisked points are well-taken, but: do you | _need_ that capability? Sometimes; sometimes not. | ergeysay wrote: | Not to mention that pure FP completely handwaves away | implicit global state such as heap. | kilgnad wrote: | That's the point. This hand waving allows the user to build | more complex systems. | | The cost is efficiency but make no mistake not thinking | about the heap allows people to build much more complex | programs. It's a trade off between complexity and | efficiency. | akimball wrote: | Au contraire. It is a tradeoff between application | programmer effort and compiler writer effort. | Amortization infers it is better to have an extremely | advanced IR optimizer. | kilgnad wrote: | Then why do zero cost abstractions exist? Languages like | rust and C++ make the trade off on the application side | rather then the compiler side. | cubefox wrote: | Then could it reversely be argued that not worrying about | the stack (tail recursion) allows people to build more | complex programs? Probably depends on which is more | worrisome on average, heap or stack. | kilgnad wrote: | No. Pure functions are more modular. It allows you to treat | your logic like bricks and building blocks. Pure functions | are composable purely off of types. Combinators are actually | the proper term here. | | Mutable state does not do this. Your idea of generations | doesn't make sense, because that means every function must | take a generation as the input parameter too. Given a | different generation number the function must | deterministically create the same output. | | It is not a complete system. Your equations produce this | useless generation number that aren't reused as part of the | system, it's just there for you and meta analysis. | | That is not to say your first paragraph is completely wrong | though. There is a trade off between modularity and | efficiency. Mutating a variable is less resource intensive | then generating a new variable. | feoren wrote: | > Pure functions are more modular. It allows you to treat | your logic like bricks and building blocks. | | Mathematical purity is indeed what allows you to treat your | logic like bricks and building blocks. My point is that "a | monad is a monoid in the category of endofunctors" is not | the only "pure math" out there, and also that your domain | model is likely the most impure part of your whole program. | Functional programming is awesome! But mostly ignores the | existence of essential mutable state, and embracing | functional programming is only a small part of the | "mathematical purity" struggle that is indeed the only way | to build truly reusable building blocks. If you're spending | lots of time building clean, pure, mathematical data | manipulation logic, but the data you're manipulating is | "Dog : Animal" and "Cat : Animal", you're in a garbage | in/garbage out situation. Worry about the mathematical | purity of your data model itself. It will marry and dance | and harmonize with the purity of your functional logic! | | > Mutating a variable is less resource intensive then | generating a new variable. | | Not always. | n0w wrote: | I came across this after seeing relic[0] submitted the other day | and thought it was pretty interesting. | | I've been into CRDTs for a while and have started wondering about | generic mechanisms for distributed data. This lead me to read a | lot more about the Relational Model of data and eventually to the | Event Calculus. | | What's interesting to me is that these things end up feeling a | lot like CRDTs[1] or Event Sourcing. I haven't quite finished | pulling on these threads but the relic link was a timely read | considering! | | I really liked the first half of this paper and the Authors | categorization of complexity. However the second half fell a bit | short for me. It seems they made the same mistake as many other | people (SQL != Relational) and their idea of Feeders and | Observers seems a bit more like an escape hatch than an elegant | method for interfacing with the outside world. | | [0] https://github.com/wotbrew/relic [1] | http://archagon.net/blog/2018/03/24/data-laced-with-history/ | hummus_bae wrote: | [dead] | carapace wrote: | I'm designing a such a system now, based on the pure functional | Joy language with two additional data stores: a relational db | system (Prolog (or maybe Datalog), not SQL) and what is | effectively a git repo although I think of it as a "data oracle". | | The role of the relational db system is explained in TFA. (Prolog | makes a fine Relational Model DB (the "relations" in RMDBs are | the same _logical relations_ that Prolog "relations", um, are.) | The language is cleaner and simpler and more powerful than stock | SQL, there's an ISO standard and several solid implementations, | and you can always back it up with SQLite or PostGRES or whatever | if you need to.) The trick to integrating it with a purely | functional system is to only use "pure and monotonic Prolog code" | which you want to do anyway ( | https://www.metalevel.at/prolog/debugging ) or as I like to say, | "Don't put lies in your database." | | The "data oracle" (which again is more-or-less just a git repo) | provides bytes given a three-tuple of _(hash, offset, length)_. | These are immutable, so you can cache the results of (pure) | computations over them (e.g. a predicate like "is valid UTF-8" | is true/false for all time, yeah?) This replaces the filesystem. | | I was working with Prof. Wirth's Oberon RISC CPU as a basis, but | a couple of days ago a fantastic new 64-bit vm went by here on HN | and I'm going to use that going forward. | https://github.com/maximecb/uvm | https://news.ycombinator.com/item?id=34936729 ___________________________________________________________________ (page generated 2023-02-28 23:01 UTC)