[HN Gopher] Principles of Data Oriented Programming ___________________________________________________________________ Principles of Data Oriented Programming Author : viebel Score : 307 points Date : 2020-10-05 11:59 UTC (11 hours ago) (HTM) web link (blog.klipse.tech) (TXT) w3m dump (blog.klipse.tech) | tgb wrote: | What's the solution to referring to other "objects" in this | setup? For example, an Author has a list of Books that they | wrote. I see several possible ways to do this, but they all seem | to have downsides. | | 1) The author map has a "books" entry that is a list of map | containing the book data. But multiple authors might have written | the same book, so does the book data get copied there? | | 2) The author contains a "books" entry that is a list of Book | IDs, like foreign keys in a database. But how about immutability? | If you want to get the book addressed by the ID, you need to look | it up somewhere and are you guaranteed the object you get from | the lookup is still the same one as it was earlier? | | 3) There is a "Author -> Book" map somewhere that you can pass an | author and it gives you the list of books they wrote. Not sure | about this one. | viebel wrote: | It is going to be the theme of Chapter 3 of my DOP book. Stay | tuned. | mtVessel wrote: | In the third example, is function isProlific missing a parameter? | 02020202 wrote: | simple read. good points. this is very useful in event sourcing | or in general for sensitive data where mutability might become an | issue(ie. financial transactions). also i think this maps to | value object perfectly. | | i have implemented ES into few projects and am now writing a ES | library, since I think it can be done much simpler than my | previous implementations that felt too verbose, and I will take | some pointers from this into account. | viebel wrote: | Could you share an example of how to apply the principles of DO | in the context of event sourcing? | dathinab wrote: | A entity component system (ECS) as used by many games would | probably fall under DO. | | Through I would argue that #5 isn't part of DO. Especially | given that e.g. `{ "a" : "b" }` is _not_ necessary a literal | but potentially an expression (depending on arbitrary | language definition aspects). On the other hand e.g. `vec![ | "a" ]` in rust is definitionally not an literal but wrt. to | the idea behind principle #5 as good as `[ "a" ]` in | JavaScript. | | EDIT: Lastly sometimes contexts using something like a | builder pattern can be the better way to "not verbose | creation" and "data is exploreable in any context". | viebel wrote: | What do you mean when you write that `{ "a" : "b" }` is not | necessary a literal but potentially an expression? | dathinab wrote: | Depending on how a language defines their syntax. | | Often times literals are only things like `"string"`, `0` | and so one. | | But thinks like `[ 1,2]` would be a array expression | where each "entry" is syntax wise an expression (and any | literal is an expression itself). | | I don't know if JavaScript specifically does define | object literals or object expressions, in the end it | depends on what you define a literal as. | | Lastly depending on the language something like `[ 1,2]` | might literally de-sugar to something like following | pseudo code `var tmp = Array.new(capacity=2); | tmp.push(1); temp.push(2)`. | | So a better way would be that a formulation like "data | should be creatable without explicitly doing any function | calls, variable assignments or similar. Creation must not | depend on implicitly captured data". | viebel wrote: | I totally agree with "data should be creatable without | explicitly doing any function calls, variable assignments | or similar." | | Could you explain what you mean by "Creation must not | depend on implicitly captured data"? | dathinab wrote: | Lets say you have a language which has some form of macro | system or similar to make creation of new thinks which | look like literal-like expression easier. | | E.g. instead of `a = [1,2]` you have `a = vec![1,2]` | which de-sugars to `a = Vec::with_capacity(2); a.push(1); | a.push(2);`. | | Now custom data structures can define their own macros | like that, e.g. `skip_list![1,2]`. | | Which would be all fine. But what if now | `bad_skip_list![1,2]` accesses a thread local variable | (or other implicit provided data) and adds that, too? | | Now `bad_skip_list![1,2]` might be not equal to | `bad_skip_list![1,2]` defined somewhere else. Which is | against the ideas behind the rule #5. | | You should be able to copy-past the literal-like creation | of data to any place (e.g. a unit-test) and get the same | result. | | EDIT: If I remember correctly you could override parts of | `Array.prototype` and array construction in JavaScript to | brake the #5 for JavaScript for thinks like `[1,2,3]` but | I'm not to sure about that anymore. | petermcneeley wrote: | I think DOD of games is only tangentially related to this | post. | | https://en.wikipedia.org/wiki/Data-oriented_design | dathinab wrote: | The "DOD of games" is just a "special case" of more | generic Data Orientated design/programming. | | E.g. ECS is a direct consequence of seperating data from | code. | | Furthermore for it to work well you normally also want | #2, #3 and #4. | | Sure you can build a ECS without #2,#3 and #4 but it | makes it more complex. | | Lastly in a ECS you split up components into many parts | each having their own data and you normally want the idea | behind #5 to apply to each of the parts. | | EDIT: Well ok, weather #2 makes any sense at all depends | on the language you use. And using a language where #2 | makes no sense can be a as reasonable choice. I only | would apply #2 IF it makes sense for you language of | choice. | slifin wrote: | This talk kind of alludes to the data driven stuff at the end: | https://youtu.be/vK1DazRK_a0?t=774 | | It's a shame the code examples are just the fp and oop solutions | not how it could look in a data oriented way | joshlemer wrote: | In his refactoring of the JS game, I was with him until about | 80% through his refactor, but in the change he describes from | about 50:50 - 51:00, he's actually changing the meaning of the | game. Instead of printing after each turn, he's running all | turns, then printing them all out at once. This is a very | different behaviour and I think it's glossing over / copping | out from the difficulties of applying FP in side-effectful | codebases. | | I mean in this case, he's transformed the program from running | with a constant amount of memory (only need as much memory as | it takes to run a turn) to having to store the entire history | in memory. So this would be really bad if there were many | turns, or if the program was going to run indefinitely, | responding to user input, etc. | samorozco wrote: | It aint pretty but if it works it works. | piokoch wrote: | Principle #2 always bothers me. | | "Model the data part of the entities of your application using | generic data structures (mostly maps and arrays)." | | and example | | function createAuthorData(firstName, lastName, books) { return | {firstName: firstName, lastName: lastName, books: books}; } | | For a simple, obvious object like "person" it might work, but for | a complicated domain object with many other composed objects this | starts to be a pain. | | I am not feeling ready to memorize all the field names, I prefer | to have an object with documentation for each field. | andi999 wrote: | Usually a map can be fine, but isnt it a maintenance nightmare. | At least if you change an object, the compiler will complain if | an attribute is not found, but this leads to runtime error/ or | bad behaviour | acarabott wrote: | Adding a version field can be useful to avoid this. | Particularly if your entire state is stored in one object. | hakre wrote: | Yes, personally I'm (generally) against version fields, | however in the OPs meaning as I read it, if you add the | version field it breaks the value (the version field | invalidates value comparison for equality) and therefore | will end up adding complexity. This may go contrary towards | the topic, as OP clearly states major goal is to reduce | complexity. | | So adding a value and make it (inherently) incompatible in | the value system breaks the benefits of a couple of the six | points outlined in the OP (given the version field | suggestion). | | Just saying. Your mileage may vary. But again, introducing | version attributes is most of the time (and that is a | warning) _increasing_ complexity. | | One of the articles referred to by the op is [out-of-the- | tar-pit] which is fundamentally about complexity and WTF it | is paradigms, on syntax level and language support. A | version field is a counter on higher level on top of | anything of it (and therefore in the off-topic domain | already to a larger extend) and also ruining any of the | value comparison ability (adding the version field exploits | the value inequality in DO as per OP making it part of the | versioning system) introducing meta-date and IMHO ruining | DO. | | If you need to encapsulate state to take a short-cut, | introduce state. Don't ruin value(s). | | Just my 2 cents.ing any of the value comparison ability | (adding the version field exploits the value inequality in | DO as per OP making it part of the versioning system) | introducing meta-date and IMHO ruining DO (towards its | benefits). | | If you need to encapsulate state to take a short-cut, | introduce state. Don't ruin value(s). | | Just my 2 cents. | | [out-of-the-tar-pit]: | https://raw.githubusercontent.com/papers-we-love/papers- | we-l... Moseley/Marks 2006 | andi999 wrote: | Agree. But then every function using the map has to query | the versiom field? | jayd16 wrote: | You can still make helper methods that get the field data for a | person from the Person class if you really want. Its not the | end of the world but you lose the benefits when you're | operating on a single large object in that style. | | The reason you would use this setup is so you have all your | data in contiguous arrays you can SIMD through quickly. If | you're doing a lot of operations where n is low then its not | that useful (and even detrimental) to orient your data array- | wise. | odshoifsdhfs wrote: | If you use simple data structures without logic, it is | basically what he is saying. Basically `createAuthorData` is a | data structure constructor. There are some advantages by using | _.pick or something in JS land, but these don't work for other | more strictly typed languages. | | Also, I would be careful with using it so much as his example: | | ``` function createAuthorData(firstName, lastName, books) { | return {firstName: firstName, lastName: lastName, books: | books}; } | | function fullName(data) { return data.firstName + " " + | data.lastName; } | | function createArtistData(firstName, lastName, genre) { return | {firstName: firstName, lastName: lastName, genre: genre}; } ``` | | you can also do `fullName({firstName: 'Elmond', not:'really'})` | and have an hard to catch bug (where an actual typesystem would | catch it). | | You can either use a Protocol for this (PersonName) or a base | class (though those are going out of fashion nowadays) | pjmlp wrote: | Or when structure typing is available, type | example = { firstName : string; lastName : string};; | let name x = x.firstName;; name {firstName = "Joe"; | lastName = "User"};; | | Or to make it more specific let name { | firstName : string} = firstName;; | mumblemumble wrote: | > for a complicated domain object with many other composed | objects this starts to be a pain | | It may be that complicated domain objects with many other | composed objects are not compatible with this style of | programming. | | Whether or not that's a good thing could be the subject of an | interesting discussion. For one thing, negative experiences | with the "large complex classes" approach to domain modeling is | a major factor in the backlash against object-oriented | programming. Lately I've been trying to familiarize myself more | with the early literature of OOP, and I'm discovering that even | OOP's early pioneers had already had bad experiences with it, | and would warn against the temptation to do things that way. | OTOH, there's undeniably a certain attractiveness to it, | otherwise it wouldn't be so common. | weavejester wrote: | > _I am not feeling ready to memorize all the field names, I | prefer to have an object with documentation for each field._ | | Using data doesn't preclude documenting fields, particularly if | those fields are assigned a unique name. | | For example, in Clojure you'd namespace the keys, so instead of | "lastName" one might write :author/last-name instead. This | keyword can then be documented or even assigned a type/spec. | | (That said, you can't currently assign a docstring to a keyword | in Clojure without the use of a third party library, or using a | comment or external document. Hopefully a future version of | Clojure will make this part of the core language.) | hakre wrote: | The author argues DO is in favour to reduce complexity, but | also notes on the prices of it and reminds that it's in the eye | of the beholder to decide whether or not it is of benefit. | | Many of the early PHP applications were written in a style | using generic data structures (array that is in PHP) having | functions dealing with all these. | | There is no free lunch. | | If you already have domains you can model in, the strategy to | reduce complexity is perhaps more from the DDD book which looks | like a higher level concept to me and therefore may be more | fitting for higher complexity levels. | | (but again, one for sure can shoot in her own foot with DDD as | well) | | What I like about the DO thing is it maps well on simple REST | APIs sending and receiving JSON text. Quite popular these days | to say the least if not mentioning Serverless. | | A similar trend can also be seen in structural logging. | | These systems are often distributed and complex. | | As it bothers you in your case, I would tend to say, it's | better to stick to a domain if there is one. Across boundaries | of domains, values to emit and receive data can work out very | well though I can imagine. As so often, it depends. | OneGuy123 wrote: | In the context of C# this becomes a real pain. | | The isuee are arrays/lists. | | Because in C# an int[] array or List<int> is a reference. | | So even if you put a int[] in a struct this int[] will NOT be | copied when you assign an instance of this struct to another | instance: you will get a reference share which makes this | annoying since you cannot do a proper deep copy with the language | itself: you are forced to use reference copy instead of deep | copy. | | In C++ this is easy because the types differentiate between | pointer and non-pointer explicitly + you can overload the | assignment operator. | pjmlp wrote: | Unless one uses a struct based array instead. | | https://docs.microsoft.com/en-us/dotnet/csharp/programming-g... | | https://docs.microsoft.com/en-us/dotnet/api/system.memory-1?... | mumblemumble wrote: | To principle #2: | | The author acknowledges this as being incompatible with static | typing, but I'm not so sure. Is it that, or is it that it's | incompatible with contemporary static languages? | | In FP, we already have a concept of type-safe hetergeneous lists | and maps, and even some clever implementations in languages like | Java[1]. The ergonomics are often less-than-stellar, but I'm | pretty sure that's something a new language could fix with some | syntactic sugar. | | There is also the data frame abstraction (like, you see in | Pandas), which is typically implemented on top of dynamic typing, | but major implementations often rely on static typing behind the | scenes to achieve efficiency. There are also projects like | Frameless[2], which implements a statically typed interface over | a dynamically typed dataframe package.[3] I'm guessing, again, | that careful language design could get us something similar, but | with better ergonomics. | | And I'd be happy with that. I've been pulling away from static | languages lately, and a big part of that is that I really like | how some of the dynamic languages let me model my data _as data_. | It 's enough of a complexity saver to feel like a net win, even | at the cost of some performance and static verification. | | [1] For example: https://github.com/palatable/lambda#hlist | | [2] https://github.com/typelevel/frameless | | [3] Which is itself, notably, implemented in a static language. | Spark also has a statically typed version of the API, but its | usage is not recommended for several reasons, one of which is | performance. That's something that all us static typing fans | should really stop to think about for a bit. | fauigerzigerk wrote: | I think a TypeScript style structural type system could work. | meheleventyone wrote: | Be careful because there are perf issues if you are using | parametric polymorphism. Monomorphic functions are preferable | but the real problems occur once you get past the inline | cache's maximum number of 'shapes'. This obviously applies to | regular JS as well. | fauigerzigerk wrote: | Perhaps I misunderstand, but wouldn't that only apply if | you attach functions to objects? I suppose you wouldn't do | that if you follow data oriented programming principles. | meheleventyone wrote: | Nope. It's about the different types passed into a | function parameter. For optimization JS engines look at | the differing shapes of JS objects. Which is determined | by the order and type of each member. | | So: { a: 6, b: 7 } is different to { b: | 7, a: 6 } and { a: "six", b: "seven" } is different to | both. | | This is done so the member can be looked up quickly by | offset. Functions then have an inline cache that stores | the shapes the function has seen. If the function is | called monomorphically it will only ever see one shape | and hit the fastest path. If it is called with up to | three shapes (in V8) it will be pretty quick. Once past | three the cache falls through to a global table and is | dog slow. | | This matters if you are using structural typing as you | are still creating different shapes to be passed into the | same function(s). | fauigerzigerk wrote: | I see. Thanks for explaining. That's very good to know! | adamnemecek wrote: | None of these really apply in this case. In data oriented | programming, the data would not be stored in heterogenous lists | but each element would be stored in a different, homogenous | array and you would ties these together using the same entity | ID. | skohan wrote: | Even with this type of design, it's possible to implement in | a typesafe way. I have seen clever ECS systems accomplish | this | adamnemecek wrote: | Maybe but ECS does in some sense circumvent types. | hansvm wrote: | In some sense. You could build a type system around "has | a <foo>" relationships instead of "is a <foo>" and get | something pretty cohesive out that closely aligns with | ECS. | mumblemumble wrote: | That is not quite what the article is talking about. | | There's a bit of a terminology collision here. What the | article is talking about is not Data-Oriented Design, the | practice of organizing your data for efficient processing. | It's proposing a separate (but not incompatible) concept of | organizing your code for easier maintainability. For example, | the sample code (which appears to be JavaScript) is not doing | ECS at all. It's almost exclusively doing the data modeling | by creating one heterogeneous map per entity. | rlogical wrote: | Great post.. | jmull wrote: | I don't think these principles add up to something useful. It's | not complete and I think some of the principles don't align that | well to the problem space. | | The big one: "Data is immutable". The problem here is that data | _isn 't_ actually immutable (generally) and mutability isn't | actually the problem. The problem is unmanaged references or | other dependencies on the mutable data. The "source of truth" | becomes muddled which creates the problem of how to keep the | various instances in sync (or otherwise handle cases where they | are/become out-of-sync). Immutability is a very useful tool since | you have have any kind of dependency -- direct, indirect, | implied, etc) -- and there's no worry. But you still need a | mechanism to manage mutating data. Maybe it comes out somewhere, | but the principles of DO don't cover it, which is a rather | serious omission. | | Also, principle 2 isn't really a principle. I think what it's | getting at is that you don't really know the precise type of your | data, over time, in a distributed system, so it's good to include | the flexibility to handle that. That makes sense to me. but | generic data structures aren't necessarily always the right way | to handle that. | wiremine wrote: | > Immutability is a very useful tool since you have have any | kind of dependency -- direct, indirect, implied, etc) -- and | there's no worry. But you still need a mechanism to manage | mutating data. | | I agree. For me it feels like immutable data is a tool, not a | universal principle. For example, in low-level C programming or | direct control of a register in an embedded context, immutable | data isn't a principle, it's just another approach. | | What I mean by "universal" is something like SOLID. [1] (Most | people only use SOLID for OOP, but Uncle Bob makes it clear in | Clean Architecture that he thinks SOLID is universal, and not | limited to OOP.) | | That said: it does feel like immutable data should be the | _default_ approach in many situations. But that's really hard | to do in most languages. | | [1] https://www.amazon.com/Clean-Architecture-Craftsmans- | Softwar... | ummonk wrote: | I think the principles do basically summarize data oriented | programming. Particularly, principle 1 is important since it is | the exact opposite of OOP's insistence of encapsulation, and is | possibly the biggest reason DOP works so much better than OOP. | joe_the_user wrote: | Evidence that "DOP works so much better than OOP" is scant | imo. | | The rise and fall of paradigms that present themselves as | panaceas is instructive. You have "structured programming", | "object oriented programming", "functional programming" and | now "data oriented programming". | | What I'd like to see is paradigms paired with "where this | works well" rather than paradigms sold based on "this will | solve the software crisis", "this works better(unqualified by | when)" and "if you're not doing this, you're doing it wrong". | The later two claims leave a bad taste in the mouth of the | casual users/observers, who gradually morph into active | critics and sink the paradigm, wasting the good and useful | parts of each (OOP has a vast universe of haters because it's | most successful of the panacea-paradigms so far and as a | panacea, there's much to hate here but still). | hjntmp wrote: | Immutability is not the fact that something can not change in | this case. It has more to do with the identity of a value. | Every time you change anything inside the data structure you | get a new reference, that is it! | hjntmp wrote: | Think of it as versioning for your data. So instead of | referring to some data structure by its general concept with | immutable data structures we are a lot more specific with | respect to the identity represented by the name, here the | name represents a version. | | This is useful because, for example, you stop having the | "unmanaged references" you were talking about because now | since you are pointing to a version of the data and not the | data itself you can be sure of what you are talking about. | hjntmp wrote: | Also remember here the promise is that the data you are | talking about wont suddenly change underneath not that the | reference 'is up to date". | | It is not a solution for change in time it is a solution | for taking change in time out of the equation when we don't | need to talk about that. With immutable ds when we talk | about data we are just talking about exactly that and time | is taken out of the picture. Its called immutable because | now you are talking about facts and not the representation | in time of those facts. Because now we are talking in | versions so it does not make sense. The thing is this data | is a snapshot so you are not guaranteed to have the most up | to date snapshot. And that's ok because precisely here we | take change in time out of the question in order to be able | to talk more precisely about the data. Tracking change in | time is another story. | | So for example you could have things like react. There you | have snapshots of the world updating. When you talk about | the data it is immutable but then you change it and update | the mutable variable where you are keeping change in time. | lacrimacida wrote: | > I don't think these principles add up to something useful. | It's not complete and I think some of the principles don't | align that well to the problem space. | | These are are part of a book that is being written at the | moment as far as I can tell: | | "This article is an excerpt from my upcoming book about Data | Oriented Programming. The book will be published by Manning, | once it is completed (hopefully in 2021). " | adamkl wrote: | Honestly, I think this series of blog posts could have had a | more complete title: "Principals of Data Oriented Programming | _(aka, Idiomatic Clojure)_ " | | All of these ideas (and I think they are good ones) are | inspired heavily by Rich Hickey's talks and rational behind | developing the Clojure language (the author of the post states | as much). And while you _can_ use these techniques in other | languages /paradigms/problem domains, they are really intended | to work well inside the constructs of Clojure, and when applied | to "information-driven situated programs" [0] (read business | applications with dynamic requirements). | | As for some of the short-comings you mentioned: | | _" But you still need a mechanism to manage mutating data"_ | | Clojure supports this through the use of locking constructs | like atoms. [1] | | _" I think what it's getting at is that you don't really know | the precise type of your data, over time, in a distributed | system, so it's good to include the flexibility to handle that. | That makes sense to me. but generic data structures aren't | necessarily always the right way to handle that."_ | | Clojure attempts to bridge the gap between generic data- | structures and strongly-typed constructs using run-time | specifications. [2] | | I mean, the ideas presented here can be generally useful, but | your mileage may vary if the principals take you too far out of | the idiomatic for your particular language/paradigm/problem | domain. If that's the case, you could find yourself wasting | energy swimming up stream. | | [0] - https://www.youtube.com/watch?v=2V1FtfBDsLU [1] - | https://clojure.org/reference/atoms [2] - | https://clojure.org/about/spec | [deleted] | dnautics wrote: | I think "data are immutable" should be qualified - usually it | means "data are immutable through your _codepaths_ " and if you | are mutating data, you need explicit checkouts and checkins of | the data. | | This is the basic principle behind how sane database | transactions work. | | ORMs in some languages can be especially dangerous if they | overload the getter/setters of the object in such a way that | the checkins and checkouts are obscured; you could be passing | your object to a function or method that expects to mutate a | polymorphic class[0] that is usually a traditional "shared | memory" form of objects, well hopefully you can imagine the | chaos, redundant database transactions, consistency problems, | failure modes, uncaught exceptions, etc. that are going to be a | nightmare to debug. | | [0] worse yet, imagine if it's someone else's code and they | change the api from not mutating to mutating for performance | reasons. Will you notice the documentation change or the | changelog? It's bad enough in the case when it's not an ORM and | just a mutable object. | ajuc wrote: | You realize data is immutable when you first try to implement | history. | | Mutability is just a hack to save some memory. | rellekio wrote: | Your right about the sync and dependency issue. Big reason why | in JS land you treat each mutation of the data as immutable | destruction and refresh of an object is to eliminate any old | references that might not of been cleaned up by the garbage | collector. | | Data Oriented Programmming is both old and new. In that it does | not have the same amount of programming patterns that OOP has. | As it is a more bare metal means of programming without a ton | of abstraction to ease most programmers into it. | | Where I find the idea interesting is concurrent and parallel | processes are more natural in the data oriented. And that is | through immutablility and ownership as first principles. | phendrenad2 wrote: | This seems to be describing a style of programming, and you'll | have to take these principles in the spirit that they were | intended, i.e. in the context of that programming style. I | recognize this style as a common style in OOP/FP languages such | as Scala, where it's common to pass around immutable Plain Old | Scala Objects (made from lists and hashmaps). | curyous wrote: | This is not Data Oriented Design, far from it. I find it | confusing that the term use here is "Data Oriented Programming". | fpoling wrote: | The examples from the first article about code reuse demonstrates | power of row polymorphism. But in a statically typed language | that requires a rather advanced type system that allows to | declare explicitly or implicitly that code works for any struct | that contains the given fields. In C++ one can use templates, but | that trivially leads to unmaintainable code. | disaster01 wrote: | Clojure-inspired concepts? An upvote from me, despite the hide- | out in other languages :) | jll29 wrote: | It's a good idea to write a book about a data-oriented | development style (I'm working on a methodology, a way of working | in data-intensive projects called Data2Value). | | However, JavaScript or Clojure are not ideal for demonstrating | this methodology in the sense that industrial applications will | more likely be built in C++, Java or Python. For example C++ and | Java support Apache UIMA, which is an industry standard for data- | oriented systems. UIMA (originally developed at IBM before open | sourced, and use in their Watson system) manages data as | immutable objects (e.g. text, videos) that are enriched with | annotations (e.g. syntax graphs, topical tags, subtitles...). | | Functional designs are often well-suited to data flow related | processing, whereas in OOP, you end up with a pipeline object and | various DataStream objects that it inputs and outputs. | | In my experience, data-intensive systems often need to: - cater | for distributed processing due to large-scale (which calls for | Apache Spark, and then PySpark or Scala); - compute-intensive | work like machine learning, which may require GPUs or other | bespoke hardware (Tensorflow supports GPUs, Google's cloud has | TPUs); and - special purpose data structures (e.g. Bloom filters, | huge persistent graphs, R* trees...) specific to the nature of | the processing (this latter point, I guess, contradicts to the | author's claims). | dragandj wrote: | FYI Clojure supports GPU processing since 5 years ago. | | https://github.com/uncomplicate | thom wrote: | Data oriented programming is orthogonal to the type of biggish | data tools you're talking about, although I agree that in the | present day, the latter is actually a more interesting subject | for methodological discussion. | [deleted] | cortexio wrote: | This is stupid. Im only in chapter 1 and there are major flaws. | | eg. | | function fullName(data) { return data.firstName + " " + | data.lastName; } | | This somehow suggests that firstname and lastname are always | stored under the exact same property names and exact same depth | of the object. | | In practice this will just result in coders making duplicated | functions with different names. | | eg: fullNameUsing_firstName_lastName , fullNameFromCustomer , | fullNameContactPerson | Joker_vD wrote: | You know, sometimes I wish the programming languages had a better | distinction between "immutable data pieces" and "stateful | agents". Sometimes it's really nice to have a simple struct for | which you (or anyone else) can write many function to act upon. | Sometimes it's really nice to have an opaque "object" with | methods to yank described in some public interface, but which | encapsulates loads of state and other objects inside it so you | don't have to think about all of that. | omginternets wrote: | This is probably an imperfect solution, but isn't this kind of | what Actors provide? Syntax and concurrency aside, passing a | message is equivalent to calling a method. | | The biggest annoyance I can foresee is discoverability. Object- | orinented classes make it pretty clear what a given object's | interface is (just look at its public methods). This is not | true of actors in the languages I've used (not many), but in | theory it should be a fairly trivial question of syntax. | discreteevent wrote: | The "stateful agents" is what OOP is all about. The objects are | interpreters. I referenced a discussion elsewhere in this | thread between Alan Kay and Rich Hickey where Kay talks about | the value of interpreters. Rich Hickey talks about data/values | coming from a seismometer or an IoT device but what he misses | there I think is that it's not all just "data". There are also | stateful agents. The seismometer (or the earthquake) is a | stateful agent and the data is messages that are sent by it. If | we had to model a seismometer or iot device (e.g. a smoke | alarm) it would be best to do so using an object that | encapsulates it's state and manages itself. It only | communicates to the outside world with messages/data (in this | case a sound when the temperature exceeds some internal state). | I can replace my smoke alarm with another and I don't need to | understand anything about its internal data or how it | interprets it. | | But for dealing with the historical data from an IoT device it | may make sense to use a data oriented/functional approach. That | time series data is not stateful, it's an immutable history of | something stateful. Functions/transformations work best there | usually. | Joker_vD wrote: | Right. And the problem is, quite often those "data/values" | turn out to be objects too, and for no good reason. Take the | whole Active Records approach, for example, re-implemented in | Java in the most straightforward way possible: you have a | class with data fields (well, with getters/setters) but it | also has "Save()/Load()" methods on it. Ugh. | identity0 wrote: | This is exactly the reason C++ was created. It has structs and | classes. In theory, structs and classes are exactly the same, | but conventionally, a struct represents some kind of data | object while a class represents something with encapsulation | and state. | dudul wrote: | So like an Erlang map and a GenServer? | mdm12 wrote: | C# will be getting records in .NET 5, which would be analogous | to what you refer as 'immutable data pieces', in addition to | the normal objects the language has always had. | https://devblogs.microsoft.com/dotnet/announcing-net-5-0-rc-... | arximboldi wrote: | Very interesting! | | In the context of C++ some of us have been calling these | programming/design principles "Value Oriented Design". Some talks | on the topic: | | - Most Valuable Values (Juan Pedro Bolivar) | https://www.youtube.com/watch?v=_oBx_NbLghY | | - Squaring the circle, value oriented design in an object | oriented system (Juanpe) | https://www.youtube.com/watch?v=e2-FRFEx8CA | | - Objects vs Values: Value Oriented Programming in an Object | Oriented World (Tony van Eerd) | https://www.youtube.com/watch?v=2JGH_SWURrI | phaedrus wrote: | Something I've found applying Value Oriented Design to C++ is | that it often leads to freeing designs of arbitrary limitations | that weren't clear beforehand but in hindsight you realize, | "well, of course I should also be able to use it this (other) | way." | | For example I'm porting a parser engine, implemented in another | language, to C++ and it wasn't clear what the hierarchy of | objects should be for the purpose of RAII (because there are | circular dependencies) (the original implementation language | was garbage collected). The original implementation loaded a | grammar file and directly created objects with pointers | (originally, references) to other objects. | | I introduced an intermediate layer where the file is first read | into data structs which hold the integer values from the file. | So instead of objects pointing to objects, the links are | implicit because of things having the same index. The | representation of the loaded grammar file is now copyable, | moveable, immutable, etc. A side effect is it's trivial to tell | whether the file loading code is correct or not when the result | is just the same data with structure applied to it, rather than | having the added dimension of determining whether a graph of | objects are correctly relating to _each other_. | | Then I construct the actual parser engine objects from the data | representation structs. True that didn't in itself solve the | RAII-hierarchy problem, but what it did do is make it easier to | isolate that problem to just the domain of _how the objects are | used_ and not commingling that with the problem of _how the | file is loaded_. | | The epiphany I spoke of is that after this refactor, it became | clear: the file is arbitrary. For testing, or for use of the | parser with a grammar which does not change, I could dispense | with the file load step and just encode the grammar directly in | value-structs. | | Why I think this is significant is that "the way I was trained" | to think of making code like this unit-testable is to mock the | file reading interface. That's a lot of work for something | that's only necessary because of an over-emphasis on objects | and _behaviors_ instead of thinking about data and _values_. | gpderetta wrote: | Sean Parent has also written a lot on the topic. | arximboldi wrote: | Yes! Also the book Elements of Programming by Stepanov has a | lot of "value orientation" in it. | gpderetta wrote: | Unsurprising as I believe Sean is a "disciple" of Stepanov | (I think they worked together at Adobe). | accountLost wrote: | See also https://matt.diephouse.com/2018/08/value-oriented- | programmin... for a very quick introduction. | gpderetta wrote: | Unsurprisingly David Abrahams, before moving to Apple, was an | extremely influential member of the C++ Boost community, | where value based programming is praticed extensively. | nafey wrote: | Whats the general opinion for using Maps instead of | Structs/Simple Objects for the data containers in languages that | allow for either? | aidenn0 wrote: | I think for untyped languages like clojure, you lose nothing. | For typed languages that don't support parametric polymorphism, | you trade type safety for flexibility and code simplicity. For | typed languages that support parametric polymorphism, I don't | see the advantage. | yen223 wrote: | If you know ahead of time the shape of the data you're dealing | with, you might as well use structs and reap the benefits of | type safety and improved performance | kaliszad wrote: | We certainly use some of these principles in OrgPad.com and some | of those inspired even the User Experience in a fundamental way. | E.g. a bullet point list in a linear medium such as a text or a | slide in a presentation is like a star in a graph, where all | children have the same weight. The thing is, when people see it | like that graphically, they sometimes get ideas they wouldn't | have, if they stared at a long text. Sometimes they figure out, | that actually the points are not equal weighted or that there | isn't such a clear boundary and connect some of these children | together either by a link or by selecting the same colour to | group them. | | Btw. we program everything in Clojure + ClojureScript so | immutability and the other points is like preaching to the choir. | | Not related, I thought Manning will not publish the book. At | least that is the last information I have seen a few days ago. I | thought about buying that book. | pjmlp wrote: | > One could argue that the complexity of the system where code | and data are mixed is due to a bad design and data an experienced | OO developer would have designed a simpler system, leveraging | smart design patterns. | | Indeed he (or she) would have made use of | traits/protocols/categories/whatever, to separate behavior from | data, while keeping the design extensible (via polymorphism). | | This is something I usually find in OOP critics, too much focus | on class driven implementations, without spending too much on the | other parts of the toolbox. | danielscrubs wrote: | The other parts exist in one form or another in non oop | languages too. Heck, polymorphism is part of type theory, | traits exists in Ocaml and Haskell... But arguing what and what | isn't oop isn't that productive as no one will agree to any | definition. That's why you'll get gut responses about lasagna | code where the layering glue is more complex and of bigger | proportions than the algorithm itself... | pjmlp wrote: | Which is why I find a complete waste of time the whole OOP vs | FP vs ECS vs ADT discussions, instead of embracing all ideas | as part of multi-paradigm toolbox that most mainstream | languages actually are. | danielscrubs wrote: | To a point I agree, but formal verification is a thing and | wouldn't handle a normal mainstream language as we know it. | discreteevent wrote: | Alan Kay and Rich Hickey discuss the merits of data vs data | with interpeters: | | https://news.ycombinator.com/item?id=11945722 | ArtWomb wrote: | >>> Model entities with generic data structures | | Which breeds data oriented "anti-patterns" when i/o performance | becomes the bottleneck. Focus on hardware. It's almost like you | need to work backwards to build scalable algorithms for modern | data loads ;) | | Scalable Machine Learning & Graph Mining via Virtual Memory | | http://poloclub.gatech.edu/mmap/ | mumblemumble wrote: | This reminds me a bit of some of the ideas that Eric Normand | presents in his book, _Grokking Simplicity_. | | Which I'd highly recommend. It's aimed at a less experienced | audience, so, as someone who's mid-career, I admit I did skim | some sections. But, all-in-all, I enjoyed reading his take on how | things should be done. | viebel wrote: | I know that there some common topics with Eric's books. Do you | think Eric focuses as much as I do about data? | mumblemumble wrote: | No, he's much more focused on what he calls actions and | calculations. | | Perhaps overly so. I'd have liked a bit more focus on data. | That could be a by-product of his Clojurist roots. Data- | oriented programming is so integral to Clojure's culture that | I'm not sure Clojurists even realize they're doing it half | the time. | hjntmp wrote: | We should rename object oriented programing to bureaucrat | oriented programming. I always think that the reasoning that led | to develop the aberration of OO is the same that creates | bureaucratic nonsense.The want of making people replaceable | through bureaucracy so that the programmer as a human being can | be removed from the picture plus all the other bureaucratic | thinking nonsense leaked to the design of the language. Its funy | how ridiculous we are, pretending something all the tme. | tomowl wrote: | Very interesting as an introduction, I think this principles | should be easy to follow using something like Rust | ChicagoDave wrote: | Wouldn't this be the exact opposite of Domain-driven design and | modeling behavior? | | Isn't the behavior of a system more critical than its data | elements? | Jarwain wrote: | Reminds me of the quote: > Show me your flowcharts and conceal | your tables, and I shall continue to be mystified. Show me your | tables, and I won't usually need your flowcharts; they'll be | obvious. -- Fred Brooks, The Mythical Man Month (1975) | | I don't think they're inherently contradictory. You can have | plain data objects representing the domain, and functions that | act on these objects representing the behaviors/actions in the | domain. You could include these functions as part of the | "class" for these objects, and have them return new instances | of the class to maintain immutability. | tome wrote: | Cheekily resubmitted, I see! Not that I mind. I think it's a | great idea that deserves sharing. | | https://news.ycombinator.com/item?id=24682380#24685657 | kensai wrote: | How did it open a second HN thread? | dllthomas wrote: | The URLs differ. This one includes "?essence". | akst wrote: | Is it me or does anyone else confuse the term "Data Oriented | Programming" with "Data Oriented Design" [1]? | | [1]: https://youtu.be/rX0ItVEVjHc | anaphor wrote: | You're not the only one. That's what first came to my mind when | I saw this. | conceptoriented wrote: | _> Data never changes, but we have the possibility to create a | new version of the data._ | | Well, it depends on what you mean by data. To avoid ambiguity it | is better to talk about data _values_ and data _objects_ which | have different properties. This can be formalized as follows [1]: | | o data values are modelled via mathematical tuples - tuples are | immutable | | o data objects are modelled via mathematical functions (one field | is a function from this reference to the field value) - functions | are supposed to be mutable | | (In reality of course we meet quite different situations, for | example, struct is mutable and objects can be immutable.) | | [1] Concept-oriented model: Modeling and processing data using | functions | https://www.researchgate.net/publication/337336089_Concept-o... | shwestrick wrote: | What do you mean by "functions are supposed to be mutable"? | | Perhaps you are just pointing out that the output of the | function (and therefore the value of the field...?) will change | as the input changes? | | If mathematical tuples are immutable, then surely mathematical | functions are immutable as well ;) | prostodata wrote: | Here is one possible implementation of the concept-oriented | model of data for data processing. It heavily relies on | functions and operations with functions and is an alternative | to purely set-oriented approaches like map-reduce or join- | groupby (sql): | | https://github.com/prostodata/prosto - Functions matter! | conceptoriented wrote: | Function is a mapping between two sets (of values). This | mapping between values is mutable although the values are | not. | louthy wrote: | Functions are a mapping between a domain and a codomain, | the mapping absolutely isn't mutable, the definition of the | function is the relationship between the domains. | | If I have a function: int Add1(int x) => | x + 1 | | I would expect the domain and codomain to be immutable; I | would also expect that x+1 to not turn in x/2 randomly also | conceptoriented wrote: | _> the mapping absolutely isn't mutable_ | | Assume f: X -> Y. We can now map x_1 to y_1 f(x_1)=y_1. | And then change this same function by mapping x_1 to y_2: | f(x_1)=y_2. Thus we can easily modify functions. | Moreover, we do it constantly when we modify object | fields in OOP. It is probably easier to comprehend if a | function is represented as a table which we modify. | | In contrast, we cannot modify data values (mathematical | tuples). Say, x=42+1 means that a new value 43 is created | rather than the existing value 42 is modified. | | _> I would expect the domain and codomain to be | immutable;_ | | No. Domains, codomains and any set can well be modified | by adding or removing tuples. What is immutable are | values (in the sets). | louthy wrote: | > Assume f: X -> Y. We can now map x_1 to y_1 f(x_1)=y_1. | And then change this same function by mapping x_1 to y_2: | f(x_1)=y_2 | | They would be different functions, the first being the | identity function: x => x, the second being: x => x + 1 | | > Thus we can easily modify functions. Moreover, we do it | constantly when we modify object fields in OOP | | This isn't the case. A field with a different value in it | just means the object is a different value. If the object | is passed to a static function, then the domain is the | full set of possible values that the object can hold | (this is known as a product-type, you multiply the total | possible values of each of its component parts to find | out the size of the domain). | | If it's passed to a method then there's an additional | implicit argument: `this`, which is the same as a static | function with an additional argument that takes the | object. The function is the same. | | Global (or even free variables) should also be considered | part of the domain: i.e. it's akin to implicit arguments | that are being passed to the function. | | > No. Domains, codomains and any set can well be modified | by adding or removing tuples. | | This also isn't the case. If a function is defined that | takes an integer and returns a boolean value: Int - Bool | then the domain is the set of integers, the co-domain is | True and False. You can't pass a tuple to a function that | takes an Int and therefore dynamically increase the size | of the domain. Even in dynamic languages the codomain is | effectively `top`, the type that holds all values, and | therefore the domain is all values and the codomain is | all values, which makes them immutable still. | | Now maybe I am misunderstanding you, but this is how all | of the mainstream statically and dynamically typed | languages work. Perhaps there's some edge-case language | that I'm missing here that allows types to be extended, | which would be interesting in its own right. | kingdomcome50 wrote: | Can you expand upon this? Perhaps the difference between | "re-mapping" the function: f(x_1)=y_2 | | and "re-mapping" the value: x=42+2 | | How is the former different than the latter? And by what | mechanism is the former achieved? I understand what you | are saying, but _how_ does one simply "change this same | function"? Redefine it? | | To be clear, I'm not suggesting you are incorrect. I just | don't fully understand what you are getting at. | dpc_pw wrote: | So... is Data Oriented something new? | | Storing fields in a map leads me to believe this is not Data | Oriented Design (DOD). And I completely reject this idea (fields | in maps). The "flexibility" there is hardly useful, and could be | achieved with defined shapes (types) in modern statically typed | languages without all the dowsides. | | "Separate code from data" is a big core belief I share with this | article, but the rest doesn't seem good idea / novel / important. | tabtab wrote: | I've always liked the idea of "table oriented programming" where | more detailed schema info is used to do most of the CRUD and UI | work. In my experiments, the tricky part is exceptions to the | patterns. You always need to be able to tweak things imperatively | (via code). But the attributes can still do roughly 90% of the | job. | | My latest approach to get enough tweakability is what I | tentatively call "fractal rendering events" or "staged | rendering". When rendering HTML or SQL, you need event "hooks" | for the different stages. Level 1 events may override/alter field | attributes. Level 2 events may override/alter the HTML (or sql) | generated for the field based on the Level 1 values. Level 3 | events may override/alter the HTML of page sections (or entire | SQL clauses). Level 4 is overriding/altering the entire page (or | final sql statement). | | In other words, the schema provides drafts, which can then be | adjusted along the way through event hooks. The granularity of | what's tweaked goes up with each stage. | | But managing that many potential events needs something more | powerful than a file-based system. It may be better to manage | such source-code in an RDBMS so you can search, sort, and group | by different factors at different times rather than hard-wire in | one viewpoint as file systems do. | | But current IDE's are not ready for this. I do believe it's the | future, though. File trees are too limiting. | | Consider this: it's common for a non-coding analyst to want to | change a field label, page title, max field length, or "required" | status. If they could do it in the schema info (data dictionary), | then they don't have to involve the coders. Whether the data | dictionary is referenced directly or generates scaffolded code is | a stack-specific or shop-specific choice. Minor things like this | shouldn't involve a lot of effort. ___________________________________________________________________ (page generated 2020-10-05 23:01 UTC)