[HN Gopher] Principles of Data Oriented Programming
       ___________________________________________________________________
        
       Principles of Data Oriented Programming
        
       Author : viebel
       Score  : 307 points
       Date   : 2020-10-05 11:59 UTC (11 hours ago)
        
 (HTM) web link (blog.klipse.tech)
 (TXT) w3m dump (blog.klipse.tech)
        
       | tgb wrote:
       | What's the solution to referring to other "objects" in this
       | setup? For example, an Author has a list of Books that they
       | wrote. I see several possible ways to do this, but they all seem
       | to have downsides.
       | 
       | 1) The author map has a "books" entry that is a list of map
       | containing the book data. But multiple authors might have written
       | the same book, so does the book data get copied there?
       | 
       | 2) The author contains a "books" entry that is a list of Book
       | IDs, like foreign keys in a database. But how about immutability?
       | If you want to get the book addressed by the ID, you need to look
       | it up somewhere and are you guaranteed the object you get from
       | the lookup is still the same one as it was earlier?
       | 
       | 3) There is a "Author -> Book" map somewhere that you can pass an
       | author and it gives you the list of books they wrote. Not sure
       | about this one.
        
         | viebel wrote:
         | It is going to be the theme of Chapter 3 of my DOP book. Stay
         | tuned.
        
       | mtVessel wrote:
       | In the third example, is function isProlific missing a parameter?
        
       | 02020202 wrote:
       | simple read. good points. this is very useful in event sourcing
       | or in general for sensitive data where mutability might become an
       | issue(ie. financial transactions). also i think this maps to
       | value object perfectly.
       | 
       | i have implemented ES into few projects and am now writing a ES
       | library, since I think it can be done much simpler than my
       | previous implementations that felt too verbose, and I will take
       | some pointers from this into account.
        
         | viebel wrote:
         | Could you share an example of how to apply the principles of DO
         | in the context of event sourcing?
        
           | dathinab wrote:
           | A entity component system (ECS) as used by many games would
           | probably fall under DO.
           | 
           | Through I would argue that #5 isn't part of DO. Especially
           | given that e.g. `{ "a" : "b" }` is _not_ necessary a literal
           | but potentially an expression (depending on arbitrary
           | language definition aspects). On the other hand e.g. `vec![
           | "a" ]` in rust is definitionally not an literal but wrt. to
           | the idea behind principle #5 as good as `[ "a" ]` in
           | JavaScript.
           | 
           | EDIT: Lastly sometimes contexts using something like a
           | builder pattern can be the better way to "not verbose
           | creation" and "data is exploreable in any context".
        
             | viebel wrote:
             | What do you mean when you write that `{ "a" : "b" }` is not
             | necessary a literal but potentially an expression?
        
               | dathinab wrote:
               | Depending on how a language defines their syntax.
               | 
               | Often times literals are only things like `"string"`, `0`
               | and so one.
               | 
               | But thinks like `[ 1,2]` would be a array expression
               | where each "entry" is syntax wise an expression (and any
               | literal is an expression itself).
               | 
               | I don't know if JavaScript specifically does define
               | object literals or object expressions, in the end it
               | depends on what you define a literal as.
               | 
               | Lastly depending on the language something like `[ 1,2]`
               | might literally de-sugar to something like following
               | pseudo code `var tmp = Array.new(capacity=2);
               | tmp.push(1); temp.push(2)`.
               | 
               | So a better way would be that a formulation like "data
               | should be creatable without explicitly doing any function
               | calls, variable assignments or similar. Creation must not
               | depend on implicitly captured data".
        
               | viebel wrote:
               | I totally agree with "data should be creatable without
               | explicitly doing any function calls, variable assignments
               | or similar."
               | 
               | Could you explain what you mean by "Creation must not
               | depend on implicitly captured data"?
        
               | dathinab wrote:
               | Lets say you have a language which has some form of macro
               | system or similar to make creation of new thinks which
               | look like literal-like expression easier.
               | 
               | E.g. instead of `a = [1,2]` you have `a = vec![1,2]`
               | which de-sugars to `a = Vec::with_capacity(2); a.push(1);
               | a.push(2);`.
               | 
               | Now custom data structures can define their own macros
               | like that, e.g. `skip_list![1,2]`.
               | 
               | Which would be all fine. But what if now
               | `bad_skip_list![1,2]` accesses a thread local variable
               | (or other implicit provided data) and adds that, too?
               | 
               | Now `bad_skip_list![1,2]` might be not equal to
               | `bad_skip_list![1,2]` defined somewhere else. Which is
               | against the ideas behind the rule #5.
               | 
               | You should be able to copy-past the literal-like creation
               | of data to any place (e.g. a unit-test) and get the same
               | result.
               | 
               | EDIT: If I remember correctly you could override parts of
               | `Array.prototype` and array construction in JavaScript to
               | brake the #5 for JavaScript for thinks like `[1,2,3]` but
               | I'm not to sure about that anymore.
        
             | petermcneeley wrote:
             | I think DOD of games is only tangentially related to this
             | post.
             | 
             | https://en.wikipedia.org/wiki/Data-oriented_design
        
               | dathinab wrote:
               | The "DOD of games" is just a "special case" of more
               | generic Data Orientated design/programming.
               | 
               | E.g. ECS is a direct consequence of seperating data from
               | code.
               | 
               | Furthermore for it to work well you normally also want
               | #2, #3 and #4.
               | 
               | Sure you can build a ECS without #2,#3 and #4 but it
               | makes it more complex.
               | 
               | Lastly in a ECS you split up components into many parts
               | each having their own data and you normally want the idea
               | behind #5 to apply to each of the parts.
               | 
               | EDIT: Well ok, weather #2 makes any sense at all depends
               | on the language you use. And using a language where #2
               | makes no sense can be a as reasonable choice. I only
               | would apply #2 IF it makes sense for you language of
               | choice.
        
       | slifin wrote:
       | This talk kind of alludes to the data driven stuff at the end:
       | https://youtu.be/vK1DazRK_a0?t=774
       | 
       | It's a shame the code examples are just the fp and oop solutions
       | not how it could look in a data oriented way
        
         | joshlemer wrote:
         | In his refactoring of the JS game, I was with him until about
         | 80% through his refactor, but in the change he describes from
         | about 50:50 - 51:00, he's actually changing the meaning of the
         | game. Instead of printing after each turn, he's running all
         | turns, then printing them all out at once. This is a very
         | different behaviour and I think it's glossing over / copping
         | out from the difficulties of applying FP in side-effectful
         | codebases.
         | 
         | I mean in this case, he's transformed the program from running
         | with a constant amount of memory (only need as much memory as
         | it takes to run a turn) to having to store the entire history
         | in memory. So this would be really bad if there were many
         | turns, or if the program was going to run indefinitely,
         | responding to user input, etc.
        
       | samorozco wrote:
       | It aint pretty but if it works it works.
        
       | piokoch wrote:
       | Principle #2 always bothers me.
       | 
       | "Model the data part of the entities of your application using
       | generic data structures (mostly maps and arrays)."
       | 
       | and example
       | 
       | function createAuthorData(firstName, lastName, books) { return
       | {firstName: firstName, lastName: lastName, books: books}; }
       | 
       | For a simple, obvious object like "person" it might work, but for
       | a complicated domain object with many other composed objects this
       | starts to be a pain.
       | 
       | I am not feeling ready to memorize all the field names, I prefer
       | to have an object with documentation for each field.
        
         | andi999 wrote:
         | Usually a map can be fine, but isnt it a maintenance nightmare.
         | At least if you change an object, the compiler will complain if
         | an attribute is not found, but this leads to runtime error/ or
         | bad behaviour
        
           | acarabott wrote:
           | Adding a version field can be useful to avoid this.
           | Particularly if your entire state is stored in one object.
        
             | hakre wrote:
             | Yes, personally I'm (generally) against version fields,
             | however in the OPs meaning as I read it, if you add the
             | version field it breaks the value (the version field
             | invalidates value comparison for equality) and therefore
             | will end up adding complexity. This may go contrary towards
             | the topic, as OP clearly states major goal is to reduce
             | complexity.
             | 
             | So adding a value and make it (inherently) incompatible in
             | the value system breaks the benefits of a couple of the six
             | points outlined in the OP (given the version field
             | suggestion).
             | 
             | Just saying. Your mileage may vary. But again, introducing
             | version attributes is most of the time (and that is a
             | warning) _increasing_ complexity.
             | 
             | One of the articles referred to by the op is [out-of-the-
             | tar-pit] which is fundamentally about complexity and WTF it
             | is paradigms, on syntax level and language support. A
             | version field is a counter on higher level on top of
             | anything of it (and therefore in the off-topic domain
             | already to a larger extend) and also ruining any of the
             | value comparison ability (adding the version field exploits
             | the value inequality in DO as per OP making it part of the
             | versioning system) introducing meta-date and IMHO ruining
             | DO.
             | 
             | If you need to encapsulate state to take a short-cut,
             | introduce state. Don't ruin value(s).
             | 
             | Just my 2 cents.ing any of the value comparison ability
             | (adding the version field exploits the value inequality in
             | DO as per OP making it part of the versioning system)
             | introducing meta-date and IMHO ruining DO (towards its
             | benefits).
             | 
             | If you need to encapsulate state to take a short-cut,
             | introduce state. Don't ruin value(s).
             | 
             | Just my 2 cents.
             | 
             | [out-of-the-tar-pit]:
             | https://raw.githubusercontent.com/papers-we-love/papers-
             | we-l... Moseley/Marks 2006
        
             | andi999 wrote:
             | Agree. But then every function using the map has to query
             | the versiom field?
        
         | jayd16 wrote:
         | You can still make helper methods that get the field data for a
         | person from the Person class if you really want. Its not the
         | end of the world but you lose the benefits when you're
         | operating on a single large object in that style.
         | 
         | The reason you would use this setup is so you have all your
         | data in contiguous arrays you can SIMD through quickly. If
         | you're doing a lot of operations where n is low then its not
         | that useful (and even detrimental) to orient your data array-
         | wise.
        
         | odshoifsdhfs wrote:
         | If you use simple data structures without logic, it is
         | basically what he is saying. Basically `createAuthorData` is a
         | data structure constructor. There are some advantages by using
         | _.pick or something in JS land, but these don't work for other
         | more strictly typed languages.
         | 
         | Also, I would be careful with using it so much as his example:
         | 
         | ``` function createAuthorData(firstName, lastName, books) {
         | return {firstName: firstName, lastName: lastName, books:
         | books}; }
         | 
         | function fullName(data) { return data.firstName + " " +
         | data.lastName; }
         | 
         | function createArtistData(firstName, lastName, genre) { return
         | {firstName: firstName, lastName: lastName, genre: genre}; } ```
         | 
         | you can also do `fullName({firstName: 'Elmond', not:'really'})`
         | and have an hard to catch bug (where an actual typesystem would
         | catch it).
         | 
         | You can either use a Protocol for this (PersonName) or a base
         | class (though those are going out of fashion nowadays)
        
           | pjmlp wrote:
           | Or when structure typing is available,                   type
           | example = { firstName : string; lastName : string};;
           | let name x = x.firstName;;         name {firstName = "Joe";
           | lastName = "User"};;
           | 
           | Or to make it more specific                   let name {
           | firstName : string} = firstName;;
        
         | mumblemumble wrote:
         | > for a complicated domain object with many other composed
         | objects this starts to be a pain
         | 
         | It may be that complicated domain objects with many other
         | composed objects are not compatible with this style of
         | programming.
         | 
         | Whether or not that's a good thing could be the subject of an
         | interesting discussion. For one thing, negative experiences
         | with the "large complex classes" approach to domain modeling is
         | a major factor in the backlash against object-oriented
         | programming. Lately I've been trying to familiarize myself more
         | with the early literature of OOP, and I'm discovering that even
         | OOP's early pioneers had already had bad experiences with it,
         | and would warn against the temptation to do things that way.
         | OTOH, there's undeniably a certain attractiveness to it,
         | otherwise it wouldn't be so common.
        
         | weavejester wrote:
         | > _I am not feeling ready to memorize all the field names, I
         | prefer to have an object with documentation for each field._
         | 
         | Using data doesn't preclude documenting fields, particularly if
         | those fields are assigned a unique name.
         | 
         | For example, in Clojure you'd namespace the keys, so instead of
         | "lastName" one might write :author/last-name instead. This
         | keyword can then be documented or even assigned a type/spec.
         | 
         | (That said, you can't currently assign a docstring to a keyword
         | in Clojure without the use of a third party library, or using a
         | comment or external document. Hopefully a future version of
         | Clojure will make this part of the core language.)
        
         | hakre wrote:
         | The author argues DO is in favour to reduce complexity, but
         | also notes on the prices of it and reminds that it's in the eye
         | of the beholder to decide whether or not it is of benefit.
         | 
         | Many of the early PHP applications were written in a style
         | using generic data structures (array that is in PHP) having
         | functions dealing with all these.
         | 
         | There is no free lunch.
         | 
         | If you already have domains you can model in, the strategy to
         | reduce complexity is perhaps more from the DDD book which looks
         | like a higher level concept to me and therefore may be more
         | fitting for higher complexity levels.
         | 
         | (but again, one for sure can shoot in her own foot with DDD as
         | well)
         | 
         | What I like about the DO thing is it maps well on simple REST
         | APIs sending and receiving JSON text. Quite popular these days
         | to say the least if not mentioning Serverless.
         | 
         | A similar trend can also be seen in structural logging.
         | 
         | These systems are often distributed and complex.
         | 
         | As it bothers you in your case, I would tend to say, it's
         | better to stick to a domain if there is one. Across boundaries
         | of domains, values to emit and receive data can work out very
         | well though I can imagine. As so often, it depends.
        
       | OneGuy123 wrote:
       | In the context of C# this becomes a real pain.
       | 
       | The isuee are arrays/lists.
       | 
       | Because in C# an int[] array or List<int> is a reference.
       | 
       | So even if you put a int[] in a struct this int[] will NOT be
       | copied when you assign an instance of this struct to another
       | instance: you will get a reference share which makes this
       | annoying since you cannot do a proper deep copy with the language
       | itself: you are forced to use reference copy instead of deep
       | copy.
       | 
       | In C++ this is easy because the types differentiate between
       | pointer and non-pointer explicitly + you can overload the
       | assignment operator.
        
         | pjmlp wrote:
         | Unless one uses a struct based array instead.
         | 
         | https://docs.microsoft.com/en-us/dotnet/csharp/programming-g...
         | 
         | https://docs.microsoft.com/en-us/dotnet/api/system.memory-1?...
        
       | mumblemumble wrote:
       | To principle #2:
       | 
       | The author acknowledges this as being incompatible with static
       | typing, but I'm not so sure. Is it that, or is it that it's
       | incompatible with contemporary static languages?
       | 
       | In FP, we already have a concept of type-safe hetergeneous lists
       | and maps, and even some clever implementations in languages like
       | Java[1]. The ergonomics are often less-than-stellar, but I'm
       | pretty sure that's something a new language could fix with some
       | syntactic sugar.
       | 
       | There is also the data frame abstraction (like, you see in
       | Pandas), which is typically implemented on top of dynamic typing,
       | but major implementations often rely on static typing behind the
       | scenes to achieve efficiency. There are also projects like
       | Frameless[2], which implements a statically typed interface over
       | a dynamically typed dataframe package.[3] I'm guessing, again,
       | that careful language design could get us something similar, but
       | with better ergonomics.
       | 
       | And I'd be happy with that. I've been pulling away from static
       | languages lately, and a big part of that is that I really like
       | how some of the dynamic languages let me model my data _as data_.
       | It 's enough of a complexity saver to feel like a net win, even
       | at the cost of some performance and static verification.
       | 
       | [1] For example: https://github.com/palatable/lambda#hlist
       | 
       | [2] https://github.com/typelevel/frameless
       | 
       | [3] Which is itself, notably, implemented in a static language.
       | Spark also has a statically typed version of the API, but its
       | usage is not recommended for several reasons, one of which is
       | performance. That's something that all us static typing fans
       | should really stop to think about for a bit.
        
         | fauigerzigerk wrote:
         | I think a TypeScript style structural type system could work.
        
           | meheleventyone wrote:
           | Be careful because there are perf issues if you are using
           | parametric polymorphism. Monomorphic functions are preferable
           | but the real problems occur once you get past the inline
           | cache's maximum number of 'shapes'. This obviously applies to
           | regular JS as well.
        
             | fauigerzigerk wrote:
             | Perhaps I misunderstand, but wouldn't that only apply if
             | you attach functions to objects? I suppose you wouldn't do
             | that if you follow data oriented programming principles.
        
               | meheleventyone wrote:
               | Nope. It's about the different types passed into a
               | function parameter. For optimization JS engines look at
               | the differing shapes of JS objects. Which is determined
               | by the order and type of each member.
               | 
               | So:                 { a: 6, b: 7 } is different to { b:
               | 7, a: 6 } and { a: "six", b: "seven" } is different to
               | both.
               | 
               | This is done so the member can be looked up quickly by
               | offset. Functions then have an inline cache that stores
               | the shapes the function has seen. If the function is
               | called monomorphically it will only ever see one shape
               | and hit the fastest path. If it is called with up to
               | three shapes (in V8) it will be pretty quick. Once past
               | three the cache falls through to a global table and is
               | dog slow.
               | 
               | This matters if you are using structural typing as you
               | are still creating different shapes to be passed into the
               | same function(s).
        
               | fauigerzigerk wrote:
               | I see. Thanks for explaining. That's very good to know!
        
         | adamnemecek wrote:
         | None of these really apply in this case. In data oriented
         | programming, the data would not be stored in heterogenous lists
         | but each element would be stored in a different, homogenous
         | array and you would ties these together using the same entity
         | ID.
        
           | skohan wrote:
           | Even with this type of design, it's possible to implement in
           | a typesafe way. I have seen clever ECS systems accomplish
           | this
        
             | adamnemecek wrote:
             | Maybe but ECS does in some sense circumvent types.
        
               | hansvm wrote:
               | In some sense. You could build a type system around "has
               | a <foo>" relationships instead of "is a <foo>" and get
               | something pretty cohesive out that closely aligns with
               | ECS.
        
           | mumblemumble wrote:
           | That is not quite what the article is talking about.
           | 
           | There's a bit of a terminology collision here. What the
           | article is talking about is not Data-Oriented Design, the
           | practice of organizing your data for efficient processing.
           | It's proposing a separate (but not incompatible) concept of
           | organizing your code for easier maintainability. For example,
           | the sample code (which appears to be JavaScript) is not doing
           | ECS at all. It's almost exclusively doing the data modeling
           | by creating one heterogeneous map per entity.
        
       | rlogical wrote:
       | Great post..
        
       | jmull wrote:
       | I don't think these principles add up to something useful. It's
       | not complete and I think some of the principles don't align that
       | well to the problem space.
       | 
       | The big one: "Data is immutable". The problem here is that data
       | _isn 't_ actually immutable (generally) and mutability isn't
       | actually the problem. The problem is unmanaged references or
       | other dependencies on the mutable data. The "source of truth"
       | becomes muddled which creates the problem of how to keep the
       | various instances in sync (or otherwise handle cases where they
       | are/become out-of-sync). Immutability is a very useful tool since
       | you have have any kind of dependency -- direct, indirect,
       | implied, etc) -- and there's no worry. But you still need a
       | mechanism to manage mutating data. Maybe it comes out somewhere,
       | but the principles of DO don't cover it, which is a rather
       | serious omission.
       | 
       | Also, principle 2 isn't really a principle. I think what it's
       | getting at is that you don't really know the precise type of your
       | data, over time, in a distributed system, so it's good to include
       | the flexibility to handle that. That makes sense to me. but
       | generic data structures aren't necessarily always the right way
       | to handle that.
        
         | wiremine wrote:
         | > Immutability is a very useful tool since you have have any
         | kind of dependency -- direct, indirect, implied, etc) -- and
         | there's no worry. But you still need a mechanism to manage
         | mutating data.
         | 
         | I agree. For me it feels like immutable data is a tool, not a
         | universal principle. For example, in low-level C programming or
         | direct control of a register in an embedded context, immutable
         | data isn't a principle, it's just another approach.
         | 
         | What I mean by "universal" is something like SOLID. [1] (Most
         | people only use SOLID for OOP, but Uncle Bob makes it clear in
         | Clean Architecture that he thinks SOLID is universal, and not
         | limited to OOP.)
         | 
         | That said: it does feel like immutable data should be the
         | _default_ approach in many situations. But that's really hard
         | to do in most languages.
         | 
         | [1] https://www.amazon.com/Clean-Architecture-Craftsmans-
         | Softwar...
        
         | ummonk wrote:
         | I think the principles do basically summarize data oriented
         | programming. Particularly, principle 1 is important since it is
         | the exact opposite of OOP's insistence of encapsulation, and is
         | possibly the biggest reason DOP works so much better than OOP.
        
           | joe_the_user wrote:
           | Evidence that "DOP works so much better than OOP" is scant
           | imo.
           | 
           | The rise and fall of paradigms that present themselves as
           | panaceas is instructive. You have "structured programming",
           | "object oriented programming", "functional programming" and
           | now "data oriented programming".
           | 
           | What I'd like to see is paradigms paired with "where this
           | works well" rather than paradigms sold based on "this will
           | solve the software crisis", "this works better(unqualified by
           | when)" and "if you're not doing this, you're doing it wrong".
           | The later two claims leave a bad taste in the mouth of the
           | casual users/observers, who gradually morph into active
           | critics and sink the paradigm, wasting the good and useful
           | parts of each (OOP has a vast universe of haters because it's
           | most successful of the panacea-paradigms so far and as a
           | panacea, there's much to hate here but still).
        
         | hjntmp wrote:
         | Immutability is not the fact that something can not change in
         | this case. It has more to do with the identity of a value.
         | Every time you change anything inside the data structure you
         | get a new reference, that is it!
        
           | hjntmp wrote:
           | Think of it as versioning for your data. So instead of
           | referring to some data structure by its general concept with
           | immutable data structures we are a lot more specific with
           | respect to the identity represented by the name, here the
           | name represents a version.
           | 
           | This is useful because, for example, you stop having the
           | "unmanaged references" you were talking about because now
           | since you are pointing to a version of the data and not the
           | data itself you can be sure of what you are talking about.
        
             | hjntmp wrote:
             | Also remember here the promise is that the data you are
             | talking about wont suddenly change underneath not that the
             | reference 'is up to date".
             | 
             | It is not a solution for change in time it is a solution
             | for taking change in time out of the equation when we don't
             | need to talk about that. With immutable ds when we talk
             | about data we are just talking about exactly that and time
             | is taken out of the picture. Its called immutable because
             | now you are talking about facts and not the representation
             | in time of those facts. Because now we are talking in
             | versions so it does not make sense. The thing is this data
             | is a snapshot so you are not guaranteed to have the most up
             | to date snapshot. And that's ok because precisely here we
             | take change in time out of the question in order to be able
             | to talk more precisely about the data. Tracking change in
             | time is another story.
             | 
             | So for example you could have things like react. There you
             | have snapshots of the world updating. When you talk about
             | the data it is immutable but then you change it and update
             | the mutable variable where you are keeping change in time.
        
         | lacrimacida wrote:
         | > I don't think these principles add up to something useful.
         | It's not complete and I think some of the principles don't
         | align that well to the problem space.
         | 
         | These are are part of a book that is being written at the
         | moment as far as I can tell:
         | 
         | "This article is an excerpt from my upcoming book about Data
         | Oriented Programming. The book will be published by Manning,
         | once it is completed (hopefully in 2021). "
        
         | adamkl wrote:
         | Honestly, I think this series of blog posts could have had a
         | more complete title: "Principals of Data Oriented Programming
         | _(aka, Idiomatic Clojure)_ "
         | 
         | All of these ideas (and I think they are good ones) are
         | inspired heavily by Rich Hickey's talks and rational behind
         | developing the Clojure language (the author of the post states
         | as much). And while you _can_ use these techniques in other
         | languages /paradigms/problem domains, they are really intended
         | to work well inside the constructs of Clojure, and when applied
         | to "information-driven situated programs" [0] (read business
         | applications with dynamic requirements).
         | 
         | As for some of the short-comings you mentioned:
         | 
         |  _" But you still need a mechanism to manage mutating data"_
         | 
         | Clojure supports this through the use of locking constructs
         | like atoms. [1]
         | 
         |  _" I think what it's getting at is that you don't really know
         | the precise type of your data, over time, in a distributed
         | system, so it's good to include the flexibility to handle that.
         | That makes sense to me. but generic data structures aren't
         | necessarily always the right way to handle that."_
         | 
         | Clojure attempts to bridge the gap between generic data-
         | structures and strongly-typed constructs using run-time
         | specifications. [2]
         | 
         | I mean, the ideas presented here can be generally useful, but
         | your mileage may vary if the principals take you too far out of
         | the idiomatic for your particular language/paradigm/problem
         | domain. If that's the case, you could find yourself wasting
         | energy swimming up stream.
         | 
         | [0] - https://www.youtube.com/watch?v=2V1FtfBDsLU [1] -
         | https://clojure.org/reference/atoms [2] -
         | https://clojure.org/about/spec
        
           | [deleted]
        
         | dnautics wrote:
         | I think "data are immutable" should be qualified - usually it
         | means "data are immutable through your _codepaths_ " and if you
         | are mutating data, you need explicit checkouts and checkins of
         | the data.
         | 
         | This is the basic principle behind how sane database
         | transactions work.
         | 
         | ORMs in some languages can be especially dangerous if they
         | overload the getter/setters of the object in such a way that
         | the checkins and checkouts are obscured; you could be passing
         | your object to a function or method that expects to mutate a
         | polymorphic class[0] that is usually a traditional "shared
         | memory" form of objects, well hopefully you can imagine the
         | chaos, redundant database transactions, consistency problems,
         | failure modes, uncaught exceptions, etc. that are going to be a
         | nightmare to debug.
         | 
         | [0] worse yet, imagine if it's someone else's code and they
         | change the api from not mutating to mutating for performance
         | reasons. Will you notice the documentation change or the
         | changelog? It's bad enough in the case when it's not an ORM and
         | just a mutable object.
        
         | ajuc wrote:
         | You realize data is immutable when you first try to implement
         | history.
         | 
         | Mutability is just a hack to save some memory.
        
         | rellekio wrote:
         | Your right about the sync and dependency issue. Big reason why
         | in JS land you treat each mutation of the data as immutable
         | destruction and refresh of an object is to eliminate any old
         | references that might not of been cleaned up by the garbage
         | collector.
         | 
         | Data Oriented Programmming is both old and new. In that it does
         | not have the same amount of programming patterns that OOP has.
         | As it is a more bare metal means of programming without a ton
         | of abstraction to ease most programmers into it.
         | 
         | Where I find the idea interesting is concurrent and parallel
         | processes are more natural in the data oriented. And that is
         | through immutablility and ownership as first principles.
        
         | phendrenad2 wrote:
         | This seems to be describing a style of programming, and you'll
         | have to take these principles in the spirit that they were
         | intended, i.e. in the context of that programming style. I
         | recognize this style as a common style in OOP/FP languages such
         | as Scala, where it's common to pass around immutable Plain Old
         | Scala Objects (made from lists and hashmaps).
        
       | curyous wrote:
       | This is not Data Oriented Design, far from it. I find it
       | confusing that the term use here is "Data Oriented Programming".
        
       | fpoling wrote:
       | The examples from the first article about code reuse demonstrates
       | power of row polymorphism. But in a statically typed language
       | that requires a rather advanced type system that allows to
       | declare explicitly or implicitly that code works for any struct
       | that contains the given fields. In C++ one can use templates, but
       | that trivially leads to unmaintainable code.
        
       | disaster01 wrote:
       | Clojure-inspired concepts? An upvote from me, despite the hide-
       | out in other languages :)
        
       | jll29 wrote:
       | It's a good idea to write a book about a data-oriented
       | development style (I'm working on a methodology, a way of working
       | in data-intensive projects called Data2Value).
       | 
       | However, JavaScript or Clojure are not ideal for demonstrating
       | this methodology in the sense that industrial applications will
       | more likely be built in C++, Java or Python. For example C++ and
       | Java support Apache UIMA, which is an industry standard for data-
       | oriented systems. UIMA (originally developed at IBM before open
       | sourced, and use in their Watson system) manages data as
       | immutable objects (e.g. text, videos) that are enriched with
       | annotations (e.g. syntax graphs, topical tags, subtitles...).
       | 
       | Functional designs are often well-suited to data flow related
       | processing, whereas in OOP, you end up with a pipeline object and
       | various DataStream objects that it inputs and outputs.
       | 
       | In my experience, data-intensive systems often need to: - cater
       | for distributed processing due to large-scale (which calls for
       | Apache Spark, and then PySpark or Scala); - compute-intensive
       | work like machine learning, which may require GPUs or other
       | bespoke hardware (Tensorflow supports GPUs, Google's cloud has
       | TPUs); and - special purpose data structures (e.g. Bloom filters,
       | huge persistent graphs, R* trees...) specific to the nature of
       | the processing (this latter point, I guess, contradicts to the
       | author's claims).
        
         | dragandj wrote:
         | FYI Clojure supports GPU processing since 5 years ago.
         | 
         | https://github.com/uncomplicate
        
         | thom wrote:
         | Data oriented programming is orthogonal to the type of biggish
         | data tools you're talking about, although I agree that in the
         | present day, the latter is actually a more interesting subject
         | for methodological discussion.
        
       | [deleted]
        
       | cortexio wrote:
       | This is stupid. Im only in chapter 1 and there are major flaws.
       | 
       | eg.
       | 
       | function fullName(data) { return data.firstName + " " +
       | data.lastName; }
       | 
       | This somehow suggests that firstname and lastname are always
       | stored under the exact same property names and exact same depth
       | of the object.
       | 
       | In practice this will just result in coders making duplicated
       | functions with different names.
       | 
       | eg: fullNameUsing_firstName_lastName , fullNameFromCustomer ,
       | fullNameContactPerson
        
       | Joker_vD wrote:
       | You know, sometimes I wish the programming languages had a better
       | distinction between "immutable data pieces" and "stateful
       | agents". Sometimes it's really nice to have a simple struct for
       | which you (or anyone else) can write many function to act upon.
       | Sometimes it's really nice to have an opaque "object" with
       | methods to yank described in some public interface, but which
       | encapsulates loads of state and other objects inside it so you
       | don't have to think about all of that.
        
         | omginternets wrote:
         | This is probably an imperfect solution, but isn't this kind of
         | what Actors provide? Syntax and concurrency aside, passing a
         | message is equivalent to calling a method.
         | 
         | The biggest annoyance I can foresee is discoverability. Object-
         | orinented classes make it pretty clear what a given object's
         | interface is (just look at its public methods). This is not
         | true of actors in the languages I've used (not many), but in
         | theory it should be a fairly trivial question of syntax.
        
         | discreteevent wrote:
         | The "stateful agents" is what OOP is all about. The objects are
         | interpreters. I referenced a discussion elsewhere in this
         | thread between Alan Kay and Rich Hickey where Kay talks about
         | the value of interpreters. Rich Hickey talks about data/values
         | coming from a seismometer or an IoT device but what he misses
         | there I think is that it's not all just "data". There are also
         | stateful agents. The seismometer (or the earthquake) is a
         | stateful agent and the data is messages that are sent by it. If
         | we had to model a seismometer or iot device (e.g. a smoke
         | alarm) it would be best to do so using an object that
         | encapsulates it's state and manages itself. It only
         | communicates to the outside world with messages/data (in this
         | case a sound when the temperature exceeds some internal state).
         | I can replace my smoke alarm with another and I don't need to
         | understand anything about its internal data or how it
         | interprets it.
         | 
         | But for dealing with the historical data from an IoT device it
         | may make sense to use a data oriented/functional approach. That
         | time series data is not stateful, it's an immutable history of
         | something stateful. Functions/transformations work best there
         | usually.
        
           | Joker_vD wrote:
           | Right. And the problem is, quite often those "data/values"
           | turn out to be objects too, and for no good reason. Take the
           | whole Active Records approach, for example, re-implemented in
           | Java in the most straightforward way possible: you have a
           | class with data fields (well, with getters/setters) but it
           | also has "Save()/Load()" methods on it. Ugh.
        
         | identity0 wrote:
         | This is exactly the reason C++ was created. It has structs and
         | classes. In theory, structs and classes are exactly the same,
         | but conventionally, a struct represents some kind of data
         | object while a class represents something with encapsulation
         | and state.
        
         | dudul wrote:
         | So like an Erlang map and a GenServer?
        
         | mdm12 wrote:
         | C# will be getting records in .NET 5, which would be analogous
         | to what you refer as 'immutable data pieces', in addition to
         | the normal objects the language has always had.
         | https://devblogs.microsoft.com/dotnet/announcing-net-5-0-rc-...
        
       | arximboldi wrote:
       | Very interesting!
       | 
       | In the context of C++ some of us have been calling these
       | programming/design principles "Value Oriented Design". Some talks
       | on the topic:
       | 
       | - Most Valuable Values (Juan Pedro Bolivar)
       | https://www.youtube.com/watch?v=_oBx_NbLghY
       | 
       | - Squaring the circle, value oriented design in an object
       | oriented system (Juanpe)
       | https://www.youtube.com/watch?v=e2-FRFEx8CA
       | 
       | - Objects vs Values: Value Oriented Programming in an Object
       | Oriented World (Tony van Eerd)
       | https://www.youtube.com/watch?v=2JGH_SWURrI
        
         | phaedrus wrote:
         | Something I've found applying Value Oriented Design to C++ is
         | that it often leads to freeing designs of arbitrary limitations
         | that weren't clear beforehand but in hindsight you realize,
         | "well, of course I should also be able to use it this (other)
         | way."
         | 
         | For example I'm porting a parser engine, implemented in another
         | language, to C++ and it wasn't clear what the hierarchy of
         | objects should be for the purpose of RAII (because there are
         | circular dependencies) (the original implementation language
         | was garbage collected). The original implementation loaded a
         | grammar file and directly created objects with pointers
         | (originally, references) to other objects.
         | 
         | I introduced an intermediate layer where the file is first read
         | into data structs which hold the integer values from the file.
         | So instead of objects pointing to objects, the links are
         | implicit because of things having the same index. The
         | representation of the loaded grammar file is now copyable,
         | moveable, immutable, etc. A side effect is it's trivial to tell
         | whether the file loading code is correct or not when the result
         | is just the same data with structure applied to it, rather than
         | having the added dimension of determining whether a graph of
         | objects are correctly relating to _each other_.
         | 
         | Then I construct the actual parser engine objects from the data
         | representation structs. True that didn't in itself solve the
         | RAII-hierarchy problem, but what it did do is make it easier to
         | isolate that problem to just the domain of _how the objects are
         | used_ and not commingling that with the problem of _how the
         | file is loaded_.
         | 
         | The epiphany I spoke of is that after this refactor, it became
         | clear: the file is arbitrary. For testing, or for use of the
         | parser with a grammar which does not change, I could dispense
         | with the file load step and just encode the grammar directly in
         | value-structs.
         | 
         | Why I think this is significant is that "the way I was trained"
         | to think of making code like this unit-testable is to mock the
         | file reading interface. That's a lot of work for something
         | that's only necessary because of an over-emphasis on objects
         | and _behaviors_ instead of thinking about data and _values_.
        
         | gpderetta wrote:
         | Sean Parent has also written a lot on the topic.
        
           | arximboldi wrote:
           | Yes! Also the book Elements of Programming by Stepanov has a
           | lot of "value orientation" in it.
        
             | gpderetta wrote:
             | Unsurprising as I believe Sean is a "disciple" of Stepanov
             | (I think they worked together at Adobe).
        
         | accountLost wrote:
         | See also https://matt.diephouse.com/2018/08/value-oriented-
         | programmin... for a very quick introduction.
        
           | gpderetta wrote:
           | Unsurprisingly David Abrahams, before moving to Apple, was an
           | extremely influential member of the C++ Boost community,
           | where value based programming is praticed extensively.
        
       | nafey wrote:
       | Whats the general opinion for using Maps instead of
       | Structs/Simple Objects for the data containers in languages that
       | allow for either?
        
         | aidenn0 wrote:
         | I think for untyped languages like clojure, you lose nothing.
         | For typed languages that don't support parametric polymorphism,
         | you trade type safety for flexibility and code simplicity. For
         | typed languages that support parametric polymorphism, I don't
         | see the advantage.
        
         | yen223 wrote:
         | If you know ahead of time the shape of the data you're dealing
         | with, you might as well use structs and reap the benefits of
         | type safety and improved performance
        
       | kaliszad wrote:
       | We certainly use some of these principles in OrgPad.com and some
       | of those inspired even the User Experience in a fundamental way.
       | E.g. a bullet point list in a linear medium such as a text or a
       | slide in a presentation is like a star in a graph, where all
       | children have the same weight. The thing is, when people see it
       | like that graphically, they sometimes get ideas they wouldn't
       | have, if they stared at a long text. Sometimes they figure out,
       | that actually the points are not equal weighted or that there
       | isn't such a clear boundary and connect some of these children
       | together either by a link or by selecting the same colour to
       | group them.
       | 
       | Btw. we program everything in Clojure + ClojureScript so
       | immutability and the other points is like preaching to the choir.
       | 
       | Not related, I thought Manning will not publish the book. At
       | least that is the last information I have seen a few days ago. I
       | thought about buying that book.
        
       | pjmlp wrote:
       | > One could argue that the complexity of the system where code
       | and data are mixed is due to a bad design and data an experienced
       | OO developer would have designed a simpler system, leveraging
       | smart design patterns.
       | 
       | Indeed he (or she) would have made use of
       | traits/protocols/categories/whatever, to separate behavior from
       | data, while keeping the design extensible (via polymorphism).
       | 
       | This is something I usually find in OOP critics, too much focus
       | on class driven implementations, without spending too much on the
       | other parts of the toolbox.
        
         | danielscrubs wrote:
         | The other parts exist in one form or another in non oop
         | languages too. Heck, polymorphism is part of type theory,
         | traits exists in Ocaml and Haskell... But arguing what and what
         | isn't oop isn't that productive as no one will agree to any
         | definition. That's why you'll get gut responses about lasagna
         | code where the layering glue is more complex and of bigger
         | proportions than the algorithm itself...
        
           | pjmlp wrote:
           | Which is why I find a complete waste of time the whole OOP vs
           | FP vs ECS vs ADT discussions, instead of embracing all ideas
           | as part of multi-paradigm toolbox that most mainstream
           | languages actually are.
        
             | danielscrubs wrote:
             | To a point I agree, but formal verification is a thing and
             | wouldn't handle a normal mainstream language as we know it.
        
         | discreteevent wrote:
         | Alan Kay and Rich Hickey discuss the merits of data vs data
         | with interpeters:
         | 
         | https://news.ycombinator.com/item?id=11945722
        
       | ArtWomb wrote:
       | >>> Model entities with generic data structures
       | 
       | Which breeds data oriented "anti-patterns" when i/o performance
       | becomes the bottleneck. Focus on hardware. It's almost like you
       | need to work backwards to build scalable algorithms for modern
       | data loads ;)
       | 
       | Scalable Machine Learning & Graph Mining via Virtual Memory
       | 
       | http://poloclub.gatech.edu/mmap/
        
       | mumblemumble wrote:
       | This reminds me a bit of some of the ideas that Eric Normand
       | presents in his book, _Grokking Simplicity_.
       | 
       | Which I'd highly recommend. It's aimed at a less experienced
       | audience, so, as someone who's mid-career, I admit I did skim
       | some sections. But, all-in-all, I enjoyed reading his take on how
       | things should be done.
        
         | viebel wrote:
         | I know that there some common topics with Eric's books. Do you
         | think Eric focuses as much as I do about data?
        
           | mumblemumble wrote:
           | No, he's much more focused on what he calls actions and
           | calculations.
           | 
           | Perhaps overly so. I'd have liked a bit more focus on data.
           | That could be a by-product of his Clojurist roots. Data-
           | oriented programming is so integral to Clojure's culture that
           | I'm not sure Clojurists even realize they're doing it half
           | the time.
        
       | hjntmp wrote:
       | We should rename object oriented programing to bureaucrat
       | oriented programming. I always think that the reasoning that led
       | to develop the aberration of OO is the same that creates
       | bureaucratic nonsense.The want of making people replaceable
       | through bureaucracy so that the programmer as a human being can
       | be removed from the picture plus all the other bureaucratic
       | thinking nonsense leaked to the design of the language. Its funy
       | how ridiculous we are, pretending something all the tme.
        
       | tomowl wrote:
       | Very interesting as an introduction, I think this principles
       | should be easy to follow using something like Rust
        
       | ChicagoDave wrote:
       | Wouldn't this be the exact opposite of Domain-driven design and
       | modeling behavior?
       | 
       | Isn't the behavior of a system more critical than its data
       | elements?
        
         | Jarwain wrote:
         | Reminds me of the quote: > Show me your flowcharts and conceal
         | your tables, and I shall continue to be mystified. Show me your
         | tables, and I won't usually need your flowcharts; they'll be
         | obvious. -- Fred Brooks, The Mythical Man Month (1975)
         | 
         | I don't think they're inherently contradictory. You can have
         | plain data objects representing the domain, and functions that
         | act on these objects representing the behaviors/actions in the
         | domain. You could include these functions as part of the
         | "class" for these objects, and have them return new instances
         | of the class to maintain immutability.
        
       | tome wrote:
       | Cheekily resubmitted, I see! Not that I mind. I think it's a
       | great idea that deserves sharing.
       | 
       | https://news.ycombinator.com/item?id=24682380#24685657
        
         | kensai wrote:
         | How did it open a second HN thread?
        
           | dllthomas wrote:
           | The URLs differ. This one includes "?essence".
        
       | akst wrote:
       | Is it me or does anyone else confuse the term "Data Oriented
       | Programming" with "Data Oriented Design" [1]?
       | 
       | [1]: https://youtu.be/rX0ItVEVjHc
        
         | anaphor wrote:
         | You're not the only one. That's what first came to my mind when
         | I saw this.
        
       | conceptoriented wrote:
       | _> Data never changes, but we have the possibility to create a
       | new version of the data._
       | 
       | Well, it depends on what you mean by data. To avoid ambiguity it
       | is better to talk about data _values_ and data _objects_ which
       | have different properties. This can be formalized as follows [1]:
       | 
       | o data values are modelled via mathematical tuples - tuples are
       | immutable
       | 
       | o data objects are modelled via mathematical functions (one field
       | is a function from this reference to the field value) - functions
       | are supposed to be mutable
       | 
       | (In reality of course we meet quite different situations, for
       | example, struct is mutable and objects can be immutable.)
       | 
       | [1] Concept-oriented model: Modeling and processing data using
       | functions
       | https://www.researchgate.net/publication/337336089_Concept-o...
        
         | shwestrick wrote:
         | What do you mean by "functions are supposed to be mutable"?
         | 
         | Perhaps you are just pointing out that the output of the
         | function (and therefore the value of the field...?) will change
         | as the input changes?
         | 
         | If mathematical tuples are immutable, then surely mathematical
         | functions are immutable as well ;)
        
           | prostodata wrote:
           | Here is one possible implementation of the concept-oriented
           | model of data for data processing. It heavily relies on
           | functions and operations with functions and is an alternative
           | to purely set-oriented approaches like map-reduce or join-
           | groupby (sql):
           | 
           | https://github.com/prostodata/prosto - Functions matter!
        
           | conceptoriented wrote:
           | Function is a mapping between two sets (of values). This
           | mapping between values is mutable although the values are
           | not.
        
             | louthy wrote:
             | Functions are a mapping between a domain and a codomain,
             | the mapping absolutely isn't mutable, the definition of the
             | function is the relationship between the domains.
             | 
             | If I have a function:                   int Add1(int x) =>
             | x + 1
             | 
             | I would expect the domain and codomain to be immutable; I
             | would also expect that x+1 to not turn in x/2 randomly also
        
               | conceptoriented wrote:
               | _> the mapping absolutely isn't mutable_
               | 
               | Assume f: X -> Y. We can now map x_1 to y_1 f(x_1)=y_1.
               | And then change this same function by mapping x_1 to y_2:
               | f(x_1)=y_2. Thus we can easily modify functions.
               | Moreover, we do it constantly when we modify object
               | fields in OOP. It is probably easier to comprehend if a
               | function is represented as a table which we modify.
               | 
               | In contrast, we cannot modify data values (mathematical
               | tuples). Say, x=42+1 means that a new value 43 is created
               | rather than the existing value 42 is modified.
               | 
               |  _> I would expect the domain and codomain to be
               | immutable;_
               | 
               | No. Domains, codomains and any set can well be modified
               | by adding or removing tuples. What is immutable are
               | values (in the sets).
        
               | louthy wrote:
               | > Assume f: X -> Y. We can now map x_1 to y_1 f(x_1)=y_1.
               | And then change this same function by mapping x_1 to y_2:
               | f(x_1)=y_2
               | 
               | They would be different functions, the first being the
               | identity function: x => x, the second being: x => x + 1
               | 
               | > Thus we can easily modify functions. Moreover, we do it
               | constantly when we modify object fields in OOP
               | 
               | This isn't the case. A field with a different value in it
               | just means the object is a different value. If the object
               | is passed to a static function, then the domain is the
               | full set of possible values that the object can hold
               | (this is known as a product-type, you multiply the total
               | possible values of each of its component parts to find
               | out the size of the domain).
               | 
               | If it's passed to a method then there's an additional
               | implicit argument: `this`, which is the same as a static
               | function with an additional argument that takes the
               | object. The function is the same.
               | 
               | Global (or even free variables) should also be considered
               | part of the domain: i.e. it's akin to implicit arguments
               | that are being passed to the function.
               | 
               | > No. Domains, codomains and any set can well be modified
               | by adding or removing tuples.
               | 
               | This also isn't the case. If a function is defined that
               | takes an integer and returns a boolean value: Int - Bool
               | then the domain is the set of integers, the co-domain is
               | True and False. You can't pass a tuple to a function that
               | takes an Int and therefore dynamically increase the size
               | of the domain. Even in dynamic languages the codomain is
               | effectively `top`, the type that holds all values, and
               | therefore the domain is all values and the codomain is
               | all values, which makes them immutable still.
               | 
               | Now maybe I am misunderstanding you, but this is how all
               | of the mainstream statically and dynamically typed
               | languages work. Perhaps there's some edge-case language
               | that I'm missing here that allows types to be extended,
               | which would be interesting in its own right.
        
               | kingdomcome50 wrote:
               | Can you expand upon this? Perhaps the difference between
               | "re-mapping" the function:                   f(x_1)=y_2
               | 
               | and "re-mapping" the value:                   x=42+2
               | 
               | How is the former different than the latter? And by what
               | mechanism is the former achieved? I understand what you
               | are saying, but _how_ does one simply  "change this same
               | function"? Redefine it?
               | 
               | To be clear, I'm not suggesting you are incorrect. I just
               | don't fully understand what you are getting at.
        
       | dpc_pw wrote:
       | So... is Data Oriented something new?
       | 
       | Storing fields in a map leads me to believe this is not Data
       | Oriented Design (DOD). And I completely reject this idea (fields
       | in maps). The "flexibility" there is hardly useful, and could be
       | achieved with defined shapes (types) in modern statically typed
       | languages without all the dowsides.
       | 
       | "Separate code from data" is a big core belief I share with this
       | article, but the rest doesn't seem good idea / novel / important.
        
       | tabtab wrote:
       | I've always liked the idea of "table oriented programming" where
       | more detailed schema info is used to do most of the CRUD and UI
       | work. In my experiments, the tricky part is exceptions to the
       | patterns. You always need to be able to tweak things imperatively
       | (via code). But the attributes can still do roughly 90% of the
       | job.
       | 
       | My latest approach to get enough tweakability is what I
       | tentatively call "fractal rendering events" or "staged
       | rendering". When rendering HTML or SQL, you need event "hooks"
       | for the different stages. Level 1 events may override/alter field
       | attributes. Level 2 events may override/alter the HTML (or sql)
       | generated for the field based on the Level 1 values. Level 3
       | events may override/alter the HTML of page sections (or entire
       | SQL clauses). Level 4 is overriding/altering the entire page (or
       | final sql statement).
       | 
       | In other words, the schema provides drafts, which can then be
       | adjusted along the way through event hooks. The granularity of
       | what's tweaked goes up with each stage.
       | 
       | But managing that many potential events needs something more
       | powerful than a file-based system. It may be better to manage
       | such source-code in an RDBMS so you can search, sort, and group
       | by different factors at different times rather than hard-wire in
       | one viewpoint as file systems do.
       | 
       | But current IDE's are not ready for this. I do believe it's the
       | future, though. File trees are too limiting.
       | 
       | Consider this: it's common for a non-coding analyst to want to
       | change a field label, page title, max field length, or "required"
       | status. If they could do it in the schema info (data dictionary),
       | then they don't have to involve the coders. Whether the data
       | dictionary is referenced directly or generates scaffolded code is
       | a stack-specific or shop-specific choice. Minor things like this
       | shouldn't involve a lot of effort.
        
       ___________________________________________________________________
       (page generated 2020-10-05 23:01 UTC)