[HN Gopher] Copying data is wasteful, mutating data is dangerous
       ___________________________________________________________________
        
       Copying data is wasteful, mutating data is dangerous
        
       Author : feross
       Score  : 134 points
       Date   : 2020-01-08 20:04 UTC (1 days ago)
        
 (HTM) web link (pythonspeed.com)
 (TXT) w3m dump (pythonspeed.com)
        
       | capableweb wrote:
       | Seems like "both libraries make copies of the data, which means
       | you're using even more RAM." is a assumption (that copy data
       | leads to more RAM usage) and I'm guessing the libraries (haven't
       | used them myself, nor python, so not sure how feasible this is)
       | can be implemented like a persistent data structure
       | (https://en.wikipedia.org/wiki/Persistent_data_structure) and
       | therefore get both immutability (which leads to less bugs, no
       | mutability) and not that high RAM usage. Clojure is a famous
       | example of something that implements persistent data structures.
        
         | perfunctory wrote:
         | > can be implemented like a persistent data structure
         | 
         | you don't get much benefit from persistent structures if every
         | element of an array is modified, like in the examples in the
         | post.
        
           | pipeep wrote:
           | If the data structure has a reference count of one, you can
           | safely mutate the data structure instead. Many persistent
           | immutable data structure libraries use this as an additional
           | optimization.
        
           | capableweb wrote:
           | I missed that, sorry. Thanks for clarifying!
        
       | vmchale wrote:
       | I think Futhark approaches this with uniqueness types. A bit
       | strange (maybe that's just me?) but it works well with arrays.
       | 
       | Also it compiles to Python nicely.
        
       | xg15 wrote:
       | This seems like a problem that would be suited for an optimizing
       | compiler.
       | 
       | You could imagine some hypothetical language in which _all_
       | objects are immutable, as far as the programmer is concerned -
       | however the compiler is allowed to reuse objects behind the
       | scenes if there is provably no way that a side-effect could be
       | introduced.
       | 
       | E.g., in such a language, the first variant would be the only
       | legal form to write the normalizing function:                 low
       | = array1.min()       high = array1.max()       array2 = array1 -
       | low       array3 = array2 / (high - low)       return array3
       | 
       | However, this could be safely turned into the following during
       | compilation:                 low = array1.min()       high =
       | array1.max()       array2 = array1 - low       array2 /= (high -
       | low)       return array2
       | 
       | ... because array2 is not visible outside the function and
       | therefore could not cause side-effects.
        
         | [deleted]
        
         | zzzeek wrote:
         | it is theoretically possible that a library like numpy or
         | pandas could do this directly. That is, if you take each of
         | these transformation operations and store them, but not
         | actually run them until needed, that is, when someone accesses
         | the individual array values, you only need to make one copy and
         | there is no impact on the programmer to have to worry about
         | this.
         | 
         | I was doing something similar for SQLAlchemy recently.
         | SQLAlchemy is all about capturing Python operators and operands
         | into data structures that represent intent which can be invoked
         | later.
         | 
         | edit: here is a demonstration:
         | https://gist.github.com/zzzeek/caa4a7ed94f326fbbc031acecb9d7...
         | 
         | edit edit: it is actually possible with numpy by using the Dask
         | library: https://dask.org/
        
           | vmchale wrote:
           | You can use Futhark; it compiles to PyOpenCL and then can be
           | called by python fairly fluently.
           | 
           | Accelerate is a nice example of an array library with a JIT
           | backend.
        
           | masklinn wrote:
           | The problem with the library doing this is you end up with
           | the same issue as in Haskell: you accumulate thunks and both
           | memory use and location of CPU hotspots becomes much harder
           | to predict and troubleshoot.
        
             | zzzeek wrote:
             | so...turn off the "thunking" feature while you are
             | debugging.
        
               | JeremyBanks wrote:
               | That won't help when you're debugging the performance
               | implications of thunking.
        
               | zzzeek wrote:
               | compare memory / callcounts / time spent with thunking
               | turned on vs. off would give a good idea of it.
               | 
               | i can say for my side of things thunking is a _super_
               | huge performance gain as it allows for quick gathering of
               | expression intent and then the computation side on the
               | other side of a cache.
        
         | jerf wrote:
         | "You could imagine some hypothetical language in which all
         | objects are immutable, as far as the programmer is concerned -
         | however the compiler is allowed to reuse objects behind the
         | scenes if there is provably no way that a side-effect could be
         | introduced."
         | 
         | There are functional languages that do this. Like many other
         | optimizations, you get less than your intuition might naturally
         | expect, but it does work.
         | 
         | A trivial case is "tail call optimization", which most
         | immediately involves not creating a literal in-memory stack
         | record for recursive invocations but also can easily involve
         | the compiler re-using the memory for the arguments. IIRC Erlang
         | will do that as much as it can in tight recursive calls.
        
         | oefrha wrote:
         | Alas, CPython isn't an optimizing compiler, it faithfully
         | compiles code into bytecode which is then faithfully executed.
         | 
         | I don't know the state of NumPy on PyPy since I don't use PyPy.
        
           | anoncake wrote:
           | It does have a peephole optimizer.
        
           | masklinn wrote:
           | numba.jit might be able to optimise the extra copies away
           | though, I'm reasonably sure working with numpy arrays is a
           | pretty significant use case.
        
         | toast0 wrote:
         | > You could imagine some hypothetical language in which all
         | objects are immutable, as far as the programmer is concerned -
         | however the compiler is allowed to reuse objects behind the
         | scenes if there is provably no way that a side-effect could be
         | introduced.
         | 
         | Erlang/BEAM fits your hypothetical to some degree. Erlang the
         | language has immutable variables; however, BEAM, the virtual
         | machine, has mutable registers. Whether or not a given code
         | sample results in copy or mutation depends on what information
         | is available to the compiler, of course.
        
       | 333c wrote:
       | Why does this blog post have a recaptcha?
       | 
       | In addition, I just finished reading the Rust book last night,
       | and I don't believe this is what interior mutability is (at least
       | as used in Rust).
        
         | nhumrich wrote:
         | I see no recaptcha.
        
           | 333c wrote:
           | Here's a screenshot: https://i.imgur.com/a/84Hb9Jq.png
        
         | itamarst wrote:
         | I'm using ConvertKit for newsletter signups, and choices are
         | either:
         | 
         | 1. Validate emails with recaptcha (which is why you see that
         | thing in the corner.) 2. no email validation at all, so random
         | people get spammed with confirma-you-want-to-subscribe emails.
         | 
         | I am not super-happy with having to make this choice :(
        
       | nayuki wrote:
       | How I would summarize the article is that mutation should be
       | confined to internal objects that have not been published yet or
       | are temporaries that are thrown away.
       | 
       | Theoretically, you can make every API mutate data. If the user
       | didn't want to mutate data, they can explicitly make a copy
       | first. Whereas if there only exists an immutable API to return
       | new objects, then it's hard or impossible to get the benefits of
       | mutating in place.
       | 
       | As far as I know, only Rust, C++, and C give support at the type-
       | checking level to denote which functions will modify the interior
       | of their arguments. This way, you can distinguish behaviors
       | easily - for example in Rust:                 impl Vector {
       | // The method reads this current vector and returns a new one.
       | pub fn normalize(&self) -> Vector { ... }                      //
       | The method reads and modifies this current vector in place.
       | pub fn normalize(&mut self) { ... }       }
       | 
       | The article gave this subtly flawed example in Python:
       | def normalize_in_place(array: numpy.ndarray):           low =
       | array.min()           high = array.max()           array -= low
       | array /= high - low                def visualize(array:
       | numpy.ndarray):           normalize_in_place(array)
       | plot_graph(array)              data = generate_data()       if
       | DEBUG_MODE:           visualize(data)       do_something(data)
       | 
       | In Rust, it would be a type error because visualize() takes a
       | reference but normalize_in_place() takes a mutable reference:
       | fn normalize_in_place(array: &mut numpy::ndarray) {           let
       | low = array.min();           let high = array.max();
       | array -= low;           array /= high - low;       }
       | fn visualize(array: &numpy::ndarray) {
       | normalize_in_place(array);  // ERROR           plot_graph(array);
       | }              let data: numpy::ndarray = generate_data();
       | if DEBUG_MODE {         visualize(&data);       }
       | do_something(&data);
        
         | masklinn wrote:
         | > As far as I know, only Rust, C++, and C give support at the
         | type-checking level to denote which functions will modify the
         | interior of their arguments.
         | 
         | 1. C and C++ allow casting away constness, which may or may not
         | be UB depending how the parameter is defined
         | 
         | 2. Swift also provides this ability to an extent: _struct_
         | parameters have to be flagged `inout` to be mutable
         | 
         | I'm sure there are others.
        
           | nwallin wrote:
           | Rust can throw away constness in unsafe blocks, which carries
           | the same caveat emptor that const_cast carries.
        
             | steveklabnik wrote:
             | Only in certain circumstances, though.
        
           | gdxhyrd wrote:
           | Casting away constness in C++ is something it is virtually
           | never done, and has to be done very explicitly.
        
       | IshKebab wrote:
       | This is just a straightforward memory optimisation; nothing to do
       | with interior mutability.
        
       | bfield wrote:
       | Specifically on the point about Pandas, last time I checked the
       | inplace flag did not save any memory. Under the hood, it creates
       | a temporary copy which is copied back to the original dataframe,
       | and then the temporary gets removed next garbage collection
       | cycle. If that is no longer the case I would love to know about
       | it though!
        
       | perfunctory wrote:
       | > But mutation is a cognitive burden: you need to think much
       | harder about what your code is doing.
       | 
       | One way to alleviate the problem is to use naming convention.
       | normalize(a)       # mutates in place. note: no return value.
       | b = normalized(a)  # no mutation. returns a new instance. note
       | the adjective.
       | 
       | Python does this in the standard library                 a.sort()
       | b = sorted(a)
        
         | FridgeSeal wrote:
         | It's convention to do this in Julia as well, every package I've
         | used that mutates marks the explicitly mutating function with a
         | ! at the end.
        
           | cestith wrote:
           | This is consistent with Ruby. It often includes both in the
           | core. There's .gsub() and .gsub!() for text substitution in
           | the String class, for example, or .select() and .select!()
           | for the Hash class.
        
             | anamexis wrote:
             | Unfortunately it isn't completely consistent, for example
             | Array#delete and String#concat
        
             | kwertzzz wrote:
             | This is interesting to know, because Julia uses exactly the
             | same approach[1] (now I know where it comes from). To me it
             | seems to be the ideal solution: use the in-place
             | normalize!(X) where performance is important, otherwise use
             | the more convenient allocating variant normalize(X). The
             | exclamation mark makes it clear that the array is modified.
             | 
             | [1] https://docs.julialang.org/en/v1/base/collections/#Base
             | .sum!
        
               | fasquoika wrote:
               | >(now I know where it comes from)
               | 
               | Both Julia and Ruby got this from Scheme and it might be
               | even older than that.
        
         | benibela wrote:
         | I use this convention whenever I can
         | 
         | I spent far too much time yesterday figuring out why nothing
         | was being replaced, when I called replace on a string in
         | Javascript
        
         | jszymborski wrote:
         | in-place operations give me the heebie-jeebies.
         | 
         | I think this is something that is a slightly less obvious when
         | dealing with classes. I see stuff like this in academic code
         | all the time:                 class SomeDataObj:         def
         | __init__(self, data):           self.data = data
         | def normalize(self):           self.data = normalize(self.data)
         | 
         | I'd much prefer for there to be variable `self.data_normalized`
         | which is initialized to None or even better a getter
         | `getNormalizedData()` that computes/caches a normalized
         | variable.
        
           | UncleEntity wrote:
           | The pythonic way would be to have sort() and sorted() which
           | either sort in place or return a sorted copy. Best of both
           | worlds.
        
         | kburman wrote:
         | I like how ruby way of declaration if a method does mutate
         | object or not but a exclamation mark at end to tell it modifies
         | the object
         | 
         | a = a.sort or a.sort!
        
         | chewxy wrote:
         | I write an maintain a generic multidimensional array package
         | for Go, and I had come into the same problems as the OP.
         | 
         | The solution was to dictate that the API is copying. But also
         | provide ways to allow the user to manage their own memory.
         | 
         | e.g.                   a = tensor.New(tensor.WithShape(2,2),
         | tensor.Of(Float64))         b, err := a.Apply(sigmoid)
         | // copies         c, err := a.Apply(sigmoid,
         | tensor.WithReuse(a)) // mutates
         | 
         | Docs: https://godoc.org/gorgonia.org/tensor#Dense.Apply
        
       | DrFell wrote:
       | I have been mutating data for decades without any unexpected
       | behavior gremlins.
        
       | benibela wrote:
       | That is how strings work in Delphi/Pascal.
       | 
       | All strings have a reference count. When you change a string, and
       | the ref count is not one, it is copied before the change. If the
       | ref count is one, it is not copied
        
       | lsb wrote:
       | Does the Numba JIT do this? This definitely seems like a compiler
       | optimization that I want to mechanically run on my code
        
       | zackmorris wrote:
       | Another option is for the language's runtime to use copy-on-write
       | (COW):
       | 
       | https://en.wikipedia.org/wiki/Copy-on-write
       | 
       | So basically every buffer can be made immutable and then only
       | copied when written to, using something like refcounts and diff
       | trees to internally track mutations from an immutable parent
       | buffer. Languages like Clojure due this to manage state. I
       | believe the Redux also does this internally, although I've only
       | studied it and haven't had a chance to use it professionally.
       | 
       | In practice, this looks like a language where everything is
       | value-based instead of reference-based.
       | 
       | So like, in C# where you pass objects by reference, a COW
       | language passes everything by const value. Then an imperative
       | language statically analyzes the code to detect mutation (my own
       | theory is that this can never be accomplished with 100%
       | certainty, at least not with a simple implementation). So
       | basically buffers act like Git instead of byte arrays, creating a
       | new snapshot with every mutation.
       | 
       | Or functional languages can disallow mutation altogether, or
       | reduce the code into its most minimal representation and handle
       | mutation at special breaks in execution (like monads). I'm
       | grossly oversimplifying all of this and probably using the wrong
       | terminology, but am more than 50% confident that I can derive
       | what I'm talking about, so will just go with it hah.
       | 
       | I think that the complex imperative implementation I'm talking
       | about is probably Rust. The simple functional implementation
       | would look like a pure immutable Lisp running within the Actor
       | model, handling state only through IO, and dropping monads
       | altogether. The catch being that complex functional code might
       | have so many breaks in execution that it would practically be
       | happening on every single line and begin to look like an
       | imperative language with all its flaws. This is the primary
       | reason why I shy away from impure functional languages like
       | Haskell, because I don't think they have solved this core issue
       | of mutation clearly enough (not to mention, I think functional
       | language syntax is generally write-only and not readable by
       | others or yourself in 6 months hahah). As far as I'm concerned,
       | this is still an open problem.
       | 
       | In another life, I want to make an imperative language that's
       | primarily immutable, that uses higher order functions (or better
       | yet, statically analyzes code for side effects and converts
       | foreach loops into higher order functions automagically), and
       | transpiles to a purely immutable Lisp. It would be associative
       | array-based and look like Javascript. The idea being that we
       | could write code in the human-readable imperative style and let
       | the computer distill it down to its pure functional equivalent.
        
         | dualogy wrote:
         | > _The idea being that we could write code in the human-
         | readable imperative style and let the computer distill it down
         | to its pure functional equivalent_
         | 
         | Here we are, folks pulling their hairs for decades wrt how to
         | most efficiently trans/compile higher-order FP idioms to
         | minimal-overhead imperative --- and you want to go the other
         | way around!
         | 
         | Kidding aside, the "functional sub-language" of cell-lang.net
         | has a feature like this, functions are pure but you get the
         | imperative sugars (loops, branches) while preserving purity
         | guarantees like referential transparency.
        
         | tome wrote:
         | > not to mention, I think functional language syntax is
         | generally write-only and not readable by others or yourself in
         | 6 months
         | 
         | This is rather tangential, and sorry to hijack your message to
         | get on my soapbox, but I and many others who program in Haskell
         | and a dynamic language (Python in my case) find that Haskell
         | code we've written is _far_ easier to come back to than code we
         | 've written in the dynamic language.
        
       | mrkeen wrote:
       | I'm surprised no-one has mentioned Haskell's ST library yet. It
       | hits this exact use-case, and comes with some pretty sweet
       | safeguards from the type system.
       | 
       | https://wiki.haskell.org/Monad/ST
       | 
       | It allows you mutate arrays and other stuff "on the inside" like
       | this article suggests, but it's also able to wrap up the
       | operation so it appears pure from the outside, so the caller
       | doesn't have to guess or rely on naming conventions to figure out
       | the mutability.
       | 
       | It's your choice on whether you want to do the copy on the way in
       | ('thawing'), on the way out ('freezing'), or both, by using the
       | 'in-place' variants of thaw and freeze.
        
       | alexhutcheson wrote:
       | In other languages this is less of an issue, because function
       | signatures can include information about whether the argument can
       | be modified or not. For example, in C++:                   //
       | Definitely doesn't modify 'array', because 'array' is copied.
       | vector<int> Normalize(vector<int> array);         // Almost
       | certainly doesn't modify 'array'.         vector<int>
       | Normalize(const vector<int>& array);         // Probably *does*
       | modify 'array' (otherwise author would have used         // const
       | reference instead of pointer.)         void
       | Normalize(vector<int>* array);
       | 
       | In Python every function uses the 3rd approach, so you have to
       | signal to the user in some other way whether the argument will be
       | modified or not. In Ruby and some other languages the convention
       | is to append "!" to the function name, but in Python most people
       | seem to just mention it in the docstring.
        
       | anaphor wrote:
       | There are also data structures such as zippers which are
       | immutable but minimize the amount of data you need to copy, at
       | the cost of potentially using more space overall. Zippers are
       | cool because they can be generalized to many different types of
       | data structures as long as they can be expressed as algebraic
       | data types.
        
       | DecoPerson wrote:
       | I'm not super familiar with interior mutablility, but I don't
       | think this is the same "interior mutablility" as used in Rust.
       | 
       | Interior mutablility:
       | 
       | A, immutable, points to B.
       | 
       | B, immutable, points to C.
       | 
       | C, mutable, points to D.
       | 
       | D, immutable, points to E.
       | 
       | Even though A and B are immutable -- B points to C mutably, so
       | you can modify its contents. Some "interior" part of A (and of B)
       | is mutable. You can't change which C is pointed to by B (because
       | B is immutable, or rather we only have an immutable reference to
       | it), but it's open season on the contents of C.
       | 
       | What this article describes:
       | 
       | We have 2 operations.
       | 
       | We are passed in Array A.
       | 
       | We can either:
       | 
       | - Apply both operations to A, modifying it even though the
       | "owner" may not want this.
       | 
       | - Copy before each operation (or as part of them), which leaves A
       | untouched, but requires triple the memory usage.
       | 
       | - Cooy before (or as part of) the first operation only. The
       | second operation can be applied in-place on the copy. This leaves
       | A untouched and only requires double the memory.
       | 
       | They're totally different concepts. I don't know what you'd call
       | the second concept -- I think the important part is that A
       | remains untouched, so... "Avoiding side effects while reducing
       | memory usage", or "How to make performant pure functions" ?
        
         | vnorilo wrote:
         | Reminds me of copy-on-write. Operation can reuse input buffer
         | if it holds the sole reference to it. In the scenario you
         | described, you manually ensure and write this behavior for the
         | rest of the pipeline.
        
         | edflsafoiewq wrote:
         | Yeah, this is just "don't create large temporaries
         | unnecessarily". I'm not sure it warrants a name.
        
           | codebje wrote:
           | It does if you want research done so compilers can implement
           | results in this space and optimise away unnecessary copies
           | without introducing errors.
        
         | DecoPerson wrote:
         | I also see this as a bit of a premature optimization. The
         | triple memory usage one may be significantly faster than the
         | in-place one -- and the memory usage is temporary so unless
         | you're in a constrained environment that's not a concern.
         | 
         | Avoiding unintentional or unexpected side effects is a good
         | thing. Pure functions are great. Do whatever you can do keep
         | your visualization helper thing pure and avoid bugs.
         | 
         | Then, once it works, if you have performance issues, do some
         | profiling.
         | 
         | If the profiling shows this helper method as a hotspot, then
         | it's worth optimizing.
         | 
         | At that point, you'd want to follow standard Python, numpy, and
         | pandas optimization strategies -- which I'm not familiar with.
         | 
         | Aimless optimization like this is a waste of time. Good program
         | structure (data is available when you need it without tonnes of
         | calls), the right choice of data structures and algorithms
         | (hashmap vs list), the right choice of underlying tools (Julia
         | vs Python) and targeted optimization (profiling) are the ways
         | to go.
        
           | kardos wrote:
           | My experience finds that the in-place optimization can indeed
           | speed up numpy code. Even if you're not memory constrained,
           | fewer arrays in play means less memory bandwidth needed.
           | 
           | >> Many NumPy APIs include an out keyword argument, allowing
           | you to write the results to an existing array, often
           | including the original one.
           | 
           | I'd wager that saving memory or memory bandwidth is the
           | impetus for expanding the API to include this option.
        
           | philipkglass wrote:
           | The article starts with "You have a large chunk of data -- a
           | NumPy array, or a Pandas DataFrame -- and you need to do a
           | series of operations on it."
           | 
           | So it's suggesting to use these techniques when you _already
           | know_ that you are working with a large chunk of data, either
           | because it was obvious from the beginning or because
           | profiling highlighted it. In scientific computing it is
           | common to store gigabytes of data in a single array. In fact
           | it is common that even the compromise-solution suggested in
           | this article, which leads to 2x memory instead of 3x, would
           | be too much extra overhead.
           | 
           | But the compromise solution is nice if you _can_ afford 2x
           | memory overhead, so I upvoted the article.
        
         | codebje wrote:
         | The second notion, where it's clear that the memory space can
         | safely be used, is why there's interest in linear and affine
         | type systems. These systems describe types whose value can only
         | be read one time, and for linear types, which must be read one
         | time.
         | 
         | On the surface that can seem a bit stupid, but the practical
         | outcome is that compilers can determine when an array's
         | contents can be mutated in place rather than overwritten.
         | 
         | A variable you read over and over can be thought of
         | theoretically as one you duplicate, then read one copy of.
         | Since the act of reading the copy destroys it, a compiler
         | implements this by merely reading the value directly.
         | 
         | Passing an array into a subroutine however means the compiler
         | either copies it, or forbids further access after passing it
         | in. If it's the latter, the memory space is available for re-
         | use inside the subroutine.
         | 
         | Rust uses affine type theory for its borrow system to provide
         | memory safety. Function language research is looking at linear
         | and affine types for a variety of applications including
         | performance.
        
         | chewbacha wrote:
         | > B points to C mutably
         | 
         | I think you mean that B points to D mutably because C is
         | mutable. The pointing from B to C is locked in place because of
         | the immutability of B. However, C is free to change it's
         | pointers to some other D.
        
         | masklinn wrote:
         | That's also the impression I have, the article really has
         | nothing to do with Rust's _interior_ mutability. And Rust
         | wouldn 't have the issue in the first place because whether it
         | takes the parameter by reference, mutable reference or value
         | would clearly explain the "contract" to the caller.
         | 
         | It does feel a bit closer to Clojure's _transients_ which I
         | guess it 's what it's referring to.
         | 
         | However there are also a number of differences with Clojure:
         | 
         | * Clojure _can not_ completely prevents the caller modifying
         | callee data (at least using  "normal" clojure collections,
         | which are persistent)
         | 
         | * creating and reifying transients is O(1)
         | 
         | * the operation is a fairly trivial mapping of #(/ (- % low) (-
         | high low)), which is likely what you'd have done in Clojure
        
           | eyelidlessness wrote:
           | This definitely made me think of Clojure transients. For
           | reference: https://clojure.org/reference/transients
           | 
           | > If a pure function mutates some local data in order to
           | produce an immutable return value, is that ok?
           | 
           | The idea in Clojure is that data is immutable _by default_
           | but there are function-local mutation APIs specifically for
           | optimization. The contract between caller and callee is still
           | the same, a function never mutates _your_ data, but
           | transients provide improved performance and memory
           | characteristics invisible to the caller.
        
         | shpongled wrote:
         | I think the easier way to think about Rust's interior
         | mutability is via an example: (Cell is a mechanism for interior
         | mutability)                 struct S {          a: Cell<u64>
         | }            struct T {          b: u64        }
         | 
         | To mutate field `b` on `T`, we need a mutable object or
         | reference. However, we can mutate field `a` on `S` via an
         | immutable reference. Typically this is used to encapsulate some
         | kind of interior state that isn't visible outside of the
         | struct's methods, and helps in certain situations where using a
         | mutable reference might cause some hair pulling due to the
         | borrow checker
        
       | wlib wrote:
       | Isn't this solved perfectly with an affine or linear type-aware
       | optimizing compiler? Where all data is immutable unless the types
       | prove to the compiler that there are no mutation conflicts. I
       | believe Rust implements this and Idris 2 does so as well with
       | quantitative type theory.
        
         | masklinn wrote:
         | Rust would be very different and the problem wouldn't exist in
         | the first place, because the function would signal its
         | behaviour and intent through taking the parameter by reference,
         | mutable reference or value.
        
           | wlib wrote:
           | That's what I meant
        
       | ed_balls wrote:
       | Another option is to have a data structure that you have frozen
       | and then you unfroze it for the hot path and then freeze it again
       | when you exit it.
        
       | TheFuntastic wrote:
       | As a game developer working in managed languages (aka Unity/C#)
       | this problem is one of my biggest headaches. I was hoping the
       | article would provide divine insight, unfortunately the recommend
       | solution doesn't really solve the problem (as I see it).
       | 
       | Whilst 2n array alloc is better than 3n, both are creating short
       | lived memory objects. Do that in a hot code path and suddenly
       | you've got tremendous garbage pressure your garbage collector is
       | going to choke on.
       | 
       | One can optimise this by calculating results into a cached array,
       | but that creates opportunities for bugs (results of previous
       | caches persisting). I would dearly love to see a solution that
       | allows me to code in a functional style without sacrificing
       | performance.
        
         | C4stor wrote:
         | On certain use cases, you can use a pool of objects, creating
         | long lived objects drawn and return to a pool, reusing the
         | memory efficiently with minimal overhead. Of course, you need
         | to be able to size such a pool to begin with.
         | 
         | Disclaimer : I have no experience with game development
         | whatsoever !
        
         | vmchale wrote:
         | > I would dearly love to see a solution that allows me to code
         | in a functional style without sacrificing performance.
         | 
         | Uniqueness types would make mutation easier to track: you'd
         | still need to write an imperative style, but less to worry
         | about.
         | 
         | I think the "functional-style" stuff in this area uses
         | compilers (i.e. accelerate or futhark) that can analyze whether
         | copying is necessary because the programmer uses various high-
         | level functional combinators. I think J does something like
         | this as well (?)
        
           | Athas wrote:
           | If you have precise reference counts (which is often
           | achievable in array languages because they don't have
           | cycles), then you can dynamically do efficient in-place
           | updates if the array you are updating has a count of 1.
           | 
           | Uniqueness types are only necessary if you want to make the
           | in-place update a static _guarantee_ (which can be crucial if
           | the code is going to run in an environment where allocation
           | is not possible).
        
         | itamarst wrote:
         | Are you familiar with data-oriented design? The game-developer
         | equivalent of the array-processing scientific programmers do.
         | 
         | I suspect if you look at how they do things you'd find some
         | domain-specific design patterns.
         | 
         | https://en.wikipedia.org/wiki/Data-oriented_design
        
         | unlinked_dll wrote:
         | >results of previous caches persisting
         | 
         | You clear the buffer or capture it inside an optional type.
         | When you're done with the buffer you can reset it to the
         | default state to be used again.
         | 
         | Most of the issues are logical bugs though, not flaws with the
         | pattern. You just need to reason about the computation that
         | you're doing to preallocate exactly how much data you need
         | outside your hot loops.
         | 
         | You can read some audio source code if you want, this is an
         | extremely common pattern. Preallocate for your computation up
         | front by exactly how much you need (and often with cache-
         | friendly data structures) and don't get crafty reusing buffers
         | where you don't need to.
        
         | hderms wrote:
         | Imo you have to allocate _something_ else if you are going with
         | an immutable paradigm. You can do clever sharing of data
         | structures, but ultimately if you 're modifying the whole
         | thing, you're modifying the whole thing and sharing can only go
         | so far. It seems like further advances in this space either
         | require giving the compiler a better understanding of what is
         | "actually" shared and in what way (i.e. Rust and knowing when
         | mutations are externally visible), or making copying cheaper.
         | 
         | Mutability is always (at least as current computer
         | architectures are concerned) going to be faster in a general
         | sense.
        
           | vmchale wrote:
           | > It seems like further advances in this space either require
           | giving the compiler a better understanding of what is
           | "actually" shared
           | 
           | In the context of arrays, it helps when programmers use
           | higher-order combinators/functions, e.g. maps.
           | 
           | There is a mutable way to do a map, and an immutable way, and
           | which should be used in the compiled program depends on
           | analysis of where the variable is used.
        
         | agumonkey wrote:
         | Do game devs write custom allocators ?
        
           | danielscrubs wrote:
           | Almost always (for big budget ones).
        
         | twiceaday wrote:
         | If you can't find a good implementation for an interface
         | perhaps the problem is the interface. Inline the function,
         | refactor, then express the result via re-usable components.
         | Ultimately it is a problem of code re-use vs performance. If
         | you want divine insight it is probably to use lazy evaluation,
         | so for the example in the article express normalize as a
         | generator.
        
         | strictfp wrote:
         | Arena style allocation helps. You still get the allocations,
         | but GC is cheap. IMO both Java and the CLR are in desperate
         | need of multiple heaps with separate GCs to solve these
         | problems. One big heap simply doesn't scale well. Erlang is the
         | only example I know of that does this right. I know that the
         | golang GC is low-latency, but I'm not familiar with if it can
         | sustain high allocation rates.
        
           | lostcolony wrote:
           | Nope.
           | 
           | Golang still uses one global heap; it just allows for stack
           | based allocation in situations it can do it (avoiding putting
           | stuff on the global heap), so copies _can_ be cheap re: GC.
        
             | jen20 wrote:
             | The CLR also allows stack allocation, and many of the newer
             | .NET Core libraries have been built with a focus on
             | eliminating unnecessary allocations.
        
       | perfunctory wrote:
       | > To reduce memory usage, you can use in-place operations like +=
       | 
       | btw, I think it's a design flaw in python that `a += b` is not
       | always equivalent to `a = a + b`.
        
         | notduncansmith wrote:
         | As someone who has never used Python seriously I have to ask...
         | why not?
        
           | masklinn wrote:
           | As an optimisation, you can override += separately from +. A
           | pretty well known oddity related to this is:
           | >>> a = ([0],) # a tuple (immutable) containing a list
           | (mutable)         >>> a[0] = a[0] + [1]         Traceback
           | (most recent call last):           File "<stdin>", line 1, in
           | <module>         TypeError: 'tuple' object does not support
           | item assignment         >>> a         ([0],)         >>> a[0]
           | += [1]         Traceback (most recent call last):
           | File "<stdin>", line 1, in <module>         TypeError:
           | 'tuple' object does not support item assignment         >>> a
           | ([0, 1],)
           | 
           | Though to be fair, _technically_ nothing prevents  "+" from
           | mutating either operator (it'd just get you tarred and
           | feathered).
        
             | Sohcahtoa82 wrote:
             | Wait...did it throw an exception, yet still add the item to
             | the list?
             | 
             | EDIT: Tried it myself. Sure enough, it throws the
             | exception, but still adds the item. This seems like a bug.
             | 
             | EDIT Part 2: I understand now. `a[0] += [1]` is internally
             | translated to `a[0] = a[0].__iadd__([1])`. The right-hand
             | side is interpreted, mutating a[0] in-place as expected.
             | But then the re-assignment happens, which throws the
             | exception since you're trying to re-assign a value in a
             | tuple, which is immutable.
        
           | falkaer wrote:
           | Python defines += as the __iadd__ method which is _sometimes_
           | mutating, i.e. if the object being assigned to is mutable or
           | not.
        
           | perfunctory wrote:
           | Because I once learned that `a += b` is just a shortcut for
           | `a = a + b`, just a syntax sugar. Now I have to constantly
           | remind myself that it's not the case in python.
           | 
           | edit: I might have misunderstood your question. If you meant
           | "why it's not equivalent" please see the falkaer's answer.
        
             | im3w1l wrote:
             | They aren't equivalent in C++ either.
        
               | perfunctory wrote:
               | My C++ is a bit rusty. Does it manifest itself in the
               | standard library as well? The problem with python (in my
               | view) is that this behaviour is implemented in the
               | standard library and therefor propagates to the 3rd-party
               | libraries by convention.
        
       | roywiggins wrote:
       | This seems more like "defensive copying" than "interior
       | mutability" to me.
       | 
       | Ideally you'd have a compiler that could perform optimizing magic
       | to detect when the copy isn't needed, but that's not going to
       | happen in CPython.
        
       ___________________________________________________________________
       (page generated 2020-01-09 23:00 UTC)