[HN Gopher] Rust: Dropping heavy things in another thread can ma... ___________________________________________________________________ Rust: Dropping heavy things in another thread can make your code 10000x faster Author : timooo Score : 280 points Date : 2020-05-30 16:53 UTC (6 hours ago) (HTM) web link (abramov.io) (TXT) w3m dump (abramov.io) | heftig wrote: | If I seriously wanted to move object destruction off-thread, I | would use at least a dedicated thread with a channel, so I could | make sure the dropper is done at some point (before the program | terminates, at the latest). It also avoids starting and stopping | threads constantly. | | Something like this: https://play.rust- | lang.org/?version=stable&mode=debug&editio... | | You could have an even more advanced version spawning tasks into | something like rayon's thread pool, I assume. | ReactiveJelly wrote: | Someone is working on this as a direct response to this blog: | | https://www.reddit.com/r/rust/comments/go4xcp/new_crate_defe... | | And yes, spawning a thread for every drop is horrible. It's | just to prove the concept. The defer_drop crate uses a global | worker thread. | Ididntdothis wrote: | I used to do this sometimes with C++ when I realized that | clearing out a vector with lots of objects was slow. Is Rust | basically based on unique_ptr? One problem with this approach was | that you still had to wait for these threads when the application | would shut down. | masklinn wrote: | > Is Rust basically based on unique_ptr? | | Rust is based on ownership and statically checked move | semantics (by default though can be opted out). So each item | has a single owner (which is why Rust deals _very badly_ with | graphs, and more generally any situation where ownership is | unclear) and the compiler will prevent you from using a moved | object (unlike C++). | | Separately it has a smart pointer which is the dual of | unique_ptr (Box), with the guarantee noted above: | let b = Box::new(1); drop(b); println!("{}", | b); | | will not compile because the second line _moves_ the box, after | which it can 't be used because it's been removed entirely from | this scope. | wnoise wrote: | > which is why Rust deals very badly with graphs, and more | generally any situation where ownership is unclear | | To be fair, so do 90+% of programmers. Much of rust's benefit | in safe code is training programmers to avoid code like that | where possible, and spreading design patterns that avoid it. | saagarjha wrote: | Rust basically gives the compiler understanding of unique_ptr | and prevents you from using it after you've moved it. | Ididntdothis wrote: | Would you have to keep track of these threads in Rust? I have | done a lot of desktop development where you have to be aware | of what happens during shutdown. Seems a lot of server guys | write their code under the assumption that it will never shut | down. | pornel wrote: | You would need to add `thread.join()` at the end of main, | or have some RAII guard that does it for you. | | In practice that's probably optional, because the heap and | all resources are usually torn down with the process | anyway. Important things, like saving data or committing | transactions, shouldn't be done in destructors. | firethief wrote: | The answer to this question is the same for any language | without a heavy runtime. You can choose to join the worker | thread, detach it, or kill it. | [deleted] | zozbot234 wrote: | > One problem with this approach was that you still had to wait | for these threads when the application would shut down. | | If you know that an object will live for the rest of the | program and not need any finalization logic, Rust allows you to | "leak" it and save that overhead on shutdown. | ordu wrote: | You could have just one thread and to kill it at exit. Do not | start new threads for each closure that drops object, send | closures into one special thread instead. | qcoh wrote: | Out of curiosity, how did you do that in C++? | Ididntdothis wrote: | It depends. Either iterate over the vector and delete the | objects or just call clear(). Obviously you have to be sure | that nobody else is accessing it at the same time. | jkoudys wrote: | It'd be interesting to implement this on a type that would defer | all of these drop threads (or one big drop threads built off a | bunch of futures) until the end of some major action, like | sending the http response on an actix-web thread. Could be a | great way to get the fastest possible response time, since then | the client has their response before any delay on cleanup. | epage wrote: | For those wanting a real world example where this can be useful: | | I am writing a static site generator. When run in "watch" mode, | it deletes everything and starts over (I'd like to reduce these | with partial updates but can't always do it). Moving that cleanup | to a thread would make "watch" more responsive. | firethief wrote: | Why can't it cleanup right after the work? | pmontra wrote: | Or no cleanup at all. A CLI command that runs for a very | short time can allocate memory to perform its job, print the | result and exit. Then the OS releases all the memory of the | process. No idea if Rust can work like this. | ReactiveJelly wrote: | "Watch mode" for static site gens would mean you leave the | process running and let it rebuild the site whenever a file | changes, probably 10s to 100s of times in a typical run | estebank wrote: | std::mem::forget, which doesn't run destructors: | | https://doc.rust-lang.org/std/mem/fn.forget.html | elcomet wrote: | That's not really the same issue that is mentionned in the | article though, is it ? | | The issue from the article would be solved by just passing a | reference to the variable. | | In your case, cleanup is an action that _needs_ to be done | before writing new files. So you have to wait for cleanup | anyway, don 't you ? | [deleted] | cs702 wrote: | In other words, Rust's automagical memory deallocation is NOT a | zero-cost abstraction: fn get_len1(things: | HeavyThings) -> usize { things.len() } | fn get_len2(things: HeavyThings) -> usize { let len = | things.len(); thread::spawn(move || drop(things)); | len } | | The OP shows an example in which a function like get_len2 is | 10000x faster than a function like get_len1 for a hashmap with 1M | keys. | | See also this comment by chowells: | https://news.ycombinator.com/item?id=23362925 | DasIch wrote: | Nothing about how Rust handles deallocation is magical in any | way. | | It's also definitely a zero-cost abstraction as I can see | because the manual solution that's equivalent to get_len1() | would be to essentially call free() on things. That would | ultimately suffer from the same problem. | cs702 wrote: | Yeah, you're right. In hindsight this was a poorly thought- | out and poorly written post on my part. | dathinab wrote: | No the zero-cost refers to the abstraction (and runtime cost), | which still is zero-cost. Deallocating is part of the normal | work load not the abstraction. | | Also this isn't rust specific. Most (all?) RAII languages are | affected and many GC approaches have this effect, too. Some do | add _additional_ abstraction to magically always or sometimes | put the de-allocation into another thread. | | But de-allocating in another thread is not generally good or | bad. There are a lot of use-cases where doing so is rather bad | or can't be done (in case TLS is involved). Rust and other | similar RAII languages at least let you decide what you want to | do. | | Now it's (I think) generally known that certain kinds (not all) | of GC do make some thinks simpler for GUI-like usage. Through | they also tend to have less control. | | Note that it's a common pattern for small user CLI facing tools | (which are not GC'ed) to leak resources instead of cleaning | them up properly. You can do so in rust too if you want but | it's a potential problem for longer running applications. | | Also here is a faster get `get_len` then both which is also | more idiomatic rust then both: | | ``` fn get_len1(things: &HeavyThings) -> usize { things.len() } | ``` | | If you have a certain thread (e.g. UI thread) in which you | never want to do any cleanup work you can consider using a | container like: | | ``` struct DropElsewhere<T: Send>(pub Option<T>); impl<T: Send> | Drop for DropElsewhere<T> { fn drop(&mut self) { if let | Some(value) = self.take() { thread::spawn(move || drop(value)); | } } } ``` | | You can optimize this with `ManualDrop` to have close to zero- | runtime overhead (removes the `take` and `if let` part). | cs702 wrote: | > No the zero-cost refers to the abstraction (and runtime | cost), which still is zero-cost. Deallocating is part of the | normal work load not the abstraction. | | Yeah, you're right. In hindsight my comment was poorly | thought-out and poorly written. | floppy123 wrote: | Why should i ever need to drop a heavy object for only getting a | size? Not in C++ and also not in Rust, the diffent thread idea is | just creativ stupidity, sorry | fpgaminer wrote: | Some important things I think people should note before blindly | commenting: | | * The example code is obviously contrived. The real gist is that | massive deallocations in the UI thread cause lag, which the | example code proves. That very thing can easily happen in the | real world. | | * I didn't see any difference on my machine between a debug build | and a release build. | | * The example is preforming 1 _million_ deallocations. That's why | it's so pathological. It's not just a "large" vector. It's a | vector of 1 million vectors. While that may seem contrived, | consider a vector of 1 million strings, something that's not too | uncommon, and which would likely suffer the same performance | penalty. | | * Rust is not copying anything, nor duplicating the structures | here. In the example code the structures would be moved, not | copied, which costs nothing. The deallocation is taking up 99% of | the time. | | * As an aside, compilers have used the trick of not free-ing data | structures before, because it provides a significant performance | boost. Instead of calling free on all those billions of tiny data | structures a compiler would generate during its lifetime, they | just let them leak. Since a compiler is short lived its not a | problem, they get a free lunch (pun unintended), and the OS takes | care of cleaning up after all is said and done. My point is that | this post isn't theoretical, we do deallocation trickery in the | real world. | papaf wrote: | This deallocation trick is neat but in C and C++ you could use | a memory pool to do this. | | In theory, you could also use a memory pool in Rust but I think | the standard library uses malloc without some way of overriding | this behaviour. | CoolGuySteve wrote: | Even just keeping a free list and deallocating it's elements | at an idle time is probably cheaper and faster than spawning | a thread. | wongarsu wrote: | In a toy raytracer I once wrote in C++, switching from malloc | to custom memory pools for small fixed-size objects was a big | performance boost. Making free() a noop was another big | performance boost, both for deallocation and allocation. | Turns out sequentially handing out memory from a big chunk of | memory is much easier than keeping track of and reusing empty | slots, and it keeps sequentially allocated objects in | sequential memory locations. | folmar wrote: | jemalloc is the answer to the question you did not ask | estebank wrote: | In Rust you could just call `mem::forget` on whatever heavy | thing that you're no longer using is before it would get | dropped, but then the programmer is effectively responsible | for that memory leak not becoming a _problematic_ leak during | refactors. | | Edit: this will also break any code that _relies_ on Drop | being called for clean up, but that is already a | "suspect"/incorrect pattern because there are no assurances | that it will ever run. | mcguire wrote: | Isn't that a problem for the RAII approach? | kd5bjo wrote: | > There are no assurances that it will ever run. | | Yes and no. Whenever control leaves a code block, Rust | automatically calls the drop() method of all values still | owned by that block. There is no guarantee that control | will exit every block (cf. Turing), but a moderately | exceptional circumstance needs to occur for this not to | happen, like an infinite loop. | rcxdude wrote: | There's also no guarantee the object won't be moved | elsewhere, including into a context which never gets | freed (for example, you can construct mem::forget using | entirely safe code by forming a cycle of reference- | counted boxes). That said, Rust has support for such a | concept through the Pin trait (which essentially | guarantees an object will not get moved until it is | 'Unpinned'). | jomohke wrote: | You can easily use a custom global allocator in Rust: | #[global_allocator] static GLOBAL: MyAllocator = | MyAllocator; | [deleted] | orf wrote: | You can change the global allocator in any rust project. You | can write your own easy enough, or use one like jemalloc | bauerd wrote: | Think with LD_PRELOAD it is always possible to override the | allocator? | dan-robertson wrote: | Only if it is dynamically linked | pjmlp wrote: | In C++/WinRT the same approach is taken, because you cannot | just use a memory pool for COM. | projektfu wrote: | Yeah, I like Apple's (Next's) approach of pool allocation for | each run through the event loop. Defer dealloc, drop pool at | the end. | saurik wrote: | Apple's pools are for helping manage reference counts of | returned objects (via autorelease) but aren't doing pool | allocation (as any of those objects could escape, so you | can't do the fast thing of just deallocating the pool). The | normal memory allocator is used while in the scope of an | autorelease pool. | projektfu wrote: | True, I was thinking the same after I posted. Maybe I'm | thinking more of Apache's approach and how it gets some | of the benefits of a fork-per-request server without the | overhead. | mcguire wrote: | Unless your destructors do more than deallocation, in which | case you will leak whatever other resource you're managing. | MaxBarraclough wrote: | A pool can invoke destructors when it is cleared. Might | take a bit of overhead (if the pool is to support | arbitrary classes), but you could retain the fast | pointer-bump allocation. | vvanders wrote: | You can also use the typed-arena crate[0] or roll your own if | you're feeling like cracking open unsafe. | | [0] https://crates.io/crates/typed-arena | rubber_duck wrote: | >As an aside, compilers have used the trick of not free-ing | data structures before, because it provides a significant | performance boost. Instead of calling free on all those | billions of tiny data structures a compiler would generate | during its lifetime, they just let them leak. Since a compiler | is short lived its not a problem, they get a free lunch (pun | unintended), and the OS takes care of cleaning up after all is | said and done. My point is that this post isn't theoretical, we | do deallocation trickery in the real world. | | And then someone tries to use your compiler as a service (code | analysis, change triggered compiler) and it's a dead end | qzw wrote: | Well, then that's not the original use case anymore, and | it'll have to be re-engineered. In the meantime it may have | been used for years and the perf difference may have saved | many developer-years collectively across its user base. | Surely you're not suggesting that the compiler developers | should be prematurely optimizing for future use cases that | they may not even have envisioned. | pdimitar wrote: | I am suggesting they apply good practices. I'd never | imagine that compilers were actually doing what was stated | -- sounds awful. | | I understand it's tradeoffs and we all have real-world | limitations to contend with -- but again, of all the | corners that could be cut that's exactly the one I didn't | imagine they would. | | Nasty. | rcxdude wrote: | Deallocation at the end of a program's execution can | substantially add to its runtime, and it's entirely | waste. It's a much more common strategy than you might | think. | pdimitar wrote: | You are right, I indeed didn't know it was that common. | | But still, in a world where languages and runtimes are | also judged by their ability to run in lambda/serverless | setups, I'd think this practice will start being | obsolete, wouldn't you think? | | (What I mean is that I imagine that any serverless | function that runs in severely constrained and measured | environments like the AWS Lambda would gain a significant | edge over the competition if it did an eager cleanup. | Should allow more of them to work in parallel?) | folmar wrote: | Most compiler are not designed to run in daemon mode, | specifically it's non-issue since their startup is | normally fast. And the compiler and runtime are a | different thing. | pdimitar wrote: | I realise that, but nowadays language servers are a | pretty normal practice in no small amount of areas. | thebean11 wrote: | Can you articulate why it's a bad practice? If it works | better than alternatives and it's documented, not really | sure what the issue is. | | I don't think it's even that uncommon. I believe some HFT | firms run Java with a huge amount of RAM and GC disabled, | and get around it by just rebooting the software | occasionally. | | To me writing software like that is fair game, I don't | see the point in being dogmatic about "how things should | be done". | pdimitar wrote: | Mostly because I look at it from the angle of one-off / | general purpose / CLI programs. If one such has to run | for 10-30 seconds and its memory just keeps growing and | growing with the idea of throwing it all away at the end | and letting the OS handle it, it might become disruptive | for other programs on the machine. | | For specialised apps and servers it's of course a | perfectly good practice. | rubber_duck wrote: | Avoiding leaks is not optimisation, it's a matter of | correctness - not freeing memory is an optimisation based | on a very shortsighted assumption that is not practical for | any new language (modern languages are expected to come | with language server support) | random314 wrote: | You have not provided any refutation to the OPs argument. | jashmatthews wrote: | Why do you say that? Even if you call free immediately | after a piece of memory is no longer needed, malloc won't | release that immediately anyway. | | If this is incorrect, then every modern malloc | implementation is incorrect. | jpitz wrote: | Correctness means adherence to the spec, not some | contrived absolute truth. | yrro wrote: | As distasteful as leaky code is, is it that bad to run it in | a separate process? You get a bit more robustness against | crashes as well. | rubber_duck wrote: | You want to be able to incrementally update state to get | performance out of incremental code analysis (eg. language | server implemention for IDE) | folmar wrote: | There are more ways to go then that: AST fragment | caching, intermediate representation fragment caching and | so on. Incremental updates fit some languages better than | the others. | ycombobreaker wrote: | In a world where processes can fork-and-exec, nothing about | "as a service" changes that. The compiler would just be | reinvoked as needed. Converting it into a persistent process | breaks a lot more than just allocation optimizations. | rubber_duck wrote: | But you want to share state and only update state | incrementally on edit to get any reasonable level of | performance for stuff like language server code analysis. | all-fakes wrote: | I'm not sure what the performance overhead would be, but | that state can easily be stored on disk. | ckcheng wrote: | >As an aside, compilers have used the trick of not free-ing | data structures before, because it provides a significant | performance boost. Instead of calling free on all those | billions of tiny data structures a compiler would generate | during its lifetime, they just let them leak. Since a compiler | is short lived its not a problem, they get a free lunch (pun | unintended), and the OS takes care of cleaning up after all is | said and done. My point is that this post isn't theoretical, we | do deallocation trickery in the real world. | | This reminds me of the exploding ultimate GC technique [1]: | | > on-board software for a missile...chief software engineer | said "Of course it leaks". ... They added this much additional | memory to the hardware to "support" the leaks. Since the | missile will explode when it hits its target or at the end of | its flight, the ultimate in garbage collection is performed | without programmer intervention. | | [1]: | https://devblogs.microsoft.com/oldnewthing/20180228-00/?p=98... | hinkley wrote: | One of my "favorite" snags in perf analysis is that periodicity | in allocations can misattribute the cost of allocations to the | wrong function. | | If I allocate just enough memory, but not too much, then pauses | for defragmentation of free space may be costed to the code | that calls me. | | A solution to this that I've seen in soft real time systems is | to amortize cleanups across all allocations. Every allocation | performs n steps of a cleanup process prior to receiving a | block of memory. In which case most of the bad actors have to | pay part of the cost of memory overhead. | | Might be good for Rust to try something in that general realm, | or in the cleanup side may be easier to tack on. On free, set a | ceiling for operations and queue what is left. That would at | least peak shave. | rini17 wrote: | Doing unrelated cleanup sounds like flushing CPU cache per | every allocation. | hinkley wrote: | This post is already about unrelated cleanup, so I'm not | sure what other escape hatch you imagine people taking. Do | you have a suggestion? | | You can tell that it's unrelated cleanup because if it were | related, then the cost of freeing wouldn't be noteworthy. | It would be cache hot because you would be visiting it for | the second time. In which case we'd be talking about why | you are scanning a giant object on an event loop in the | first place. That's not what's at issue. What's at issue is | that you've been handed this great bomb of uncached data | from someone else and now you're stuck doing the janitorial | work. | | Freeing an object of arbitrary size is effectively an | unbounded operation. Cache invalidation has a very high | cost, sure, but it's still bounded. | | Putting a limit on the amount of work you do, you could | stop before purging the entire cache. You could use a | smaller limit on doing work for someone else and control | invalidation there, too. | indemnity wrote: | Isn't a Rust "move" implemented as a bit wise copy (e.g. memcpy | call)? I see people claiming move has no cost but I'm not sure | that is true. | steveklabnik wrote: | Semantically, it is a bit wise copy, yes. | | However, these copies can often be elided by optimizations. | dathinab wrote: | What is bit-wise copied is the pointer to the memory. | | I.e. a `HashMap` struct, or `Vec` struct don't directly | contain the data. | | For example the `Vec` is defined internally as something | similar to: | | `struct Vec<T> { data: *mut [T], capacity: usize, len: usize, | marker: PhantomData<T> }` | | (Slightly simplified, not actual Vec type). | | So a move of a Vec copies at most 3 usize (24 bytes on 64bit | systems), similar thinks apply for a HashMap. | | Additionally the copy can often be elided through compiler | optimizations. | | As a interesting side note a new empty Vec/HashMap will not | actually allocate any memory, only once elements get added it | will start doing so. This is why it crates vec's of vecs of | length 1. Or else it wouldn't need to do "number of element" | free calls. | justinpombrio wrote: | EDIT: See kevincox's reply. Rust will bitwise copy the | _containing type_ , which is typically very cheap. For | example, it you move a String, it will copy the String | struct, which contains a couple pointers and a length (or | something along those lines). Importantly, it will _not_ copy | the underlying char array. | | I was thinking of the following code, where I believe the | assignment to y is actually free. Though apparently this | isn't called a "move". let x = <<large | owned type like [char; 1000]>>; let y = x; | | More info: https://doc.rust-lang.org/rust-by- | example/scope/move.html | ahupp wrote: | That's not exactly true. In C++ terms, the example code is | moving a value: std::map<...> foo; | someFunction(std::move(foo)); | | And not moving a pointer like: | std::unique_ptr<std::map<...>> foo = ...; | someFunction(std::move(foo)); | | So it copies sizeof(std::map<...>), not a pointer. | kevincox wrote: | Semantically it is a bitwise copy. | | In the example from the article it is probably actually a | copy because the value was originally on the parent | thread's stack, which will be reused after the function | returns, so the value will need to be copied to the new | thread's stack. | | However it is important to not that it isn't a | deep/recursive bitwise copy. It just needs to copy the | HashMap itself (which is probably a handful of words). | | So yes, it is doing a bitwise copy, but this is also very | cheap. It will be much, much cheaper than spawning the | thread. | staticfloat wrote: | It seems that this would be a great reason to not pass the entire | heavy object through your function, and to instead pass it as a | reference. When passing an object (rather than a reference to an | object) there's a lot more work going on both in function setup, | and in object dropping. I'm not a rust guru, so I don't know the | precise wording, but it's simple enough to realize that if this | function, as claimed, must drop all the sub-objects within the | `HeavyObject` type, then those objects must have been copied from | the original object. | | If you instead define the function to take in a reference (by | adding just two `&` characters into your program), the single- | threaded case is now almost 100x faster than the multithreaded | case. | | Here's a link to a Rust Playground with just those two characters | changed: https://play.rust- | lang.org/?version=stable&mode=debug&editio... | | Note that the code that drops the data in a separate thread is | not timing the amount of time your CPU is spinning, dropping the | data. So while this does decrease the latency of the original | thread, the best solution is to avoid copying and then freeing | large, complex objects as much as possible. While it is of course | necessary to do this sometimes, this particular example is just | not one of them. :) | | As an aside, I'm somewhat surprised that the Rust compiler isn't | inlining and eliminating all the copying and dropping; this would | seem to be a classic case where compiler analysis should be able | to determine that `a.size()` should be computable without copying | `a`, and it should be able to eliminate the function call cost as | well. Manually doing this gives the exact same timing as my gist | above, so I assume that this is happening when passing a | reference, but not happening when passing the object itself. | heftig wrote: | As already mentioned, Rust wasn't copying anything; the | `HashMap` is not a `Copy`-able type, so it was just moved | around (it's also not very large: all its items are behind a | pointer to the heap). | | All you did was move the drop from the | `fn_that_drops_heavy_things` to the end of `main`, where it is | outside the timing function. | heavenlyblue wrote: | If your function takes a reference to the object, something | still needs to free it. | fpgaminer wrote: | Rust isn't copying anything; everything in the original code | would be a move. | cesarb wrote: | Just be careful, because moving heavy things to be dropped to | another thread can change the _semantics_ of the program. For | instance, consider what happens if within that heavy thing you | had a BufWriter: unless its buffer is empty, dropping it writes | the buffer, so now your file is being written and closed in a | random moment in the future, instead of being guaranteed to have | been sent to the kernel and closed when the function returns. | | And it can even be worse if it's holding a limited resource, like | a file descriptor or a database connection. That is, I wouldn't | recommend using this trick unless you're sure that the only thing | the "heavy thing" is holding is memory (and even then, keep in | mind that _memory_ can also be a limited resource). | usefulcat wrote: | It seems like the caller should ensure that the buffer is | written before giving away ownership. Also, what happens if | there is an error writing during finalization/destruction/etc? | Seems like you'd want to find out about such errors earlier if | at all possible. | lostmyoldone wrote: | I only know a very little rust, but since it's generally a good | practice to never defer writing (or other side effects) to an | ambiguous future point in time - with memory allocations as the | only plausible exception - is there any way in rust to make | sure one doesn't accidentally move complex objects with drop | side-effects into other threads? | | Granted the way the type system work you usually know the type | of a variable quite well, but could this happen with opaque | types? | | I'm very much out of my depth, but it felt like one of those | things that could really bite you if you are unaware, as | happened with finalizers in Java decades ago. | masklinn wrote: | > I only know a very little rust, but since it's generally a | good practice to never defer writing (or other side effects) | to an ambiguous future point in time - with memory | allocations as the only plausible exception - is there any | way in rust to make sure one doesn't accidentally move | complex objects with drop side-effects into other threads? | | If you're the one creating the structure, you could opt it | out of Send, that'd make it... not sendable. So it wouldn't | be able to cross thread-boundaries. For instance Rc is !Send, | you simply can not send it across a thread-boundary (because | it's a non-threadsafe reference-counting handle). | | If you don't control the type, then you'd have to wrap it | (newtype pattern) or remember to manually mem::drop it. The | latter would obviously have no safety whatsoever, the former | you might be able to lint for I guess, though even that is | limited or complicated (because of type inference the | problematic type might never get explicitly mentioned). | the8472 wrote: | Considering that writing files can also block the process you | probably don't want to have that in your latency-sensitive | parts either, so you'll have to optimize that one way or | another anyway. | | For the more general problem you have can also dedicate more | threads to the task or apply backpressure. | chowells wrote: | This is the standard problem with tracing data structures to free | them. You frequently run into it with systems based on | malloc/free or reference counting. The underlying problem is that | freeing the structure takes time proportional to the number of | pointers in the structure it has to chase. | | Generational/compacting GC has the opposite problem. Garbage | collection takes time proportional to the live set, and the | amount of memory collected is unimportant. | | It's actually a lot to be said for rust that the ownership system | lets you transfer freeing responsibility off-thread safely and | cheaply in order to not have it block the critical path. | | But overall, there's nothing really unexpected here, if you're | familiar with memory management. | loufe wrote: | I've not worked with any language thus far without automatic | garbage collecting, so this was definitely a neat read for me. | It sounds rather elegant. | Reelin wrote: | > the ownership system lets you transfer freeing responsibility | off-thread safely and cheaply in order to not have it block the | critical path | | This can also trivially be done in other languages. Atomically | append your pointer to a queue of "large things that need to be | freed" and move on as though you had actually called free. | | Within a particularly time sensitive loop you can even opt to | place pointers into a preallocated array locally. Then once per | loop iteration swap that array with the thread handling the | deallocations for you. It eats up a bit of CPU time but can | significantly reduce latency. | im3w1l wrote: | A lot of C++ code depends on deallocation order for | correctness. Like a destructor may want to say bye-bye to a | pointed-to-object, and if you reverse order of deallocation, | that pointer may be dangling. | | Consider this code { Window a; | ClickHandler* b = new ClickHandler(&a); delete b; | } | | Let's say b tries to deregister itself when it's deleted. | This code will work as written. But if you defer the deletion | of b, then stack allocated Window a may already be gone. | jacobparker wrote: | OP said safely; what you're describing isn't safe in, say, | C++ in the same sense that it is in Rust. | Reelin wrote: | Isn't that essentially a tautology? Manual memory | management and threading in such languages lacks safety | guarantees to begin with. | arcticbull wrote: | > This is the standard problem with tracing data structures to | free them. You frequently run into it with systems based on | malloc/free or reference counting. The underlying problem is | that freeing the structure takes time proportional to the | number of pointers in the structure it has to chase. | | That doesn't seem to make intuitive sense. A GC has the same | problem. | | A garbage collector has to traverse the data structure in a | similar way to determine whether it (and it's embedded keys and | values) are part of the live set or not, and to invoke | finalizers. You're beginning your comparison after the mark | step, which isn't a fair assessment since what Rust is doing is | akin both both the mark and sweep phases. | | The only way to drop an extensively nested structure like this | any faster than traversing it would be an arena allocator, and | forgetting about the entire arena. | | The difference between a GC and this kind of memory management | is that the GC does the traversal later, at some point, non- | deterministically. Rust allows you to decide between | deallocating it in place, immediately, or deferring it to a | different thread. | Reelin wrote: | > The only way to drop an extensively nested structure like | this any faster than traversing it would be an arena | allocator, and forgetting about the entire arena. | | Isn't that incompatible with RAII though? | arcticbull wrote: | You can handle this in Rust pretty neatly with lifetimes. | There's a bunch of crates that do this. [1] | | [1] https://crates.io/crates/typed-arena | Reelin wrote: | That's a neat library but as far as I can tell it doesn't | avoid any traversal or cleanup code. It appears to delay | the cleanup so it all happens at once. That's certainly | useful, but if you have RAII the traversal still has to | happen at some point. | winstonewert wrote: | It avoids it _if_ your type has no-op drop | implementation. So if you use typed-arena for objects | which don 't own resources, they all get dropped in one | massive deallocation and don't have to be traversed. | | EDIT: and then I noticed that you mentioned RAII... | Right, if the object own some sort of resources that | doesn't apply. | Reelin wrote: | No worries. And to clarify, in context the point is that | traversal fundamentally can't be avoided in the case of | RAII. This defeats (what I see as) the primary use case | of an arena allocator - deallocating an arbitrarily large | chunk of contiguous memory in O(1) time regardless of | object count. | | Of course this is all somewhat tangential to the original | topic of generational GCs, where the RAII idiom also has | significant negative impacts. The performance | characteristics would otherwise be O(n) based on the live | set and thus similar to an arena allocator in terms of | the ability to dispose of an arbitrarily large number of | objects efficiently. | winstonewert wrote: | I'm not sure its fair to say it defeats that use case. It | only defeats that use case if you need RAII. Furthermore, | in my experience, an arena is most useful when you | allocate lots of small objects which is the least likely | to case to need RAII. | chowells wrote: | I said generational/compacting collector. You're talking | about a mark and sweep collector. | | A generational/compacting collector traverses pointers from | the live roots, and copies everything it finds to the start | of its memory space, and then declares the rest unused. If | there is 1GB of unused memory, it's irrelevant. Only the | things that can be reached are even examined. | | As I said, this has the opposite problem. When the live set | becomes huge, this can drag performance. When the live set is | small, it doesn't matter how much garbage it produces, | performance is fast. | arcticbull wrote: | How are finalizers invoked if the structure isn't | traversed? Would it just be optimized away none of the | objects have finalizers? Hence my suggestion about the area | allocators being a better point of comparison. | mcguire wrote: | Maintain a collection of objects with finalizers | associated with the generation; when an object is moved | out of the generation, move the finalizer record with it. | Then, just before the generation is discarded, run the | finalizers. | | If the finalizers do something stupid like resurrect the | object, have the runtime system notify someone with the | authority to go beat the programmer with a stick. | Reelin wrote: | My understanding is that finalizers are special cased for | a generational GC, have a tendency to introduce | significant overhead, and (as with any GC scheme) | generally run at unpredictable times. My impression is | that RAII idioms are strongly discouraged in conjunction | with most GC ecosystems. | pcl wrote: | Many languages allow a class to avoid declaring a | finalized, for just this reason. | | The JVM is notable in this regard. And thus, most classes | that compile to Java bytecode offer no-finalizer | semantics. | zucker42 wrote: | Java is an example of a language with a generational copy | collector by default. Most objects in Java don't have a | finalizer, since after all the main point of, for | example, destructors in C++ is to make sure you don't | leak memory, which the GC solves. But when the `finalize` | method is used is causes significant overhead. | | > Objects with finalizers (those that have a non-trivial | finalize() method) have significant overhead compared to | objects without finalizers, and should be used sparingly. | Finalizeable objects are both slower to allocate and | slower to collect. At allocation time, the JVM must | register any finalizeable objects with the garbage | collector, and (at least in the HotSpot JVM | implementation) finalizeable objects must follow a slower | allocation path than most other objects. Similarly, | finalizeable objects are slower to collect, too. It takes | at least two garbage collection cycles (in the best case) | before a finalizeable object can be reclaimed, and the | garbage collector has to do extra work to invoke the | finalizer. [1] | | Sure, you're technically correct that if the objects all | had finalizers that did the same thing as C++ | destructors, it would be equivalent, but because of the | existence of a GC we don't have to do any work for most | objects. A GC is equivalent to an arena allocator in this | sense. | | Another point is the C++/Rust pattern of each object | recursively freeing the objects it owns presumably leads | to slower deallocation, because in the general case it | involves pointer following and non-local access. | | [1] https://www.ibm.com/developerworks/java/library/j-jtp | 01274/i... | msclrhd wrote: | Destructors in C++ aren't just for making sure you leak | memory. They are used for many lifetime controlled things | such as: 1. general resource cleanup (file handle, | database connection, etc.) using RAII (Resource | Aquisition Is Initialization); 2. tracing function | entry/exit. | pdpi wrote: | In the JVM the equivalent to RAII is implemented with | try-with-resource/Autocloseable instead. | mcguire wrote: | Apparently, that doesn't work in Rust: | https://news.ycombinator.com/item?id=23363647 | the8472 wrote: | It does work in rust. you just cannot rely on Drop _for | memory-safety_. If you mem::forget a struct that holds | onto some other resource then all that means is that you | 're committing that resource to the lifetime of the | process. We usually call that a leak but it can be | intentional. | pron wrote: | Which is why finalizers in Java have been officially | deprecated [1] and might be removed altogether in a | future release. | | [1]: https://docs.oracle.com/en/java/javase/14/docs/api/j | ava.base... | mcguire wrote: | Finalizers/destructors do not work well in garbage collected | languages, for that very reason. | saagarjha wrote: | Usually in a background thread ;) | arcticbull wrote: | Indeed. The difference is this is happening | deterministically, in place (with an optional deferral). A | garbage collector has all sorts of different trade-offs. | pron wrote: | > A garbage collector has to traverse the data structure in a | similar way to determine whether it (and it's embedded keys | and values) are part of the live set or not | | Yes, but in practice tracing in a tracing GC is done | concurrently and with the help of GC barriers that don't | require synchronization and so are generally cheaper than the | common mechanisms for reference-counting GC. | | > and to invoke finalizers | | As others have said, finalizers are very uncommon and, in | fact, have been deprecated in Java. | Jasper_ wrote: | One of my favorite papers by Bacon et al expands on this | intuition that garbage collection and reference counting are | opposite tradeoffs in many respects, and gives a formal theory | for it. My views on gc/rc haven't been the same since. | | http://researcher.watson.ibm.com/researcher/files/us-bacon/B... | pron wrote: | That's a great paper, but one important thing to point out is | that some production-grade tracing GCs are on the | sophisticated end of that paper, while almost all reference | counting GCs are on the rather primitive end. Given the same | amount of effort, it's easier to get a reasonable result with | reference-counting, but there are industrial-strength tracing | GCs out there that have had _a lot_ of effort put into them. | jeffdavis wrote: | "Generational/compacting GC has the opposite problem. Garbage | collection takes time proportional to the live set, and the | amount of memory collected is unimportant." | | Takes time proportional the live set _times the number of GC | runs that happen while the objects are alive_. In other words, | the longer the objects live, the more GC runs have to scan that | object (assuming there is enough activity to trigger the GC), | and the worse GC looks. | rhacker wrote: | Pass by reference? | bszupnick wrote: | If you pass by reference the heavy object won't be dropped. If | your goal is to drop a heavy object, this is a cool way to do | it. | thickice wrote: | Is this applicable for Go as well ? | [deleted] | SilasX wrote: | Completely different dynamic (because no Rust GC), but this | reminds me of how Twitch made their server, written in Go, a lot | faster by allocating a bunch of dummy memory at the beginning so | the garbage collector doesn't trigger nearly as often: | | https://news.ycombinator.com/item?id=21670110 | the8472 wrote: | The java equivalent to the Go case would simply be adjusting | the -Xms flag. The Go approach is a needlessly convoluted | because the runtime doesn't offer any tuning knobs. | | As for the rust case, if you squint then it's similar to a | concurrent collector. | saagarjha wrote: | Why would you ever write a get_size function that drops the | object you call it on? Surely in an actual, non-contrived usecase | spawning another thread and letting the drop occur there would | just be plain worse? | pjmlp wrote: | Not at all, Herb Sutter has a CppCon talk about this kind of | optimisations. | | It is also the approach taken by C++/WinRT, COM and UWP | components get moved into a background cleaning thread, to | avoid application pauses on complex data structures reaching | zero count. | ashtonkem wrote: | It's a contrived example to demonstrate the technique. | epage wrote: | I believe this is contrived to prove a point. | | And this isn't just a help in these contrived examples. I | believe process cleanup (an extreme case of cleaning up | objects) is one of cases where garbage collection performs | better because it doesn't have to unwind the stack, call | cleanup functions that are not in the cache, and make a lot of | `free` calls to the allocator. | | I vaguely remember reading about Google killing processes | rather than having them clean up correctly, relying on the OS | to properly clean up any resources of significance. | | Now this doesn't mean you should do this in all cases. Profile | first, see if you can avoid the large objects, and then look | into deferred de-allocations ... if the timing of resource | cleanup meets your application's guarantees. | Reelin wrote: | > killing processes rather than having them clean up | correctly, relying on the OS | | I recall Firefox preventing cleanup code from running when | you quit a few years ago. Prior to that, quitting with a lot | of pages open (ie hundreds) could cause it to lock up for | quite some time. | seventh-chord wrote: | Killing a process without freeing all allocations is, as far | as I can tell, routine in C. Especially for memory it makes | no sense "freeing" allocations, the whole memory space is | getting scrapped anyways. Of course, once you add RAAI the | compiler cant reason about which destructors it can skip on | program exit, and if programmers are negligent of this you | get programs that are slow to close. | gpderetta wrote: | exit(2) will only call destructors of static objects. | Quick_exit not even those. | estebank wrote: | > Killing a process without freeing all allocations is, as | far as I can tell, routine in C. | | Many times by accident :) | | > if programmers are negligent of this you get programs | that are slow to close. | | I wouldn't call that negligence, just not fully optimized. | Reelin wrote: | I think the contrived use case is just for illustrative | purposes? If I'm understanding correctly, the combination of | cleanup code and deallocation can sometimes consume enough time | that it's worth dispatching it on another thread. That's hardly | specific to Rust though. | | As you note that will certainly add some overhead, although | that could be minimized by not spawning a fresh thread each | time. It could easily reduce latency for a thread the UI is | waiting on in many cases. | tedunangst wrote: | It would be helpful to see an example from a real | application, too. | masklinn wrote: | A very large Vec<String> (say a few million non-empty | strings) would do I'd guess, Rust would drop the Vec which | would recursively drop each String. | Areading314 wrote: | Right there is no reason to pass ownership to a function like | this. | nickm12 wrote: | I took this to be a contrived example to illustrate the point. | I could imagine a process that creates a big data structure | (e.g. parse an xml file), pulls some data out, and then drops | the data structure. If you want to use that data sooner, you | can push the cleanup off your thread. | andrewfromx wrote: | hmm my first thought its, having to do that is a lot like c and | cleaning up my own allocations. This feels like something rust | should automatically do for me? | ashtonkem wrote: | Rust will automatically clean up data that's left scope, but | you can also manually accomplish this by the "drop" function, | which is only necessary if you want to cleanup explicitly, such | as in a different thread. | | Interestingly, the drop function is actually user-creatable. | It's actually an empty function with a very permissive Non- | reference argument. The semantics of ownership in Rust makes | that sufficient to trigger memory cleanup. | klyrs wrote: | As I understand it, rust _is_ automatically cleaning up, and | that can cause glitchy timing. The clever hack is that rust | lets you shunt that cleanup process off to another thread when | you 're the sole owner of that object. You can do the same | thing in C, but unlike rust, the cost of cleanup is not hidden | by the syntax. | klyrs wrote: | On further reflection... I'm curious about how allocators | would handle this -- if you return from this context only to | make another heavyweight object, it seems like you'd be | trading glitchy timing with allocator contention. | devit wrote: | Because it's impossible to do this automatically in the general | case. | | In particular, types may not be sendable to other threads, or | may have side effects on dropping, and in those cases you would | need to rearchitect the code before you can apply this | technique. | | Also this technique adds overhead, so it should never be used | (including not doing it conditionally) if you don't care about | latency or if the objects are always small, and the compiler | cannot know whether that is the case. | ReaLNero wrote: | In C, if you forget to clean up, you have a memory leak which | is hard to track down. In Rust, if you don't do this, you're | not sacrificing memory leaks, only performance. A profiler can | tell you when you should drop asynchronously. | [deleted] | madmax96 wrote: | >A profiler can tell you when you should drop asynchronously | | Is there any profiler that does this today? | | What are the drawbacks with asynchronous drops? | ehsanu1 wrote: | See some discussion here: https://www.reddit.com/r/rust/com | ments/gntv7l/dropping_heavy... | [deleted] | dirtydroog wrote: | Oh my good god. | | I'm hoping this is down to developer naivety rather than being a | feature of rust. | sockgrant wrote: | 1) he should pass by reference to avoid the extra copy. So in | his example yes it's dev naivety | | 2) but somewhere somehow this object will deallocate, so his | trick of putting it to another thread would work if the deal | location takes awhile. Same for cpp if you have a massive | object in a unique ptr. So it's not a rust issue | renewiltord wrote: | Where's the extra copy? I don't see one. He's moving the | struct into the function, getting size and then dropping it. | VWWHFSfQ wrote: | > avoid the extra copy | | there is no copy happening here | ReactiveJelly wrote: | The same could happen in C++, I think. Destructors are supposed | to be called recursively. | wizzwizz4 wrote: | It's not a feature of Rust; it's a "feature" of the way we | design operating systems and processors. This is the same in C. | [deleted] | andreygrehov wrote: | Does anyone know how would this work in Go? | echlebek wrote: | Lots to be learned at https://blog.golang.org/ismmkeynote | arendtio wrote: | I have no idea, but my guess is that it doesn't matter, as the | deallocation is being done by the garbage collection. | maxton wrote: | I'm not very familiar with Rust, but I don't understand why you | wouldn't just use a reference-to-HeavyThing as the function | argument, so that the object isn't moved and then dropped in the | `get_size` function? | ehsanu1 wrote: | If you never drop it, you have a memory leak. If the caller | drops it, it's still the same as the `get_size` dropping it in | terms of performance impact. | | Generally you'd only pass ownership when that's needed for some | reason. So this toy example might not be realistic but it does | demonstrate the performance impact. | heavenlyblue wrote: | So the caller of the function still needs to free HeavyThing in | the same thread. | epage wrote: | For these contrived cases, yes, you would just pass a reference | to the function but I think the point is to simplify the case | down to demonstrate a point. | Cyph0n wrote: | You're spot on: this is simply a bad example that you would | never see in a real application. | jeffdavis wrote: | Speedup numbers should be given when optimizing constant factors | -- e.g. "I made this operation 5X faster using SIMD" or "By | employing readahead, I sped up this file copy by 10X". | | The points raised in this article are really different: | | * don't do slow stuff in your latency-critical path | | * threads are a nice way to unload slow stuff that you don't need | done right away (especially if you have spare cores) | | * dropping can be slow | | The first and second points are good, but not really related to | rust, deallocations, or the number 10000. | | The last point is worth discussing, but still not really related | to the number 10000 and barely related to rust. Rust encourages | an eager deallocation strategy (kind of like C), whereas many | other languages would use a more deferred strategy (like many | GCs). | | It seems like deferred (e.g. GC) would be better here, because | after the main object is dropped, the GC doesn't bother to | traverse all of the tiny allocations because they are all dead | (unreachable by the root), and it just discards them. But that's | not the full story either. | | It's not terribly common to build up zillions of allocations and | then immediately free them. What's more common is to keep the | structure (and its zillions of allocations) around for a while, | perhaps making small random modifications, and then eventually | freeing them all at once. If using a GC, while the large | structure is alive, the GC needs to scan all of those objects, | causing a pause each time, which is not great. The eager strategy | is also not great: it only needs to traverse the structure once | (at deallocation time), but it needs to individually deallocate. | | The answer here is to recognize that all of the objects in the | structure will be deallocated together. Use a separate | region/arena/heap for the entire structure, and wipe out that | region/arena/heap when the structure gets dropped. You don't need | to traverse anything while the structure is alive, or when it | gets dropped. | | In rust, probably the most common way to approximate this is by | using slices into a larger buffer rather than separate | allocations. I wish there was a little better way of doing this, | though. It would be awesome if you could make new heaps specific | to an object (like a hash table), then allocate the keys/values | on that heap. When you drop the structure, the memory disappears | without traversal. | ncmncm wrote: | There is nothing unique to Rust about this; it is a very old | technique. It is usually much inferior to the "arena allocator" | method, where all the discarded allocations are coalesced and | released in a single, cheap operation that could as well be done | without another thread. That method is practical in many | languages, Rust possibly included. C++ supports it in the | Standard Library, for all the standard containers. | | If important work must be done in the destructors, it is still | better to farm the work out to a thread pool, rather than | starting another thread. Again, C++ supports this in its Standard | Library, as I think Rust does too. | | One could suggest that the only reason to present the idea in | Rust is the cynical one that Rust articles get free upvotes on | HN. | dathinab wrote: | One thing I just noticed is that the example doesn't make sure to | actually run the new thread to completion before the main thread | exists. | | This means that if you do a "drop in other thread" and then main | exists, the drop might never run. Which is often fine as the exit | of main causes process termination and as such will free the | memory normally anyway. | | But it would be a problem one some systems where memory cleanup | on process exit is less reliable. Through such systems are more | rare by now I think. | ReactiveJelly wrote: | It would have to be a non-desktop system. | | I'm pretty sure Linux will always free process-private memory, | and threads, and file descriptors when a process exits. | | The only things that can leak in typical cases are some kinds | of shared memory and maybe child processes? | pierrebai wrote: | I've seen variations on this trick multiple times. Using threads, | using a message sent to self, using a list and a timer to do the | work "later", using a list and waiting for idle time... | | They all have one thing in common: pampering over a bad design. | | In the particular example given, the sub-vector probably come | from a common source. One could keep a big buffer (a single | allocation) and an array of internal pointers. For example of | such a design to hold a large array of text strings, see for | example this blog entry and its associated github repo: | https://www.spiria.com/en/blog/desktop-software/optimizing- | shared-data/ | https://github.com/pierrebai/FastTextContainer | | Roughly it is this: struct TextHolder { | const char* common_buffer; std::vector<const char*> | internal_pointers; }; | | This is of course addressing the example, but the underlying | message is generally applicable: change your flawed design, don't | hide your flaws. | cperciva wrote: | If _freeing_ the data structure in question takes this long, how | much time are you wasting _duplicating_ the data structure? | saagarjha wrote: | I'm actually very curious why it takes this long; is Rust | memseting the buffer when dropping it? | | Edit: it seems like turning on optimizations seems to improve | the situation quite a bit. Not sure why they were profiling the | debug build. | Reelin wrote: | > I'm actually very curious why it takes this long; is Rust | memseting the buffer when dropping it? | | Regardless of memset and optimizations, consider a | particularly complicated object which lives on the heap and | contains hundreds of other nested objects (which themselves | contain nested objects, etc). Now imagine that a significant | fraction of them make use of RAII. That cleanup code can't be | elided. | | That being said, it's a pretty bad example if they were | actually profiling the debug build ... | fpgaminer wrote: | > it seems like turning on optimizations seems to improve the | situation quite a bit. | | I'm not seeing that on my local machine? Were you comparing | on the Playground which would be quite variable in its | results? > cargo build Compiling | foo v0.1.0 (/private/tmp/foo) Finished dev | [unoptimized + debuginfo] target(s) in 0.42s > | ./target/debug/foo drop in another thread 52.121us | drop in this thread 514.687233ms > > | > cargo build --release Compiling foo v0.1.0 | (/private/tmp/foo) Finished release [optimized] | target(s) in 0.47s > ./target/release/foo | drop in another thread 48.418us drop in this thread | 548.005373ms | saagarjha wrote: | I saw an increase of about 2x on my computer, though I | didn't take too much effort to control for noise. | [deleted] | firethief wrote: | > Edit: it seems like turning on optimizations seems to | improve the situation quite a bit. Not sure why they were | profiling the debug build. | | This is the most important point in the thread, since it | invalidates the results for most purposes. | saagarjha wrote: | Not completely, it's still 2-3 orders of magnitude slower. | firethief wrote: | You're right, I expected it would make a bigger | difference | dathinab wrote: | The thing is it's not slow because rust is doing anything | wrong or unoptimized, is slow because cleaning up insane | amounts of memory allocations is slow. | | Also if you run this: | | ``` fn main() { ::std::thread::spawn(move || { | println!("end")}); println!("Hello, world!"); } ``` | | You might notice that "end" might not be printed because | the main thread exists before it prints and terminates | the process. This means that the dropping might actually | not happen if it's at the end of the program and nothing | is faster then not doing the work. | | Also it's a not uncommon pattern in small user facing CLI | to leak (memory) resources, as they (should) be cleaned | up with the process termination. | bszupnick wrote: | This code doesn't duplicate it. In Rust when a variable is sent | as an argument to a function it's "ownership" moves to be in | the scope of that function. | | https://doc.rust-lang.org/book/ch04-01-what-is-ownership.htm... | cperciva wrote: | You're missing my point. Unless the only thing you want to do | with your giant data structure is measure its size, you're | not going to be passing ownership of your only copy of it | into the get_size function. You're going to be passing in a | copy -- hence the cost of duplicating everything. | phyzome wrote: | In the different contrived case where it gets copied, you'd | instead change this code to take an immutable reference to | it, and compute the size of that. Or you'd call .size() | instead of calling this function! | eMSF wrote: | It is just an example. You can think of "measuring size" | here as getting the result of a long computation that | involves a lot of allocations. After you get the result, | you no longer care about the intermediate stuff - i.e. all | the allocations. You certainly don't want to duplicate | them, you just want to get rid of them, and the author | tells that you might not want to deallocate (drop) them in | the UI thread. | | If it helps you, you might want to imagine the contents of | the get_size function as being the end part of a longer | calc_foo function. What's really missing the point is | focusing so hard on the part that the example even contains | a call to size() of a collection. | ReactiveJelly wrote: | Rust is stricter about aliasing than C++ is. | | Vectors are the size of 3 pointers (data, size, capacity), | so I guess 24 bytes on x64. | | Even if the move requires a memcpy, it's only copying that | 24 bytes - The heap allocation is not copied, because there | are never two owners of the vector at once. | [deleted] | Animats wrote: | There's a worse case in deallocation. Tracing through a data | structure being released for a long-running program can cause | page faults, unused data having been swapped out. This is part of | why some programs take far too long to exit. | littlestymaar wrote: | The title is slightly wrong: it's not going to make your code | _faster_ , it's going to reduce _latency_ on the given thread. | | It maybe a net win if this is the UI thread of a desktop app, but | overall, it will come at a performance cost: because modern | allocators have thread-local memory pools, and now you're moving | away from it. And if you're running you code on a NUMA system | (most server nowadays), when moving from one thread to another, | you can end up freeing non-local memory instead of local one. | Also, you won't have any backpressure on your allocations, and | you are susceptible to run out of memory (especially because your | deallocations now occur more slowly than they should) | | Main takeaway: if you use it blindly it's an anti-pattern, but it | can be a good idea in its niche: the UI thread of a GUI. | wmichelin wrote: | Minor typo, `froget` instead of `forget` | grogers wrote: | Contrived examples like this are ridiculous. Creating such a | heavy thing is likely even more expensive than tearing it down. | So unless you create it on a separate thread, you probably | shouldn't be freeing it on a separate one. It's not going to | solve your interactivity problem. If you are creating the object | on a separate thread then it's already going to be natural to | free it on a separate one too. | ReactiveJelly wrote: | Something is better than nothing. | thePunisher wrote: | The obvious solution would be to borrow the HeavyThing instead of | having it dropped inside the function. ___________________________________________________________________ (page generated 2020-05-30 23:00 UTC)