[HN Gopher] Custom Allocators in Rust
       ___________________________________________________________________
        
       Custom Allocators in Rust
        
       Author : g0xA52A2A
       Score  : 100 points
       Date   : 2023-04-08 09:53 UTC (13 hours ago)
        
 (HTM) web link (nical.github.io)
 (TXT) w3m dump (nical.github.io)
        
       | antimora wrote:
       | Does anyone know a custom allocator that allocates memory for the
       | same pattern usage? I am trying to find a good allocator for
       | tensor operations used in Burn's framework
       | (https://github.com/burn-rs/burn). It uses the same vectors
       | without any varying size changes.
        
       | foota wrote:
       | There's been some side work towards defining dynamic scoping for
       | rust. E.g., allow functions to declare a dependency on some value
       | or type, and then require callers to either pass that dependency
       | explicitly, or have it on their dependencies.
       | 
       | One of the proposed uses was for allocators.
       | 
       | https://tmandry.gitlab.io/blog/posts/2021-12-21-context-capa...
       | is one of the more cogent proposals.
       | 
       | I don't think any of these has made substantial progreess towards
       | being approved though.
        
         | forrestthewoods wrote:
         | That's pretty neat. Seems very similar to Jai's implicit
         | context. I'm definitely a fan of context objects. It solves so
         | many painful problems related to allocators, logging, etc.
        
       | fpoling wrote:
       | The article assumes that storing an allocator together with a
       | container inevitably bloats the container by 1-2 CPU words. This
       | is not necessary so. One can require that it should be possible
       | to get the allocator from a pointer it allocates. With many
       | allocator designs this will costs like one CPU word per CPU page
       | which is trivial.
       | 
       | This still requires to parametrize the container on the type of
       | the allocator, but there will be no penalty on the size of the
       | containers and the API will be safe.
        
       | phkahler wrote:
       | How much better is Rusts default than C++? I would expect it to
       | be better since they have all that nice lifetime information on
       | every piece of data. It seems like some nice things could be done
       | with that. Are they?
        
         | LegionMammal978 wrote:
         | By default, Rust just delegates to the ordinary libc malloc(),
         | realloc(), and free(). In safe code, allocations are managed
         | through RAII, just like in modern C++. Box works like
         | unique_ptr, Arc works like shared_ptr, Vec works like vector,
         | and so on. Lifetime annotations are just a way for the
         | programmer to tell the compiler that none of the rules are
         | being broken: they don't actually affect the semantics of the
         | compiled program, they just make it fail to compile if there is
         | an error. The underlying UB rules (except for those around
         | unique references) are not much stricter than those required by
         | C++; meanwhile, the lifetime rules are fairly strict, but the
         | compiler cannot optimize based on them.
         | 
         | At best, the lifetime system could allow you to use a faster
         | design for your library, by placing tighter compile-time
         | constraints on allowed use cases than you could have feasibly
         | documented in a C++ library's interface. But examples of this
         | are not too common in my experience, since most find it
         | acceptable for C++ libraries not to accommodate sufficiently
         | weird use cases.
        
       | cormacrelf wrote:
       | > _This one is probably closest to what one would write in
       | languages like C++. The data structure just assumes the allocator
       | will outlive it, and it is up to the user to either use an static
       | allocator, or pretend to using unsafe code to cast away the
       | lifetime while making sure that the allocator outlives the data
       | structure without the compiler 's support._
       | 
       |  _I know many in the Rust community will frown upon this
       | approach, but to be honest I don 't think that it is a terrible
       | solution in the context of an advanced feature like custom
       | allocation strategies. Tessellator does not expose an unsafe API,
       | but it documents that if its users were to break the rules they'd
       | simply have to make sure the allocator outlives the data
       | structure. In any other language with this kind of control over
       | memory management, this contract would have to be manually upheld
       | by users of the API and it is considered normal._
       | 
       | ... no thanks.
        
         | lenkite wrote:
         | Well, I hope we get custom allocation soon in Rust. Its really
         | strange not having the same in an advertised system programming
         | language.
         | 
         | Can help in incidents like the below:
         | 
         | https://www.svix.com/blog/heap-fragmentation-in-rust-applica...
        
           | Diggsey wrote:
           | You can already do custom allocation in Rust. These efforts
           | are about bringing support for custom allocators to the types
           | in `std` so that you can combine use of a normal `Vec` with a
           | custom allocator, instead of needing to definte your own
           | `Vec` (or at least pull one in from a 3rd party crate).
        
           | tomjakubowski wrote:
           | You might want custom local allocators. You can already
           | replace the global allocator with one designed to reduce
           | fragmentation, as described in the article.
        
         | andrepd wrote:
         | The value proposition of Rust is precisely having zero-cost
         | abstractions and highest-performance code with compile-time
         | verification of correctness.
         | 
         | That being said, I don't oppose the occasional judicious use of
         | unsafe. If 99% of your code is verified, it's not 100%, but
         | that's still a massive improvement over a C++ codebase.
        
           | likeabbas wrote:
           | I think `unsafe` would have been more aptly named
           | `compiler_unverifiable`. IMO there would be less apprehension
           | to using `unsafe` when it's needed.
        
             | Yoric wrote:
             | `unchecked` might be more palatable, but I agree that there
             | is some uncomfortable mismatch between the meanings of
             | "unsafe" and `unsafe`.
        
             | ZephyrBlu wrote:
             | As someone who mainly uses higher level languages, doesn't
             | care for C/C++ and really likes Rust, I'm glad it was named
             | so strongly and that safety and correctness is very
             | important in the community.
             | 
             | It creates a strong incentive to only write safe Rust,
             | which is great for the vast majority of people.
        
               | anonymoushn wrote:
               | In this case it would be nice if operations that are
               | type-safe and don't read or write memory at all, such as
               | mm256_shuffle_epi8, were available in safe rust.
        
               | jsheard wrote:
               | It's a work in progress, on nightly there is a safe and
               | portable SIMD abstraction under development in std::simd.
               | 
               | e.g. "mm256_shuffle_epi8" on X64+AVX2, ARM64+NEON and
               | plain ARM: https://rust.godbolt.org/z/7rjKE93Kn
               | 
               | It currently only works with constant shuffle masks but
               | dynamic shuffles are on the to-do list.
        
               | likeabbas wrote:
               | I think the incentive to write compiler verifiable Rust
               | would be the same as safe Rust, but with less fear for
               | the situations where you do need to bypass the borrow
               | checker such as with cyclical references in graphs
               | (currently doing this right now by re-writing a basic NN
               | in Rust). Even the standard library uses unsafe for
               | certain situations.
        
               | proto_lambda wrote:
               | > with less fear for the situations where you do need to
               | bypass the borrow checker such as with cyclical
               | references in graphs
               | 
               | That's a pretty tricky thing to get right, and with the
               | consequence for getting it wrong being UB, at least a
               | little fear is warranted.
        
               | likeabbas wrote:
               | `compiler_unverifiable` isn't risky enough for you?
        
               | ZephyrBlu wrote:
               | It doesn't have the same connotations as `unsafe`.
        
               | likeabbas wrote:
               | But it's the true definition being stated. Unsafe is
               | subjective, compiler unverifiable isn't
        
               | ZephyrBlu wrote:
               | So?
               | 
               | This is like applications adding artificial delay to
               | operations so users aren't surprised they complete so
               | quickly.
               | 
               | User understanding is more important than definitional
               | correctness.
        
           | peyton wrote:
           | Ok Rust is about a lot of things, but correctness is not one
           | of those things. Dafny is an example of a programming
           | language that's about correctness.
        
           | cormacrelf wrote:
           | We can talk in specifics: this example was about making
           | people use unsafe to avoid having to type <'static> in the
           | general case. The author had been trying for a number of
           | paragraphs to avoid polluting the type name with generics or
           | lifetimes. This is what going too far looks like.
           | 
           | The std APIs look like this:                  struct Box<T,
           | A: Allocator = Global>
           | 
           | And it's been this way in stable releases for the last year
           | or so. The same has been done for Vec and all the other
           | std::collections. What percent of Rust programmers do you
           | think even noticed at all? 1%? The most flexible design and
           | the least impacting on regular users. You can use A = &'a dyn
           | Allocator if you like, equally you can choose a ZST and not
           | pay for 16 bytes of storage. The library author has no need
           | to choose in advance at all, which is great if they're
           | determined to make weirdly constrained choices, ultimately
           | forcing most uses of the API to be unsafe. I stopped reading
           | after that so I don't know which design they went for.
        
             | mikepurvis wrote:
             | And if the generated documentation is really the main
             | sticking point, it would be perfectly possible to special
             | case the allocator type parameter in rustdoc so that it
             | hides or otherwise deemphasizes it.
        
             | throwawaymaths wrote:
             | > The most flexible design
             | 
             | Not quite. For the most general case, you probably want
             | allocator to be a proper parameter instead of a type
             | parameter. For example, suppose you have a green thread
             | that you want to have its own isolated heap. Then you can't
             | assume that any given allocator is a singleton in its type;
             | The struct itself must somehow be able to find to its
             | "owner" on release. In the green thread case you can't
             | "just use threadlocal" because a green thread might not be
             | sticky to an os thread.
        
               | twic wrote:
               | Rust already has almost exactly this problem with keeping
               | track of the current task in the async framework. As far
               | as I know, the solution is indeed to use a thread local,
               | and be scrupulous about updating it on task entry. It's
               | ugly but it seems to work.
        
       | stephc_int13 wrote:
       | I think that avoiding global allocation and lifetime management
       | is good design and a way to make memory management as simple and
       | flexible as it should.
       | 
       | But, from my own work in the field, I came to the conclusion that
       | there is a better way than passing custom allocators: passing
       | containers instead.
       | 
       | In practice this is almost the same thing (except that containers
       | are made to be easily traversed) but semantically there is a
       | subtle and positive difference.
       | 
       | The goal is to avoid confusing the allocating policy (let's say
       | arena versus heap) with the allocator instance.
        
         | conaclos wrote:
         | What do you mean by Containers?
         | 
         | A good abstraction for passing allocators could be effect and
         | effect handler.
        
           | stephc_int13 wrote:
           | A container is basically a structure holding a collection.
           | 
           | In that case, the container also acts as an allocator (the
           | feature can be embedded directly or not)
           | 
           | The only practical difference is that a container should
           | provide some facilities for traversal, while the allocator is
           | only taking care of allocating and freeing.
           | 
           | From my experience, I very rarely allocate things without
           | keeping them in a kind of container, it can be something as
           | simple as an array of references.
        
       | Reitet00 wrote:
       | Great to see some movement in this area for Rust. Compared to Zig
       | which had this from day 1 it may be hard to adjust Rust (the way
       | it was hard for Go to adopt generics).
        
         | Diggsey wrote:
         | I think most of the challenges are around making custom
         | allocators _safe_ without harming the ergonomics, which Zig
         | hasn 't solved.
         | 
         | If Rust was content to have use of custom allocators be unsafe
         | it would be trivial to add them (since you could just add new
         | variants of allocating methods that take an allocator as a
         | parameter).
        
           | hansvm wrote:
           | Are you just reminding us that Rust does some checks that Zig
           | doesn't, or are you saying that there's some particular
           | footgun in their allocators above and beyond the fact that
           | UAF and other memory bugs are writeable in general?
        
             | Diggsey wrote:
             | Neither, I was disagreeing with the comparison to Go and
             | generics.
             | 
             | Generics in Go don't add anything beyond what generics
             | already do in other languages, so the challenge with
             | bringing generics to Go is "how do we adapt the language to
             | support a feature that already exists in other languages
             | and is generally well understood". Bringing generics to a
             | language that wasn't designed with them in mind has often
             | resulted in a sub-optimal implementation (eg. Java vs C#).
             | 
             | On the other hand, "safe custom allocators" are not a
             | feature that any language (to my knowledge) has solved.
             | It's not as though this was an oversight in Rust's initial
             | design: using a custom allocator in an unsafe context has
             | always been possible in Rust, and it's too early to say
             | whether bringing this feature into the language later will
             | result in a similarly sub-optimal design: in order to be
             | sub-optimal there would have to exist some better solution
             | out there, and there currently doesn't.
        
           | fpoling wrote:
           | Zig approach essentially parametrizes instances of containers
           | on the instance of the allocator. To properly support that in
           | the type system one needs dependent types.
           | 
           | In theory that can be done, but the consequences for the
           | compilation time will be extreme as the compiler becomes
           | essentially a generic theorem prover.
           | 
           | Plus the noise from the proofs in the sources will be much
           | bigger than that of type parameters.
        
             | throwawaymaths wrote:
             | I think realistically if you want to prove safety of memory
             | provenance in zig, you assume that allocators are sound,
             | and you just track the lifetime of the memory from
             | alloc/create to free/destroy and call it a day. This is
             | probably "good enough", and in rust you're assuming the
             | allocation is sound as well, it's just implicit.
        
             | throwaway17_17 wrote:
             | I'm pretty sure the 'logic' require for paramterizing
             | container types over allocators can be restricted in a wide
             | majority of cases to a linear set which would allow for
             | refinement types to be used. This would eliminate the
             | requirement for full theorem proving.
             | 
             | In fact, it would probably be acceptable to insist on
             | keeping the scope of parameterization limited to the
             | linearly determinable set of values and operations on those
             | values.
        
               | fpoling wrote:
               | A Zig-style allocator that is passed as arguments
               | requires to introduce a dependency of the type of the
               | container on the instance of the allocator.
               | 
               | Perhaps it is possible to specialize for that case a
               | generic prover, but I am sceptical that the effect on
               | compilation timing will be minimal.
        
       ___________________________________________________________________
       (page generated 2023-04-08 23:01 UTC)