hngopher.com

       [HN Gopher] What is `Box&lt;str&gt;` and how is it different fro...
       ___________________________________________________________________
        
       What is `Box&lt;str&gt;` and how is it different from `String` in
       Rust?
        
       Author : asimpletune
       Score  : 103 points
       Date   : 2022-06-24 09:54 UTC (1 days ago)
        
 (HTM) web link (mahdi.blog)
 (TXT) w3m dump (mahdi.blog)
        
       | sirwhinesalot wrote:
       | It's unfortunate that strings are badly named in rust. They got
       | that better with Path and PathBuf.
       | 
       | str is fixed size, like a Java String
       | 
       | String is growable, like a Java StringBuilder
       | 
       | After that, we get into memory ownership, with &str not owning
       | memory, and Box<str> owning memory, but you rarely need the
       | latter, so it's really &str vs String that you need to care
       | about.
       | 
       | EDIT: changed immutable to fixed and mutable to growable to
       | better reflect the real difference, though typically you almost
       | always use immutable &str and &mut String. I thank the commenters
       | below for pointing it out, I don't want to make the problem even
       | more confusing than it already is.
        
         | Arnavion wrote:
         | String used to be StrBuf first. The rename to String was
         | intentional because String was the more commonly known name in
         | other languages.
         | 
         | https://rust-lang.github.io/rfcs/0060-rename-strbuf.html
        
           | lifthrasiir wrote:
           | Note that this is a very old RFC and doesn't have much
           | context and discussion compared to later RFCs. It is
           | worthwhile to read the actual discussion happened [1].
           | 
           | [1] https://github.com/rust-lang/rfcs/pull/60
        
           | howinteresting wrote:
           | This was a mistake. Having str and StrBuf would have been
           | significantly less confusing than str and String.
        
             | steveklabnik wrote:
             | I often joke that this is the only change I'd desire for a
             | Rust 2.0.
        
               | OJFord wrote:
               | What about aliasing it, marking String as deprecated in
               | docs, 'please use StrBuf'? (Clippy warning, etc.)
        
               | steveklabnik wrote:
               | In theory you could do something like this, but it would
               | be a _lot_ of churn for a questionable amount of gain. I
               | probably wouldn 't support it today; Rust is past being
               | able to make these sorts of changes imho.
        
           | sirwhinesalot wrote:
           | Unfortunately, judging by the fact so many people are still
           | confused about it, it was a mistake. Having a shorthand for
           | something (str) and that thing (String) be different things
           | was dumb, and someone brought that up in the discussion at
           | the time but I guess hindsight is 20/20.
           | 
           | C++ has std::string and std::string_view which makes a loads
           | more sense.
           | 
           | Java and C# have StringBuilder and String.
           | 
           | Go has strings.Builder and string.
           | 
           | Objective-C/Cocoa has and NSMutableString and NSString.
           | 
           | ADA has Unbounded_String, Bounded_String and Fixed_String for
           | different use cases.
           | 
           | Rust has by far the worst naming.
        
             | kzrdude wrote:
             | I guess C++ has the best names after all, Rust should have
             | emulated those (except it couldn't - string_view came after
             | Rust and maybe even was inspired by Rust.)
        
               | cmrdporcupine wrote:
               | Chromium's C++ StringPiece dates back to at least 2012,
               | and pretty sure Google had something similar (I forget
               | that name) to it in Google3's C++ base library (which
               | became abseil's string_view) before that even.
               | 
               | I seem to recall Boost may have had a string_view pretty
               | far back, too.
               | 
               | https://chromium.googlesource.com/chromium/src/base/+/mas
               | ter...
               | 
               | https://github.com/abseil/abseil-
               | cpp/blob/master/absl/string...
        
         | nicoburns wrote:
         | Personally I'd prefer String/StringView (and potentially Path
         | and PathView), but I guess that ship has sailed.
        
         | Blikkentrekker wrote:
         | I find that this explanation does not do justice
         | 
         | The important part is that `str` is a dynamically sized type as
         | it's called. What it is is simply a region of memory, of any
         | size, containing UTF8. Since it is dynamically sized various
         | constraints are placed onto it which in practice come down to
         | that it can only really be passed around at runtime by being
         | behind a pointer and is hard to directly put on the stack.
         | 
         | `String` is three words, two words are aequivalent to a "fat
         | pointer" to a `str`, as in one word for the address, and the
         | other for the size, which is how Rust deals with dynamically
         | sized types in general, and the third word denotes the capacity
         | of memory allocated to the `String` which it uses to know when
         | to reallocate.
         | 
         | `str` is neither mutable nor immutable which isn't part of it's
         | type, `&str` is immutable, and `&mut str` is mutable. It's
         | perfectly possible in Rust to mutate a `str` if one obtains a
         | mutable, or perhaps better called exclusive reference to it
         | somehow, but the mutations that can be performed are very
         | limited since the size cannot easily grow.
         | 
         | This is where `String` comes in, which guarantees that the
         | space after the `str` pointed to it, the size of it's
         | "capacity" third word is not used by anything else, and thus it
         | can grow more easily by manipulations.
         | 
         | There are some limited mutation methods on `&mut str` in Rust,
         | such as `make_ascii_uppercase`, which converts all lowercase
         | ascii letters to uppercase, which is perfectly fine, since this
         | operation is guaranteed to not ever increase the size of the
         | `str`, but with unicode such a guarantee no longer applies and
         | one needs a `String`.
         | 
         | That being said, yes, I would have favored for `String` to be
         | called `StrBuf`, and `Vec` `SliceBuf` instead.
        
           | sirwhinesalot wrote:
           | Sure, if you want to be truly specific about it and not do a
           | Java analogy ;)
        
         | aliceryhl wrote:
         | The difference has to do with ownership, and it has nothing to
         | do with mutability. For _both_ types, you can mutate them given
         | a mutable refence, and you can 't given an immutable reference.
         | 
         | For an example, an `&mut str` can be modified via various
         | methods such as make_ascii_uppercase.
        
           | sirwhinesalot wrote:
           | Nope, not ownership either, Box<str> and String both own
           | their memory, the different is fixed size vs growable :)
           | 
           | But you're right, I edited my post to reflect this, the Java
           | analogy is pretty strained as it is.
        
             | Macha wrote:
             | I believe the parent poster was comparing &str and String,
             | not Box<str> and String.
        
         | marcosdumay wrote:
         | > but you rarely need the latter
         | 
         | AFAIK, it's because people go with String when what they
         | actually mean is Box<str>. Since they have similar costs,
         | nobody ever sees the need to change it, and the String type
         | does have a much better name.
         | 
         | But the need is there all the time. People just satisfy it
         | differently.
        
           | sirwhinesalot wrote:
           | I think it's mainly because unlike Java, where a
           | StringBuilder is effectively an optimisation over
           | concatenating Strings, in Rust managing that memory would be
           | a total pain, so you tend to keep the mutable thing around.
           | 
           | Once that happens, Box<str> becomes kinda unnecessary. There
           | are many cases where it would be the correct type, for
           | example reading from a file in a read-only manner, but most
           | of the time you're going to be doing _something_ to that
           | text, so it makes more sense to just load it up as a String
           | already and avoid the unnecessary copy.
           | 
           | Either way, it's mostly a naming problem. &str/String sucks
           | :(
        
         | fpoling wrote:
         | String in Rust is very similar to std::string in C++, while str
         | is std::string_view except it is safe to use.
         | 
         | StringBuffer in Java is not like String in Rust. In particular,
         | one cannot pass StringBuffer in Java to a function taking
         | String, while both Rust and C++ allow to implicitly convert the
         | string backed by a heap into the corresponding read-only view.
        
           | sirwhinesalot wrote:
           | Strings in Java own their memory, they aren't views, they're
           | closer to Box<str>. That's why you can't implicitly convert a
           | StringBuilder into one.
           | 
           | I know this, I'm not the one you need to explain it too, it's
           | Rust newbies. So many problems would have been avoided with
           | Str/StrBuf or StrView/Str, but now the ship has sailed.
        
             | rrobukef wrote:
             | String in Java share their memory with other substrings of
             | the same allocation. They are views.
        
               | cesarb wrote:
               | IIRC, that used to be the case, but recent Java releases
               | changed it so that memory is no longer shared with
               | substrings. The former behavior could cause some extreme
               | memory leaks (unless you were very careful to always
               | manually duplicate each substring); a one-character
               | substring could keep a multi-megabyte memory allocation
               | alive. See for instance
               | https://stackoverflow.com/questions/33893655/string-
               | substrin... which discusses this issue.
        
       | OJFord wrote:
       | If OP is here, then in this listing:                   let
       | boxed_str: Box<str> = "hello".into();         println!("size of
       | boxed_str on stack: {}", std::mem::size_of_val(&boxed_str));
       | let s = String::from("hello!");         println!("size of string
       | on stack: {}", std::mem::size_of_val(&s));
       | 
       | I know it's not the point and doesn't make a difference, but you
       | might want to make the two 'strings' the same (not with & without
       | '!'), just to be clearer.
        
       | umanwizard wrote:
       | This might clarify the situation, for C or C++ folks:
       | // heap-allocated, fixed-size         struct BoxStr {
       | unsigned length;             // INVARIANT: this points to a heap
       | allocation of length bytes, and is valid utf8
       | unsigned char *data;         }              // heap-allocated,
       | resizable         struct String {             unsigned length;
       | unsigned capacity;             // INVARIANT: heap allocation of
       | capacity bytes, the first length of which are valid utf8
       | unsigned char *data;         }
       | 
       | Of course you _could_ resize BoxStr, but only by reallocating
       | `data` to the exact desired length every time, which will kill
       | your asymptotic complexity.
        
         | tylerhou wrote:
         | Is your first example really equivalent to Box<str>? I would
         | have expected something like                   using BoxStr =
         | std::unique_ptr<Str>;
         | 
         | where Str is defined as                   struct Str {
         | size_t len;           char data[];         };
         | 
         | The difference is that the len is stored on the heap, and the
         | data is stored inline with the length. Unfortunately C++ does
         | not support flexible array members so this syntax is not
         | actually valid.
         | 
         | Edit: Never mind, after reading the article Rust does use the
         | above representation because Box holds a "fat" pointer to str,
         | which stores it's length on the stack. So BoxStr is the correct
         | equivalent, because &[u8] is not equivalent to u8*, it's
         | equivalent to std::span<u8>.
        
           | steveklabnik wrote:
           | Your parent is correct, the length is stored alongside the
           | pointer, not on the heap with its data. This is true for any
           | "dynamically sized type," not just Box<str>. &str is also a
           | (pointer, length) pair, for example.
        
       | the__alchemist wrote:
       | I'm working on a PC-based configuration for a drone flight
       | controller. PC-side is std Rust with a stack available. Firmware
       | is `no-std`, running on a microcontroller. It has waypoints you
       | can program when connected to a PC using USB. They have names
       | that need to be represented as some sort of string.
       | 
       | I'm using `u8` arrays for the strings on both sides; seems the
       | easiest to serialize, and Rust has `str::from_utf8` etc to handle
       | conversion to/from the UI.
       | 
       | `String` is unsupported on the MCU side since there's no
       | allocation. I find this low-level approach ergonomic given it's
       | easy to [de]serialize over USB.
        
       | sampo wrote:
       | Title is: What is Box<str> and how is it different from String in
       | Rust?
        
         | dang wrote:
         | Fixed now. Thanks!
        
       | codedokode wrote:
       | Is there official documentation about what `str` (without an
       | ampersand) is? For example, documentation [1] says that `str` is
       | a "string slice" (without explaining what "string slice" mean),
       | and then goes on with description of &str.
       | 
       | And a book on Rust [2] says:
       | 
       | > A string slice is a reference to part of a String
       | 
       | This seems wrong, because &str can reference static strings which
       | are not String. And if str, or "string slice" is a "reference",
       | then &str is a reference to a reference?
       | 
       | And later:
       | 
       | > The type that signifies "string slice" is written as &str
       | 
       | But the documentation said that "string slice" is str, not &str.
       | 
       | Also, I wonder, what do square brackets mean when they are used
       | without an ampersand (as s[0..2] instead of &s[0..2])?
       | 
       | Also, is an ampersand in &str the same as an ampersand in &u8
       | (meaning an immutable reference to u8) or does it have other
       | meaning?
       | 
       | [1] https://doc.rust-lang.org/std/primitive.str.html
       | 
       | [2] https://doc.rust-lang.org/book/ch04-03-slices.html#string-
       | sl...
        
         | [deleted]
        
         | LegionMammal978 wrote:
         | > Is there official documentation about what `str` (without an
         | ampersand) is? For example, documentation [1] says that `str`
         | is a "string slice" (without explaining what "string slice"
         | mean), and then goes on with description of &str.
         | 
         | A `str` is really just a `[u8]` with extra semantics. Thus, a
         | `&str` is really a `&[u8]`, a `&mut str` is a `&mut [u8]`, a
         | `Box<str>` is a `Box<[u8]>`, etc. So we call it a "string
         | slice", since it mostly acts like a regular `[T]` slice.
         | 
         | In general, the term "slice" can either refer to the unsized
         | type `[T]` or the reference `&[T]`/`&mut [T]` interchangeably.
         | You could also call the latter a "slice reference" where the
         | distinction is important; e.g., a `Box<[T]>` would be a "boxed
         | slice", while `Box<&[T]>` would be a "boxed slice reference" or
         | "boxed reference to a slice". But most of the time, the correct
         | meaning can be inferred from context.
         | 
         | > Also, I wonder, what do square brackets mean when they are
         | used without an ampersand (as s[0..2] instead of &s[0..2])?
         | 
         | `s[0..2]` is a place expression that refers to the raw `str`
         | subslice. But since `str` is an unsized type [0], it cannot
         | appear on its own; it must appear behind some reference type.
         | Thus, `&s[0..2]` creates a `&str`, and `&mut s[0..2]` creates a
         | `&mut str`. However, the ampersand isn't always necessary: you
         | can write `s[0..2].to_owned()` to use the `str` as a method
         | receiver, which implicitly creates a reference.
         | 
         | [0] https://doc.rust-lang.org/book/ch19-04-advanced-
         | types.html#d...
        
         | ruuda wrote:
         | The & in &str is like the & in &[u8], str is like [u8] (an
         | unsized type), not like u8. A &str is a "fat pointer" (pointer
         | + length), unlike &u8 which is a regular "thin" pointer.
        
       | FullyFunctional wrote:
       | This is missing a conversation about
       | https://lib.rs/crates/compact_str (and a few alternatives like
       | it). TL;DR: String takes the space of three pointers, that is, 24
       | bytes on 64-bit archs. compact_str fits up to 24 byte strings in
       | the same space and reverts to String for longer strings.
       | 
       | ADD: that is, avoids heap allocation for those, unlike both
       | Box<str> and String.
        
         | tialaramex wrote:
         | Box<str> is still going to be smaller _if_ you know how big the
         | text is because (unlike CompactString and String) it doesn 't
         | need to carry a capacity value. In exchange of course you can't
         | append things to it (without re-allocating)
         | 
         | CompactString is a very clever+ SSO implementation, and I'll
         | remember it is there if I run into a situation where it might
         | help but I firmly agree with Rust's choice _not_ to implement
         | the SSO optimisation in the standard library 's String type.
         | 
         | + Storing 23 UTF-8 codepoints as one of several representations
         | in a 24 byte data structure makes sense, you can see how to
         | write a fairly safe SSO optimisation for Rust which does that,
         | but the CompactString scheme relies on the fact Rust's strings
         | are by definition UTF-8 encoded to squeeze the discriminant
         | into the same space as the last possible byte of an actual
         | UTF-8 string, so it can store a 24 byte value like
         | "ABCDEFGHIJKLMNOPQRSTUVWX" inline despite also distinguishing
         | the case where it needs a heap pointer for larger strings.
         | That's very clever.
        
           | rtfeldman wrote:
           | > I firmly agree with Rust's choice not to implement the SSO
           | optimisation in the standard library's String type.
           | 
           | Out of curiosity, why is that?
           | 
           | I don't know much about how or why that decision was made,
           | but I'm curious.
        
             | lifthrasiir wrote:
             | SSO means that pretty every string operation has multiple
             | code paths, which can be highly unpredictable. Basically it
             | is a trade-off between memory usage and performance, and
             | the standard library is not really a good place to make
             | that trade-off. By comparison many C++ codes (still) copy
             | strings all over the place for no good reason, so SSO in
             | the standard library has a much greater appeal.
        
         | pornel wrote:
         | A nice thing is that all string types have &str as the lowest
         | common denominator, so even if you use SSO or on-stack or any
         | other fancy string type, it's automatically compatible with
         | almost everything.
        
       | terhechte wrote:
       | I recently gave a Rust workshop to Kotlin and Swift developers.
       | Strings in Rust are a really, really difficult topic for complete
       | newcomers because they're understood as a basic type whereas in
       | Rust they require having read half the Rust book to grasp.
       | 
       | Consider: I can teach a lot of Rust basic with `usize`. Defining
       | funcions, calling functions, enums because they're `Copy` and
       | because there's only one type. String requires knowing about &str
       | which requires knowing about deref which requires knowing about
       | (&String -> &str), it also requires understanding lifetimes,
       | moving, heap and stack, cloning. Then, if you want to work with
       | the file system you also need to understand Paths, OsString and
       | AsRef.
       | 
       | With Kotlin and Swift, for all these things, you really just need
       | one type, String, and you handle it just like usize.
       | 
       | It is really a bid of a hurdle for new developers coming from
       | higher level languages (especially if they just give it a quick
       | try).
        
         | klabb3 wrote:
         | Don't worry. As soon as you explain to them that appending to a
         | PathBuf is O(1) amortized they'll come around, and it will
         | scale much better for all their GB-sized file paths.
         | 
         | I guess this adds a prerequisite on complexity theory but
         | nobody should go anywhere near advanced data structures like
         | strings with less than a bachelor in CS.
        
         | lijogdfljk wrote:
         | Makes me wonder if there could be room for a SimpleString
         | library.
         | 
         | I love/use Rust. I don't think any of this is complicated. BUT,
         | i'm a big fan of just "clone your problems away" for beginner
         | Rust users. Going knee deep into techniques which merely reduce
         | memory usage when people likely don't actually care - at all -
         | about it just feels wrong to me.
         | 
         | So yea, maybe a cursed library where SimpleString is just some
         | niceties around some Cow + Arc thing which is also Copy. Hell,
         | you could probably just apply it Vec and who knows what else.
         | 
         | Anyway, clearly not something i'm advocating anyone _really_
         | use. But it seems a nice way to make stuff "Just Work" in the
         | beginning.
        
           | kzrdude wrote:
           | Some weird construction around Cow + Arc that is also Copy is
           | not really possible in Rust, I'm sorry to report. No way to
           | implement it and even if you could (you technically "can" by
           | reimplementing most of Cow and Arc) - the result is not
           | useful, the destructor of it doesn't work.
        
           | codedokode wrote:
           | But Rust is designed to write high-performance code. If you
           | don't care about performace, you don't really need Rust.
           | Swift or Go seem more readable and easier to use.
        
             | pjmlp wrote:
             | Swift is pretty much about performance, as replacement for
             | C, C++ and Objective-C in the Apple ecosystem, it is even
             | on Apple's official sites.
             | 
             | What Apple isn't willing to do is sacrifice productivity
             | while achieving that goal.
        
             | howinteresting wrote:
             | Swift is well-designed but is virtually non-existent
             | outside of Apple platforms, so it doesn't have nearly the
             | third-party ecosystem that Rust does. Go has the third-
             | party ecosystem but is poorly designed and doesn't have
             | basic language features like sum types.
             | 
             | Rust is likely the best combination of thought-out design
             | and ecosystem support that exists in a programming language
             | today.
        
               | pjmlp wrote:
               | Rust is also pretty much focused on Linux workloads,
               | mostly.
               | 
               | Also the Apple ecosystem has plenty of third parties,
               | including commercial libraries.
        
               | jeroenhd wrote:
               | Interestingly, Microsoft is also pushing Rust quite hard
               | with special API packages, tutorials, and even some IDE
               | integration. Windows tools are often closed source,
               | though, so you'll probably never notice it if your
               | favourite tool uses Rust or not.
        
         | agumonkey wrote:
         | rust has one uphill battle in the mainstream adoption is that a
         | lot of things make sense if you wrote bare metal code. If not
         | then it can be very confusing.
        
         | tialaramex wrote:
         | I think I'd recommend teaching Move semantics not Copy
         | semantics from the outset, because Move semantics work fine
         | everywhere in Rust and the Copy semantics are just an
         | optimisation. As you've found, if you teach Copy then for types
         | which aren't Copy you now need to teach Move.
         | 
         | Languages like Kotlin and Swift are doing a _lot_ of lifting to
         | deliver this behaviour for String, and of course they can 't
         | keep it up, so students who've done more than a little Kotlin
         | or Swift will be aware of the idea of "reference semantics" in
         | those languages where most of the objects they use do not have
         | the behaviour they've seen in String which is instead
         | pretending to be a value type like an integer.
         | 
         | Again, if you only teach Move, you're fine. After not very long
         | a student will wonder how they can duplicate things (since they
         | didn't know Copy), and you can show them Clone. Clone works
         | everywhere. Is cloning a usize idiomatic Rust? No it is not.
         | Does it work just fine anyway? Of course it does! And of course
         | Clone is implemented for String, and for most types beginners
         | will ever see.
        
           | hgomersall wrote:
           | Are copy semantics always used in place of move semantics for
           | a Copy type? I didn't know that.
        
             | [deleted]
        
             | tialaramex wrote:
             | Literally all that Copy does is it says after assignment
             | the moved-from variable can still be used. So in this
             | sense, sure, these semantics are "always used". But if you
             | don't use the variable after assigning from it, you could
             | also say the semantics aren't used in this case. Does that
             | help? Copy does a _lot_ less than many people think it
             | does.
             | 
             | If you're a low level person it's apparent this is because
             | Copy types are just some bits and their meaning is
             | literally in those bits, _Copy_ the bits and you 've copied
             | the meaning. Thus, this "it still works after assignment"
             | Copy behaviour is just how things would work naturally for
             | such types. But Rust doesn't require programmers (and
             | especially beginners) to grok that.
             | 
             | It's possible to explain Copy semantics first in a way
             | that's easier to grasp for people coming from, say, Java,
             | but that's only half the picture because your students will
             | soon need Move semantics which are different. Thus I
             | recommend instead explaining Move semantics from the outset
             | (which will be harder) and only introducing Copy as an
             | optimisation.
             | 
             | I think this might even be better for students coming from
             | C++, because C++ move semantics are a horrible mess, so
             | underscoring that Move is the default in Rust and it's fine
             | to think of every assignment as Move in Rust will avoid
             | them getting the idea that there must be secret magic
             | somewhere, there isn't, C++ hacked these semantics in to a
             | finished language which didn't previously have Move and
             | that's why it's a mess.
             | 
             | I'm less sure for people coming from low-level C. I can
             | imagine if you're going to work with no_std on bare metal
             | you might actually do just fine working almost entirely
             | with Copy types and you probably need actual bona fide
             | pointers (not just references) and so you end up needing to
             | know what's "really" going on anyway. If you're no_std you
             | don't have a String type anyway, nor do you have Box, and
             | thus you can't write Box<str> either, although &str still
             | works fine if you've burned some strings into your firmware
             | or whatever.
        
             | afdbcreid wrote:
             | This isn't really something you usually encounter, but I
             | have to bring this cute example:                   pub fn
             | foo() -> impl FnOnce() {             let non_copy: String =
             | String::new();             let copy: i32 = 123;
             | || {                 drop(non_copy); // Works
             | drop(copy); // error[E0373]             }         }
             | 
             | https://play.rust-
             | lang.org/?version=stable&mode=debug&editio...
        
         | lumost wrote:
         | Rust strings are difficult for others coming from statically
         | typed and low level languages as well.
         | 
         | It's one of the types programmers will most often encounter,
         | and yet it's one of the most obtuse topics within rust.
        
           | k__ wrote:
           | I remember strings being "not so easy" in C/C++ too.
        
             | oconnor663 wrote:
             | I think the big differences are that copying and reference
             | taking are automatic and invisible in C++. So a lot of APIs
             | taking string or string& will "just work" for the
             | beginners, and you can delay the part where you talk about
             | how different those things are.
             | 
             | This sounds like a minor difference, but I've met lots of
             | developers who do meaningful work in C++ but who don't know
             | what a copy constructor is. I get the impression that
             | there's an enormous difference between being a C++ "user"
             | vs a "library writer", because there's so much automatic
             | stuff happing under the covers.
             | 
             | Rust tends to have a bit less invisible complexity, I
             | think, but some of that difference is just making the
             | complexity visible (like reference taking), which
             | effectively frontloads it onto beginners. It's a tough
             | tradeoff.
        
           | jokethrowaway wrote:
           | After haskell strings, rust strings actually felt reasonable
        
         | nicoburns wrote:
         | On the plus side, String makes a really good example to explain
         | ownership, moving, stack vs heap, etc. All of which you need at
         | least a basic understanding of to do anything non-trivial in
         | Rust.
         | 
         | I kind of feel like it goes without saying that Rust isn't
         | ideal for beginners. For developers who already have a good
         | knowledge of other languages I feel like learning about these
         | things shouldn't be a problem, as becoming familiar with these
         | concepts is one of the main benefits of learning Rust.
        
           | smaddox wrote:
           | > I kind of feel like it goes without saying that Rust isn't
           | ideal for beginners.
           | 
           | I think that depends on, first, what the goal is, and second,
           | what you're comparing to. It think Rust is easier on
           | beginners, in many ways, than C. And C is easier on
           | beginners, in many ways, than assembly or machine code. But
           | if you want to really understand computer programming,
           | starting at machine code or at least assembly isn't a crazy
           | way to start.
        
             | tialaramex wrote:
             | Beginning with machine code for some simple architecture
             | (maybe RISC-V these days?) might be one good route in.
             | 
             | I can also see (having experienced it myself, albeit I
             | already knew C etc. these were not requirements and many of
             | my classmates did not) beginning with a pure functional
             | language where all the practicalities are abstracted
             | entirely.
             | 
             | Today the University where I learned this begins with Java,
             | which I am confident is the wrong choice, but the person
             | who part-designed their curriculum, and is a friend,
             | disagrees with me and he's the one getting paid to teach
             | them.
        
             | msla wrote:
             | > But if you want to really understand computer
             | programming, starting at machine code or at least assembly
             | isn't a crazy way to start.
             | 
             | I've long suspected that the CS field was founded on two
             | approaches: The people who started from EE and worked their
             | way up, and the people who started from Math and worked
             | their way down. The former people think assembly is the
             | "real" way to approach software, and probably view C++ as
             | "very high-level", whereas the latter people think everyone
             | should start with a course on the lambda calculus and type
             | systems and gradually ease into Haskell, work down to Lisp,
             | and then maybe deign to learn Python for * _shudder_ *
             | numerical work.
        
               | nicoburns wrote:
               | I'd argue there's also a 3rd foundation of CS: language.
               | Programming languages really are languages in the general
               | sense of the word, and their purpose is to allow humans
               | to effectively communicate with machines. Focussing on
               | optimising that communication is the 3rd approach.
        
             | nicoburns wrote:
             | > It think Rust is easier on beginners, in many ways, than
             | C. And C is easier on beginners, in many ways, than
             | assembly or machine code. But if you want to really
             | understand computer programming, starting at machine code
             | or at least assembly isn't a crazy way to start.
             | 
             | I mean sure. But equally, starting with Python isn't a
             | crazy way to start. And Python is much easier language to
             | learn than any of those (esp. if you want to actually
             | create something practical with it).
        
               | hgomersall wrote:
               | Sure, but if your objective is systems programming,
               | you'll probably quickly get to the point of realising
               | python is not the right choice.
        
               | pjmlp wrote:
               | Depends, if writing a compiler is still considered
               | systems programming in modern times.
               | 
               | https://www.amazon.com/Writing-Interpreters-Compilers-
               | Raspbe...
        
               | less_less wrote:
               | Compilers are their own beast -- I wouldn't put them with
               | systems code. They're pretty different from an OS, BLAS,
               | machine learning kernel, game engine, network stack,
               | database or what have you. There's not as much buffer
               | management, speed and memory aren't usually as critical,
               | you don't make direct syscalls, many structures are
               | graphs rather than arrays, etc. They often aren't even
               | multithreaded.
               | 
               | It's also popular to write compilers in distinctly
               | non-"systems-y" languages, most notably Standard ML but
               | also eg Haskell, and lots of languages are self-hosted.
        
               | nicoburns wrote:
               | If your objective is specifically systems programming
               | then you'll quickly outgrow python, but I'm not convinced
               | that makes it the wrong starting point. For systems
               | programming you'll likely need _both_ high-level and low-
               | level programming concepts. Learning low-level first is
               | absolutely a valid path, but my point is that going high-
               | level first is equally valid. People on the internet like
               | to make out like someone who starts out by learning
               | Python are incapable of later learning low-level
               | concepts, but if anything they 're at an advantage
               | compared with someone with no programming experience at
               | all.
        
               | nvrspyx wrote:
               | This is just my opinion, but I can't imagine systems
               | programming being the objective of any beginner. A
               | beginner probably wouldn't even be able to differentiate
               | systems programming from applications programming.
        
       | jez wrote:
       | Do any of the string types in the Rust standard library implement
       | the same sort of small string optimization that C++ libraries
       | implement for std::string? (explained here[1])
       | 
       | Some quick searching turned up a few rust-lang internals posts
       | and GitHub issues, but it was hard to see whether anything came
       | of them.
       | 
       | I understand that it's probably possible to implement a
       | comparable String API in a crate that uses small string
       | optimizations, but being able to avoid a dedicated crate makes
       | interoperability with other libraries much easier.
       | 
       | [1] https://tc-imba.github.io/posts/cpp-sso/
        
         | aaaaaaaaaaab wrote:
         | https://github.com/rust-lang/rust/issues/20198
        
         | edflsafoiewq wrote:
         | Not in std, no.
        
         | steveklabnik wrote:
         | Rust's standard library strings cannot because of a specific
         | API, as_mut_vec, which is incompatible with the internal
         | representation necessary to do SSO.
        
         | 24bytes wrote:
         | https://github.com/ParkMyCar/compact_str
         | 
         | https://old.reddit.com/r/rust/comments/t33hxp/announcing_com...
        
       | dochtman wrote:
       | The tl;dr doesn't quite make sense to me. To me the core
       | difference is that a Box<str> takes one less word on the stack,
       | because by virtue of the str being immutable it doesn't need to
       | track the capacity of the allocation as distinct from the length.
       | This is analogous to Box<[u8]> vs Vec<u8> (and in fact those are
       | the same data types except for the guarantee of valid UTF-8).
        
         | tialaramex wrote:
         | One notable difference is that ToOwned for &str gives you a
         | String, whereas ToOwned for &[u8] gives you a [u8] by cloning
         | the slice you have.
         | 
         | In fact all four standard library types that are ToOwned
         | without invoking Clone are more or less strings (str, CStr,
         | OsStr, Path)
        
         | tines wrote:
         | C++ programmer here: which one guarantees valid utf8, and why
         | would a primitive container make guarantees about the values
         | it's storing?
        
           | lifthrasiir wrote:
           | Everything labelled as "string" is a valid UTF-8 string in
           | Rust, and to my knowledge this decision was made very early
           | in the history of Rust (before 0.1). Many "modern" languages
           | (including modern enough C++) have a distinction between
           | Unicode strings and byte strings however they are called and
           | Rust just followed the suit.
        
           | Animats wrote:
           | "str" and "String" guarantee UTF-8. To make a String from an
           | array of bytes, call                   pub fn from_utf8(vec:
           | Vec<u8, Global>) -> Result<String, FromUtf8Error>
           | 
           | which consumes the input Vec and returns it unmodified, if
           | it's valid UTF-8,, or reports an error, if it's not. There
           | are a number of related functions in this family. Such as
           | pub fn from_utf8_lossy(v: &[u8]) -> Cow<'_, str>
           | 
           | which takes in a slice of bytes and checks if it's a UTF-8
           | string. If it is, it returns the original str. Otherwise it
           | makes a copy with any errors replaced with the Unicode error
           | character.
           | 
           | Vec<u8> and array slices such as &[u8] are primitive
           | containers - they can store any sequence of u8 values. String
           | is more like an object with access methods.
        
           | pornel wrote:
           | The guarantee exists to speed up UTF-8 processing, so that it
           | can safely assume working with whole codepoints/sequences
           | (without extra out of bounds checks for every byte) and to
           | ensure you can always losslessly roundtrip every string to
           | and from other Unicode encodings without introducing any
           | special notion of a broken character. There's also a security
           | angle in this: text-processing algorithms may have different
           | strategies for recovering from broken UTF-8, which could be
           | exploited to fool parsers (e.g. if a 4-byte UTF-8 sequence
           | has only 3 bytes matching, do you advance by 3 or 4 bytes?).
           | 
           | Having the "valid UTF-8" state being part of the type system
           | means it needs to be checked only once when the instance is
           | created (which can be compile-time for constants), and
           | doesn't have to be re-checked later, even if the string is
           | mutated. Unlike a generic bag of bytes, the pubic interface
           | on string won't allow making it invalid UTF-8.
        
           | ntoskrnl wrote:
           | > why would a primitive container make guarantees about the
           | values it's storing
           | 
           | If you know you have valid UTF-8, you can safely skip bounds
           | checks when decoding a codepoint that spans multiple bytes.
        
       ___________________________________________________________________
       (page generated 2022-06-25 23:00 UTC)