[HN Gopher] Making Rust as Fast as Go ___________________________________________________________________ Making Rust as Fast as Go Author : chrfrasco Score : 252 points Date : 2020-05-03 10:32 UTC (12 hours ago) (HTM) web link (www.christianfscott.com) (TXT) w3m dump (www.christianfscott.com) | virtualritz wrote: | I tried this on a spare time project[1]. Runtime in a quick test | went down from 14.5 to 12.2 secs on macOS! | | So a solid ~15% by changing the allocator to jemalloc. | | However, I now have a segfault w/o a stack trace when the data | gets written at the end of the process. | | Possibly something fishy in some `unsafe{}` code of a dependent | crate of mine that the different allocator exposed. :] | | Still - no stack trace at all is very strange in Rust when one | runs a debug build with RUST_BACKTRACE=full. | | [1] https://github.com/virtualritz/rust-diffusion-limited- | aggreg... | saagarjha wrote: | I have found that jemallocator is currently broken on macOS | Catalina, so that might be the problem. If you can reproduce | this issue reliably, I'd love to hear about it because I can't | myself unless I use a very specific toolchain that produces -O3 | binaries that are a real pain to work with. | pcr910303 wrote: | TLDR for people who didn't read: | | The speed difference came from the allocator. | | Rust switched from jemalloc to the system allocator per ticket | #36963[0] for various reasons (like binary bloat, valgrind | incompatibility, etc...). | | Go uses a custom allocator[1] instead. | | To make 'Rust Go fast' (pun intended), one can use the | '#[global_allocator]' to use a custom allocator (in this case, | with the jemallocator crate) to make allocations fast again. | | [0]: https://github.com/rust-lang/rust/issues/36963 | | [1]: https://golang.org/src/runtime/malloc.go | k__ wrote: | The comments of Rust programmers here also suggest that the | Rust implementation is, indeed, different from the Go | implementation. | pcr910303 wrote: | It was just a summary of the post contents - the post | suggests that the biggest difference comes from the | allocator. | loeg wrote: | FreeBSD's system allocator _is_ jemalloc :-). | ishanjain28 wrote: | I tried to benchmark Go/Rust versions as well. | | I made 4 changes in Rust version. | | 1. Moved up the line that gets a value from cache[j+1] before any | calls are made to cache[j]. This removes 1 bound check. | (Improvement from 182,747ns down to 176,xyzns +-4800) | | 2. Moved from .chars().enumerate() to .as_bytes() and manually | tracking current position with i/j variables. (Improvement from | 176,xyz ns down to 140,xyz ns) | | 3. Moved to the standard benchmark suite from main + handrolled | benchmark system.(File read + load + parse into lines was kept | out of benchmark) | | 4. Replaced hand rolled min with std::cmp::min. (improvement from | 140,xyz down to 139,xyz but the std deviation was about the same. | So Could just be a fluke. Don't know) | | In Go version, I made three changes. | | 1. Doing the same thing from #1 in Rust actually increased the | runtime from 190,xyz to 232,xyz and quite consistently too. I ran | it 10+ times to confirm) | | 2. Replaced []rune(source), []rune(target) to []byte(source), | []byte(target). (Improvement from 214817ns to 190152 ns) | | 3. Replaced hand rolled bench mark system with a proper bench | mark system in Go. (Again, File read + load + parse into lines | was kept out of benchmark) | | So, At the end of it, Rust version was about 50k ns faster than | Go version. | | Edit #1: | | In rust version, I had also replaced the cache initialization to | (0..=target.len()).collect() before doing anything els.. This | also gave a good perf boost but I forgot to note down the exact | value. | burntsushi wrote: | Note that using bytes is a fundamentally different | implementation that will produce different results on non-ASCII | input. Using codepoints (or "runes") will better approximate | edit distance based on visual characters. (And grapheme | clusters would be even better. Although one could put the text | in composed normal form to get more mileage out of the rune | based algorithm.) | blablabla123 wrote: | I'd be really surprised to hear that Go is supposed to be | faster than Rust. Of course Rust is a bit newer but to me it | always sounded like Go is fast because it's static but it | doesn't have to be high-speed if that would sacrifice | conciseness. Given that this is an artifical example, this here | looks more realistic: https://benchmarksgame- | team.pages.debian.net/benchmarksgame/... | arcticbull wrote: | Rust and Go are contemporaries. Rust started in 2006 at | Mozilla, and the first Go public release from Google was in | 2009, meaning it probably started at the same time. | | Except of course all the Plan 9 garbage (like Go's hand- | rolled assembler) brought in to underpin Go from the 80s ;) | gpm wrote: | Rust is also basically just LLVM in terms of optimization, | which looks like it had it's initial release in 2003. | fortran77 wrote: | Related thread: https://news.ycombinator.com/item?id=21103027 | codeflo wrote: | I recently did some experiments with creating small static Rust | binaries, custom linking, no_std et cetera. A lot of stuff around | that kind of thing is unstable or unfinished, which might be | somewhat expected. But I've also come to the conclusion that Rust | relies on libc _way_ too much. That might be fine on Linux, where | GNU's libc is well-maintained, is a bit questionable on MacOS (as | seen in this article) and is a a complete distribution nightmare | on Windows (in no small part due to a series of very questionable | decisions by Microsoft). | | My understanding is that Go doesn't use the libc at all and makes | system calls directly, which IMO is the correct decision in a | modern systems programming language that doesn't want to be | limited by 40 years of cruft. | fluffything wrote: | > But I've also come to the conclusion that Rust relies on libc | way too much. | | How did you come to this conclusion? | | People using Rust rely on libc a lot. | | For example, #![no_std] means "no standard-library", but it | doesn't mean "no libc, no libunwind, etc.". | | So a lot of people like to advertise their crates as | "#![no_std]" compatible, because they compile with #![no_std], | but then the first thing the crate does is linking against | libc, against liballoc, against libm, .... or even the standard | library itself... | | So... if you are trying to build a binary that does not depend | on libc or libstd, then there is no language feature to help | you there. | | #[no_std] binaries are not only unstable, but also probably not | what you want, since that won't prevent you from linking any | library that links libc, or the standard library, etc. | | If you want to avoid libc, you have to enforce that, e.g., by | checking in your build system that your binary doesn't contain | any libc, libstd, etc. symbols (I just use nm for this). | #![no_std] helps a bit here, but making sure you don't link any | library that violates this is up to you. | codeflo wrote: | That's true. But going the no_std route is very hard (the | ecosystem isn't big, and relying cargo and crates.io you're | almost guaranteed to link in the std or liballoc by accident | at some point). Even when using the std intentionally, I | really wouldn't have expected that basic std functions like | println require the libc. | geofft wrote: | I would not necessarily expect it but I appreciate it in a | "systems" language where "systems" is defined as compatible | with the existing traditional systems software on a | machine. For example, it's nice if the stdbuf(1) command | works on Rust binaries on glibc systems, and it's nice if a | Rust binary calling a C library that writes with printf (or | vice versa) don't maintain two separate output buffers. | | To me, Go is the systems programming language for a world | untethered by existing platform compatibility (and so, for | instance, writing Go libraries to be called from C is | awkward, calling C libraries from Go incurs overhead, Go | has its own concurrency model, etc.) and Rust is the | systems programming language for use cases where you'd | otherwise want to use C (really "the platform's native | systems language," but that's C on all the major platforms) | but you want a better language. I appreciate that they both | exist and target these different use cases. | BuckRogers wrote: | Is Go widely used compared to the others as a systems | language at this point? | | Over the years I've gathered that it's more of a | competitor for C# & Java rather than Rust & C(++). | geofft wrote: | Depends what you mean by "systems language" (many common | definitions turn out to be equivalent to "drop-in | replacement for the platform language," which makes the | argument circular), but the choice of Go as an | implementation language for Docker and Kubernetes puts it | pretty firmly in the "systems language" space IMO. | Container management requires more systems-level | functionality than C# and Java are really geared towards | (although you certainly _could_ use them, perhaps with a | tablespoon of FFI). | apta wrote: | The original version of Docker was written in Python, and | the original version of Kubernetes was written in Java, | so your argument doesn't hold. | geofft wrote: | Would this be the part of my argument that doesn't | mention Python (which I personally consider a systems | language) at all? Or the part of my argument where I say, | "you certainly _could_ use them "? | GlitchMr wrote: | Well, writing to standard output requires interacting with | an operating system, so it's not surprising that it | requires `std`. You cannot write to standard output without | knowing what operating system you are compiling for. | | Worth noting however that `writeln` only needs `core`, and | as long you can provide an implementation of writing to | standard output (pretty much create a struct and implement | `core::fmt::Write` trait for it), creating `println` macro | is trivial. | steveklabnik wrote: | An example of this: https://os.phil-opp.com/vga-text- | mode/#a-println-macro | geofft wrote: | To me, a #![no_std] library is useful for contexts where | there isn't a platform libc (embedded systems, kernels, etc.) | and you can't assume things like the existence of stdin or | processes. On those systems, there may still be a dynamic | allocator, because that's a super common feature in all but | the most spartan environments. For those use cases, a | #![no_std] library that links liballoc (and documents that it | needs it) is totally okay, and often better than a more- | limited library that only does static allocation - or | conversely implementing all of libstd with runtime panics for | most of it, just because your library needs to pass a Vec | around. It's a balance. | | The only case I think I've seen where a #![no_std] library | ends up pulling in libc is if you haven't added a custom | allocator and your platform's default allocator uses libc | (and so you could switch to a libc-free allocator if you | want). Are there other cases? | codeflo wrote: | There are lots of tiny practical annoyances you notice, | starting with having to vet dependencies and transitive | dependency upgrades extremely carefully. A major one is | that dev-dependencies are very broken in a no_std | environment. It's obviously useful to have the std in your | build.rs script, but unfortunately, Cargo merges | dependencies and devDependencies in a way that's simply | incorrect. | | The good news is that every problem I found had an actively | discussed GitHub issue and the community is active, so | there will be progress. | geofft wrote: | > _unfortunately, Cargo merges dependencies and | devDependencies in a way that's simply incorrect._ | | Yeah, I've run into that, but IIRC this got solved super | recently/is being solved. Probably | https://github.com/rust-lang/cargo/pull/7820 ? | fluffything wrote: | > To me, a #![no_std] library is useful for contexts where | there isn't a platform libc (embedded systems, kernels, | etc.) and you can't assume things like the existence of | stdin or processes. | | "Could be useful for" (FTFY). | | Unfortunately, for the reasons mentioned above, most | #![no_std] libraries aren't useful for that, because they | link libc and other libraries. | | Most people don't do this intentionally. The compiler | doesn't complain about: #![no_std] | extern "C" { fn malloc(...) -> ...; } | | so when somebody does it accidentally, everything works and | they get no feedback. | | When you then go and build a #![no_std] binary, and run it | in a platform when libc is not available, only then you get | an error. And at that point, good luck figuring out which | of your 200 dependencies has the error. | | In particular, if you are running `#[test]`s, doing that | links standard, so if your program implicitly depends on | libc somewhere, tests won't reveal that, because while | testing, libc will be linked. | geofft wrote: | No, most #![no_std] libraries do not link libc in the way | you describe (having a direct FFI dependency on malloc) - | they link liballoc, which has a pluggable allocator | interface. On some platforms and some configurations | (including most normal userspace platforms), the default | allocator used by liballoc uses malloc. | | It's actually hard to go out of your way and call malloc | directly, because FFI calls are unsafe. It's a lot easier | to use Box/Vec/String/etc., all of which are defined in | liballoc and use the pluggable allocator interface. | | I know this because I've successfully used #![no_std] | libraries in places where libc doesn't exist and no | function named malloc exists, and they do work. If you're | having a linker issue it's almost certainly because you | haven't changed the default allocator - if you have an | example of this I'd be happy to take a look at debugging | it. | fluffything wrote: | > and they do work. | | Maybe you are just using different no_std libraries that | I am using, but pretty much all of the no_std libraries | that I use have `libc` as a direct dependency. | | Not only for malloc, many of them just needs to write | something to stdout or a file, generate random numbers, | ..., and that's hard to do without libc. Why they | advertise themselves as no_std escapes my comprehension. | geofft wrote: | Huh, the libc crate _itself_ claims #![no_std] support. I | agree that 's confusing and counterintuitive... I see | what it means in terms of semantics but I don't | understand _why_ that 's useful. | fluffything wrote: | The libc crate does correctly claim #![no_std] support, | because #![no_std] means "Does not require the Rust | standard library", and libc does not require it. | | The two main issues I see with #![no_std] are: | | * its a flag for "doesn't link the standard library" but | the standard library is often too high-level for bare | metal apps that want to be in precise control about | everything that gets linked | | * it isn't a contract of any kind, so you can't rely on | it for anything. In fact, this code is correct and works | as intended: #![no_std] extern | crate std; | | This is problematic, because many #![no_std] libraries | end up linking libstd "by accident", even though that's | probably not their intent. | | So I agree with you that #![no_std] isn't a silver | bullet. I think it is still useful in that it lets you | avoid linking the standard library by default, which is | necessary condition for embedded development. It is not a | sufficient condition, in that in practice you actually | want to forbid the standard library and other libraries | from being linked, and not just "don't link them by | default, but do so by accident". | fortran77 wrote: | This is simply false. Windows APIs are stable. It's one of the | best platforms for back compatibility. The kernel32/user32 API | has been stable for 20 years. Rust has some work to do. | codeflo wrote: | That's exactly my point: The problem is that Rust _doesn't_ | use the stable kernel32 /user32 functions (VirtualAlloc | etc.), but the libc ones, which don't have a stable location | on Windows. Without looking it up: Which DLL do you have to | redistribute this week to get "malloc"? | fortran77 wrote: | I just tell Visual Studio to create an installer, I select | the set of target platforms, and it just works. See, for | example: https://docs.microsoft.com/en- | us/cpp/ide/walkthrough-deployi... | ekidd wrote: | As far as I know, the official system interface on Windows and | several Unix systems is via the standard library, not via | direct syscalls. I don't know about the MacOS. But in general, | you may be required to dynamically link the standard library on | many platforms. | | Linux guarantees syscalls are stable. And on Linux, you have | the option of telling Rust to cross-compile using a statically- | linked musl-libc. (If you also need to statically link OpenSSL | or a few other common libraries, I maintain | https://github.com/emk/rust-musl-builder, and there's at least | one similar image out there.) | pansa2 wrote: | AFAIK on Windows, the hierarchy is: | | C library => kernel32.dll => ntdll.dll => system calls | | You don't have to go via the C library - calling kernel32 | directly is fine (I believe this is what Go does). However, | it's very rare to call ntdll or to make system calls | directly. | ludamad wrote: | I'm guessing the performance benefits are negligible | anyway? | codeflo wrote: | Basically yes. ntdll is an implementation detail that | shouldn't be relied upon. kernel32/user32 and friends are | considered the "proper" interface to the system and have | been stable for decades. | vardump wrote: | There are some ntdll calls that are officially documented | and ok to use. Of course there are also a lot of calls | you shouldn't use. | | When necessary, it's fine to use even undocumented ones | to support Windows 7 and older. It's not like those are | going to change anymore. | ChrisSD wrote: | To expand on that, on Windows it's best to use kernel32 (or | WindowsApp) unless you really need the cross-platform | convenience of the C lib. The exception being architecture | dependent functions like memcmp, memcpy and memset which | will most likely have efficient assembly implementations. | | ntdll is lower level and technically unstable but core | functions have been pretty stable for a long time. They | could of course have breaking changes but it risks breaking | things like cygwin. Microsoft tends to take compatibility | seriously, although perhaps not as much as they used to. | | Direct system calls are completely unstable. They can and | do change a lot between releases. You'd need to use a | lookup table for every build of Windows. | vardump wrote: | > it's very rare to call ntdll or to make system calls | directly | | Until you need to do something that's not possible through | kernel32.dll. Sometimes I've called ntdll.dll directly to | support older Windows versions. | majewsky wrote: | macOS requires you to make syscalls through libSystem if you | want a stable interface. Go binaries used to make direct | syscalls until 1.10. Since this caused major breakage on most | new macOS releases, they have since switched to using | libSystem as well in 1.11: | | > On macOS and iOS, the runtime now uses libSystem.dylib | instead of calling the kernel directly. This should make Go | binaries more compatible with future versions of macOS and | iOS. The syscall package still makes direct system calls; | fixing this is planned for a future release. | | Source: https://golang.org/doc/go1.11 | masklinn wrote: | > I don't know about the MacOS. | | MacOS literally forbids statically linking to libSystem. | | Go finally had to bow down and accept that they just could | not perform raw syscalls on MacOS after gettimeofday (IIRC) | changed ABI multiple times during the Sierra beta. | codeflo wrote: | That's partly true, but the stable interface on Windows is | not the libc, but the kernel32/user32 functions like | VirtualAlloc, CreateFileW etc. Those are stable since the NT | days. The libc functions like malloc and fopen are a | compatibility layer above that and unfortunately switch | places every few years. Currently they are delivered by a | combination of a Windows update and a redistributable | package, which makes it a nightmare to ship (on pre-Windows | 10 even more so). | jfkebwjsbx wrote: | Do you really need to ship them? I thought libc (CRT?) in | Windows was a given, and what used to be redistributed was | only the C++ ones. Is not that the case? | steveklabnik wrote: | The problem with this is that not every system makes their | system call ABI stable. You have two choices here: use the | interface that is stable (which is libc), or track changes and | fix things up when they break. | codeflo wrote: | The only stable interface on Windows are the documented | functions from kernel32.dll, user32.dll etc. Libc is a | compatibility layer above that, that Microsoft invents a new | incompatible distribution mechanism for every 3-5 years. It's | pure DLL hell unfortunately. | | Edit: Not even the DLL name of Microsoft's libc is stable | (msvcrt140.dll etc.), leading to all kinds of wild goose | chases when trying to run old binaries. | steveklabnik wrote: | Yes, I was being imprecise wrt Windows, thanks :) | codeflo wrote: | Yeah, I saw your username too late or I would have known | that you know that. ;) But it's a common misunderstanding | among many Unix programmers, so I feel it was good to | clarify. | steveklabnik wrote: | Nah I think it's helpful! Comments are read by a lot more | folks than just the person you're replying to. I knew it | wasn't exactly libc but I didn't know the full hierarchy | involved. | weff_ wrote: | I think Go tries to reduce its dependence on libc but, by | default, it will still link to it. | | For instance, this code: package main | import "net" func main(){ net.Dial("tcp", | "golang.org:80") } | | When compiled with _go build main.go_ does link: | linux-vdso.so.1 (0x00007ffe3d7f0000) libpthread.so.0 => | /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007fc7ac05a000) | libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 | (0x00007fc7abc69000) /lib64/ld-linux-x86-64.so.2 | (0x00007fc7ac279000) | | There are of course compiler options to truly statically | compile. | rhinoceraptor wrote: | Go's pathological NIH syndrome does come with downsides. For | example, there was an infamous memory corruption bug in the way | they called into vDSO. | adev_ wrote: | > Go's pathological NIH syndrome does come with downsides. | | Yes. And you can add the inability to use the glibc's nss | modules under Linux. | | Making it unable to use sssd properly and authenticate a | posix user on a machine with LDAP authentication. | | Getting completely independent from OS sys lib has | consequences | maoeurk wrote: | Assuming this was run on a 64bit system, the Rust version seems | to be allocating and zeroing twice as much memory as the Go | version. | | edit: this has been pointed out as incorrect, Go ints are 8 bytes | on 64bit systems -- thanks for the correction! | let mut cache: Vec<usize> = | (0..=target.chars().count()).collect(); | | which can be simplified as let mut cache: | Vec<usize> = vec![0; target.len()]; | | vs cache := make([]int, len(target)+1) for | i := 0; i < len(target)+1; i++ { cache[i] = i } | | Rust usize being 8 bytes and Go int being 4 bytes as I understand | it. | | So between doing more work and worse cache usage, it wouldn't be | surprising if the Rust version was slower even with the faster | allocator. | rossmohax wrote: | Go int is 8 bytes | eis wrote: | It can be either depending on the system. | | https://golang.org/ref/spec#Numeric_types | [deleted] | devit wrote: | The main problem is that allocating a vector for each evaluation | is completely wrong: instead, it needs to be allocated once by | making the function a method on a struct containing a Vec; which | makes the allocator moot. | | The second problem is that at least the Rust code is decoding | UTF-8 every iteration of the inner loop instead of decoding once | and saving the result, or even better interning the characters | and having versions of the inner loop for 32-bit chars and 8-bit | and 16-bit interned indexes. | | Furthermore the code rereads cache[j] instead of storing the | previous value, and doesn't do anything to make sure that bound | checks are elided in the inner loop (although perhaps the | compiler can optimize that). | | The code for computing the min seems to have been written | mindlessly rather than putting serious thought towards whether to | have branches or not and in what order (depending on an analysis | of what the branch directions rates would be). | | Implausible benchmark results are almost always an indicator of | the incompetence of the person performing the benchmark. | burntsushi wrote: | > The second problem is that at least the Rust code is decoding | UTF-8 every iteration of the inner loop instead of decoding | once and saving the result | | Indeed. This is a pretty damning difference. The `target` | string is being repeatedly UTF-8 decoded where as the same is | not true in the Go version. The Go version even _goes out of | its way_ to do UTF-8 decoding exactly once for each of `source` | and `target`, but then doesn 't do the same for the Rust | program. | | > Implausible benchmark results are almost always an indicator | of the incompetence of the person performing the benchmark. | | Come on. We can do better than this. Please don't make it | personal. We all have to learn things at some point. | masklinn wrote: | > Indeed. This is a pretty damning difference. The `target` | string is being repeatedly UTF-8 decoded where as the same is | not true in the Go version. The Go version even goes out of | its way to do UTF-8 decoding exactly once for each of | `source` and `target`, but then doesn't do the same for the | Rust program. | | I'm really not sure that's an issue, utf8 decoding is very, | very cheap and it's iterating either way. | | It would have to be benched, but I wouldn't be surprised if | allocating the caches (at least one allocation per line of | input) had way more overhead, especially given the inputs are | so very short. | | I'm not going to claim Rust's utf8 decoder is the fastest | around, but it's _very_ fast. | burntsushi wrote: | It's cheap, but not _that_ cheap. It shouldn't be as cheap | as just iterating over a sequence of 32-bit integers. | | But yes, I did benchmark this, even after reusing | allocations, and I can't tell a difference. The benchmark | is fairly noisy. | | I agree with your conclusion, especially after looking at | the input[1]. The strings are so small that the overhead of | caching the UTF-8 decoding is probably comparable to the | cost of doing UTF-8 decoding. | | [1] - https://github.com/christianscott/levenshtein- | distance-bench... | matklad wrote: | > It shouldn't be as cheap as just iterating over a | sequence of 32-bit integers. | | I wonder if there are any benchmarks about this? | Specifically, it feels like in theory iterating utf8 | _could_ actually be faster if the data is mostly ascii, | as that would require less memory bandwidth, and it seems | like the computation is simple enough for memory to be | the bottleneck (this is a wild guess, I have horrible | intuition about speed of various hardware things). In | _this_ particular benchmark this reasoning doesn't apply, | as strings are short and should just fit in cache. | matklad wrote: | Did a quick benchmark: https://gist.github.com/matklad/eb | c1acd2fab884f3ff9d96d4175f.... | | Seems like hypothesis is not wrong, but also is not | interesting: the difference is pretty small, and it can | be easily dwarfed by the big wins for utf32 if something | like auto-vectorization or bounds-check elision kicks in. | Also I am not entirely sure that the diff I observe is | not due to something completely irrelevant, like one loop | ending up on a lucky cacheline or what not. | burntsushi wrote: | If all you need to do is validate UTF-8, then yes, mostly | ASCII enables some nice fast paths[1]. | | I'm not a UTF-8 decoding specialist, but if you need to | traverse rune-by-rune via an API as general as | `str::chars`, then you need to do some kind of work to | convert your bytes into runes. Usually this involves some | kind of branching. | | But no, I haven't benchmarked it. Just intuition. A | better researched response to your comment would | benchmark, and would probably at least do some research | on whether Daniel Lemire's work[2] would be applicable. | (Or, in general, whether SIMD could be used to batch the | UTF-8 decoding process.) | | [1] - https://github.com/BurntSushi/bstr/blob/91edb3fb3e1 | ef347b30e... | | [2] - https://lemire.me/blog/2018/05/16/validating- | utf-8-strings-u... | molf wrote: | > Implausible benchmark results are almost always an indicator | of the incompetence of the person performing the benchmark. | | An ad hominem attack surely isn't needed. | masklinn wrote: | And that's really too bad because I think the rest of the | comment is spot-on as the fundamental issue is that Rust | simply delegates to the system allocator, and MacOS's | allocator is pretty slow. | | As a result, while Rust allows very explicitly and relatively | easily removing allocations (compared to C or C++), getting | the most performances out of your program also _requires_ | doing so, unless you use a non-system allocator with better | support for the workload. | heavenlyblue wrote: | But you can use a custom allocator in Rust: | https://doc.rust-lang.org/1.9.0/book/custom-allocators.html | | You don't have that option in Go | masklinn wrote: | > But you can use a custom allocator in Rust: | https://doc.rust-lang.org/1.9.0/book/custom- | allocators.html | | That is true -- and was demonstrated by the article as | its "fix" was to use jemalloc, but even a custom | allocator will usually be less performant than a high- | performance GC there, because the GC's allocator has more | insight into the requirements and workloads. | | It might be possible to give that insight to a custom | allocator by using the allocator's custom APIs, but this | requires a deeper integration between the program and the | allocator. | yencabulator wrote: | > You don't have that option in Go | | Sure you do. You can even build one from nothing but mmap | in pure Go. It just won't be part of the garbage | collection, so you get malloc/free or arenas etc, just | like in C/Rust/whatever. | steveklabnik wrote: | (those are very old docs, you want to link to | https://doc.rust- | lang.org/stable/std/alloc/trait.GlobalAlloc...) | chrfrasco wrote: | > The second problem is that at least the Rust code is decoding | UTF-8 every iteration of the inner loop instead of decoding | once and saving the result, or even better interning the | characters and having versions of the inner loop for 32-bit | chars and 8-bit and 16-bit interned indexes. | | I tried this. Pulling the .chars() call out of the loop & | collecting into a Vec made the performance even worse - the | following balloons the runtime from ~2.7s to ~5s: | let target_chars: Vec<char> = target.chars().collect(); | for (i, source_char) in source.chars().enumerate() { | let mut next_dist = i + 1; for (j, | target_char) in target_chars.iter().enumerate() { | | > written mindlessly > incompetence of the person | | No challenge there :P I am operating under the assumption that | I don't need to understand how compilers work to get good | performance from rust (where good is "similar enough to an | equivalent go program") | burntsushi wrote: | Indeed, it looks like doing the UTF-8 decoding up-front is | exacerbating the performance difference in the allocator. | | I think this is where the GP's first suggestion comes into | play. If one were writing this code _and_ cared about | performance, then you'd usually find a way to reuse | allocations. I submitted a PR to demonstrate this: | https://github.com/christianscott/levenshtein-distance- | bench... | chrfrasco wrote: | Yeah, I can definitely see how that would be a more | performant approach. | | I suppose I wasn't so interested in figuring out how to | make this algorithm as fast as possible as much I was | interested in diving into why _this particular_ | implementation was slower. | | I'm not totally convinced that this difference is down to | the string being parsed over and over, though | | > doing the UTF-8 decoding up-front is exacerbating the | performance difference in the allocator | | This seems to suggest that allocation might be dominating | here. WDYT? Either way, I've added a disclaimer to the | post. | alvarelle wrote: | Could also try to use the smallvec crate in this case, which put | small allocation on the stack https://docs.rs/smallvec/ | anderskaseorg wrote: | I've found that Microsoft's mimalloc (available for Rust via | https://crates.io/crates/mimalloc) typically provides even better | performance than jemalloc. Of course, allocator performance can | vary a lot depending on workload, which is why it's good to have | options. | mqus wrote: | Is it intentional that the benchmarks include not only running | the program itself but also compiling it? e.g. in the linked | source code, the bench.sh includes the compilation step which is | known to be slow in rust: #!/usr/bin/env bash | set -e run() { cargo build --release 2> | /dev/null ./target/release/rust } | run; | | Sure, if you run it many times in succession the compiler won't | do much but the benchmarking script (run.js) doesn't really | indicate that and the blog post also doesn't mention that. | | EDIT: I was just being stupid, don't mind me. The times were | taken within each language and not externally. | chrfrasco wrote: | run.js is not doing the benchmarking. If you look at the source | for each of the programs being benchmarked, you'll see that the | programs themselves are responsible for benchmarking | matklad wrote: | Note that Rust and Go programs differ in a seemingly | insignificant, but actually important detail. | | Rust does this next_dist = std::cmp::min( | dist_if_substitute, std::cmp::min(dist_if_insert, | dist_if_delete), ); | | Go does this nextDist = min( | distIfDelete, min(distIfInsert, distIfSubstitute) | ) | | The order of minimums is important for this dynamic programming | loop. If I change Rust version to take minimums in the same order | (swapping substitute and delete), runtime drops from 1.878696288 | to 1.579639363. | | I haven't investigated this, but I would guess that this is the | same effect I've observed in | | * https://matklad.github.io/2017/03/12/min-of-three.html | | * https://matklad.github.io/2017/03/18/min-of-three-part-2.htm... | | (reposting my comment from reddit, as it's a rather unexpected | observation) | ishanjain28 wrote: | I think they made the rust version same as Go because I cloned | it just now and they are both the same. Also, Thank you soo | much for the blog posts! :) | ttd wrote: | Those are two really well written articles -- taking a | complicated topic and making it very accessible. Thanks! FWIW I | think a companion article on how to effectively use perf for | tasks like this would be a great addition, since it can be a | bit novice-unfriendly. | drfuchs wrote: | Very few things have ever been measured accurately to ten | significant digits. How much did these numbers change run to | run? How many measurements did you take? Were the caches warmed | up similarly? Still, point taken. | matklad wrote: | The excessive "precision" is because I've just copy-pasted | what the original bench printed. | | As for "is this a reliable result", I believe I've performed | diligence, appropriate for a HN comment, to make sure that | this is not a completely random result. As I've said, I did | not investigate this particular bit of code thoroughly. You | are welcome to read the linked blog posts, which study the | issue in depth. | londons_explore wrote: | Since min(a, min(b,c)) == min(b, min(a,c)), perhaps the | compiler should be smart enough to swap the comparisons around | if it makes it quicker? | dan-robertson wrote: | I suspect that statement is not true for floats. Possibly you | don't get the same float from min(0,-0) as min(-0,0), and | similarly with NaNs. Rust specifies that if one input is NaN | then the other is returned but doesn't say what happens if | both are NaN. | gpm wrote: | Hmm, not sure what the llvm semantics look like, but you're | right about the assembly semantics, vminsd (the instruction | used in the post) is unfortunately not symettric in it's | handling on NaN. | | https://www.felixcloutier.com/x86/minsd | fluffything wrote: | > Rust specifies that if one input is NaN then the other is | returned but doesn't say what happens if both are NaN. | | It does. If both are NaN a NaN is returned. Note, however, | that when Rust says that a NaN is returned, this means that | any NaN can be returned. So if you have min(NaN0, NaN0) the | result isn't necessarily NaN0, it might be another NaN with | a different payload. | dan-robertson wrote: | Right. That's what I was getting at but I didn't know the | NaN-return rule. | londons_explore wrote: | NaN's are rarely required on the fast path of any code... | The compiler ought to make a 2nd slow implementation for if | any NaNs are seen. | fluffything wrote: | Sure, but then it needs to add a branch to detect the | inputs, and maybe keep around some state to switch | between implementations at run-time. | | Its far from clear that doing that is, in general, worth | it. | [deleted] | arcticbull wrote: | Part of the problem may be they re-implemented | `std::cmp::min` at the bottom of the file, I wonder if | there's a more optimized version in the stdlib. | gpm wrote: | One of the most frustrating parts of rust (for me), is that | `std::cmp::min` (and some other methods) require that their | arguments are `Ord` (totally ordered), and floats are only | partially order because of NaN, so you can't use | std::cmp::min on floats. | shepmaster wrote: | That's what https://doc.rust- | lang.org/std/primitive.f32.html#method.min is for | Arnavion wrote: | You can use the ordered_float crate. It has a newtype | that cannot hold NaNs and thus impls Ord. | jfkebwjsbx wrote: | Yeah, it makes no sense to be the default. Code that | expects to treat NaNs is very rare. | arcticbull wrote: | Safety is the whole idea behind Rust, and if you draw the | line here, that's neither in line with the Rust ethos, | nor particularly valuable. After all, code that "expects | to handle" null was pretty rare too ;) | steveklabnik wrote: | Depending on what behavior you want, there's a bunch of | wrapper crates. | arcticbull wrote: | I do believe that's quite intentional, due to the | inexactness of floating-point values, it's not "right" to | do that. There should be a `partial_min` function though, | IMO, which has weaker guarantees. | gpm wrote: | It's absolutely intentional, and it's got being | technically correct on its side, it's just frustrating. | _____smurf_____ wrote: | Given this information, and for general parsing | functionalities, which one is faster, Go or Rust? | fluffything wrote: | You can write a parser as far as the limits of your | imagination allow in Rust, or C, or any other baremetal | language without a thick run-time. | | In Go, beyond the limits of your imagination, you'll also hit | other limits, like those of the garbage collector. | arcticbull wrote: | As always, it depends on what your goals are. String | processing is usually not the long pole when you're building | something that consumes the output of a parser. Based on | micro and macro benchmarks, Rust is typically substantially | faster than Go and pretty much always uses less RAM [1]. | | But again, depends what you're doing with the output, and if | these deltas even matter in your context. | | [1] https://benchmarksgame- | team.pages.debian.net/benchmarksgame/... | hu3 wrote: | I'm surprised that a naive implementation in Go can outperform a | naive implementation in Rust. | empath75 wrote: | I'm not. Hell, when I first started learning rust i frequently | wrote code that ran slower than _python_. | novocaine wrote: | It may be that the system allocator is making an excessive number | of syscalls to do work, whereas most custom allocators will | allocate in slabs to avoid this. You could try using dtruss or | strace to compare the differences. | savaki wrote: | This discussion seems to me like a microcosm of the differences | in philosophies between Rust and Go. | | With Rust, you have much more control, but you also need a deep | understanding of the language to get the most out of it. With Go, | the way you think it should work is usually is Good Enough(tm). | jeffdavis wrote: | I wouldn't put it that way. Both languages are fast at nice | straight-line code. | | The main area I'd expect to see performance benefits for rust | (though I don't have experience here) is larger rust programs. | Rust's zero-cost abstractions have more benefits as the | abstractions nest more deeply. For a small program, you don't | really have a lot of abstractions, so Go will do just fine. | | I think Go has a number of nice performance tricks up it's | sleeve, though, so I wouldn't rule out Go on performance | grounds too quickly. | chrfrasco wrote: | Hey all, as some keen-eyed commenters have pointed out, it looks | like the rust program is not actually equivalent to the go | program. The go program parses the string once, while the rust | program parses it repeatedly inside every loop. It's quite late | in Sydney as I write this so I'm not up for a fix right now, but | this post is probably Fake News. The perf gains from jemalloc are | real, but it's probably not the allocators fault. I've updated | the post with this message as well. | | The one-two combo of 1) better performance on linux & 2) jemalloc | seeming to fix the issue lured me into believing that the | allocator was to blame. I'm not sure what the lesson here is - | perhaps more proof of Cunningham's law? | https://en.wikipedia.org/wiki/Ward_Cunningham#Cunningham's_L... | arcticbull wrote: | Thanks for following up. Just as an FYI, there's a few bugs in | your implementation, the most obvious one is the use of | ".len()" in a number of places interspersed with | ".chars().count()". These two return different values. ".len()" | returns then number of UTF-8 bytes in the input string, which | for ASCII is the same as ".chars().count()" obviously, but if | you do attempt any Unicode characters, your function won't | work. ".chars()" provides Unicode Scalar Values (USVs) -- which | is a subset of code points, excluding surrogate pairs [1]. Note | also this is _not_ the same as a Go rune, which is a code point | _including_ surrogate pairs. | | Secondly, you re-implemented "std::cmp::min" at the bottom of | the file, and I'm not sure if the stdlib version is more | optimized. | | Lastly, well, you caught the issue with repeated passes over | the string. | | I've fixed the issues if you're curious: | https://gist.github.com/martinmroz/2ff91041416eeff1b81f624ea... | | Unrelated, I hate the term "fake news" as it's an intentional | attempt to destroy the world public's faith in news media. It's | a cancer on civilized society. Somewhere your civics teacher is | crying into some whiskey, even though of course you're joking. | | [1] http://www.unicode.org/glossary/#unicode_scalar_value | arcticbull wrote: | This doesn't even begin to get into the question of what | Levenshtein Distance even means in a Unicode context. What's | the Levenshtein Distance of 3 emoji flags? I suppose we | should be segmenting by grapheme clusters and utilizing a | consistent normalization form when comparing, but Rust has no | native support for processing grapheme clusters -- or for | normalizations I believe. The UnicodeSegementation crate | might help. | | Based on some cursory research, the go version differs in a | more subtle way too. A Rune is a Code Point, which is a | superset of the Rust "char" type; it includes surrogate | pairs. | derefr wrote: | Levenstein (edit) distance is fundamentally an information- | theoretical concept defined on bitstreams, as | insertions/deletions/swaps _of individual bits_ within a | stream. It has a lot in common with error-correcting codes, | fountain codes, and compression, which all also operate on | bitstreams. | | Any higher-level abstract mention of Levenstein distances | (e.g. of Unicode codepoints) is properly supposed to be | taken to refer to the Levenstein distance of a conventional | (or explicitly specified) binary encoding of the two | strings. | grantwu wrote: | Can you point to a source that defines Levenstein | distance as only referring to bitstreams? | | A translation of the original article [1] that introduced | the concept notes in a footnote that "the definitions | given below are also meaningful if the code is taken to | mean an arbitrary set of words (possibly of different | lengths) in some alphabet containing r letters (r >= 2)". | | And if you wish to strictly stick to how it was | originally defined, you'd need to only use strings of the | same length. | | More recent sources [2] say instead "over some alphabet", | and even in the first footnote, describe results for | "arbitrarily large alphabets"! | | [1] | https://nymity.ch/sybilhunting/pdf/Levenshtein1966a.pdf | | [2] https://arxiv.org/pdf/1005.4033.pdf | arcticbull wrote: | And Unicode is the biggest alphabet haha. | klodolph wrote: | > Any higher-level abstract mention of Levenstein | distances (e.g. of Unicode codepoints) is properly | supposed to be taken to refer to the Levenstein distance | of a conventional (or explicitly specified) binary | encoding of the two strings. | | This doesn't match any definition of Levenshtein distance | that I've ever encountered. I've always seen it defined | in terms of strings over some alphabet, and the binary | case is just what happens when your alphabet only has two | symbols in it. | | Quite naturally the problem with Unicode strings is that | there is are multiple ways to treat them as sequences. | One obvious way is to treat them as a sequence of Unicode | scalar values, but that's by no means what you'd want-- | maybe a sequence of grapheme clusters may be more | appropriate, and you also may wish to consider | normalization. | arcticbull wrote: | The problem is there's not a "conventional" binary | encoding of Unicode strings. You can create the same | exact character many different ways, from a single scalar | value to a composition of multiple scalar values. There's | also multiple different ways of ordering different pieces | of a composite character that yield the same value. Would | we not want to utilize a consistent decomposition and | consistent logical segmentation for the string? It's no | longer enough to iterate over a string one byte at a time | and derive any meaning. Is it right that the Levenstein | distance between 2 equal characters, "a" and "a", might | be 12, simply because the joiners were ordered | differently? | | It seems like segmentation by grapheme cluster and | comparison using a consistent normalization would provide | the same logical answer as a classic byte-wise | Levenshtein distance on an ASCII string. [1] | | Or are you suggesting that's too high level and we should | just consider this to be operating on bit strings that | happen to be grouped into bytes, and not worry about the | logical implications. Therefore we'd just use a | consistent normalization form on both input strings, and | it's okay that the distance is up to like 10-15 for a | single character difference in a composite character and | 1 in an ASCII character. That sounds totally reasonable | too, just different. | | [1] http://unicode.org/reports/tr15/ | masklinn wrote: | > Hey all, as some keen-eyed commenters have pointed out, it | looks like the rust program is not actually equivalent to the | go program. The go program parses the string once, while the | rust program parses it repeatedly inside every loop. [...] The | perf gains from jemalloc are real, but it's probably not the | allocators fault. I've updated the post with this message as | well. | | I don't know that it would be a gain: Rust is pretty good at | decoding UTF8 quickly given how absolutely fundamental that | operation is, and "caching" the decoded data would increase | pressure on the allocator. | | Unless you also changed the interface of the levenshtein to | hand it preallocated cache, source and destination buffers (or | the caller did that). | | edit: to the downvoter, burntsushi did the legwork of actually | looking at this[0] and found caching the decoding to have no | effect at best, unless the buffers get lifted out of the | function entirely, which matches my comment's expectations. | | [0] https://news.ycombinator.com/item?id=23059753 | | > But yes, I did benchmark this, even after reusing | allocations, and I can't tell a difference. The benchmark is | fairly noisy. | [deleted] | otterley wrote: | > this post is probably Fake News | | It's not Fake News. Fake News is the publication of | intentionally false stories. This is just erroneous. | | There's a yawning chasm between the two. | arcticbull wrote: | "fake news" is an attempt at permanently damaging the | American public's faith in the fifth estate. Nobody should | ever use that term in the current political climate, ever. | FpUser wrote: | "damaging the _American public 's_ faith" - Really, just | the American? You ever realize there are other human | species on this planet? | arcticbull wrote: | That's a fair criticism if clumsily put. For the record, | I'm Canadian, British and Polish. I called out America as | by far the biggest perpetrator at the moment. | | So to address the second half of your post, I have indeed | heard of "other countries" (just ask USCIS) although | being a US resident at the moment I want to speak to what | I know, not for anyone else. | [deleted] | will4274 wrote: | Says who? | | When news organizations take other news organizations word | for it and the story is false, that's fake news. We called it | something different back then, but fake news led to the | invasion of Iraq. Negligence is sufficient for fake news, | malice not required. | otterley wrote: | Fake News was a term invented around 2015 during the run-up | to the 2016 election. It specifically referred to the | phenomenon of literally fake news stories, such as rallies | and completely false stories about politicians, being | published by legitimate-sounding but nonexistent news | organizations, using platforms like Facebook to disseminate | themselves. Usually these were done from China and Russia. | empath75 wrote: | And then trump redefined it to be about legitimate news | sources exclusively. We've really been in upside down | world for four years. | Ar-Curunir wrote: | It's a joke mate | arcticbull wrote: | Of course, but it chips away at the Overton window, | normalizing it. | tromp wrote: | Missed chance to shorten title to "Making Rust Go Fast" :-) | katktv wrote: | Making Rust As Fast As It Can Go | codegladiator wrote: | Making the code rust as fast as it can go | kreetx wrote: | Go Rust! | arcticbull wrote: | GO!!!! | savaki wrote: | A few folks have commented that there were logic errors in the Go | version. Specifically that len("foo") = 5 | | should instead have returned len("foo") = 3 | | I submitted a pull request, | https://github.com/christianscott/levenshtein-distance-bench..., | that fixes these issues in the Go implementation. | | Interestingly enough, when I re-ran the benchmark, the Go version | is roughly 19% faster than it was previously: | old: 1.747889s new: 1.409262s (-19.3%) | molf wrote: | The Rust version uses `target.chars().count()` to initialise the | cache, while the Go version counts up to `len(target)`. These are | not equivalent: the Rust version counts Unicode code points, the | Go version counts bytes. | | I am confused by the implementations, although I have not spent | any time testing them. Both versions contain a mix of code that | counts bytes (`.len()` and `len(...)`) and Unicode code points | (`chars()` and `[]rune(...)`). My guess is that the | implementation might not work correctly for certain non-ASCII | strings, but I have not verified this. | | Of course, if only ASCII strings are valid as input for this | implementation then both versions will be a lot faster if they | exclusively operate on bytes instead. | steveklabnik wrote: | Yes, especially because that changes it from constant time | (strings know their length in bytes) to linear time (counting | chars means a loop through the text.) | | I was a bit suspicious of the conclusion, but didn't dig in | myself. I imagine this would be a much larger source of | difference. | eis wrote: | Yep. | | Here a Go playground example showing that the result is indeed | wrong: | | https://play.golang.org/p/vmctMFUevPc | | It should output 3 but outputs 5 because each o is two bytes, | len("foo") = 5. | | I would suggest using "range" to iterate over the unicode | characters. | savaki wrote: | Looks like a small bug in the go code, corrected here. | Original author should have used rune throughout. | https://play.golang.org/p/mGZZMFtMgHH | earthboundkid wrote: | If you diff foo and f though, it correctly gives an edit | distance of 2. | | The code is weird because someone knew enough to convert the | strings to slices of runes but not enough to use the rune | slices consistently. :-/ | arcticbull wrote: | Not to mention Rune slices are insufficient for things like | Flag emoji and Family emoji, which is going to be a ton of | separate runes put together. The latter of which, | apparently deletes one family member at a time when you hit | "backspace". | chrfrasco wrote: | Nice catch, thanks for pointing this out. I've updated the | cache initialization to use `len(targetChars)` rather than | `len(target)`: cache := make([]int, | len(targetChars)+1) for i := 0; i < len(targetChars)+1; | i++ { cache[i] = i } | | AFAIK this makes them equivalent (fingers crossed). It seems to | not have made much of a difference (-0.03s) | Thaxll wrote: | https://blog.golang.org/strings | molf wrote: | They might be equivalent (perhaps apart from other issues | pointed out in other comments), but both implementations are | still either | | 1) incorrect if UTF-8-strings are supposed to be valid input, | or | | 2) very inefficient if only ASCII-strings are supposed to be | valid input. | [deleted] | maxmcd wrote: | huh, yeah if I switch the rust version to target.len() the | execution time drops by more than 10% | | edit: and if I switch to source.bytes().enumerate() it drops by | 20% more | arcticbull wrote: | Go's version and the Rust version differ in yet more subtle | ways. It appears that Go's "rune" type is a Code Point, but | Rusts's "char" type is a Unicode Scalar Value, a subset of Code | Point that excludes surrogate pairs. Both versions will not | work with complex Unicode input unless you perform both | segmentation by Grapheme Cluster [1] and utilize a consistent | Normalization [2] when comparing clusters. | | Unicode is hard, fams, and it's rare that anything that looks | easy is actually what you want. | | [1] https://unicode.org/reports/tr29/ | | [2] http://unicode.org/reports/tr15/ | fluffything wrote: | > Both versions will not work with complex Unicode input | unless you perform both segmentation by Grapheme Cluster [1] | and utilize a consistent Normalization [2] when comparing | clusters | | Doing this is quite easy from Rust. | arcticbull wrote: | There's no standard library API in Rust for either grapheme | cluster segmentation or for normalization. You'd need a | third-party crate, at which point it's less "easy in Rust" | and more "someone did the hard work for you" because, its | really not easy anywhere haha. | fluffything wrote: | No, and there probably never be a libstd implementation | of it, but there is a single crate that everybody uses: | unicode-segmentation [0] | | > You'd need a third-party crate, at which point it's | less "easy in Rust" and more "someone did the hard work | for you" | | "Someone did the work for you" is true of all code that | you did not write yourself, independently of whether that | code is easy or hard to write, or whether it is in the | standard library or not. | | unicode-segmentation is pretty much the only library of | its kind in Rust, is super easy to discover (google "Rust | grapheme cluster", "Rust unicode segmentation", etc.), | and using it is as easy as just typing "cargo add | unicode-segmentation". | | The library is maintained by a Rust core team member, a | Rust standard library team member, is used by servo and | firefox, and is the only unicode segmentation library | that people use. | | Since many programs don't need to do any kind of unicode | segmentation, making it part of the standard library | sounds like a bad idea. In particular, given that unicode | is a moving standard, it would mean that people stuck on | old Rust toolchains (e.g. LTS linux distros) cannot | create binaries that do proper unicode segmentation, | which does not make sense. | | The underlying problem is that many programmers do not | know that they need to do unicode segmentation in the | first place. Moving this into the standard library does | not fix that problem either. | | [0]: https://crates.io/crates/unicode-segmentation | jfkebwjsbx wrote: | > Since many programs don't need to do any kind of | unicode segmentation, making it part of the standard | library sounds like a bad idea. In particular, given that | unicode is a moving standard, it would mean that people | stuck on old Rust toolchains (e.g. LTS linux distros) | cannot create binaries that do proper unicode | segmentation, which does not make sense. | | That has nothing to do with it. You could still have a | library that has the very latest Unicode standard support | for those that need the very latest, and keep updating | the stdlib one. | | It does not make sense either to expect someone to use | bleeding edge libraries from cargo yet use an old rustc | compiler. They can easily update it if needed. | fluffything wrote: | > It does not make sense either to expect someone to use | bleeding edge libraries from cargo yet use an old rustc | compiler. | | Of course it does. Many software users are stuck on | multiple-year-old toolchains for various reasons, yet | these systems still need to be able to handle unicode | properly. | | > They can easily update it if needed. | | No, they cannot. Many users are stuck in older windows | versions, linux versions, LTS linux versions, etc. | because of their organization or their clients | requirements. | | Telling a client that you can't develop an app for them | because their 2 year old Ubuntu is too old is often not a | very successful business model. | | > and keep updating the stdlib one. | | These updates would only apply to newer Rust toolchains, | that many users cannot use. Unless you are suggesting the | release of patch versions for the soon to be 100 old Rust | toolchains in existence every time the unicode standard | is updated. | | This is too much trouble and work for little gain, given | that one can still use a Rust 1.0 compiler to compile the | latest version of the unicode-segmentation crate without | problems. | arcticbull wrote: | IMO it's not as clear-cut as you make it out to be. It's | a pretty arbitrary line to exclude full Unicode support | from the standard library. There's a ton of stuff in | libstd that could be supported as third-party crates. I | don't disagree with what the Rust team has done, and I | think there could be a world in which the compiler team | also releases first-party crates with "enhanced" | functionality beyond just libstd. I consider proper | Unicode support to be a "first party" thing, but I also | don't think it has to be in libstd per se, necessarily. | arcticbull wrote: | There's a bunch of issues with the Rust implementation, not least | that the initial condition where source or target lengths are | zero, it returns the number of UTF-8 bytes of the other, but all | other computations are performed in terms of Unicode chars -- | except at the end: `cache[target.len()]` which will return the | wrong value if any non-ASCII characters precede it. | | Further, each time you call `.chars().count()` the entire string | is re-enumerated at Unicode character boundaries, which is O(n) | and hardly cheap, hence wrapping it in an iterator over char | view. | | Also, re-implementing std::cmp::min at the bottom there may well | lead to a missed optimization. | | Anyways, I cleaned it up here in case the author is curious: | https://gist.github.com/martinmroz/2ff91041416eeff1b81f624ea... | submeta wrote: | Impressed to see four posts about Rust on the front page of HN | simultaneously. ___________________________________________________________________ (page generated 2020-05-03 23:00 UTC)