[HN Gopher] Making Rust as Fast as Go
       ___________________________________________________________________
        
       Making Rust as Fast as Go
        
       Author : chrfrasco
       Score  : 252 points
       Date   : 2020-05-03 10:32 UTC (12 hours ago)
        
 (HTM) web link (www.christianfscott.com)
 (TXT) w3m dump (www.christianfscott.com)
        
       | virtualritz wrote:
       | I tried this on a spare time project[1]. Runtime in a quick test
       | went down from 14.5 to 12.2 secs on macOS!
       | 
       | So a solid ~15% by changing the allocator to jemalloc.
       | 
       | However, I now have a segfault w/o a stack trace when the data
       | gets written at the end of the process.
       | 
       | Possibly something fishy in some `unsafe{}` code of a dependent
       | crate of mine that the different allocator exposed. :]
       | 
       | Still - no stack trace at all is very strange in Rust when one
       | runs a debug build with RUST_BACKTRACE=full.
       | 
       | [1] https://github.com/virtualritz/rust-diffusion-limited-
       | aggreg...
        
         | saagarjha wrote:
         | I have found that jemallocator is currently broken on macOS
         | Catalina, so that might be the problem. If you can reproduce
         | this issue reliably, I'd love to hear about it because I can't
         | myself unless I use a very specific toolchain that produces -O3
         | binaries that are a real pain to work with.
        
       | pcr910303 wrote:
       | TLDR for people who didn't read:
       | 
       | The speed difference came from the allocator.
       | 
       | Rust switched from jemalloc to the system allocator per ticket
       | #36963[0] for various reasons (like binary bloat, valgrind
       | incompatibility, etc...).
       | 
       | Go uses a custom allocator[1] instead.
       | 
       | To make 'Rust Go fast' (pun intended), one can use the
       | '#[global_allocator]' to use a custom allocator (in this case,
       | with the jemallocator crate) to make allocations fast again.
       | 
       | [0]: https://github.com/rust-lang/rust/issues/36963
       | 
       | [1]: https://golang.org/src/runtime/malloc.go
        
         | k__ wrote:
         | The comments of Rust programmers here also suggest that the
         | Rust implementation is, indeed, different from the Go
         | implementation.
        
           | pcr910303 wrote:
           | It was just a summary of the post contents - the post
           | suggests that the biggest difference comes from the
           | allocator.
        
       | loeg wrote:
       | FreeBSD's system allocator _is_ jemalloc :-).
        
       | ishanjain28 wrote:
       | I tried to benchmark Go/Rust versions as well.
       | 
       | I made 4 changes in Rust version.
       | 
       | 1. Moved up the line that gets a value from cache[j+1] before any
       | calls are made to cache[j]. This removes 1 bound check.
       | (Improvement from 182,747ns down to 176,xyzns +-4800)
       | 
       | 2. Moved from .chars().enumerate() to .as_bytes() and manually
       | tracking current position with i/j variables. (Improvement from
       | 176,xyz ns down to 140,xyz ns)
       | 
       | 3. Moved to the standard benchmark suite from main + handrolled
       | benchmark system.(File read + load + parse into lines was kept
       | out of benchmark)
       | 
       | 4. Replaced hand rolled min with std::cmp::min. (improvement from
       | 140,xyz down to 139,xyz but the std deviation was about the same.
       | So Could just be a fluke. Don't know)
       | 
       | In Go version, I made three changes.
       | 
       | 1. Doing the same thing from #1 in Rust actually increased the
       | runtime from 190,xyz to 232,xyz and quite consistently too. I ran
       | it 10+ times to confirm)
       | 
       | 2. Replaced []rune(source), []rune(target) to []byte(source),
       | []byte(target). (Improvement from 214817ns to 190152 ns)
       | 
       | 3. Replaced hand rolled bench mark system with a proper bench
       | mark system in Go. (Again, File read + load + parse into lines
       | was kept out of benchmark)
       | 
       | So, At the end of it, Rust version was about 50k ns faster than
       | Go version.
       | 
       | Edit #1:
       | 
       | In rust version, I had also replaced the cache initialization to
       | (0..=target.len()).collect() before doing anything els.. This
       | also gave a good perf boost but I forgot to note down the exact
       | value.
        
         | burntsushi wrote:
         | Note that using bytes is a fundamentally different
         | implementation that will produce different results on non-ASCII
         | input. Using codepoints (or "runes") will better approximate
         | edit distance based on visual characters. (And grapheme
         | clusters would be even better. Although one could put the text
         | in composed normal form to get more mileage out of the rune
         | based algorithm.)
        
         | blablabla123 wrote:
         | I'd be really surprised to hear that Go is supposed to be
         | faster than Rust. Of course Rust is a bit newer but to me it
         | always sounded like Go is fast because it's static but it
         | doesn't have to be high-speed if that would sacrifice
         | conciseness. Given that this is an artifical example, this here
         | looks more realistic: https://benchmarksgame-
         | team.pages.debian.net/benchmarksgame/...
        
           | arcticbull wrote:
           | Rust and Go are contemporaries. Rust started in 2006 at
           | Mozilla, and the first Go public release from Google was in
           | 2009, meaning it probably started at the same time.
           | 
           | Except of course all the Plan 9 garbage (like Go's hand-
           | rolled assembler) brought in to underpin Go from the 80s ;)
        
             | gpm wrote:
             | Rust is also basically just LLVM in terms of optimization,
             | which looks like it had it's initial release in 2003.
        
       | fortran77 wrote:
       | Related thread: https://news.ycombinator.com/item?id=21103027
        
       | codeflo wrote:
       | I recently did some experiments with creating small static Rust
       | binaries, custom linking, no_std et cetera. A lot of stuff around
       | that kind of thing is unstable or unfinished, which might be
       | somewhat expected. But I've also come to the conclusion that Rust
       | relies on libc _way_ too much. That might be fine on Linux, where
       | GNU's libc is well-maintained, is a bit questionable on MacOS (as
       | seen in this article) and is a a complete distribution nightmare
       | on Windows (in no small part due to a series of very questionable
       | decisions by Microsoft).
       | 
       | My understanding is that Go doesn't use the libc at all and makes
       | system calls directly, which IMO is the correct decision in a
       | modern systems programming language that doesn't want to be
       | limited by 40 years of cruft.
        
         | fluffything wrote:
         | > But I've also come to the conclusion that Rust relies on libc
         | way too much.
         | 
         | How did you come to this conclusion?
         | 
         | People using Rust rely on libc a lot.
         | 
         | For example, #![no_std] means "no standard-library", but it
         | doesn't mean "no libc, no libunwind, etc.".
         | 
         | So a lot of people like to advertise their crates as
         | "#![no_std]" compatible, because they compile with #![no_std],
         | but then the first thing the crate does is linking against
         | libc, against liballoc, against libm, .... or even the standard
         | library itself...
         | 
         | So... if you are trying to build a binary that does not depend
         | on libc or libstd, then there is no language feature to help
         | you there.
         | 
         | #[no_std] binaries are not only unstable, but also probably not
         | what you want, since that won't prevent you from linking any
         | library that links libc, or the standard library, etc.
         | 
         | If you want to avoid libc, you have to enforce that, e.g., by
         | checking in your build system that your binary doesn't contain
         | any libc, libstd, etc. symbols (I just use nm for this).
         | #![no_std] helps a bit here, but making sure you don't link any
         | library that violates this is up to you.
        
           | codeflo wrote:
           | That's true. But going the no_std route is very hard (the
           | ecosystem isn't big, and relying cargo and crates.io you're
           | almost guaranteed to link in the std or liballoc by accident
           | at some point). Even when using the std intentionally, I
           | really wouldn't have expected that basic std functions like
           | println require the libc.
        
             | geofft wrote:
             | I would not necessarily expect it but I appreciate it in a
             | "systems" language where "systems" is defined as compatible
             | with the existing traditional systems software on a
             | machine. For example, it's nice if the stdbuf(1) command
             | works on Rust binaries on glibc systems, and it's nice if a
             | Rust binary calling a C library that writes with printf (or
             | vice versa) don't maintain two separate output buffers.
             | 
             | To me, Go is the systems programming language for a world
             | untethered by existing platform compatibility (and so, for
             | instance, writing Go libraries to be called from C is
             | awkward, calling C libraries from Go incurs overhead, Go
             | has its own concurrency model, etc.) and Rust is the
             | systems programming language for use cases where you'd
             | otherwise want to use C (really "the platform's native
             | systems language," but that's C on all the major platforms)
             | but you want a better language. I appreciate that they both
             | exist and target these different use cases.
        
               | BuckRogers wrote:
               | Is Go widely used compared to the others as a systems
               | language at this point?
               | 
               | Over the years I've gathered that it's more of a
               | competitor for C# & Java rather than Rust & C(++).
        
               | geofft wrote:
               | Depends what you mean by "systems language" (many common
               | definitions turn out to be equivalent to "drop-in
               | replacement for the platform language," which makes the
               | argument circular), but the choice of Go as an
               | implementation language for Docker and Kubernetes puts it
               | pretty firmly in the "systems language" space IMO.
               | Container management requires more systems-level
               | functionality than C# and Java are really geared towards
               | (although you certainly _could_ use them, perhaps with a
               | tablespoon of FFI).
        
               | apta wrote:
               | The original version of Docker was written in Python, and
               | the original version of Kubernetes was written in Java,
               | so your argument doesn't hold.
        
               | geofft wrote:
               | Would this be the part of my argument that doesn't
               | mention Python (which I personally consider a systems
               | language) at all? Or the part of my argument where I say,
               | "you certainly _could_ use them "?
        
             | GlitchMr wrote:
             | Well, writing to standard output requires interacting with
             | an operating system, so it's not surprising that it
             | requires `std`. You cannot write to standard output without
             | knowing what operating system you are compiling for.
             | 
             | Worth noting however that `writeln` only needs `core`, and
             | as long you can provide an implementation of writing to
             | standard output (pretty much create a struct and implement
             | `core::fmt::Write` trait for it), creating `println` macro
             | is trivial.
        
               | steveklabnik wrote:
               | An example of this: https://os.phil-opp.com/vga-text-
               | mode/#a-println-macro
        
           | geofft wrote:
           | To me, a #![no_std] library is useful for contexts where
           | there isn't a platform libc (embedded systems, kernels, etc.)
           | and you can't assume things like the existence of stdin or
           | processes. On those systems, there may still be a dynamic
           | allocator, because that's a super common feature in all but
           | the most spartan environments. For those use cases, a
           | #![no_std] library that links liballoc (and documents that it
           | needs it) is totally okay, and often better than a more-
           | limited library that only does static allocation - or
           | conversely implementing all of libstd with runtime panics for
           | most of it, just because your library needs to pass a Vec
           | around. It's a balance.
           | 
           | The only case I think I've seen where a #![no_std] library
           | ends up pulling in libc is if you haven't added a custom
           | allocator and your platform's default allocator uses libc
           | (and so you could switch to a libc-free allocator if you
           | want). Are there other cases?
        
             | codeflo wrote:
             | There are lots of tiny practical annoyances you notice,
             | starting with having to vet dependencies and transitive
             | dependency upgrades extremely carefully. A major one is
             | that dev-dependencies are very broken in a no_std
             | environment. It's obviously useful to have the std in your
             | build.rs script, but unfortunately, Cargo merges
             | dependencies and devDependencies in a way that's simply
             | incorrect.
             | 
             | The good news is that every problem I found had an actively
             | discussed GitHub issue and the community is active, so
             | there will be progress.
        
               | geofft wrote:
               | > _unfortunately, Cargo merges dependencies and
               | devDependencies in a way that's simply incorrect._
               | 
               | Yeah, I've run into that, but IIRC this got solved super
               | recently/is being solved. Probably
               | https://github.com/rust-lang/cargo/pull/7820 ?
        
             | fluffything wrote:
             | > To me, a #![no_std] library is useful for contexts where
             | there isn't a platform libc (embedded systems, kernels,
             | etc.) and you can't assume things like the existence of
             | stdin or processes.
             | 
             | "Could be useful for" (FTFY).
             | 
             | Unfortunately, for the reasons mentioned above, most
             | #![no_std] libraries aren't useful for that, because they
             | link libc and other libraries.
             | 
             | Most people don't do this intentionally. The compiler
             | doesn't complain about:                   #![no_std]
             | extern "C" { fn malloc(...) -> ...; }
             | 
             | so when somebody does it accidentally, everything works and
             | they get no feedback.
             | 
             | When you then go and build a #![no_std] binary, and run it
             | in a platform when libc is not available, only then you get
             | an error. And at that point, good luck figuring out which
             | of your 200 dependencies has the error.
             | 
             | In particular, if you are running `#[test]`s, doing that
             | links standard, so if your program implicitly depends on
             | libc somewhere, tests won't reveal that, because while
             | testing, libc will be linked.
        
               | geofft wrote:
               | No, most #![no_std] libraries do not link libc in the way
               | you describe (having a direct FFI dependency on malloc) -
               | they link liballoc, which has a pluggable allocator
               | interface. On some platforms and some configurations
               | (including most normal userspace platforms), the default
               | allocator used by liballoc uses malloc.
               | 
               | It's actually hard to go out of your way and call malloc
               | directly, because FFI calls are unsafe. It's a lot easier
               | to use Box/Vec/String/etc., all of which are defined in
               | liballoc and use the pluggable allocator interface.
               | 
               | I know this because I've successfully used #![no_std]
               | libraries in places where libc doesn't exist and no
               | function named malloc exists, and they do work. If you're
               | having a linker issue it's almost certainly because you
               | haven't changed the default allocator - if you have an
               | example of this I'd be happy to take a look at debugging
               | it.
        
               | fluffything wrote:
               | > and they do work.
               | 
               | Maybe you are just using different no_std libraries that
               | I am using, but pretty much all of the no_std libraries
               | that I use have `libc` as a direct dependency.
               | 
               | Not only for malloc, many of them just needs to write
               | something to stdout or a file, generate random numbers,
               | ..., and that's hard to do without libc. Why they
               | advertise themselves as no_std escapes my comprehension.
        
               | geofft wrote:
               | Huh, the libc crate _itself_ claims #![no_std] support. I
               | agree that 's confusing and counterintuitive... I see
               | what it means in terms of semantics but I don't
               | understand _why_ that 's useful.
        
               | fluffything wrote:
               | The libc crate does correctly claim #![no_std] support,
               | because #![no_std] means "Does not require the Rust
               | standard library", and libc does not require it.
               | 
               | The two main issues I see with #![no_std] are:
               | 
               | * its a flag for "doesn't link the standard library" but
               | the standard library is often too high-level for bare
               | metal apps that want to be in precise control about
               | everything that gets linked
               | 
               | * it isn't a contract of any kind, so you can't rely on
               | it for anything. In fact, this code is correct and works
               | as intended:                   #![no_std]         extern
               | crate std;
               | 
               | This is problematic, because many #![no_std] libraries
               | end up linking libstd "by accident", even though that's
               | probably not their intent.
               | 
               | So I agree with you that #![no_std] isn't a silver
               | bullet. I think it is still useful in that it lets you
               | avoid linking the standard library by default, which is
               | necessary condition for embedded development. It is not a
               | sufficient condition, in that in practice you actually
               | want to forbid the standard library and other libraries
               | from being linked, and not just "don't link them by
               | default, but do so by accident".
        
         | fortran77 wrote:
         | This is simply false. Windows APIs are stable. It's one of the
         | best platforms for back compatibility. The kernel32/user32 API
         | has been stable for 20 years. Rust has some work to do.
        
           | codeflo wrote:
           | That's exactly my point: The problem is that Rust _doesn't_
           | use the stable kernel32 /user32 functions (VirtualAlloc
           | etc.), but the libc ones, which don't have a stable location
           | on Windows. Without looking it up: Which DLL do you have to
           | redistribute this week to get "malloc"?
        
             | fortran77 wrote:
             | I just tell Visual Studio to create an installer, I select
             | the set of target platforms, and it just works. See, for
             | example: https://docs.microsoft.com/en-
             | us/cpp/ide/walkthrough-deployi...
        
         | ekidd wrote:
         | As far as I know, the official system interface on Windows and
         | several Unix systems is via the standard library, not via
         | direct syscalls. I don't know about the MacOS. But in general,
         | you may be required to dynamically link the standard library on
         | many platforms.
         | 
         | Linux guarantees syscalls are stable. And on Linux, you have
         | the option of telling Rust to cross-compile using a statically-
         | linked musl-libc. (If you also need to statically link OpenSSL
         | or a few other common libraries, I maintain
         | https://github.com/emk/rust-musl-builder, and there's at least
         | one similar image out there.)
        
           | pansa2 wrote:
           | AFAIK on Windows, the hierarchy is:
           | 
           | C library => kernel32.dll => ntdll.dll => system calls
           | 
           | You don't have to go via the C library - calling kernel32
           | directly is fine (I believe this is what Go does). However,
           | it's very rare to call ntdll or to make system calls
           | directly.
        
             | ludamad wrote:
             | I'm guessing the performance benefits are negligible
             | anyway?
        
             | codeflo wrote:
             | Basically yes. ntdll is an implementation detail that
             | shouldn't be relied upon. kernel32/user32 and friends are
             | considered the "proper" interface to the system and have
             | been stable for decades.
        
               | vardump wrote:
               | There are some ntdll calls that are officially documented
               | and ok to use. Of course there are also a lot of calls
               | you shouldn't use.
               | 
               | When necessary, it's fine to use even undocumented ones
               | to support Windows 7 and older. It's not like those are
               | going to change anymore.
        
             | ChrisSD wrote:
             | To expand on that, on Windows it's best to use kernel32 (or
             | WindowsApp) unless you really need the cross-platform
             | convenience of the C lib. The exception being architecture
             | dependent functions like memcmp, memcpy and memset which
             | will most likely have efficient assembly implementations.
             | 
             | ntdll is lower level and technically unstable but core
             | functions have been pretty stable for a long time. They
             | could of course have breaking changes but it risks breaking
             | things like cygwin. Microsoft tends to take compatibility
             | seriously, although perhaps not as much as they used to.
             | 
             | Direct system calls are completely unstable. They can and
             | do change a lot between releases. You'd need to use a
             | lookup table for every build of Windows.
        
             | vardump wrote:
             | > it's very rare to call ntdll or to make system calls
             | directly
             | 
             | Until you need to do something that's not possible through
             | kernel32.dll. Sometimes I've called ntdll.dll directly to
             | support older Windows versions.
        
           | majewsky wrote:
           | macOS requires you to make syscalls through libSystem if you
           | want a stable interface. Go binaries used to make direct
           | syscalls until 1.10. Since this caused major breakage on most
           | new macOS releases, they have since switched to using
           | libSystem as well in 1.11:
           | 
           | > On macOS and iOS, the runtime now uses libSystem.dylib
           | instead of calling the kernel directly. This should make Go
           | binaries more compatible with future versions of macOS and
           | iOS. The syscall package still makes direct system calls;
           | fixing this is planned for a future release.
           | 
           | Source: https://golang.org/doc/go1.11
        
           | masklinn wrote:
           | > I don't know about the MacOS.
           | 
           | MacOS literally forbids statically linking to libSystem.
           | 
           | Go finally had to bow down and accept that they just could
           | not perform raw syscalls on MacOS after gettimeofday (IIRC)
           | changed ABI multiple times during the Sierra beta.
        
           | codeflo wrote:
           | That's partly true, but the stable interface on Windows is
           | not the libc, but the kernel32/user32 functions like
           | VirtualAlloc, CreateFileW etc. Those are stable since the NT
           | days. The libc functions like malloc and fopen are a
           | compatibility layer above that and unfortunately switch
           | places every few years. Currently they are delivered by a
           | combination of a Windows update and a redistributable
           | package, which makes it a nightmare to ship (on pre-Windows
           | 10 even more so).
        
             | jfkebwjsbx wrote:
             | Do you really need to ship them? I thought libc (CRT?) in
             | Windows was a given, and what used to be redistributed was
             | only the C++ ones. Is not that the case?
        
         | steveklabnik wrote:
         | The problem with this is that not every system makes their
         | system call ABI stable. You have two choices here: use the
         | interface that is stable (which is libc), or track changes and
         | fix things up when they break.
        
           | codeflo wrote:
           | The only stable interface on Windows are the documented
           | functions from kernel32.dll, user32.dll etc. Libc is a
           | compatibility layer above that, that Microsoft invents a new
           | incompatible distribution mechanism for every 3-5 years. It's
           | pure DLL hell unfortunately.
           | 
           | Edit: Not even the DLL name of Microsoft's libc is stable
           | (msvcrt140.dll etc.), leading to all kinds of wild goose
           | chases when trying to run old binaries.
        
             | steveklabnik wrote:
             | Yes, I was being imprecise wrt Windows, thanks :)
        
               | codeflo wrote:
               | Yeah, I saw your username too late or I would have known
               | that you know that. ;) But it's a common misunderstanding
               | among many Unix programmers, so I feel it was good to
               | clarify.
        
               | steveklabnik wrote:
               | Nah I think it's helpful! Comments are read by a lot more
               | folks than just the person you're replying to. I knew it
               | wasn't exactly libc but I didn't know the full hierarchy
               | involved.
        
         | weff_ wrote:
         | I think Go tries to reduce its dependence on libc but, by
         | default, it will still link to it.
         | 
         | For instance, this code:                 package main
         | import "net"       func main(){         net.Dial("tcp",
         | "golang.org:80")       }
         | 
         | When compiled with _go build main.go_ does link:
         | linux-vdso.so.1 (0x00007ffe3d7f0000)        libpthread.so.0 =>
         | /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007fc7ac05a000)
         | libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6
         | (0x00007fc7abc69000)       /lib64/ld-linux-x86-64.so.2
         | (0x00007fc7ac279000)
         | 
         | There are of course compiler options to truly statically
         | compile.
        
         | rhinoceraptor wrote:
         | Go's pathological NIH syndrome does come with downsides. For
         | example, there was an infamous memory corruption bug in the way
         | they called into vDSO.
        
           | adev_ wrote:
           | > Go's pathological NIH syndrome does come with downsides.
           | 
           | Yes. And you can add the inability to use the glibc's nss
           | modules under Linux.
           | 
           | Making it unable to use sssd properly and authenticate a
           | posix user on a machine with LDAP authentication.
           | 
           | Getting completely independent from OS sys lib has
           | consequences
        
       | maoeurk wrote:
       | Assuming this was run on a 64bit system, the Rust version seems
       | to be allocating and zeroing twice as much memory as the Go
       | version.
       | 
       | edit: this has been pointed out as incorrect, Go ints are 8 bytes
       | on 64bit systems -- thanks for the correction!
       | let mut cache: Vec<usize> =
       | (0..=target.chars().count()).collect();
       | 
       | which can be simplified as                 let mut cache:
       | Vec<usize> = vec![0; target.len()];
       | 
       | vs                 cache := make([]int, len(target)+1)       for
       | i := 0; i < len(target)+1; i++ {         cache[i] = i       }
       | 
       | Rust usize being 8 bytes and Go int being 4 bytes as I understand
       | it.
       | 
       | So between doing more work and worse cache usage, it wouldn't be
       | surprising if the Rust version was slower even with the faster
       | allocator.
        
         | rossmohax wrote:
         | Go int is 8 bytes
        
           | eis wrote:
           | It can be either depending on the system.
           | 
           | https://golang.org/ref/spec#Numeric_types
        
         | [deleted]
        
       | devit wrote:
       | The main problem is that allocating a vector for each evaluation
       | is completely wrong: instead, it needs to be allocated once by
       | making the function a method on a struct containing a Vec; which
       | makes the allocator moot.
       | 
       | The second problem is that at least the Rust code is decoding
       | UTF-8 every iteration of the inner loop instead of decoding once
       | and saving the result, or even better interning the characters
       | and having versions of the inner loop for 32-bit chars and 8-bit
       | and 16-bit interned indexes.
       | 
       | Furthermore the code rereads cache[j] instead of storing the
       | previous value, and doesn't do anything to make sure that bound
       | checks are elided in the inner loop (although perhaps the
       | compiler can optimize that).
       | 
       | The code for computing the min seems to have been written
       | mindlessly rather than putting serious thought towards whether to
       | have branches or not and in what order (depending on an analysis
       | of what the branch directions rates would be).
       | 
       | Implausible benchmark results are almost always an indicator of
       | the incompetence of the person performing the benchmark.
        
         | burntsushi wrote:
         | > The second problem is that at least the Rust code is decoding
         | UTF-8 every iteration of the inner loop instead of decoding
         | once and saving the result
         | 
         | Indeed. This is a pretty damning difference. The `target`
         | string is being repeatedly UTF-8 decoded where as the same is
         | not true in the Go version. The Go version even _goes out of
         | its way_ to do UTF-8 decoding exactly once for each of `source`
         | and `target`, but then doesn 't do the same for the Rust
         | program.
         | 
         | > Implausible benchmark results are almost always an indicator
         | of the incompetence of the person performing the benchmark.
         | 
         | Come on. We can do better than this. Please don't make it
         | personal. We all have to learn things at some point.
        
           | masklinn wrote:
           | > Indeed. This is a pretty damning difference. The `target`
           | string is being repeatedly UTF-8 decoded where as the same is
           | not true in the Go version. The Go version even goes out of
           | its way to do UTF-8 decoding exactly once for each of
           | `source` and `target`, but then doesn't do the same for the
           | Rust program.
           | 
           | I'm really not sure that's an issue, utf8 decoding is very,
           | very cheap and it's iterating either way.
           | 
           | It would have to be benched, but I wouldn't be surprised if
           | allocating the caches (at least one allocation per line of
           | input) had way more overhead, especially given the inputs are
           | so very short.
           | 
           | I'm not going to claim Rust's utf8 decoder is the fastest
           | around, but it's _very_ fast.
        
             | burntsushi wrote:
             | It's cheap, but not _that_ cheap. It shouldn't be as cheap
             | as just iterating over a sequence of 32-bit integers.
             | 
             | But yes, I did benchmark this, even after reusing
             | allocations, and I can't tell a difference. The benchmark
             | is fairly noisy.
             | 
             | I agree with your conclusion, especially after looking at
             | the input[1]. The strings are so small that the overhead of
             | caching the UTF-8 decoding is probably comparable to the
             | cost of doing UTF-8 decoding.
             | 
             | [1] - https://github.com/christianscott/levenshtein-
             | distance-bench...
        
               | matklad wrote:
               | > It shouldn't be as cheap as just iterating over a
               | sequence of 32-bit integers.
               | 
               | I wonder if there are any benchmarks about this?
               | Specifically, it feels like in theory iterating utf8
               | _could_ actually be faster if the data is mostly ascii,
               | as that would require less memory bandwidth, and it seems
               | like the computation is simple enough for memory to be
               | the bottleneck (this is a wild guess, I have horrible
               | intuition about speed of various hardware things). In
               | _this_ particular benchmark this reasoning doesn't apply,
               | as strings are short and should just fit in cache.
        
               | matklad wrote:
               | Did a quick benchmark: https://gist.github.com/matklad/eb
               | c1acd2fab884f3ff9d96d4175f....
               | 
               | Seems like hypothesis is not wrong, but also is not
               | interesting: the difference is pretty small, and it can
               | be easily dwarfed by the big wins for utf32 if something
               | like auto-vectorization or bounds-check elision kicks in.
               | Also I am not entirely sure that the diff I observe is
               | not due to something completely irrelevant, like one loop
               | ending up on a lucky cacheline or what not.
        
               | burntsushi wrote:
               | If all you need to do is validate UTF-8, then yes, mostly
               | ASCII enables some nice fast paths[1].
               | 
               | I'm not a UTF-8 decoding specialist, but if you need to
               | traverse rune-by-rune via an API as general as
               | `str::chars`, then you need to do some kind of work to
               | convert your bytes into runes. Usually this involves some
               | kind of branching.
               | 
               | But no, I haven't benchmarked it. Just intuition. A
               | better researched response to your comment would
               | benchmark, and would probably at least do some research
               | on whether Daniel Lemire's work[2] would be applicable.
               | (Or, in general, whether SIMD could be used to batch the
               | UTF-8 decoding process.)
               | 
               | [1] - https://github.com/BurntSushi/bstr/blob/91edb3fb3e1
               | ef347b30e...
               | 
               | [2] - https://lemire.me/blog/2018/05/16/validating-
               | utf-8-strings-u...
        
         | molf wrote:
         | > Implausible benchmark results are almost always an indicator
         | of the incompetence of the person performing the benchmark.
         | 
         | An ad hominem attack surely isn't needed.
        
           | masklinn wrote:
           | And that's really too bad because I think the rest of the
           | comment is spot-on as the fundamental issue is that Rust
           | simply delegates to the system allocator, and MacOS's
           | allocator is pretty slow.
           | 
           | As a result, while Rust allows very explicitly and relatively
           | easily removing allocations (compared to C or C++), getting
           | the most performances out of your program also _requires_
           | doing so, unless you use a non-system allocator with better
           | support for the workload.
        
             | heavenlyblue wrote:
             | But you can use a custom allocator in Rust:
             | https://doc.rust-lang.org/1.9.0/book/custom-allocators.html
             | 
             | You don't have that option in Go
        
               | masklinn wrote:
               | > But you can use a custom allocator in Rust:
               | https://doc.rust-lang.org/1.9.0/book/custom-
               | allocators.html
               | 
               | That is true -- and was demonstrated by the article as
               | its "fix" was to use jemalloc, but even a custom
               | allocator will usually be less performant than a high-
               | performance GC there, because the GC's allocator has more
               | insight into the requirements and workloads.
               | 
               | It might be possible to give that insight to a custom
               | allocator by using the allocator's custom APIs, but this
               | requires a deeper integration between the program and the
               | allocator.
        
               | yencabulator wrote:
               | > You don't have that option in Go
               | 
               | Sure you do. You can even build one from nothing but mmap
               | in pure Go. It just won't be part of the garbage
               | collection, so you get malloc/free or arenas etc, just
               | like in C/Rust/whatever.
        
               | steveklabnik wrote:
               | (those are very old docs, you want to link to
               | https://doc.rust-
               | lang.org/stable/std/alloc/trait.GlobalAlloc...)
        
         | chrfrasco wrote:
         | > The second problem is that at least the Rust code is decoding
         | UTF-8 every iteration of the inner loop instead of decoding
         | once and saving the result, or even better interning the
         | characters and having versions of the inner loop for 32-bit
         | chars and 8-bit and 16-bit interned indexes.
         | 
         | I tried this. Pulling the .chars() call out of the loop &
         | collecting into a Vec made the performance even worse - the
         | following balloons the runtime from ~2.7s to ~5s:
         | let target_chars: Vec<char> = target.chars().collect();
         | for (i, source_char) in source.chars().enumerate() {
         | let mut next_dist = i + 1;                  for (j,
         | target_char) in target_chars.iter().enumerate()  {
         | 
         | > written mindlessly > incompetence of the person
         | 
         | No challenge there :P I am operating under the assumption that
         | I don't need to understand how compilers work to get good
         | performance from rust (where good is "similar enough to an
         | equivalent go program")
        
           | burntsushi wrote:
           | Indeed, it looks like doing the UTF-8 decoding up-front is
           | exacerbating the performance difference in the allocator.
           | 
           | I think this is where the GP's first suggestion comes into
           | play. If one were writing this code _and_ cared about
           | performance, then you'd usually find a way to reuse
           | allocations. I submitted a PR to demonstrate this:
           | https://github.com/christianscott/levenshtein-distance-
           | bench...
        
             | chrfrasco wrote:
             | Yeah, I can definitely see how that would be a more
             | performant approach.
             | 
             | I suppose I wasn't so interested in figuring out how to
             | make this algorithm as fast as possible as much I was
             | interested in diving into why _this particular_
             | implementation was slower.
             | 
             | I'm not totally convinced that this difference is down to
             | the string being parsed over and over, though
             | 
             | > doing the UTF-8 decoding up-front is exacerbating the
             | performance difference in the allocator
             | 
             | This seems to suggest that allocation might be dominating
             | here. WDYT? Either way, I've added a disclaimer to the
             | post.
        
       | alvarelle wrote:
       | Could also try to use the smallvec crate in this case, which put
       | small allocation on the stack https://docs.rs/smallvec/
        
       | anderskaseorg wrote:
       | I've found that Microsoft's mimalloc (available for Rust via
       | https://crates.io/crates/mimalloc) typically provides even better
       | performance than jemalloc. Of course, allocator performance can
       | vary a lot depending on workload, which is why it's good to have
       | options.
        
       | mqus wrote:
       | Is it intentional that the benchmarks include not only running
       | the program itself but also compiling it? e.g. in the linked
       | source code, the bench.sh includes the compilation step which is
       | known to be slow in rust:                   #!/usr/bin/env bash
       | set -e         run() {           cargo build --release 2>
       | /dev/null           ./target/release/rust         }
       | run;
       | 
       | Sure, if you run it many times in succession the compiler won't
       | do much but the benchmarking script (run.js) doesn't really
       | indicate that and the blog post also doesn't mention that.
       | 
       | EDIT: I was just being stupid, don't mind me. The times were
       | taken within each language and not externally.
        
         | chrfrasco wrote:
         | run.js is not doing the benchmarking. If you look at the source
         | for each of the programs being benchmarked, you'll see that the
         | programs themselves are responsible for benchmarking
        
       | matklad wrote:
       | Note that Rust and Go programs differ in a seemingly
       | insignificant, but actually important detail.
       | 
       | Rust does this                   next_dist = std::cmp::min(
       | dist_if_substitute,             std::cmp::min(dist_if_insert,
       | dist_if_delete),         );
       | 
       | Go does this                   nextDist = min(
       | distIfDelete,               min(distIfInsert, distIfSubstitute)
       | )
       | 
       | The order of minimums is important for this dynamic programming
       | loop. If I change Rust version to take minimums in the same order
       | (swapping substitute and delete), runtime drops from 1.878696288
       | to 1.579639363.
       | 
       | I haven't investigated this, but I would guess that this is the
       | same effect I've observed in
       | 
       | * https://matklad.github.io/2017/03/12/min-of-three.html
       | 
       | * https://matklad.github.io/2017/03/18/min-of-three-part-2.htm...
       | 
       | (reposting my comment from reddit, as it's a rather unexpected
       | observation)
        
         | ishanjain28 wrote:
         | I think they made the rust version same as Go because I cloned
         | it just now and they are both the same. Also, Thank you soo
         | much for the blog posts! :)
        
         | ttd wrote:
         | Those are two really well written articles -- taking a
         | complicated topic and making it very accessible. Thanks! FWIW I
         | think a companion article on how to effectively use perf for
         | tasks like this would be a great addition, since it can be a
         | bit novice-unfriendly.
        
         | drfuchs wrote:
         | Very few things have ever been measured accurately to ten
         | significant digits. How much did these numbers change run to
         | run? How many measurements did you take? Were the caches warmed
         | up similarly? Still, point taken.
        
           | matklad wrote:
           | The excessive "precision" is because I've just copy-pasted
           | what the original bench printed.
           | 
           | As for "is this a reliable result", I believe I've performed
           | diligence, appropriate for a HN comment, to make sure that
           | this is not a completely random result. As I've said, I did
           | not investigate this particular bit of code thoroughly. You
           | are welcome to read the linked blog posts, which study the
           | issue in depth.
        
         | londons_explore wrote:
         | Since min(a, min(b,c)) == min(b, min(a,c)), perhaps the
         | compiler should be smart enough to swap the comparisons around
         | if it makes it quicker?
        
           | dan-robertson wrote:
           | I suspect that statement is not true for floats. Possibly you
           | don't get the same float from min(0,-0) as min(-0,0), and
           | similarly with NaNs. Rust specifies that if one input is NaN
           | then the other is returned but doesn't say what happens if
           | both are NaN.
        
             | gpm wrote:
             | Hmm, not sure what the llvm semantics look like, but you're
             | right about the assembly semantics, vminsd (the instruction
             | used in the post) is unfortunately not symettric in it's
             | handling on NaN.
             | 
             | https://www.felixcloutier.com/x86/minsd
        
             | fluffything wrote:
             | > Rust specifies that if one input is NaN then the other is
             | returned but doesn't say what happens if both are NaN.
             | 
             | It does. If both are NaN a NaN is returned. Note, however,
             | that when Rust says that a NaN is returned, this means that
             | any NaN can be returned. So if you have min(NaN0, NaN0) the
             | result isn't necessarily NaN0, it might be another NaN with
             | a different payload.
        
               | dan-robertson wrote:
               | Right. That's what I was getting at but I didn't know the
               | NaN-return rule.
        
             | londons_explore wrote:
             | NaN's are rarely required on the fast path of any code...
             | The compiler ought to make a 2nd slow implementation for if
             | any NaNs are seen.
        
               | fluffything wrote:
               | Sure, but then it needs to add a branch to detect the
               | inputs, and maybe keep around some state to switch
               | between implementations at run-time.
               | 
               | Its far from clear that doing that is, in general, worth
               | it.
        
           | [deleted]
        
           | arcticbull wrote:
           | Part of the problem may be they re-implemented
           | `std::cmp::min` at the bottom of the file, I wonder if
           | there's a more optimized version in the stdlib.
        
             | gpm wrote:
             | One of the most frustrating parts of rust (for me), is that
             | `std::cmp::min` (and some other methods) require that their
             | arguments are `Ord` (totally ordered), and floats are only
             | partially order because of NaN, so you can't use
             | std::cmp::min on floats.
        
               | shepmaster wrote:
               | That's what https://doc.rust-
               | lang.org/std/primitive.f32.html#method.min is for
        
               | Arnavion wrote:
               | You can use the ordered_float crate. It has a newtype
               | that cannot hold NaNs and thus impls Ord.
        
               | jfkebwjsbx wrote:
               | Yeah, it makes no sense to be the default. Code that
               | expects to treat NaNs is very rare.
        
               | arcticbull wrote:
               | Safety is the whole idea behind Rust, and if you draw the
               | line here, that's neither in line with the Rust ethos,
               | nor particularly valuable. After all, code that "expects
               | to handle" null was pretty rare too ;)
        
               | steveklabnik wrote:
               | Depending on what behavior you want, there's a bunch of
               | wrapper crates.
        
               | arcticbull wrote:
               | I do believe that's quite intentional, due to the
               | inexactness of floating-point values, it's not "right" to
               | do that. There should be a `partial_min` function though,
               | IMO, which has weaker guarantees.
        
               | gpm wrote:
               | It's absolutely intentional, and it's got being
               | technically correct on its side, it's just frustrating.
        
         | _____smurf_____ wrote:
         | Given this information, and for general parsing
         | functionalities, which one is faster, Go or Rust?
        
           | fluffything wrote:
           | You can write a parser as far as the limits of your
           | imagination allow in Rust, or C, or any other baremetal
           | language without a thick run-time.
           | 
           | In Go, beyond the limits of your imagination, you'll also hit
           | other limits, like those of the garbage collector.
        
           | arcticbull wrote:
           | As always, it depends on what your goals are. String
           | processing is usually not the long pole when you're building
           | something that consumes the output of a parser. Based on
           | micro and macro benchmarks, Rust is typically substantially
           | faster than Go and pretty much always uses less RAM [1].
           | 
           | But again, depends what you're doing with the output, and if
           | these deltas even matter in your context.
           | 
           | [1] https://benchmarksgame-
           | team.pages.debian.net/benchmarksgame/...
        
       | hu3 wrote:
       | I'm surprised that a naive implementation in Go can outperform a
       | naive implementation in Rust.
        
         | empath75 wrote:
         | I'm not. Hell, when I first started learning rust i frequently
         | wrote code that ran slower than _python_.
        
       | novocaine wrote:
       | It may be that the system allocator is making an excessive number
       | of syscalls to do work, whereas most custom allocators will
       | allocate in slabs to avoid this. You could try using dtruss or
       | strace to compare the differences.
        
       | savaki wrote:
       | This discussion seems to me like a microcosm of the differences
       | in philosophies between Rust and Go.
       | 
       | With Rust, you have much more control, but you also need a deep
       | understanding of the language to get the most out of it. With Go,
       | the way you think it should work is usually is Good Enough(tm).
        
         | jeffdavis wrote:
         | I wouldn't put it that way. Both languages are fast at nice
         | straight-line code.
         | 
         | The main area I'd expect to see performance benefits for rust
         | (though I don't have experience here) is larger rust programs.
         | Rust's zero-cost abstractions have more benefits as the
         | abstractions nest more deeply. For a small program, you don't
         | really have a lot of abstractions, so Go will do just fine.
         | 
         | I think Go has a number of nice performance tricks up it's
         | sleeve, though, so I wouldn't rule out Go on performance
         | grounds too quickly.
        
       | chrfrasco wrote:
       | Hey all, as some keen-eyed commenters have pointed out, it looks
       | like the rust program is not actually equivalent to the go
       | program. The go program parses the string once, while the rust
       | program parses it repeatedly inside every loop. It's quite late
       | in Sydney as I write this so I'm not up for a fix right now, but
       | this post is probably Fake News. The perf gains from jemalloc are
       | real, but it's probably not the allocators fault. I've updated
       | the post with this message as well.
       | 
       | The one-two combo of 1) better performance on linux & 2) jemalloc
       | seeming to fix the issue lured me into believing that the
       | allocator was to blame. I'm not sure what the lesson here is -
       | perhaps more proof of Cunningham's law?
       | https://en.wikipedia.org/wiki/Ward_Cunningham#Cunningham's_L...
        
         | arcticbull wrote:
         | Thanks for following up. Just as an FYI, there's a few bugs in
         | your implementation, the most obvious one is the use of
         | ".len()" in a number of places interspersed with
         | ".chars().count()". These two return different values. ".len()"
         | returns then number of UTF-8 bytes in the input string, which
         | for ASCII is the same as ".chars().count()" obviously, but if
         | you do attempt any Unicode characters, your function won't
         | work. ".chars()" provides Unicode Scalar Values (USVs) -- which
         | is a subset of code points, excluding surrogate pairs [1]. Note
         | also this is _not_ the same as a Go rune, which is a code point
         | _including_ surrogate pairs.
         | 
         | Secondly, you re-implemented "std::cmp::min" at the bottom of
         | the file, and I'm not sure if the stdlib version is more
         | optimized.
         | 
         | Lastly, well, you caught the issue with repeated passes over
         | the string.
         | 
         | I've fixed the issues if you're curious:
         | https://gist.github.com/martinmroz/2ff91041416eeff1b81f624ea...
         | 
         | Unrelated, I hate the term "fake news" as it's an intentional
         | attempt to destroy the world public's faith in news media. It's
         | a cancer on civilized society. Somewhere your civics teacher is
         | crying into some whiskey, even though of course you're joking.
         | 
         | [1] http://www.unicode.org/glossary/#unicode_scalar_value
        
           | arcticbull wrote:
           | This doesn't even begin to get into the question of what
           | Levenshtein Distance even means in a Unicode context. What's
           | the Levenshtein Distance of 3 emoji flags? I suppose we
           | should be segmenting by grapheme clusters and utilizing a
           | consistent normalization form when comparing, but Rust has no
           | native support for processing grapheme clusters -- or for
           | normalizations I believe. The UnicodeSegementation crate
           | might help.
           | 
           | Based on some cursory research, the go version differs in a
           | more subtle way too. A Rune is a Code Point, which is a
           | superset of the Rust "char" type; it includes surrogate
           | pairs.
        
             | derefr wrote:
             | Levenstein (edit) distance is fundamentally an information-
             | theoretical concept defined on bitstreams, as
             | insertions/deletions/swaps _of individual bits_ within a
             | stream. It has a lot in common with error-correcting codes,
             | fountain codes, and compression, which all also operate on
             | bitstreams.
             | 
             | Any higher-level abstract mention of Levenstein distances
             | (e.g. of Unicode codepoints) is properly supposed to be
             | taken to refer to the Levenstein distance of a conventional
             | (or explicitly specified) binary encoding of the two
             | strings.
        
               | grantwu wrote:
               | Can you point to a source that defines Levenstein
               | distance as only referring to bitstreams?
               | 
               | A translation of the original article [1] that introduced
               | the concept notes in a footnote that "the definitions
               | given below are also meaningful if the code is taken to
               | mean an arbitrary set of words (possibly of different
               | lengths) in some alphabet containing r letters (r >= 2)".
               | 
               | And if you wish to strictly stick to how it was
               | originally defined, you'd need to only use strings of the
               | same length.
               | 
               | More recent sources [2] say instead "over some alphabet",
               | and even in the first footnote, describe results for
               | "arbitrarily large alphabets"!
               | 
               | [1]
               | https://nymity.ch/sybilhunting/pdf/Levenshtein1966a.pdf
               | 
               | [2] https://arxiv.org/pdf/1005.4033.pdf
        
               | arcticbull wrote:
               | And Unicode is the biggest alphabet haha.
        
               | klodolph wrote:
               | > Any higher-level abstract mention of Levenstein
               | distances (e.g. of Unicode codepoints) is properly
               | supposed to be taken to refer to the Levenstein distance
               | of a conventional (or explicitly specified) binary
               | encoding of the two strings.
               | 
               | This doesn't match any definition of Levenshtein distance
               | that I've ever encountered. I've always seen it defined
               | in terms of strings over some alphabet, and the binary
               | case is just what happens when your alphabet only has two
               | symbols in it.
               | 
               | Quite naturally the problem with Unicode strings is that
               | there is are multiple ways to treat them as sequences.
               | One obvious way is to treat them as a sequence of Unicode
               | scalar values, but that's by no means what you'd want--
               | maybe a sequence of grapheme clusters may be more
               | appropriate, and you also may wish to consider
               | normalization.
        
               | arcticbull wrote:
               | The problem is there's not a "conventional" binary
               | encoding of Unicode strings. You can create the same
               | exact character many different ways, from a single scalar
               | value to a composition of multiple scalar values. There's
               | also multiple different ways of ordering different pieces
               | of a composite character that yield the same value. Would
               | we not want to utilize a consistent decomposition and
               | consistent logical segmentation for the string? It's no
               | longer enough to iterate over a string one byte at a time
               | and derive any meaning. Is it right that the Levenstein
               | distance between 2 equal characters, "a" and "a", might
               | be 12, simply because the joiners were ordered
               | differently?
               | 
               | It seems like segmentation by grapheme cluster and
               | comparison using a consistent normalization would provide
               | the same logical answer as a classic byte-wise
               | Levenshtein distance on an ASCII string. [1]
               | 
               | Or are you suggesting that's too high level and we should
               | just consider this to be operating on bit strings that
               | happen to be grouped into bytes, and not worry about the
               | logical implications. Therefore we'd just use a
               | consistent normalization form on both input strings, and
               | it's okay that the distance is up to like 10-15 for a
               | single character difference in a composite character and
               | 1 in an ASCII character. That sounds totally reasonable
               | too, just different.
               | 
               | [1] http://unicode.org/reports/tr15/
        
         | masklinn wrote:
         | > Hey all, as some keen-eyed commenters have pointed out, it
         | looks like the rust program is not actually equivalent to the
         | go program. The go program parses the string once, while the
         | rust program parses it repeatedly inside every loop. [...] The
         | perf gains from jemalloc are real, but it's probably not the
         | allocators fault. I've updated the post with this message as
         | well.
         | 
         | I don't know that it would be a gain: Rust is pretty good at
         | decoding UTF8 quickly given how absolutely fundamental that
         | operation is, and "caching" the decoded data would increase
         | pressure on the allocator.
         | 
         | Unless you also changed the interface of the levenshtein to
         | hand it preallocated cache, source and destination buffers (or
         | the caller did that).
         | 
         | edit: to the downvoter, burntsushi did the legwork of actually
         | looking at this[0] and found caching the decoding to have no
         | effect at best, unless the buffers get lifted out of the
         | function entirely, which matches my comment's expectations.
         | 
         | [0] https://news.ycombinator.com/item?id=23059753
         | 
         | > But yes, I did benchmark this, even after reusing
         | allocations, and I can't tell a difference. The benchmark is
         | fairly noisy.
        
           | [deleted]
        
         | otterley wrote:
         | > this post is probably Fake News
         | 
         | It's not Fake News. Fake News is the publication of
         | intentionally false stories. This is just erroneous.
         | 
         | There's a yawning chasm between the two.
        
           | arcticbull wrote:
           | "fake news" is an attempt at permanently damaging the
           | American public's faith in the fifth estate. Nobody should
           | ever use that term in the current political climate, ever.
        
             | FpUser wrote:
             | "damaging the _American public 's_ faith" - Really, just
             | the American? You ever realize there are other human
             | species on this planet?
        
               | arcticbull wrote:
               | That's a fair criticism if clumsily put. For the record,
               | I'm Canadian, British and Polish. I called out America as
               | by far the biggest perpetrator at the moment.
               | 
               | So to address the second half of your post, I have indeed
               | heard of "other countries" (just ask USCIS) although
               | being a US resident at the moment I want to speak to what
               | I know, not for anyone else.
        
             | [deleted]
        
           | will4274 wrote:
           | Says who?
           | 
           | When news organizations take other news organizations word
           | for it and the story is false, that's fake news. We called it
           | something different back then, but fake news led to the
           | invasion of Iraq. Negligence is sufficient for fake news,
           | malice not required.
        
             | otterley wrote:
             | Fake News was a term invented around 2015 during the run-up
             | to the 2016 election. It specifically referred to the
             | phenomenon of literally fake news stories, such as rallies
             | and completely false stories about politicians, being
             | published by legitimate-sounding but nonexistent news
             | organizations, using platforms like Facebook to disseminate
             | themselves. Usually these were done from China and Russia.
        
               | empath75 wrote:
               | And then trump redefined it to be about legitimate news
               | sources exclusively. We've really been in upside down
               | world for four years.
        
           | Ar-Curunir wrote:
           | It's a joke mate
        
             | arcticbull wrote:
             | Of course, but it chips away at the Overton window,
             | normalizing it.
        
       | tromp wrote:
       | Missed chance to shorten title to "Making Rust Go Fast" :-)
        
         | katktv wrote:
         | Making Rust As Fast As It Can Go
        
           | codegladiator wrote:
           | Making the code rust as fast as it can go
        
           | kreetx wrote:
           | Go Rust!
        
             | arcticbull wrote:
             | GO!!!!
        
       | savaki wrote:
       | A few folks have commented that there were logic errors in the Go
       | version. Specifically that                 len("foo") = 5
       | 
       | should instead have returned                 len("foo") = 3
       | 
       | I submitted a pull request,
       | https://github.com/christianscott/levenshtein-distance-bench...,
       | that fixes these issues in the Go implementation.
       | 
       | Interestingly enough, when I re-ran the benchmark, the Go version
       | is roughly 19% faster than it was previously:
       | old: 1.747889s       new: 1.409262s (-19.3%)
        
       | molf wrote:
       | The Rust version uses `target.chars().count()` to initialise the
       | cache, while the Go version counts up to `len(target)`. These are
       | not equivalent: the Rust version counts Unicode code points, the
       | Go version counts bytes.
       | 
       | I am confused by the implementations, although I have not spent
       | any time testing them. Both versions contain a mix of code that
       | counts bytes (`.len()` and `len(...)`) and Unicode code points
       | (`chars()` and `[]rune(...)`). My guess is that the
       | implementation might not work correctly for certain non-ASCII
       | strings, but I have not verified this.
       | 
       | Of course, if only ASCII strings are valid as input for this
       | implementation then both versions will be a lot faster if they
       | exclusively operate on bytes instead.
        
         | steveklabnik wrote:
         | Yes, especially because that changes it from constant time
         | (strings know their length in bytes) to linear time (counting
         | chars means a loop through the text.)
         | 
         | I was a bit suspicious of the conclusion, but didn't dig in
         | myself. I imagine this would be a much larger source of
         | difference.
        
         | eis wrote:
         | Yep.
         | 
         | Here a Go playground example showing that the result is indeed
         | wrong:
         | 
         | https://play.golang.org/p/vmctMFUevPc
         | 
         | It should output 3 but outputs 5 because each o is two bytes,
         | len("foo") = 5.
         | 
         | I would suggest using "range" to iterate over the unicode
         | characters.
        
           | savaki wrote:
           | Looks like a small bug in the go code, corrected here.
           | Original author should have used rune throughout.
           | https://play.golang.org/p/mGZZMFtMgHH
        
           | earthboundkid wrote:
           | If you diff foo and f though, it correctly gives an edit
           | distance of 2.
           | 
           | The code is weird because someone knew enough to convert the
           | strings to slices of runes but not enough to use the rune
           | slices consistently. :-/
        
             | arcticbull wrote:
             | Not to mention Rune slices are insufficient for things like
             | Flag emoji and Family emoji, which is going to be a ton of
             | separate runes put together. The latter of which,
             | apparently deletes one family member at a time when you hit
             | "backspace".
        
         | chrfrasco wrote:
         | Nice catch, thanks for pointing this out. I've updated the
         | cache initialization to use `len(targetChars)` rather than
         | `len(target)`:                   cache := make([]int,
         | len(targetChars)+1)         for i := 0; i < len(targetChars)+1;
         | i++ {             cache[i] = i         }
         | 
         | AFAIK this makes them equivalent (fingers crossed). It seems to
         | not have made much of a difference (-0.03s)
        
           | Thaxll wrote:
           | https://blog.golang.org/strings
        
           | molf wrote:
           | They might be equivalent (perhaps apart from other issues
           | pointed out in other comments), but both implementations are
           | still either
           | 
           | 1) incorrect if UTF-8-strings are supposed to be valid input,
           | or
           | 
           | 2) very inefficient if only ASCII-strings are supposed to be
           | valid input.
        
         | [deleted]
        
         | maxmcd wrote:
         | huh, yeah if I switch the rust version to target.len() the
         | execution time drops by more than 10%
         | 
         | edit: and if I switch to source.bytes().enumerate() it drops by
         | 20% more
        
         | arcticbull wrote:
         | Go's version and the Rust version differ in yet more subtle
         | ways. It appears that Go's "rune" type is a Code Point, but
         | Rusts's "char" type is a Unicode Scalar Value, a subset of Code
         | Point that excludes surrogate pairs. Both versions will not
         | work with complex Unicode input unless you perform both
         | segmentation by Grapheme Cluster [1] and utilize a consistent
         | Normalization [2] when comparing clusters.
         | 
         | Unicode is hard, fams, and it's rare that anything that looks
         | easy is actually what you want.
         | 
         | [1] https://unicode.org/reports/tr29/
         | 
         | [2] http://unicode.org/reports/tr15/
        
           | fluffything wrote:
           | > Both versions will not work with complex Unicode input
           | unless you perform both segmentation by Grapheme Cluster [1]
           | and utilize a consistent Normalization [2] when comparing
           | clusters
           | 
           | Doing this is quite easy from Rust.
        
             | arcticbull wrote:
             | There's no standard library API in Rust for either grapheme
             | cluster segmentation or for normalization. You'd need a
             | third-party crate, at which point it's less "easy in Rust"
             | and more "someone did the hard work for you" because, its
             | really not easy anywhere haha.
        
               | fluffything wrote:
               | No, and there probably never be a libstd implementation
               | of it, but there is a single crate that everybody uses:
               | unicode-segmentation [0]
               | 
               | > You'd need a third-party crate, at which point it's
               | less "easy in Rust" and more "someone did the hard work
               | for you"
               | 
               | "Someone did the work for you" is true of all code that
               | you did not write yourself, independently of whether that
               | code is easy or hard to write, or whether it is in the
               | standard library or not.
               | 
               | unicode-segmentation is pretty much the only library of
               | its kind in Rust, is super easy to discover (google "Rust
               | grapheme cluster", "Rust unicode segmentation", etc.),
               | and using it is as easy as just typing "cargo add
               | unicode-segmentation".
               | 
               | The library is maintained by a Rust core team member, a
               | Rust standard library team member, is used by servo and
               | firefox, and is the only unicode segmentation library
               | that people use.
               | 
               | Since many programs don't need to do any kind of unicode
               | segmentation, making it part of the standard library
               | sounds like a bad idea. In particular, given that unicode
               | is a moving standard, it would mean that people stuck on
               | old Rust toolchains (e.g. LTS linux distros) cannot
               | create binaries that do proper unicode segmentation,
               | which does not make sense.
               | 
               | The underlying problem is that many programmers do not
               | know that they need to do unicode segmentation in the
               | first place. Moving this into the standard library does
               | not fix that problem either.
               | 
               | [0]: https://crates.io/crates/unicode-segmentation
        
               | jfkebwjsbx wrote:
               | > Since many programs don't need to do any kind of
               | unicode segmentation, making it part of the standard
               | library sounds like a bad idea. In particular, given that
               | unicode is a moving standard, it would mean that people
               | stuck on old Rust toolchains (e.g. LTS linux distros)
               | cannot create binaries that do proper unicode
               | segmentation, which does not make sense.
               | 
               | That has nothing to do with it. You could still have a
               | library that has the very latest Unicode standard support
               | for those that need the very latest, and keep updating
               | the stdlib one.
               | 
               | It does not make sense either to expect someone to use
               | bleeding edge libraries from cargo yet use an old rustc
               | compiler. They can easily update it if needed.
        
               | fluffything wrote:
               | > It does not make sense either to expect someone to use
               | bleeding edge libraries from cargo yet use an old rustc
               | compiler.
               | 
               | Of course it does. Many software users are stuck on
               | multiple-year-old toolchains for various reasons, yet
               | these systems still need to be able to handle unicode
               | properly.
               | 
               | > They can easily update it if needed.
               | 
               | No, they cannot. Many users are stuck in older windows
               | versions, linux versions, LTS linux versions, etc.
               | because of their organization or their clients
               | requirements.
               | 
               | Telling a client that you can't develop an app for them
               | because their 2 year old Ubuntu is too old is often not a
               | very successful business model.
               | 
               | > and keep updating the stdlib one.
               | 
               | These updates would only apply to newer Rust toolchains,
               | that many users cannot use. Unless you are suggesting the
               | release of patch versions for the soon to be 100 old Rust
               | toolchains in existence every time the unicode standard
               | is updated.
               | 
               | This is too much trouble and work for little gain, given
               | that one can still use a Rust 1.0 compiler to compile the
               | latest version of the unicode-segmentation crate without
               | problems.
        
               | arcticbull wrote:
               | IMO it's not as clear-cut as you make it out to be. It's
               | a pretty arbitrary line to exclude full Unicode support
               | from the standard library. There's a ton of stuff in
               | libstd that could be supported as third-party crates. I
               | don't disagree with what the Rust team has done, and I
               | think there could be a world in which the compiler team
               | also releases first-party crates with "enhanced"
               | functionality beyond just libstd. I consider proper
               | Unicode support to be a "first party" thing, but I also
               | don't think it has to be in libstd per se, necessarily.
        
       | arcticbull wrote:
       | There's a bunch of issues with the Rust implementation, not least
       | that the initial condition where source or target lengths are
       | zero, it returns the number of UTF-8 bytes of the other, but all
       | other computations are performed in terms of Unicode chars --
       | except at the end: `cache[target.len()]` which will return the
       | wrong value if any non-ASCII characters precede it.
       | 
       | Further, each time you call `.chars().count()` the entire string
       | is re-enumerated at Unicode character boundaries, which is O(n)
       | and hardly cheap, hence wrapping it in an iterator over char
       | view.
       | 
       | Also, re-implementing std::cmp::min at the bottom there may well
       | lead to a missed optimization.
       | 
       | Anyways, I cleaned it up here in case the author is curious:
       | https://gist.github.com/martinmroz/2ff91041416eeff1b81f624ea...
        
       | submeta wrote:
       | Impressed to see four posts about Rust on the front page of HN
       | simultaneously.
        
       ___________________________________________________________________
       (page generated 2020-05-03 23:00 UTC)