[HN Gopher] Sentinels Can Be Faster
       ___________________________________________________________________
        
       Sentinels Can Be Faster
        
       Author : panic
       Score  : 39 points
       Date   : 2020-09-07 20:53 UTC (2 days ago)
        
 (HTM) web link (lemire.me)
 (TXT) w3m dump (lemire.me)
        
       | sild wrote:
       | I put the benchmark into quick-bench but could not replicate the
       | 40% result. The sentinel version was faster but only slightly.
       | 
       | https://quick-bench.com/q/314Z81FskTlcDqMCUHFVhWmDz8Q
       | 
       | Update 1: After moving some constants around, I get the 40%
       | result:
       | 
       | https://quick-bench.com/q/lPrpQTAyDQuOoKS9MBWCTBXk1TE
       | 
       | No idea why it made such a big difference to the benchmark.
       | 
       | Update 2: If the test order is reversed, the result goes back to
       | being only slightly faster for the sentinel version:
       | 
       | https://quick-bench.com/q/Ds7aqe5-6md_tTPndOK54ltYZmE
        
         | [deleted]
        
         | brandmeyer wrote:
         | In the first two links you sent, the 40% result looks like the
         | baseline case getting slower, not the unit under test getting
         | faster. The core assembly looks look identical in both cases.
        
           | ggrrhh_ta wrote:
           | Well spotted; great for sild! Looks like the 40% claim was
           | due to a bug in benchmarking (which makes sense, and can
           | happen).
        
         | ggrrhh_ta wrote:
         | The order in which the tests were run was the first thing I
         | checked in his implementation, but I looked too quickly and
         | thought he was generating the data for each variant, so I
         | assumed that was not the problem. [Actually, you need the same
         | data for both tests, but generated twice]
         | 
         | I was going to just point out that 40% percent difference would
         | mean that the version without the sentinel can be improved...
         | was going to check if there is something that is preventing the
         | branch prediction from actually taking care of that performance
         | drop - memory is only being read and nothing should be
         | invalidated...
        
           | ggrrhh_ta wrote:
           | I am pretty sure that the 40% original difference was due to
           | a bug in benchmarking - or food for thought to improve the
           | non-sentinel version.
        
       | dpc_pw wrote:
       | I never thought of null-termination as a "sentinel". I thought
       | sentinel means strategically putting somewhere a value to avoid
       | having to check for termination condition. Null-termination is
       | not a sentinel, IMO. It is the opposite - terminator, which has
       | to be explicitly checked for. Am I wrong here?
       | 
       | BTW. If you really want speed, SIMD will obliterate any non-SIMD
       | solution, and my bet would be that most of the time SIMD will be
       | easier to implement with length prefix because it will not have
       | to handle a special case of null-termination. Someone correct me
       | if I'm wrong.
       | 
       | Trailing zero is a a typical case of a short-signed hack. More
       | naive, simpler, natural solution (length prefix) actually works
       | better, especially long term, where other things change (hardware
       | architecture, performance and resource constraint, use cases).
        
       | lalaland1125 wrote:
       | This is a rather weird article. Of course sentinels can be faster
       | in some very particular algorithms. That's not the core
       | contention. The core contention is that sentinels in general
       | cause more speed losses than speed gains over the entirety of the
       | application. Importantly, many useful algorithms such as string
       | copying, string comparison, and string concatenation can be much
       | more efficient using a size field.
        
         | hinkley wrote:
         | Is that even the core contention? Or is the core contention
         | that within mutable data it's possible to destroy the sentinel
         | through bugs or malice?
        
         | colejohnson66 wrote:
         | I think this is something C++ (and other later ones) got right;
         | A size field _and_ a sentinel'd (null-terminated) char[]. So
         | getting the length only requires reading 4 or 8 bytes, but if
         | you want to pass that string to a function that expects a null
         | terminated string, just pass x.c_str().
        
           | throwaway894345 wrote:
           | This seems useful if you're expecting to pass your strings to
           | a lot of C libraries, which is probably sensible for C++
           | (especially whatever vintage of C++ happened to introduce
           | this string design); however, surely this is a negligible
           | concern for most mainstream languages?
        
             | pengaru wrote:
             | > This seems useful if you're expecting to pass your
             | strings to a lot of C libraries, which is probably sensible
             | for C++ (especially whatever vintage of C++ happened to
             | introduce this string design); however, surely this is a
             | negligible concern for most mainstream languages?
             | 
             | No, the system call interface of the operating system
             | expects NUL-terminated strings as parameters all over the
             | place, especially the filesystem APIs.
        
               | steveklabnik wrote:
               | Who came first, the chicken or the egg?
               | 
               | EDIT: Okay, this is at -1, maybe I shouldn't be slightly
               | obtuse. These questions are, in my mind, two sides of the
               | same coin. The OS expects null terminated strings, not
               | for some inherent reason, but because C and UNIX grew up
               | together. There's no _inherent_ reason you can 't have an
               | OS that does not use sentinels to demarcate strings.
        
               | throwaway894345 wrote:
               | Probably my fault for phrasing the question ambiguously,
               | but the parent interpreted my question as I intended it
               | to be interpreted. Interfacing with the system libraries
               | is a common enough case for using null-terminated
               | strings.
               | 
               | I'm actually a little curious how what these low-level
               | APIs look like and how "modern" languages handle
               | efficiently converting to null-terminated strings. Do
               | they just copy everything while C/C++/etc would pass a
               | straight reference? Also, what do the underlying syscall
               | APIs look like?
        
               | steveklabnik wrote:
               | It's all good. You are 100% right that it's still a good
               | use-case.
               | 
               | And yeah, in Rust we have a separate CString type that's
               | null terminated, so you may need a copy to move between
               | the two worlds. It just depends, if you're only reading,
               | for example, then you can create a pointer + len where
               | the length doesn't include the 0, and that's effectively
               | zero cost.
               | 
               | You got me curious about my days as a kid putting
               | "pascal" in front of Mac System 7 stuff; I believe that
               | it only accepted Pascal strings. I also vaguely remember
               | people doing funny stuff like null terminating pascal
               | strings anyway and then putting the length before the
               | actual pointer so that you get both in one...
        
               | colejohnson66 wrote:
               | Weren't "Pascal strings" limited in that they were only a
               | byte though? So, your string could only be 255 bytes long
               | (conveniently going to a nice round 256 when you add the
               | length byte).
        
               | steveklabnik wrote:
               | Yep!
        
             | colejohnson66 wrote:
             | For _most_ people, the time difference between reading 4 /8
             | bytes and scanning for a null terminator is negligible, but
             | there's the other benefit of not having to worry as much
             | about buffer overruns; If you append to a std::string and
             | its internal buffer is full, it'll realloc() its buffer.
             | This does have the problem of the possibility of a
             | realloc(), but I prefer that over buffer overruns.
        
           | twic wrote:
           | It would be even better if it was terminated by 8-len%8 null
           | bytes, as that would make blocked loops simpler.
        
         | throwaway894345 wrote:
         | I agree. The framing of the article could have been clearer
         | that (start, length) data structures are widely better for the
         | general case, but that null terminated strings can be faster
         | for certain niche cases.
        
         | dundarious wrote:
         | See Alexandrescu's Fastware talk on how amazingly sentinels can
         | improve even textbook algorithms, and how to apply the approach
         | intelligently. E.g., for a linear search, overwrite the last
         | element with the sought after value, and do a tiny bit of work
         | to make that safe and to undo it before return.
         | 
         | If you're skeptical, I really do encourage you to watch, he's
         | not sloppy.
         | 
         | https://youtu.be/AxnotgLql0k
        
           | [deleted]
        
       | augustk wrote:
       | Another classical example of sentinels is improving a linear
       | search by inserting the key as the last element.
        
         | Someone wrote:
         | Classical, but often not the best idea, nowadays.
         | 
         | You often cannot know whether that element is writable, and if
         | it is, that will cause problems when running concurrent
         | searches (even for the same key)
        
       | lowiqengineer wrote:
       | This is really less "sentinels can be faster" and more "assuming
       | invariants can be faster because you're reducing comparisons".
        
       | recursive wrote:
       | Any string representation is faster for some hand-picked task.
       | But still, I think it's reasonable to assert that certain ones
       | are better than others for general use.
        
       | bvrmn wrote:
       | I think most modern languages shifted towards slices due to
       | security issues with sentinels. C++ span, rust, zig, go. It's
       | very sad we need to use libc interfaces and deal with null
       | terminated byte sequences.
        
         | dnautics wrote:
         | Zig as of 0.6.0 supports both sentinels _and_ length.
        
         | steveklabnik wrote:
         | Not just security issues, it also makes substrings zero copy.
        
           | mehrdadn wrote:
           | It also makes it possible to have nulls inside a string.
        
           | Someone wrote:
           | That can be problematic, too.
           | 
           | If your language has a garbage collector, it may mean tiny
           | substrings keep huge strings alive. Java moved away from
           | using slices (https://dzone.com/articles/changes-
           | stringsubstring-java-7), I think for that reason.
           | 
           | If your language isn't garbage collected, it can cause
           | problems if the substring must outlive the string it was
           | extracted from. That means that, for some functions, you'll
           | have to decide whether to return the slice or make the copy.
           | 
           | (In Rust, that means the slice has a different 'type' (not
           | sure what they call it), that of a string whose lifetime is
           | bound to the original string, and the compiler will check
           | that you won't make the mistake of having the slice outlive
           | it's source)
        
             | steveklabnik wrote:
             | Yep, all good points!
             | 
             | (And yeah, you're right about Rust too; string slices are a
             | different type than owned strings.)
        
             | xfer wrote:
             | How huge typical strings are that this is a problem? It
             | could be a problem for arrays but seems unlikely for
             | strings.
        
       ___________________________________________________________________
       (page generated 2020-09-09 23:00 UTC)