[HN Gopher] Sentinels Can Be Faster ___________________________________________________________________ Sentinels Can Be Faster Author : panic Score : 39 points Date : 2020-09-07 20:53 UTC (2 days ago) (HTM) web link (lemire.me) (TXT) w3m dump (lemire.me) | sild wrote: | I put the benchmark into quick-bench but could not replicate the | 40% result. The sentinel version was faster but only slightly. | | https://quick-bench.com/q/314Z81FskTlcDqMCUHFVhWmDz8Q | | Update 1: After moving some constants around, I get the 40% | result: | | https://quick-bench.com/q/lPrpQTAyDQuOoKS9MBWCTBXk1TE | | No idea why it made such a big difference to the benchmark. | | Update 2: If the test order is reversed, the result goes back to | being only slightly faster for the sentinel version: | | https://quick-bench.com/q/Ds7aqe5-6md_tTPndOK54ltYZmE | [deleted] | brandmeyer wrote: | In the first two links you sent, the 40% result looks like the | baseline case getting slower, not the unit under test getting | faster. The core assembly looks look identical in both cases. | ggrrhh_ta wrote: | Well spotted; great for sild! Looks like the 40% claim was | due to a bug in benchmarking (which makes sense, and can | happen). | ggrrhh_ta wrote: | The order in which the tests were run was the first thing I | checked in his implementation, but I looked too quickly and | thought he was generating the data for each variant, so I | assumed that was not the problem. [Actually, you need the same | data for both tests, but generated twice] | | I was going to just point out that 40% percent difference would | mean that the version without the sentinel can be improved... | was going to check if there is something that is preventing the | branch prediction from actually taking care of that performance | drop - memory is only being read and nothing should be | invalidated... | ggrrhh_ta wrote: | I am pretty sure that the 40% original difference was due to | a bug in benchmarking - or food for thought to improve the | non-sentinel version. | dpc_pw wrote: | I never thought of null-termination as a "sentinel". I thought | sentinel means strategically putting somewhere a value to avoid | having to check for termination condition. Null-termination is | not a sentinel, IMO. It is the opposite - terminator, which has | to be explicitly checked for. Am I wrong here? | | BTW. If you really want speed, SIMD will obliterate any non-SIMD | solution, and my bet would be that most of the time SIMD will be | easier to implement with length prefix because it will not have | to handle a special case of null-termination. Someone correct me | if I'm wrong. | | Trailing zero is a a typical case of a short-signed hack. More | naive, simpler, natural solution (length prefix) actually works | better, especially long term, where other things change (hardware | architecture, performance and resource constraint, use cases). | lalaland1125 wrote: | This is a rather weird article. Of course sentinels can be faster | in some very particular algorithms. That's not the core | contention. The core contention is that sentinels in general | cause more speed losses than speed gains over the entirety of the | application. Importantly, many useful algorithms such as string | copying, string comparison, and string concatenation can be much | more efficient using a size field. | hinkley wrote: | Is that even the core contention? Or is the core contention | that within mutable data it's possible to destroy the sentinel | through bugs or malice? | colejohnson66 wrote: | I think this is something C++ (and other later ones) got right; | A size field _and_ a sentinel'd (null-terminated) char[]. So | getting the length only requires reading 4 or 8 bytes, but if | you want to pass that string to a function that expects a null | terminated string, just pass x.c_str(). | throwaway894345 wrote: | This seems useful if you're expecting to pass your strings to | a lot of C libraries, which is probably sensible for C++ | (especially whatever vintage of C++ happened to introduce | this string design); however, surely this is a negligible | concern for most mainstream languages? | pengaru wrote: | > This seems useful if you're expecting to pass your | strings to a lot of C libraries, which is probably sensible | for C++ (especially whatever vintage of C++ happened to | introduce this string design); however, surely this is a | negligible concern for most mainstream languages? | | No, the system call interface of the operating system | expects NUL-terminated strings as parameters all over the | place, especially the filesystem APIs. | steveklabnik wrote: | Who came first, the chicken or the egg? | | EDIT: Okay, this is at -1, maybe I shouldn't be slightly | obtuse. These questions are, in my mind, two sides of the | same coin. The OS expects null terminated strings, not | for some inherent reason, but because C and UNIX grew up | together. There's no _inherent_ reason you can 't have an | OS that does not use sentinels to demarcate strings. | throwaway894345 wrote: | Probably my fault for phrasing the question ambiguously, | but the parent interpreted my question as I intended it | to be interpreted. Interfacing with the system libraries | is a common enough case for using null-terminated | strings. | | I'm actually a little curious how what these low-level | APIs look like and how "modern" languages handle | efficiently converting to null-terminated strings. Do | they just copy everything while C/C++/etc would pass a | straight reference? Also, what do the underlying syscall | APIs look like? | steveklabnik wrote: | It's all good. You are 100% right that it's still a good | use-case. | | And yeah, in Rust we have a separate CString type that's | null terminated, so you may need a copy to move between | the two worlds. It just depends, if you're only reading, | for example, then you can create a pointer + len where | the length doesn't include the 0, and that's effectively | zero cost. | | You got me curious about my days as a kid putting | "pascal" in front of Mac System 7 stuff; I believe that | it only accepted Pascal strings. I also vaguely remember | people doing funny stuff like null terminating pascal | strings anyway and then putting the length before the | actual pointer so that you get both in one... | colejohnson66 wrote: | Weren't "Pascal strings" limited in that they were only a | byte though? So, your string could only be 255 bytes long | (conveniently going to a nice round 256 when you add the | length byte). | steveklabnik wrote: | Yep! | colejohnson66 wrote: | For _most_ people, the time difference between reading 4 /8 | bytes and scanning for a null terminator is negligible, but | there's the other benefit of not having to worry as much | about buffer overruns; If you append to a std::string and | its internal buffer is full, it'll realloc() its buffer. | This does have the problem of the possibility of a | realloc(), but I prefer that over buffer overruns. | twic wrote: | It would be even better if it was terminated by 8-len%8 null | bytes, as that would make blocked loops simpler. | throwaway894345 wrote: | I agree. The framing of the article could have been clearer | that (start, length) data structures are widely better for the | general case, but that null terminated strings can be faster | for certain niche cases. | dundarious wrote: | See Alexandrescu's Fastware talk on how amazingly sentinels can | improve even textbook algorithms, and how to apply the approach | intelligently. E.g., for a linear search, overwrite the last | element with the sought after value, and do a tiny bit of work | to make that safe and to undo it before return. | | If you're skeptical, I really do encourage you to watch, he's | not sloppy. | | https://youtu.be/AxnotgLql0k | [deleted] | augustk wrote: | Another classical example of sentinels is improving a linear | search by inserting the key as the last element. | Someone wrote: | Classical, but often not the best idea, nowadays. | | You often cannot know whether that element is writable, and if | it is, that will cause problems when running concurrent | searches (even for the same key) | lowiqengineer wrote: | This is really less "sentinels can be faster" and more "assuming | invariants can be faster because you're reducing comparisons". | recursive wrote: | Any string representation is faster for some hand-picked task. | But still, I think it's reasonable to assert that certain ones | are better than others for general use. | bvrmn wrote: | I think most modern languages shifted towards slices due to | security issues with sentinels. C++ span, rust, zig, go. It's | very sad we need to use libc interfaces and deal with null | terminated byte sequences. | dnautics wrote: | Zig as of 0.6.0 supports both sentinels _and_ length. | steveklabnik wrote: | Not just security issues, it also makes substrings zero copy. | mehrdadn wrote: | It also makes it possible to have nulls inside a string. | Someone wrote: | That can be problematic, too. | | If your language has a garbage collector, it may mean tiny | substrings keep huge strings alive. Java moved away from | using slices (https://dzone.com/articles/changes- | stringsubstring-java-7), I think for that reason. | | If your language isn't garbage collected, it can cause | problems if the substring must outlive the string it was | extracted from. That means that, for some functions, you'll | have to decide whether to return the slice or make the copy. | | (In Rust, that means the slice has a different 'type' (not | sure what they call it), that of a string whose lifetime is | bound to the original string, and the compiler will check | that you won't make the mistake of having the slice outlive | it's source) | steveklabnik wrote: | Yep, all good points! | | (And yeah, you're right about Rust too; string slices are a | different type than owned strings.) | xfer wrote: | How huge typical strings are that this is a problem? It | could be a problem for arrays but seems unlikely for | strings. ___________________________________________________________________ (page generated 2020-09-09 23:00 UTC)