[HN Gopher] "To be safe allocate 5 bytes more than you need" ___________________________________________________________________ "To be safe allocate 5 bytes more than you need" Author : luu Score : 165 points Date : 2020-11-23 05:16 UTC (17 hours ago) (HTM) web link (twitter.com) (TXT) w3m dump (twitter.com) | SilasX wrote: | Related: There was that story about how axle count on Swiss | trains can't be a multiple of 256 (2^8) [0], and a comment | mentioned that German has a limit of 252 "just to be safe": | | https://news.ycombinator.com/item?id=12071686 | | [0] Because there are only 8 bits to store the axle counter for | whether the train has passed, and you don't want overflow to | zero. | jbay808 wrote: | That's interesting! But a safety factor does make sense here if | you're worried about a small probability of double-counting an | axle. | ajarmst wrote: | Were it my team, this would get bounced with a request for a link | to the bug-report/documentation in cJSON that necessitates this | wart and at least a FIXME flag so that we periodically re-visit | the 'why does cJSON still not correctly estimate its memory and | if it can't then why the hell is it not returning the minimum | safe value rather than a guess?' unsightly hack in our beautiful | code. | asdfasgasdgasdg wrote: | We have a kind of varint-based delta decoder at work that can | read up to 15 bytes past the end of its buffer, because it uses | uint64s to read the buffer faster, than the buffer doesn't encode | byte-length information. Since it doesn't know how many bits wide | the last entry will be, and since it reads ahead a little, it can | end up 16 bytes starting at the last byte and proceeding 15 bytes | onward. It doesn't matter that these bytes are filled with | garbage, because a well formed program will only use the bits | from one of the valid bytes. The outer code knows how many | entries to look for, so it just calls the advance function the | correct number of times, and those bits past the end don't get | used. | | This can lead to a pretty subtle crash because, except in msan | mode, reading those fifteen bytes won't cause any problem except | at the page boundary. This is because you don't corrupt the | sentinel bytes, since they're never written. So to detect the bug | you either have to run the code in debug mode (which we rarely | do, because the code is deep inside a very expensive database | system), or you have to get unlucky. | | I tried fixing this, but adding a "past the end" test really hurt | performance. Instead, we continued with the preexisting solution: | copy the data into a buffer fifteen bytes longer than the data, | then decode it. T_T | Negitivefrags wrote: | We ran into the same problem, only in the AMD GPU driver. | | It would very occasionally crash during a draw that reads the | last few bytes of a vertex buffer on AMD cards only. | | Very annoying to find because it only triggered rarely. | | Our solution was the same, just allocate vertex buffers a | little longer on AMD cards and don't use the extra space. | huhtenberg wrote: | Looks suspiciously like an alignment issue, with driver | expecting something somewhere to be aligned to 4/8 bytes or | be a multiple of dword/qword? | ajarmst wrote: | And, of course, you carefully documented the reason for the | larger buffer and encoded the value into a constant or other | documentable source. One hopes you _didn 't_ hardcode '5 | bytes' with a comment that says 'more bytes safer' and | nothing else :-). | nsajko wrote: | If the uint64s are aligned, wouldn't it be better to use | alignas (to make the buffer aligned as soon as it's allocated)? | As far as I see that way you get a longer buffer only when | necessary (you never waste memory), plus you could get | autovectorized code if the compiler decides that makes sense. | pedrocr wrote: | The same is done for decoding camera raw formats. Variable | encoding of the number of bits in the stream means you never | know when the decode will end. Reading all the file into a | buffer longer than needed is fast, whereas the bit pump is | performance critical and having to check for end of buffer | every time you want to read would be painful. | asdfasgasdgasdg wrote: | Saddest thing is we're reading these values from a database, | and the database SDK does not have a mechanism for reading | values into user-controlled buffers without copying. That is | a conceptually neat solution to the problem, but since it's a | somewhat strange thing to want to do, we can't do it in our | case. Storing the extended values is something I also | considered, but there are tens or hundreds of billions of | them in production. Couldn't really ask for an extra PB of | space for the relatively small performance and elegance | benefit you get from not copying these things. | pedrocr wrote: | Extra copying is the downside if you don't control what | you're reading from. Thankfully memcpy is fast but if your | input is very big it's still painful. In rust I just used | an API that allows reading from both an input stream and | memory so that if you are reading directly from a file the | copy is avoided: | | https://github.com/pedrocr/rawloader/blob/c793a132fa03e336f | 8... | choeger wrote: | That's kind of spooky, it sounds like we work for the same | company, because we have a very similar thing. I am pretty sure | that no one at our company knows its details as well as you | just described yours... | | Maybe someone should open source an efficient yet safe varint | writer? | asdfasgasdgasdg wrote: | As far as I could tell the problem was somewhat inherent in | the encoding. One thing that could have been done is | something like for (int i = 0; i<n-15; ++I) | { dec.FastUnsafeRead(result); } for | (int j = n-15; j < n; ++j) { | dec.SlowSafeRead(result); } | woofie11 wrote: | The concept of a preprocessor is helpful here. | | Most modern languages no longer have preprocessors, which is a | pity. | | In cases like these, you want complex computed constants. You | want those computed at build time, and not at runtime. I can: | | * Run the math on a whiteboard, compute +5 or +15, hardcode | that, and have the world explode when the code changes | | * Run the math in code, at runtime, and have the system be | slow. | | * Code up the math, have it run at build time, and have a clear | documentation of what the constant is, why, how it's computed, | and have it change automatically if the code changes. | | There were awesome things done in C++ templates back in the | day, with some pretty magical performance. The preprocessor | would unroll loops to the appropriate level for a given | processor, align things to cache boundaries, and similar. | | I think the reason this might have gone away is that with GHz | CPUs, few people care as much about performance anymore. Most | code is no longer CPU-bound, but programmer-bound, and lower | cognitive complexity is more important than better performance. | ben509 wrote: | Option 4: | | * Code up a unit test that replicates end of buffer scenarios | and verifies that X bytes is enough and X-1 bytes fails. | zabzonk wrote: | Reading or writing past the end of a buffer in C is | undefined behaviour - you cannot therefore write a test in | C to check this, as you have no idea how such a test would | behave. | Ar-Curunir wrote: | As the sibling comment noted; you don't preprocessors for | these; what you want is compile-time computation. It's much | more principled than text-based preprocessing | asdfasgasdgasdg wrote: | I don't think a preprocessor can help in this situation, | since what's ultimately being read is the data, and the | offsets to be read can't be computed without looking at it. | scott_s wrote: | This sort of requirement is met by compile time computations. | I know C++11, D and Rust have them, and I imagine many more | languages do as well. The need for this sort of thing clearly | has not gone away, we're just moving away from a free-form, | textual pre-processing stage. | kenniskrag wrote: | constexpr in c++ is pretty magic. Loops, try catch etc. is | possible | (https://en.cppreference.com/w/cpp/language/constexpr) | filleokus wrote: | One of the few things I was truly amazed by when doing | some C++ courses during university. Contrived OOP or | extremely cumbersome string manipulation wasn't really my | thing. But being able to have compile time evaluation of | pretty complicated functions without using macros or | anything was so cool. Actually thought about wanting it | yesterday in Go | jdc wrote: | Nim too | benlivengood wrote: | How about reading N-15 fast (assuming varints can be as small | as a byte) and the rest with bounds-checking? Is N generally | too small for that to be useful? | | You might also be able to heuristically compare buffer address | modulo the page size to N to know whether you need to be safe | or not. | kazinator wrote: | > _It doesn 't matter that these bytes are filled with | garbage,_ | | That's only true if the strategy avoids hitting an invalid | page. | | Reading a single uint64_t which is aligned on an 8 byte | boundary will be fine. If any byte of that is within a valid | page, the other bytes have to be. | | If two uint64_t's are read, then as a whole, they have to be a | 16-byte-aligned block for that same assurance to hold. | | If the code is using misaligned reads, that itself will crash | on some processor architectures, or possibly trigger expensive | handling. | vesinisa wrote: | Did you actually read the comment you replied to to the end? | haberman wrote: | The latest optimized version of the protobuf parser does a | remarkably similar thing: | | https://github.com/protocolbuffers/protobuf/blob/master/src/... | | Protobufs are full of varints, and we want to perform only one | bounds-check per field. So we ensure that the input buffer | always has at least 16 bytes available to parse per field. If | there are fewer than 16 bytes remaining, we copy these last few | bytes of data into a patch buffer and parse from that instead. | asdfasgasdgasdg wrote: | Possible indication of common ancestry. | PixelOfDeath wrote: | x86_64 malloc already uses 16 byte alignment and portions to | support SSE commands. So why not go all the way and allocate full | cache lines? This also prevents accidental false sharing between | different malloc pointers. | SamReidHughes wrote: | I've used cJSON. Its problems have been far deeper and more | serious than this. | pantalaimon wrote: | Did you try jsmn? | SamReidHughes wrote: | No, but I haven't personally done a search of C/C++ JSON | libraries. | quelsolaar wrote: | Over allocating is often a very useful strategy. | | I wrote a fast Json parser a while ago and one of the largest | optimizations was to look at the size of the file, compute the | worst case of how much memory it would need, and then do one | allocation and then use the memory. | | Another example would be a dynamic list of values. If you use a | linked list, and have 64bit pointer and a 64bit value, each link | costs 16 bytes. If you have a list of say 8 values then that's 8 | allocations and 16*8 = 128 bytes. If you instead allocate | speculatively an array of 16 values also taking 128 bytes, and | then only use 8 of the values, you only need one allocation. If | the number of values you need grows beyond 16, you need to | reallocate, and expensive operation, but compared to making one | allocation each time a value is added its much faster. Also all | values are close to each other in memory so cache coherence will | be much better, and you may end up with an order of magnitude | faster access. This can be true even if you do expensive | operations like insert. Yes, you can improve the performance of a | liked list by allocating lots of links in one allocation, but the | added space of the pointers, and the out of order access will | still cost you. | | Obviously, when you do this you should provide functions that | always allocate the right number for the user of the code so that | there is no magic numbers visible to the end user.... | hzhou321 wrote: | > Over allocating is often a very useful strategy. | | It is not! | | * If you know exact reasons and exact how much space need | (over) allocate, then you should document the reason, and it is | no long _over_ allocating. | | * If you don't know the exact reasons and only know once a | while it requires additional space, then you are not solving | the problem. You are kicking the can down the road, and it is | likely to explode later, possibly with inappropriate timing. | | * If you know exactly how much the space you need, then the | over allocating is just misleading code. | | Fix Your Code! | | EDIT: Your comment apparently is for something else. Anyway, | the headline needs some counter correction. | quelsolaar wrote: | I'm talking about internal code that is not exposed to the | user. There are loads of situations where something will | fluctuate in size rapidly, you dont know beforehand to what | extent, and allocating memory every-time it changes is very | inefficient. In these cases speculatively over allocating is | a very good strategy. | dls2016 wrote: | Haha I just used this code. My little ESP32 data acquisition | project has been running two days since switching to the | PrintPreallocate function with essentially constant memory usage | so -\\_(tsu)_/-. Way too many other things to worry about. | elcritch wrote: | Exactly, if their inputs are fairly well known and tested, | adding 5 magic bytes is hackish but effective. It could take | days or weeks to debug that properly, if at all. | | Nice you were able to get constant memory. I was getting 40byte | max memory usage difference between identical RPC calls on the | esp32, but that seems due to Nim's ARC garbage collector. | Sometimes +20 bytes, sometimes -20 bytes but not leaking. | Though maybe it's an esp-idf thing. Took me a bit to just call | it good and focus on more important things. -\\_(tsu)_/- | [deleted] | 29athrowaway wrote: | Looks like a workaround rather than a fix, for whatever problem | they were having. | | This happens in science too. Many constants are placeholders for | stuff we do not understand. | pantalaimon wrote: | But science is a bit like reverse engineering the universe. In | this case however _they wrote the source code_. | alerighi wrote: | Despite the (horrible) name computer science, informatics is | not science, is math. There is a big difference: in science you | observe and try to understand something that already exists, | the universe. | | In math (and informatics) you build your own universe with your | set of rules. It means that you can and should understand | everything (well, of course if you assume the hardware works | perfectly as specified). | MaxBarraclough wrote: | Computer science is not entirely math. Compilers, for | instance, care about real-world performance, not theory. | | Physics is more mathematical than computer science, but | there's no doubt that physics is a science. | 29athrowaway wrote: | Discrete math, computer science, etc. are sciences (although | some people claim math is not a science... that's a | discussion for another day). Software engineering, like all | forms of engineering, is applied science plus processes based | on empirical evidence. | DannyB2 wrote: | 5 bytes? What about 5 characters? | | When I started almost four decades ago it was then true that | bytes and characters were the same thing. | aYsY4dDQ2NrcNzA wrote: | I don't think that was guaranteed, at least not in C. Wasn't | that implementation-dependent? | MereInterest wrote: | Sort of, yeah. In C, the size of the integer primitives isn't | defined. What is defined is that sizeof(char) will always be | 1 (one). However, it is entirely implementation defined what | that 1 refers to. On most systems, it is the number of 8-bit | bytes, but it could refer to a larger or smaller amount of | memory. | wahern wrote: | > but it could refer to a larger or smaller amount of | memory. | | Never smaller as the minimum limits for char require _at_ | _least_ 8 bits. But definitely can be wider than 8 bits, as | is common on DSPs or some old mainframes. Though, even on | systems that don 't permit direct addressing of 8-bit char, | C implementations will sometimes emulate it. | | FWIW, POSIX mandates a fixed, 8-bit char. | lambda_obrien wrote: | That's no longer true, in case you were wondering. People still | say `bytes <-"instead of"-> chars` sometimes but we should stop | that. | alphagrep12345 wrote: | This happened at my previous work place. They added 1 free byte | every time they saved some data to disk. When I asked my manager | about it, he said - ' There was a crash a while ago and this | fixed it. No one knows why, but this is what we all do after | that' | comboy wrote: | There are so many IT companies craving for coders, just saying. | cbsks wrote: | I abhor comments like these. It says that it needs to allocate 5 | more bytes, and I can see that in the code, but it doesn't say | why the magic number is 5. The comment explains the what but not | the _why_. The commit message is equally as useless: | https://github.com/DaveGamble/cJSON/commit/65541b900c740e1d5... | | What needs to be fixed so that this "+5" can be removed? What | happens if something changes and the magic number needs to be 6? | Also, where's the unit test? | Znafon wrote: | The Pull Request is at | https://github.com/DaveGamble/cJSON/pull/146 | [deleted] | Arnavion wrote: | Based on the original message that did + 64 and that this | revised to + 5, it might be that a float64 can be serialized to | at most 13 ASCII characters (not true in general but might be | true of cjson's serializer; I didn't check). The original | buffer would only have 8 bytes since that's the width of the | f64, so 5 more bytes are needed. | mark-r wrote: | If you've debugged the problem to the point where you know +5 | will be sufficient, you should know enough to fix the bug that | required it in the first place. | Arnavion wrote: | It's not a bug. It's a public API and the buffer is an input | from the caller. Nothing goes wrong if the buffer is smaller | than it ought to be. The doc comment is just warning the | caller they might need a larger buffer than they think they | do. | kfichter wrote: | Or, at the very least, know enough about the problem to leave | a comment that properly explains why you're unable to fix the | bug. | SkyPuncher wrote: | I disagree. | | You can certainly fix known knowns, but you can't fix unknown | unknowns. Sometimes its safer to simply apply a rule that you | have confidence will work everywhere than fix that may | introduce unsuspecting problems. | thaumasiotes wrote: | > Also, where's the unit test? | | What would you think if the following were true for a fairly | generalized problem: | | - It is easy to prove a particular upper bound on memory | requirements when calling a function. | | - It is very difficult to know exactly what the memory | requirements will be for particular input. | | - No known input achieves the known upper bound. | | Is it a problem if we allocate enough memory to satisfy the | bound we can prove, and then don't have a unit test showing we | can't do with less? | pmiller2 wrote: | If that's all the case, then, where's the proof? It might be | "easy to prove," but it's probably not an "I can prove this | trivially, on the fly" type of task. | cbsks wrote: | I'd be satisfied with that if I was a reviewer in the code | review. I'd want that (and the proofs) explicitly mentioned | in the comment, or at the very least in the commit message. | | At first glance the example seems contrived to me. Assuming | that the code is deterministic, wouldn't it be trivial to | measure the memory requirement for a particular input? | thaumasiotes wrote: | There are a lot of potential inputs. | pantalaimon wrote: | so why is 5 enough? could there be an input where the | number needs to be 15? or 1005? | cbsks wrote: | But surely it would be possible to measure the memory | requirements for one of those potential inputs... | [deleted] | a1369209993 wrote: | > What would you think if the following were true for a | fairly generalized problem: | | > - It is easy to prove a particular upper bound on memory | requirements | | That they should have included said proof in the source code | (in a comment if the type system isn't expressive enough to | run it). | bena wrote: | I think the real resolution would be change the function that | estimates the amount of memory needed would return it with | the additional 5 added already. | | That way you can trust the estimator to always tell you at | most how much memory you'd need. | raldi wrote: | In this case, the unit test would serve as excellent | documentation of how to trigger the bug this padding fixes. | [deleted] | [deleted] | WalterBright wrote: | This screams "time for a serious code review" to me. Also use an | array type that does overflow checking. | asveikau wrote: | I don't recall where, but I remember once encountering an API | that took a buffer and a length, and _required_ the buffer to | exceed the specified length by some n. Unlike this case, I think | the requirement was documented. Still seemed like a recipe for | bugs. If you need to implicitly adjust the length of the buffer | upwards, it 's not really the length of the buffer. | Someone wrote: | Recipe for bugs but possibly also a recipe for performance. | | I'm not making any claims for this particular case, but if a | function taking a large buffer to modify needs some extra | space, and the code can get faster by storing that at the end | of the buffer, requiring the caller to reserve that space can | mean not having to copy the buffer twice, increasing memory use | and execution time. | | A fairly common example is room for adding extra zero string | terminators, so that an vectorized algorithm walking the input | always hits one of them. | asveikau wrote: | I think there are better ways to encode that in API design | that are less ambiguous. | | For example perhaps having a function or some other means to | compute required buffer size, in a way that makes such | padding requirements relatively opaque to the caller. | spaetzleesser wrote: | Wouldn't you wrap the allocation with an alloc/free function that | does the allocation with whatever extra bytes that are needed? | Letting the API users deal with such details seems a recipe for | disaster. | zelly wrote: | Because then you're forcing clients to use malloc rather than | their own allocator | gfodor wrote: | I was going to trash this, but it actually made me think of | something non-obvious: where is the line between hacks like this | and smart behavior? For example, we have code in systems that | performs network retries, and I have code in my own software that | performs polling operations when a request/response "should" | work, but in practice, a variety of reasons cause it to be less | robust. It seems right that something like "allocating extra | memory just to be safe" falls on the wrong side of defensive | programming vs obvious kludge, but where is the line, exactly? It | doesn't seem to be first vs third party code. Is it network hops? | That seems extremely arbitrary, especially given that modern | networking has characteristics closer to a bus than say, | accessing a disk, where such defensiveness is less sane. If | network boundaries are the norm where we start assuming failure | is likely, maybe that's why microservices are considered | beneficial since they result in more defensive programming in | practice even though in principle there's no reason that needs to | be the boundary? Maybe I've had too much coffee this morning. | [deleted] | zamadatix wrote: | Depends what you call smart behaviour I suppose. For the vast | majority of projects though I'd assume the following to be | smart behaviour: | | - Is it robust | | - Is it fast (enough) | | - Is it the simplest way to achieve those levels of robustness | and speed | | If the answer to the first 2 is largely "yes" and the 3rd is | "no" then I'd call it a bad hack. OTOH if the answer to 3 is | also "yes" then it's a good hack. If the answer to 1 or 2 is | "no" then it's not ready to ship. | | In either case a comment saying "this is the simple/safe way, | performance opportunity via..." Or "this is the complicated way | because <reasons complicated way was needed, usually perf> and | it this is why it works..." is good to leave in case | performance requirements ever change or someone is trying to | figure out why you went the way you did. | kansface wrote: | > For example, we have code in systems that performs network | retries, and I have code in my own software that performs | polling operations when a request/response "should" work, but | in practice, a variety of reasons cause it to be less robust. | | Apples and oranges. The network is unreliable and all code | dealing with the network should treat it as such. The file | system and disks are also unreliable, but are probably more | reliable by a factor of 10K than the network. Engineers mostly | choose to ignore these potential errors via unintentional | decisions or do stuff like let the process crash because | restarting an API server once a year doesn't matter. Of course, | not everyone has that luxury. Determining how much memory to | allocate ought to be perfectly deterministic which is why this | is such a code smell. | | > where is the line, exactly? | | The line to me is pretty clear in this case! There are all | sorts of reasons why this code would be acceptable: if fixing | the code to prevent the bug[s] is sufficiently onerous as to | cause more bugs than the fix would prevent, or is so much work | that it would never be undertaken, or so complex that the code | is un-mergable, an emergency hot fix... all acceptable so long | as the code is documented as such. In other words, the line | crossed here was in the comment. Where did the number 5 come | from? Where's the link to an issue tracking memory | allocation/estimation? | | I believe most comments in code are worse than useless - that | we shouldn't merely document bad code, but write code that is | so blindingly obvious and idiomatic that comments detract from | its perfection. Comments exist when we deviate from the | platonic ideal -- the real world with leaky abstractions, | deadlines, bugs and dollars. | gfodor wrote: | I don't think you understood my comment - it was raising the | question where the line actually is, and if the line is drawn | more upon cultural norms than solid engineering principles. | That isn't an argument the code of the OP is a good idea, it | isn't, the question is if there are large categories of code | we write that ought to be defensive but aren't, or vice | versa. | WalterBright wrote: | > where is the line between hacks like this and smart behavior? | | Understanding exactly what the cause of the problem. Hacks | imply a fix that "seems" to work but not understanding. | ExcavateGrandMa wrote: | hmmmm... That's kind of a bean counter. krkRKRkrkrkrkRKR | floatingatoll wrote: | UTF-8 characters can be composed of at most 4 bytes, plus a NUL | terminator makes 5. I have no idea if this is true but it's | plausible enough for a Unicode parser! | SAI_Peregrinus wrote: | Not UTF-8 characters, UTF-8 code points. UTF-8 characters (more | properly "extended grapheme clusters") can be of any length. " | " is one character, 25 bytes long (0xf0,0x9f,0x91,0xa8,0xe2,0x8 | 0,0x8d,0xf0,0x9f,0x91,0xa8,0xe2,0x80,0x8d,0xf0,0x9f,0x91,0xa7,0 | xe2,0x80,0x8d,0xf0,0x9f,0x91,0xa7) (it won't show correctly on | HN, since HN strips some Unicode). If you assume that code | point = character, your parser is buggy and doesn't comply with | unicode specifications. | floatingatoll wrote: | I don't presume to know enough about Unicode to code parsers | it, but I'm happy to accept the correction! | zepearl wrote: | Uuuhh, this is an extremely important remark/correction for | me, thank you. | | So far I was thinking as well that any "character" could be | saved ("encoded"?) as UTF-8 by using max 4 bytes... . | rurban wrote: | On the character level yes, but this is for strings, so looks | totally broken to me. | rusk wrote: | The len is being passed to malloc which I recall takes number | of bytes, so this explanation seems plausible. I very much | doubt memory allocation is _"innacurate"_ , as per the | comment ... | avian wrote: | Some more clues where this comes from are in this issue: | https://github.com/DaveGamble/cJSON/issues/255 | IshKebab wrote: | Pretty good advert for not writing code in C. | GuB-42 wrote: | More like writing C properly. | | When you program in C, you should know exactly what each byte | of memory is for. If you don't do that, not only you will | eventually get crashes and leaks and overflows, but you also | miss the point of programming in C in the first place. | | C is fast, but C is not fast because it overclocks your CPU | or whatever magic you can think of. An good reason why is is | fast is that it gives you tight control over your memory. You | can reuse buffers, size them precisely, allocate them exactly | when you need them and free them when you are finished, | nothing funny happening behind your back. It generally | results in lower memory usage, better use of cache and | therefore better performance. | | If you don't manage your memory precisely, then sure, C is | probably not the language for you. ___________________________________________________________________ (page generated 2020-11-23 23:00 UTC)