[HN Gopher] A database without dynamic memory allocation ___________________________________________________________________ A database without dynamic memory allocation Author : todsacerdoti Score : 250 points Date : 2022-10-13 15:23 UTC (7 hours ago) (HTM) web link (tigerbeetle.com) (TXT) w3m dump (tigerbeetle.com) | pyrolistical wrote: | This implies changing of buffer sizes requires a process restart. | Which is a fine trade off. | | This also implies all memory is allocated upfront. It would | require the operating system to be smart enough to compress/page | blocks or else the process will hold on to unused memory | haggy wrote: | Per the article: "(Currently it's determined even before | startup, at compile-time, but startup is the important point | since that is the first time that memory can be allocated.)" | | Sooo are we sure you can even change this via proc restart? | This is saying the limits are built-in during compile... | eatonphil wrote: | They are currently compile time settings. That may change in | the future to be more user-friendly. :) But what won't change | is that the size is fixed at startup. | Thaxll wrote: | How does it works exactly? It's a memory pool that uses 85% of | available memory? If I have one row in my DB it's going to | prealocate 30GB of memory? I don't think it's a sane design, it | reminds me the JVM args with min / max. | sakras wrote: | The OS doesn't actually map/allocate the pages to you until you | touch them, so if you preallocate 30 GB and only use one row, | you'd only be using 4 KB (or whatever your page size is). This | is the same as if you'd individually malloc'd the single row, | since deep down malloc just calls mmap. | jorangreef wrote: | Joran from the TigerBeetle team here! | | We have a secret plan for this too. ;) | allknowingfrog wrote: | The article differentiates between "memory" and "on disk" | storage, which is where the rows in the database would live. | apavlo wrote: | This is pretty much how all DBMSs work right now. You have to | tell it how much memory it can use for its internal buffer | pool. | aidenn0 wrote: | I've been an embedded software developer for almost 2 decades, | and this style used to be near universal in embedded, but has | become less so, much to my dismay. I even saw a project not too | long ago where a large C++ STL List (i.e. a linked list) was | created at startup, then later freed piecemeal, causing so much | heap fragmentation on the poor embedded system malloc | implementation. | hinkley wrote: | I think it was when I was reading the zlib source code, while | dinosaurs roamed outside my college window, where I first saw a | variant of this, of a library that statically allocated memory | for a graceful shutdown routine before doing anything, so even | if memory were exhausted by a bug in other code later on it | could still exit cleanly. | melech_ric wrote: | It's still done. I did the code review for the guy who wrote | it for our server. If we're crashing due to an OOM situation | we still have enough memory to write out a minidump. | touisteur wrote: | The pain here is to test and keep testing these paths. Same | as some 'free disk space' monitoring... | melech_ric wrote: | I agree. However, in our case, that pain has proven to be | less than not having those minidumps. | touisteur wrote: | Oh for me too, but I wish it was easier, like something | supported seriously by my language of choice, my OS of | choice or my libc of choice, with the flick of an flag... | I don't retest the whole libc and kernel every time | (although, err, someone should, right? :-) | jorangreef wrote: | Joran from the TigerBeetle team here! | | TigerBeetle uses Deterministic Simulation Testing to test | and keep testing these paths. Fuzzing and static | allocation are force multipliers when applied together, | because you can now flush out leaks and deadlocks in | testing, rather than letting these spillover into | production. | | Without static allocation, it's a little harder to find | leaks in testing, because the limits that would define a | leak are not explicit. | polskibus wrote: | Can you provide some pointers regarding the simulation | testing for more information? | jorangreef wrote: | Sure! | | Here's an overview with references to the simulator | source, and links to resources from FoundationDB and | Dropbox: https://github.com/tigerbeetledb/tigerbeetle/blo | b/main/docs/... | | We also have a $20k bounty that you can take part in, | where you can run the simulator yourself. | srcreigh wrote: | What language? What does the server do? | melech_ric wrote: | C++ on Windows. It's an old codebase that has been used | to run custom test hardware. | | We also target Linux with the same codebase, but the | environment is a bit different. On Linux we're using a | system slice and we're not doing minidumps. | kotlin2 wrote: | What environment is the server running in? Can you prevent | the Linux OOM killer from killing your process? | abeyer wrote: | The OOM killer can be disabled completely on a system, or | a process can be excluded from it by setting an OOM flag | under `/proc/<pid>` It's not generally considered good | practice in production, but I think that's largely | because most software does not do what is being suggested | here... in the scenario where you have, seems like the | right thing to do. | throwaway09223 wrote: | It's definitely good practice in production and is often | necessary. | | The techniques mentioned above will (perhaps | surprisingly) not eliminate errors related to OOM, due to | the nature of virtual memory. Your program can OOM at | runtime even if you malloc all your resources up front, | because allocating address space is not the same thing as | allocating memory. In fact, memory can be deallocated | (swapped out), and then your application may OOM when it | tries to access memory it has previously used | successfully. | | Without looking, I can confidently say that tigerbeetle | does in fact dynamically allocate memory -- even if it | does not call malloc at runtime. | abeyer wrote: | > It's definitely good practice in production and is | often necessary. | | I'd be curious if you have any resources/references on | what is considered good practice in that now, then. | | It's been a long time since I did much ops stuff outside | a few personal servers, so it may well be my background | is out of date... but I've certainly heard the opposite | in the past. The argument tended to run along the lines | that most software doesn't even attempt to recover and | continue from a failed malloc, may not even be able to | shut down cleanly at all if it did anyway, and the kernel | may have more insight into how to best recover...so just | let it do its thing. | jorangreef wrote: | We're aware of this, in fact, and do have a plan to | address virtual memory. To be fair, it's really the | kernel being dynamic here, not TigerBeetle. | throwaway09223 wrote: | Oh for sure. Not casting any shade cast at all on your | work - I am really happy to see what you're doing. This | kind of thing has a lot of value even in a virtual memory | environment. | colejohnson66 wrote: | IIRC, .NET does something similar for OutOfMemoryExceptions. | The exception is thrown _before_ actually running out of | memory. That allows just enough space for the | OutOfMemoryException object (and associated exception info) | to be allocated. | Merad wrote: | I believe the runtime preallocates an OOME on startup, so | if one is encountered it can just fill in the blanks and | throw without needing any additional allocations. | znpy wrote: | Iirc also perl does this. I'm on the phone right now, but | iirc $^O was a variable filled with about 20k of worthless | data that could be freed in case of an out of memory | condition. | LunaSea wrote: | Would you have some links to posts that talk about this style | of memory management and allocation? | malkia wrote: | Game Development still revolves around this. For example when | loading a level, it used to be you have huge arena, you load | your "serialized" map at one end (say top), and your other | chunks at the bottom - you have some "flexibility" to pop | from top or bottom, so early on you arrange for that. | Alternatively you split your memory per sized big chunks, and | stream in out these (works well for textures, and primed | sounds - e.g. the first 64kb or 128kb of a sound file). | [deleted] | aidenn0 wrote: | I don't have some links, but it's more straightforward than | you might think (and TFA covers some of it). | | 1. Bound every _whatever_ that requires memory to some value | N | | 2. For all _whatevers_ that can have a lifetime for your | entire program, do that | | 3. For any _whatevers_ that can 't have a lifetime for your | entire program (e.g. for a database server like TFA, a | connection object would qualify), make a table of size N | | 4. When you need a _whatever_ (again using a connection | object as an example, when listen() returns) grap an unused | entry from the table. If there is no unused entry do the | appropriate thing (for a connection, it could be sending an | error back to the client, just resetting the TCP connection, | blocking, or logging the error and restarting the whole | server[1]). | | 5. When you are done with a _whatever_ (e.g. the connection | is closed) return it to the table. | | Those are the basics; there are some details that matter for | doing this well e.g.: | | - You probably don't want to do a linear-probe for a free | item in the table if N is larger than about 12 | | - Don't forget that the call-stack is a form of dynamic | allocation, so ensure that all recursion is bounded or you | still can OOM in the form of a stack overflow. | | I should also note, that for certain values of N, doing | things this way is normal: | | - When N=1 then it's just a global (or C "static" local | variable). | | - When N=(number of threads) then it's just a thread-local | variable | | 1: The last one sounds extreme, but for a well understood | system, too many connections could mean something has gone | terribly wrong; e.g. if you have X client instances running | for the database, using connection pools of size Y and you | can have 2 _X_ Y live connections in your database, exceeding | N=2 _X_ Y is a Bad Thing that recovering from might not be | possible | hobs wrote: | Missing the end of "- You probably don't want to do a | linear-probe for a free item in the table if N is larger | than about" | aidenn0 wrote: | Thanks, fixed | ericbarrett wrote: | I was curious about that too :) My guess is 16 items | (edit: OP says 12) but I'm not an embedded developer, and | I'm sure it's hardware-specific. | | The approach I've seen for larger fixed-sized tables is a | free list: when the table is allocated, also initialize | an array of the slot numbers, with an associated length | index. To allocate, pop the last slot # off the array and | decrement the length index. To free, push the slot # onto | the end of the array and increment the length index. | Checking for a free slot is then just testing for length | index > 0. | aidenn0 wrote: | Yeah, a stack of free indexes is good. In the rare case | of "lots of indexes, but little (extra) memory" a bitmap | can be good because it's 1 bit per index rather than | [?]log2(N)[?] bits per index and has a constant factor | speed up equal to the word-size of the architecture. | [deleted] | 01100011 wrote: | Nearly everything I learned about embedded systems | programming came from either Embedded Systems Magazine | articles(https://www.embedded.com/embedded-systems-design- | magazine-ar...) or just being a paranoid SoB. | | Writing reliable systems means spending a lot of time | thinking about what and how failures can occur, minimizing | them and dealing with the cases you can't get rid of. Simple | is nearly always better. | | It also helped to be an EE by education and not a CS major. | CS majors can get blinded by their abstractions and forget | that they're working on a real piece of HW with complex | limitations which require design tradeoffs. | blondin wrote: | game development has been doing it this way for a long time | too. the domain often dictates how we code. | aaaronic wrote: | Isn't this just a bunch of static object pools? That's extremely | common in embedded programming. I'm glad to see someone | recognized it could apply to a database. | | My students do this on the GameBoy Advance specifically so they | don't need to learn about dynamic memory allocation to get useful | things done early in my course. In fact, they only learn about | dynamic memory allocation for academic reasons later on and never | _need_ to use it in their games. | schemescape wrote: | Do you mind sharing the name of the course? It sounds | interesting. (I didn't see anything in your profile.) | victor106 wrote: | > Doing this was possible because we sketched out all resource | usage (network, disk, memory, CPU) in the design phase with back- | of-the-envelope calculations. | | I wish all programs did this | [deleted] | chubot wrote: | _Building a system that allocates memory once up-front requires | up-front thinking._ | | sqlite apparently has some support for this, although of course | you make it more predictable if you control all the code. | | https://www.sqlite.org/malloc.html | | They even provide some nice math for you to create the fixed size | heap! | | _The problem of dynamic memory allocation, and specifically the | problem of a memory allocator breakdown, has been studied by J. | M. Robson and the results published as_ J. M. | Robson. "Bounds for Some Functions Concerning Dynamic Storage | Allocation". Journal of the Association for Computing Machinery, | Volume 21, Number 8, July 1974, pages 491-499. | matheusmoreira wrote: | Thanks for the reference, that Robson proof is amazing. | Published in 1974... I wonder what other treasures are hidden | in the literature. | kristoff_it wrote: | The idea of having programs that don't crash under memory | pressure is one of the things that I really hope developers will | start striving for more in the near future. | | A database that doesn't OOM crash is great, and the same is also | true for more consumer-grade software. | | I have 32gb of RAM on my PC and whenever I'm encoding a video | with DaVinci Resolve, I always see Firefox and Discord crash | almost instantly. | slaymaker1907 wrote: | What would happen in your utopia is that DaVinci Resolve would | fail to start because Firefox or Discord would reserve too much | memory. Or maybe Firefox would demand you close open tabs | before allowing you to open a new one. We have a solution to | the problem you describe, it's called paging to disk and hoping | that we won't need to read or write to that memory very often | (which works great for something like Firefox where a bunch of | pages are sitting idly in the background). | kristoff_it wrote: | Allocating upfront is only one technique. | | Somehow DaVinci Resolve is capable of using all the free | memory it can find and not more than that, otherwise it would | never be able to complete any rendering job. That's one | "utopic" program that's pretty real. | | > We have a solution to the problem you describe, it's called | paging to disk and hoping | | No. The solution is to be smart about memory usage, and to | handle failures when trying to allocate memory. DaVinci would | become unusable if it started swapping when rendering. | | Maybe for some programs swapping is a fine solution, but the | argument that software engineers shouldn't concern themselves | with designing for OOM conditions is absolutely ridiculous. | | That's not a 'utopia', that's what we should be concretely be | striving for. | mananaysiempre wrote: | It's difficult to see how that would be possible without a | hard break from legacy systems when fork/exec practically | forces overcommit and very dubious resource accounting on | us. (One consumer system that handled OOM conditions with | remarkable grace was Symbian, but of course it forced an | extremely peculiar programming style.) | kristoff_it wrote: | That's a fair point, but it's also true that we're never | going to break away from legacy systems until we start | writing software that has a chance of functioning | correctly when running out of resources. | | The good news is that we do have ecosystems where you | don't have overcommit, like webassembly and embedded | devices, and if you're writing libraries meant to also | work on those platforms, then you will have some good | building blocks that can be used to iterate towards more | robust software. | | That's how you're expected to approach library writing in | Zig. | ok_dad wrote: | Too bad we can't design some protocol for applications to | "work together" to ensure each has the memory they need. Like | some sort of "application collective memory management" where | each application has certain memory it needs/wants and | depending on the priority of that need and the current | priorities of the other applications, it could know to reduce | memory usage during those times when some other application | has a higher priority. For one, it would help in your case, | because background video processing might be a lower | priority, or perhaps you want it to finish as fast as | possible, so that Firefox and etc. just "freeze" themselves | for a while and drop as much memory as is practicable (ie: | deleting currently rendered pages from memory and re- | rendering them later, or paging those out to disk for a | while). | mananaysiempre wrote: | Related references: | | - Neal Walfield's papers on resource management[1,2] | | - The agoric papers[3] from the E people (who sadly now | seem to be trying to steer the whole thing in the | cryptocurrency direction) | | [1] http://walfield.org/papers/2009-walfield-viengoos-a- | framewor... | | [2] http://walfield.org/papers/2011-walfield-smart-phones- | need-s... | | [3] https://papers.agoric.com/papers/ | loriZyy wrote: | Windows had this for over 20 years: | | https://learn.microsoft.com/en- | us/windows/win32/api/memoryap... | kccqzy wrote: | iOS has had something similar but simpler since the very | beginning: https://developer.apple.com/documentation/uikit/ | uiapplicatio... | kristoff_it wrote: | Firefox should in theory be able to crash individual tabs | to reclaim memory, but in practice I just see it die. | littlestymaar wrote: | Isn't it likely because it's not crashing by itself, and | is being killed by the OS? | colejohnson66 wrote: | OOM is usually a _good_ reason to crash. Sure, a DB might be | able to do something creative with its cache, but if you can 't | allocate, there's not much the average program can do anymore. | I guess browsers can kill their child processes, but that's not | a long-term solution. If you're out of memory, your computer | will be lagging from paging/swapping memory in-and-out, so | killing yourself helps relieve some pressure. | | Also, are you sure it's the programs failing and not some OOM | killer coming around? Could your pagefile/swap be full? | advisedwang wrote: | The article says it rejects incoming connections. that seems | like a much better behaviour than OOM - clients can back-off | and retry and keep the overall system stable. | jorangreef wrote: | Thanks! | | Joran from the TigerBeetle team here. | | This was in fact one of our motivations for static | allocation--thinking about how best to handle overload from | the network, while remaining stable. The Google SRE book | has a great chapter on this called "Handling Overload" and | this had an impact on us. We were thinking, well, how do we | get this right for a database? | | We also wanted to make explicit what is often implicit, so | that the operator has a clear sense of how to provision | their API layer around TigerBeetle. | lmm wrote: | I can easily imagine that kind of design getting into a | state where it's "up" but not accepting any connections | indefinitely (if it's using just enough memory to run | itself, but doesn't have enough to accept any connections). | Crashing early is often a good way to reduce the state | space and ensure consistent behaviour, since you will | always have to handle crashing anyway. | wahern wrote: | Crashing and losing all existing connections often merely | leads to an avalanche of subsequent requests, | accelerating overload of your systems. I've seen this | countless times. | | Being crash tolerant is one thing. But "crash early, | crash often" is absolutely horrible advice. | lmm wrote: | > Crashing and losing all existing connections often | merely leads to an avalanche of subsequent requests, | accelerating overload of your systems. I've seen this | countless times. | | Sure, but again, that's a scenario that you need to | handle anyway. | | > Being crash tolerant is one thing. But "crash early, | crash often" is absolutely horrible advice. | | I've found it to be good advice. It's a big part of why | Erlang systems have been so reliable for decades. | wahern wrote: | In Erlang a "process" is more akin to a thread or fiber | in most other languages, and "crashing" more like an | exception that is caught just across a logical task | boundary. Importantly, in Erlang a process "crashing" | doesn't kill all other connections and tasks within the | kernel-level Erlang process. | | And that's why Erlang is so resilient, precisely because | the semantics of the language make it easier to _isolate_ | tasks and subtasks, _minimizing_ blast radius and the | risk of reaching an inconsistent or unrecoverable state. | I often use Lua (as a high-level glue language) to | accomplish something similar: I can run a task on a | coroutine, and if a constraint or assertion throws an | error (exceptions in Lua are idiomatically used | sparingly, similar to how they 're used in Go), it's much | easier to recover. This is true even for malloc failures, | which can be caught at the same boundaries (pcall, | coroutine.resume, etc) as any other error.[1] It's also | common to use multiple separate Lua VM states in the same | kernel process, communicating using sockets, data-copying | channels, etc, achieving isolation behaviors even closer | to what Erlang provides, along with the potential | performance costs. | | [1] While more tricky in a language like Lua, as long as | your steady-state--i.e. event loop, etc--is free of | dynamic allocations, then malloc failure is trivially | recoverable. The Lua authors are careful to document | which interfaces and operations might allocate, and to | keep a core set of operations allocation free. Of course, | you still need to design your own components likewise, | and ideally such that allocations for connection request, | subtasks, etc can be front-loaded (RAII style), or if not | then isolated behind convenient recovery points. Erlang | makes much of this discipline perfunctory. | yellowapple wrote: | > Could your pagefile/swap be full? | | This presumes that one is indeed using a pagefile or swap | partition. I know of quite a few folks who skip that on SSDs | out of fear of SSD wear. | | Personally, in this day and age I don't really give a damn | about SSD wear (that's what backups are for), so I'll happily | create a swap partition, but not everyone is as comfortable | with that. | perryizgr8 wrote: | Software for critical infrastructure often handles no/low | memory situations in a graceful manner. I've worked on | network switch software where we were expected to work | indefinitely with zero remaining memory. A single packet drop | is unacceptable, at 10-40G line rate. Any new malloc needs to | be justified with a design document. | | So yeah, average user software can do a lot, it's just that | we've given up on reliability and robustness as an industry. | That's why sometimes your phone fails to make a 911 call, and | sometimes you need to reset your car. | slaymaker1907 wrote: | When SQL Server goes OOM, it's often because the system can't | free memory quickly enough, not that there is literally not | enough memory to continue operating. Sure, it could be more | aggressive with freeing memory, but at some point it just | makes more sense to restart the entire system so that overall | performance doesn't nosedive. | pphysch wrote: | It would be nice if you could specify e.g. a Postgres table be | backed by a (really big) "ring buffer" rather than the usual | storage. | | Maybe that breaks some relational features on the table, but | being able to painlessly query the most recent X records for | some interesting transient log data would be nice. | ok_dad wrote: | I ended up using a ring buffer in Redis recently to store the | last N values of some database lookups that are slow. It | works great! Didn't take that much logic, either, since Redis | has easy transactions. I just have the "current buffer index | 'i'" as a key-value, the "buffer length 'N'" as a key-value, | then 'N' numbered entries as key-values. I prefix these with | the data identifier that it caches, so you can have several | ring buffers active. you can do this with a Postgres table | just as easily, I assume! | | I also am building a ring buffer data structure for my hobby | database that I use for personal stuff that uses FoundationDB | [though I have been unhappy with the state of the FDB Go | bindings, which are somewhat out of date versus the C client | library for FDB and aren't very Go-like in design (the C | libraries are the "gold standard" and they use FFI in all the | other bindings to connect to those)]. | | _Added later_ : it was a bit harder to deal with resizing | the buffer, so when that happens I do some data | shifing/dropping depending on which way we're resizing, | effectively initializing the buffer anew because that was | easier (again, a Redis transaction saves me a lot of | concurrency work!), but lucky for me we would only resize | each buffer at most 1 time per minute due to how our configs | work. | jandrewrogers wrote: | Allocating all required memory during startup has been idiomatic | for database architectures as long as I can remember. The obvious | benefits to doing things this way are wide-ranging. | | There is a less obvious benefit in more schedule-driven database | architectures. By implication, the code knows exactly how much of | which resources are instantaneously available at all times since | it is directly managing those resources. This enables the | database to dynamically adapt its behavior and scheduling | decisions in response to specific resource pressures or excess | resource capacity. This is basically a fine-grained form of | internal micro-backpressure, dynamically prioritizing subsets of | the workload that minimize impact on the resources under pressure | at any moment in time. In some versions, you can switch internal | execution strategies to take advantage of currently under- | utilized resources. If you outsource this to the OS, you have no | idea how close or far you are from the limits, it doesn't really | work. | | This doesn't add much performance _per se_ but is great for | minimizing tail latencies. It helps to create a robust internal | resource equilibrium that handles transients and concurrency more | smoothly and consistently than if the database couldn 't make a | fine-grained decisions about what to do from a position of total | resource awareness. | com2kid wrote: | > Allocating all required memory during startup has been | idiomatic for database architectures as long as I can remember. | The obvious benefits to doing things this way are wide-ranging. | | Allocating all required memory is, IMHO, a great practice for | almost any type of program. | | Heap memory allocators are a type of performance optimization, | they allow re-use of existing memory, but only if you are | careful to not run out of memory. Heap's of course also allow | for more efficient utilization of memory, if some software | module isn't using that bit of RAM right now, let another bit | of code use it. | | But like all performance optimizations, they make code messier, | and they also lead to less reliable and harder to follow code. | | In the embedded world, code statically allocates all memory it | needs up front. This means that code that reads from the | external world has a fixed buffer size, and also likely a | related max throughput, regardless of system conditions (e.g. | lots of free memory laying around). | | But, this is also a good thing! It forces programmers to think | about their limits up front, and it also forces thinking about | error handling when those limits are broken! | | Sure it can require creative coding, loading large blobs of | binary data requires more work when you stop assuming malloc | can return infinite memory, but, hot take here, programs should | stop assuming malloc can return infinite memory anyway. | girvo wrote: | > In the embedded world, code statically allocates all memory | it needs up front. | | Except for things like the ESP32, which uses malloc pretty | liberally within ESP-IDF. Networking is hard without it, it | seems? | astrange wrote: | If you're running on a regular OS, you can't escape it by | just not calling malloc. | | You'll still get lower performance if the system is low on | memory, because you got swapped out - it doesn't have much if | anything to do with how much you allocated. Wiring it is a | different story, and if you're an audio app maybe you should | allocate and mlock(), but not a database. | | > hot take here, programs should stop assuming malloc can | return infinite memory anyway. | | Assuming this is a perfectly fine way to make reliable | software. | | https://en.wikipedia.org/wiki/Crash-only_software | jorangreef wrote: | Static allocation has also made TigerBeetle's code cleaner, | by eliminating branching at call sites where before a message | might not always have been available. With static allocation, | there's no branch because a message is always guaranteed to | be available. | | It's also made TigerBeetle's code more reliable, because | tests can assert that limits are never exceeded. This has | detected rare leaks that might otherwise have only been | detected in production. | pierrebai wrote: | I don't get your characterization. The statically sized | arrays can never be fully used? That is doubtful. I suppose | that if data X and only ever used 1:1 with data Y and you | got a Y, then you are guaranteed to have an X, but that | requires complete static analysis of the code base _and_ | certainty that no future code change will _ever_ affect | that 1:1 relationship. That seems fragile. | | I mean it's not like memory exhaustion is a common | occurrence nowadays. These kind of compromises, where one | trades an almost non-existent failure mode for a fragile | assumption-ridden code base does not sounds like a wise | choice. | | I mean, if the particular process is so tightly-integrated | that you can know that you have a 1:1 relationship between | X and Y, you can also tie those together with dynamic | allocations. I find it hard to believe that you can easily | statically show that X and Y allocations are tied in a | static allocation scheme but that it would not be so under | dynamic allocation? | Twisol wrote: | I think the grandparent was saying that dynamic allocation | is a form of optimization, which also makes the code harder | to follow. Your anecdote seems exactly in line with their | suggestion. | jorangreef wrote: | Ah, missed that, thanks! I've updated the comment. | parhamn wrote: | > memory during startup has been idiomatic for database | architectures as long as I can remember. | | How can this be true given most mainstream databases do not do | this? Maybe a pattern sure, by idiomatic, not so much. I only | add this comment in defense of the poster as this comment | seemed to reduce their achievement/efforts. | dilyevsky wrote: | Not just databases but really any kind of latency sensitive | software especially with "realtime" requirements (e.g telecom | gear) | gavinray wrote: | I like this idea a lot. | | Are there any other databases which do a similar thing? | | This seems like one of those things that has massive benefit if | you're willing to invest in the effort + architect it like that | from the start. | | (Unrelated note: I discovered last night that Zig has io_uring | support and some nice wrappers implemented in "std". Which is | nice for usecases like databases or networked services) | | https://github.com/ziglang/zig/blob/42a3b60c331cf01d1ab80eb7... | kristoff_it wrote: | > (Unrelated note: I discovered last night that Zig has | io_uring support and some nice wrappers implemented in "std". | Which is nice for usecases like databases or networked | services) | | io_uring support was contributed to the Zig standard library by | Joran, who is also the creator of TigerBeetle :^) | SleepyMyroslav wrote: | rant at fixed memory systems: | | In many fixed sized systems "re-used" as a result of bugs data is | still good one and it will produce slightly different results | that can not be distinguished from real ones until whole | computation will be done outside of system. If data reference | mistake is stable it can even pass majority/stability tests for | long time. Filling destroyed values can save a lot of time just | because it will produce nonsensical results that will be noticed | faster. | | Also coming from games industry I don't recognize praise of fixed | systems at all. Games prefer fail fast. Restart can be easy for | players if they re-connect or not lose any progress. The way to | get players truly upset is to get player data mixed with bad | values after long play time or if there is no easy way to recover | progress/saved data. | [deleted] | slaymaker1907 wrote: | Should be titled "without using malloc" or something. It's pretty | much impossible to do anything with user input strings unless you | have some sort of memory management. Maybe you have one giant | buffer that you use for strings in a particular session, but that | still involves memory management because even if you allocate the | buffer at startup, using one buffer for multiple strings is just | a type specific arena allocator. | | As someone who works on low level database code, this approach is | going to be EXTREMELY wasteful. Instead, it's much better to | instead have a more sophisticated understanding of memory use by | subsystems and have some concept of required memory and caches. | Each system allocates the minimum memory it needs at startup and | then treats the rest of memory as caches which can be emptied as | necessary to preserve fairness among all the various subsystems. | throw827474737 wrote: | > Should be titled "without using malloc" or something. It's | pretty much impossible to do anything with user input strings | unless you have some sort of memory management | | Title says "without dynamic memory allocation" == what means | without malloc in this context, so static memory allocation ... | where you do your own kind of memory management in fixed size | arrays or more sophisticated if you want. It didn't say without | memory management? (And article explains also why no strings or | other unknown/arbitrary length stuff..) | eatonphil wrote: | Also, the article mentions the only entities are two fixed | size types: accounts and transfers. | | There are no user strings in TigerBeetle. :) | | If anyone wants to see what you can store in accounts and | transfers, check out the reference docs! | | https://docs.tigerbeetle.com/reference/accounts | | https://docs.tigerbeetle.com/reference/transfers | kortex wrote: | > using one buffer for multiple strings is just a type specific | arena allocator. | | Sure, but there's a lot of advantages of entirely controlling | memory allocation in-process. | | > this approach is going to be EXTREMELY wasteful. | | Could you elaborate why? Sure, you have to do a bit more | juggling in-process, but I would imagine this would be | potentially _more_ efficient than having conventional mallocs. | | In the environment it is designed to operate in, there is | usually a fixed memory limit per process, so you might as well | grab all that up front. | jorangreef wrote: | Thanks! | | Joran from the TigerBeetle team here. | | > I would imagine this would be potentially more efficient | than having conventional mallocs. | | Yes, in our experience, static allocation means we use | sometimes 10x less memory. For example, TigerBeetle's storage | engine can theoretically address up to 100 TiB with less than | 1 GiB of RAM for the LSM in-memory manifest, which is | efficient. | | Because we think about memory so much upfront, in the design | phase, we tend not to waste it. | kristoff_it wrote: | I think it should be noted that TigerBeetle is an extremely | specialized database. It's not a general-purpose DB for storing | arbitrary data, all it stores is numeric financial account | information. | sh1mmer wrote: | Maybe I'm missing something but I would imagine query planning | and caching is the hardest thing in terms of database memory | usage. This post doesn't seem to address that. | | Even if the messages are fixed sized surely the memory cost of | executing the queries depends on the complexity of the query, and | the expected numbers of results returned. | | Are they just doing get/set style queries only? | jorangreef wrote: | Thanks! | | Joran from the TigerBeetle team here. | | TigerBeetle's storage engine is designed also for range | queries, and we have some interesting ideas for our query | engine in the works. | | To be sure, there are some tricky things that crop up, such as | pipeline blockers for queries. However, we do have limits on | literally everything, so that makes it easier--static | allocation is best when it's done viral. | | We wanted to go into so much more detail on all this but there | was only space for so much. Stay tuned. | pierrebai wrote: | One of the first claim of the article is plainly false. Static | allocation does not avoid use-after-free. It just avoids | accessing freed memory. You can still have bugs and access dead | data in those static arrays that are really dead object but your | buggy code still references and access them. | | Most benefits listed have nothing to do with memory allocations. | For example "having a client open too many connections" can be | limited by... limiting the number of open connections. Where and | how that check is done might differ, but it's not related to | memory allocations. | | There are BIG downfalls to static allocations: - | You cannot support varied workloads. If one workload needs a lot | of X and another needs a lot of Y, well, you already decided the | maximum numbers of X and Y and cannot re-balance it. | - If you run out of anything you thought was "sufficient" enough, | your programs just fails. | | And I know, because I've worked on program that did static | initialization at startup, and they failed in just that way. I | also did test the performance benefits: it was around 12%. So you | do get some performances out of it, but the added complexity of | having things fail out where in a normal program it would just | chug along, having to deal with these failures and the usual | have-to-roll your-own data structures or libraries is not worth | 12%. Because, remember, if you go out of your way to have static | allocations, that means all 3rd-party libraries have to do it | too, otherwise the benefits is wasted. Is that worth 12%? IMO, | no. | | I mean, sure, your program is easier to analyze statically, just | like if you gave up dynamic threads and instead just pin | different parts of your program to different CPU core. Sure that | solves some concurrency problems, but most programs don't do this | for obvious reasons. | AdamProut wrote: | I'm not sure why this is being downvoted? Dividing up memory | statically at startup/compile time has some nice benefits | described in the article, but it also makes the database less | flexible to changing workload requirements. For example, if a | workload wants to do a big hash join that needs 90% of the | memory at time X and then a big data load that needs most of | the memory in a completely different sub system at time Y - its | nice to be able to dynamically shift the memory around to match | the workload. Doing static up front allocations seems to block | this from happening, and that is maybe a fine trade-off for | TigerBeetleDB based on the types of workloads they're | optimizing for, but its definitely a valid dis-advantage of | their approach. | sakras wrote: | I like this a lot! I've been trying to find ways to avoid memory | allocation as much as possible in my projects as well. So far | I've settled on mmap-ing large slabs at a time and arena | allocating (Linux's mremap makes growing the allocation really | efficient too!). | | However, come to think of it just using a fixed amount is just as | good if not better, since I'm always going to be constrained by | memory in some way anyway. And if I allocate e.g. 80% of the | system's memory upfront, it won't _actually_ be mapped by the OS | until I touch it. There may be some interesting stuff with using | huge pages for the first few pages and use smaller and smaller | pages as you approach your memory limit. Definitely will give | this a try! | vbezhenar wrote: | > if I allocate e.g. 80% of the system's memory upfront, it | won't _actually_ be mapped by the OS until I touch it | | What about people running vm.overcommit_memory=2 | enedil wrote: | Yes, this approach is followed as well, but now you're at the | problem of writing a memory allocator by yourself (which might | of course turn out to be better for your usecase). So the issue | of writing code that avoids allocation still prevails, as now | you probably want to know what to allocate memory for, since if | you were supposed to make a general purpose allocator, it is | likely it won't overperform libc malloc. | sakras wrote: | I agree on some level, but I think if you're intentional with | your design you can get by with a very simple arena allocator | where an allocation is a single subtraction from a pointer. | | In my experience, most programs have a few long-lived data | structures that can be initialized up front, and then lots of | small data structures that are created and destroyed | frequently. If you do it right, you can put the long-lived | guys at the beginning of the arena and then just wiggle your | arena head pointer back and forth for the small guys. | | This does take some more thought and consideration, and is | definitely a bit trickier than just using malloc when you | need it, but for me the trade off is worth it (for one I gain | some amount of satisfaction from imagining my data all neatly | in a row). | [deleted] | [deleted] | hinkley wrote: | Listen to me now and hear me later: | | Static allocation doesn't eliminate use-after-free bugs, it just | reduces them. | | If you have mutable data structures, it's still always possible | to refer to something that used to have one meaning and now means | something else. That's still a use-after-free bug. | | I was going to say that it's similar to how GC can still have | memory leaks, but in fact it's the exact same problem: | reachability of data that is no longer relevant/correct. | slaymaker1907 wrote: | Yep, I was thinking along the same lines. What if the database | kept a reference to a connection for some timer or something | and the connection memory later gets used for a different | connection? GC languages don't have this problem because you | would just allocate a fresh connection each time and the | language itself guarantees that said physical memory is not | reused unless it is unreachable. | notacoward wrote: | Even resource leaks are still possible. Marking an object in a | static array as used is still allocation. Forget to mark it as | unused later and it still remains unavailable. Do that enough | times and you'll still run out. What this approach does is | eliminate _unconstrained_ memory use. Leak-related failures | will tend to surface earlier and more predictably, instead of | at some distant unpredictable time when the system can 't | satisfy your program's infinite appetite. And there are | resources besides memory. | | This approach is still useful, but it's no panacea. It can even | leave various kinds of capacity unused, or it can become a | tuning nightmare if you try to avoid that. It has long been | common in embedded (as others have pointed out) due to the | requirements and tradeoffs characteristic of that space, but | for general-purpose computation it might not be the tradeoff | you want to make when other techniques can solve the same | problem without some of the drawbacks. | anthomtb wrote: | Pedantically, you could say static allocations eliminate use- | after-free() bugs. | hinkley wrote: | use after free is really about the consequences of use-after- | reuse. The bug is always there, but you don't notice the | behavior until someone changes the data out from under you. | Until someone reuses the memory allocation, your app appears | to be working properly. | the_mitsuhiko wrote: | Based on my limited experience with using arenas and | preallocated memory I find debugging memory issues | significantly harder since it's super easy to accidentally | reuse allocations and only notice much later. | hinkley wrote: | The first pointer arithmetic bug I fixed, I only narrowed it | down once I realized that the function was getting garbage | data. Several others since have been the same way. This data | structure has a value that can't possibly be arrived at, | therefore this data is fake. | | When you point to real, dead objects, the bug can be anywhere | or everywhere. If you have a pattern of using impure | functions everywhere this can be a nightmare to identify and | fix at scale. | avgcorrection wrote: | Sounds like there should be some additional metadata memory | that is summed in as well (for the total allocation at | startup)? To check for such issues. | gamegoblin wrote: | At the risk of shilling for Big Rust (or at least any language | with ownership semantics, really) -- well-designed Rust code | can truly eliminate use-after-free. I was the lead engineer on | a project that wrote an entire filesystem from scratch in Rust, | and we did exactly this. 100K lines of code without any use- | after-free bugs in the whole development process. | | One of the first bits of code we wrote in the project was a | statically allocated pool of buffers such that the only way to | free buffers back to the pool is for all references to that | buffer being dropped. | mytherin wrote: | If anything, it might make use-after frees more painful. Now | the address sanitizer won't help you find them. | sillysaurusx wrote: | That jumped out to me too. The point of use after free is that | your program has a logic error. The logic error doesn't go away | just because it doesn't crash. | | There's a lot of wisdom in this post, but that made me wince. | It's a bit like saying "Our system doesn't crash when we | dereference a null pointer." Well, cool. What could possibly go | wrong. | | Still, I think it's best to just ignore that line and read the | rest of the post. It's an unfortunate distraction to the core | idea (which does have real value). For example: | | > here is how we calculate the amount of memory needed to store | messages for TigerBeetle's consensus protocol: <snip> | | > And if we get the calculation wrong? Well, since we enforce | static allocation, we can check these calculations with | assertions (i.e. that a free message is always available). If | we then get the static allocation calculation wrong, this can | be surfaced sooner through fuzzing, rather than having no | limits and eventual resource exhaustion (and cascading | failure!) in production. When combined with assertions, static | allocation is a force multiplier for fuzzing! | | Being able to put a hard upper bound on the amount of memory | your system can use is generally a sign of good design. The | alternatives are somewhat messy but still work. (The HN clone I | maintain https://www.laarc.io/ runs via a while-loop bash | script, because it dies via OOM every couple weeks. But it | works great, since every time it dies it just restarts.) | jorangreef wrote: | Thanks! | | Joran from the TigerBeetle team here. | | Appreciate your balanced comment. | | To be fair, we're certainly concerned about logic errors and | buffer bleeds. The philosophy in TigerBeetle is always to | downgrade a worse bug to a lesser. For example, if it's a | choice between correctness and liveness, we'll downgrade the | potential correctness bug to a crash. | | In the specific case of message buffer reuse here, our last | line of defense then is also TigerBeetle's assertions, hash | chains and checksums. These exhaustively check all function | pre/post-conditions, arguments, processing steps and return | values. The assertion-function ratio is then also tracked for | coverage, especially in critical sections like our consensus | or storage engine. | | So--apologies for the wince! I feel it too, this would | certainly be a nasty bug if it were to happen. | the_mitsuhiko wrote: | So you only reuse memory for objects with a checksum? | Buffer bleed is scary if exploitable (see heartbleed) and | I'm curious how you are protecting against it in practice. | jorangreef wrote: | It's defense-in-depth. | | We use what we have available, according to the context: | checksums, assertions, hash chains. You can't always use | every technique. But anything that can possibly be | verified online, we do. | | Buffer bleeds also terrify me. In fact, I worked on | static analysis tooling to detect zero day buffer bleed | exploits in the Zip file format [1]. | | However, to be clear, the heart of a bleed is a logic | error, and therefore even memory safe languages such as | JavaScript can be vulnerable. | | [1] https://news.ycombinator.com/item?id=31852389 | 0xbadcafebee wrote: | For the first 5 years that I wrote C, I never used dynamic | memory. (It was a hobby.) Later I found that dynamic memory | allowed me to perform more operations. By only allocating what I | needed, I could process more operations in the same amount of | memory. I tripled the amount of operations due to not needing to | restrict myself to an arbitrary size. | | The Kernel has a feature that takes advantage of this same | principle: memory overcommit. Programs are always asking for way | more memory than they actually need. The kernel lies to those | programs, and says, "yes, yes, I'll give you all that memory... | sure thing..." even if it already said that to every other | program. This allows more programs to continue operating since | they aren't really using all that memory. When the system _does_ | run out of memory, it can swap rarely used memory out to a swap | file; when that doesn 't work, the application can receive an | error, or the OOM killer can just kill the process. The result is | more programs doing more work, and an occasional app crash. | | Virtual machine hypervisors do a similar trick with memory | ballooning. If the guest OS has a balloon driver, the hypervisor | can overcommit memory for the VM guests, and you can provision | more VMs than there's actually memory for. | polio wrote: | It seems analogous to fractional reserve banking and the free | economic performance it enables by leaning on a mostly reliable | statistic about empirical peak concurrent usage of a shared | resource. | stephc_int13 wrote: | In some areas, especially hard real-time/embedded or safety | critical, dynamic memory allocation is strongly frowned upon, and | for good reasons. | | Even if your application does not have stringent | safety/predictability requirement, following the principles of | Data Oriented Design leads naturally to a radically different | approach to memory management. | | I am glad to see this kind of projects and design choices. | jorangreef wrote: | Thanks! | | Data Oriented Design runs like a river through TigerBeetle-- | it's always on our mind. | | By the way, have you seen Andrew Kelley's Handmade Seattle talk | on Practical DOD? [1] | | [1] https://vimeo.com/649009599 | stephc_int13 wrote: | Yes, I've seen this one, and a few others. | | I think DOD should be more like an engineering mindset than a | strict set of rules. | | From my experience it can be impractical/cumbersome, | especially when prototyping because DOD usually pays off in | the long run. | habibur wrote: | > TigerBeetle is written in Zig | | This is the key part. | | Zig memory allocation is something like bookmark, allocate and | then free everything after bookmark when done. Someone pushed it | to limit. | eatonphil wrote: | Interestingly enough, the first version of TigerBeetle (that | still followed much of the same design decisions) was written | in Node! :) | | https://github.com/tigerbeetledb/tigerbeetle/blob/main/docs/... | throwawaymaths wrote: | *can be. You can do more traditional memory patterns too in | zig. There is also an allocator in stdlib that directly wraps | C's malloc. Likewise, if you're writing a plugin for another | software or FFI for a VM language, you can wrap their | allocators (if they provide them). | commandlinefan wrote: | Without looking through the implementation, I suspect that the | consequence here is that it pre-allocates way more memory than it | needs and then does some custom dynamic allocation with that. | haggy wrote: | They explain pretty clearly that this is static memory | allocation. In other words, a calculated amount of (mostly) | contiguous memory is allocated at startup and used very | carefully, relying on disk to supply the dynamic nature of | database load semantics. | | https://www.geeksforgeeks.org/difference-between-static-allo... | francasso wrote: | Welcome back to the 70s! Jokes aside I think there are use cases | for this style of memory allocation even today (embedded systems | for example). I just have never felt its need in a DB (or rather, | I have many more desiderata from a DB than "I don't have to worry | about OOM"). | Rochus wrote: | > back to the 70s | | The sixties actually; that's how memory was managed in Fortran | from its inception until the eighties. | jorangreef wrote: | Good to be back--Joran from the TigerBeetle team here! | | Static allocation does make for some extremely hard guarantees | on p100 latency. For example, for a batch of 8191 queries, the | performance plot is like Y=10ms, i.e. a flat line. | | And memory doesn't increase as throughput increases--it's just | another flat line. | | I personally find it also to be a fun way of coding, everything | is explicit and limits are well-defined. | jokethrowaway wrote: | Cool article and an interesting concept. It's definitely good to | think out of the box - but I've rarely had to deal with a | database OOMing (I assume because they have strategies to write | to disk before that happens). | | There are plenty of strategies nowadays to avoid garbage | collection and use-after-frees (rust comes to mind), I'm not | convinced working with fixed amount of resources is the most | friendly to operators. | lewisjoe wrote: | Can somebody eli5 what this really means? Specifically, how is | code without dynamic allocation bring about predictability? | Arelius wrote: | In addition to the mentioned known maximum memory, it also has | useful performance characteristics. global memory allocators, | by their nature can create allocation patterns and layouts that | can be very sensitive thing like startup conditions, long- | runtimes, or exact inter-thread execution sequences. This can | cause items that in one run that get allocated near each other, | can be allocated very distantly, or chaotically in the heap in | another run, for instance. | | Now, given that modern CPUs rely heavily on their cache | lines/hierarchies, branch predictors, and prefetchers, changes | in layouts can have significant performance implications, | algorithms that run quickly when things are contiguous in | memory, can run orders of magnitudes slower on layouts that are | more random, and have more cache line miss. And those layouts | are often the result of long run-times in dynamically allocated | systems, due to accumulated heap fragmentation. | | Now, static allocation neither completely solves these | problems, nor is it the only solution, but it is the case that | the techniques that are used for static allocation, often have | tremendous benefit for cache layout, and combined with other | added benefits, like reduced OOM conditions, it's a powerful | constraint. | | Edit: Not to mention that the previously noted heap | fragmentation can also cause eventual failure to allocation | conditions, even without true memory exhaustion. | 0x5345414e wrote: | It brings predictability because know you won't run out of | memory. If everything is statically allocated, the total amount | of memory the application will use is known at compile time. | Very important for memory restricted systems. | 01100011 wrote: | Dynamic allocation can fail. Memory can become fragmented or | you could just exceed your budget. | tannhaeuser wrote: | What's so special about it? Static heap allocation (without | malloc) was the norm with COBOL + ISAM; early SQL databases used | eg ESQL/C with static or quasi-static allocation (by which I mean | at most one malloc per field for the entire process runtime). | Process-per-request was (and is) recommended practice if you | value stability and security above all when using languages that | don't have GC and/or have issues with memory fragmentation. | felipelalli wrote: | I love/hate the bad humor of HN. | qwertox wrote: | Just today I got presented an article in my "newsfeed" claiming | that coroutines in C++ are the hottest thing right now in | algorithmic trading, which I assume also relies on databases. | | It left me thinking because of the unpredictability in timing | which arises from its use, but then again, the article closes | with the sentence | | > Until then, it is important to note that no feature is given | support by C++ if it has not been fine-tuned and optimized and we | cannot expect to see total coroutine implementation until at | least C++26. Nonetheless, if you're an algorithmic trader or C++ | developer, watch this space and watch it very closely. | | https://www.efinancialcareers.com/news/2022/10/algorithmic-t... | valyagolev wrote: | > Restrict the length of function bodies to reduce the | probability of poorly structured code. We enforce a hard limit of | 70 lines per function. | | Reminds me of Carmack on the benefits of long functions: | http://number-none.com/blow/john_carmack_on_inlined_code.htm... | dktoao wrote: | "Clever" developer limitation like this almost invariably cause | developers to come up with even more convoluted solutions to | problems than needed. Don't disagree that long functions can be | an indication that someone was copy pasting a bunch of stuff | around in the codebase, but not always! | jorangreef wrote: | Joran from the TigerBeetle team here. | | The limit of 70 lines is actually a slight increase beyond | the 60 line limit imposed by NASA's Power of Ten Rules for | Safety Critical Software. | | In my experience, in every instance where we've refactored an | overlong function, the result has almost always been safer. | avgcorrection wrote: | 70 lines is generous compared to Clean Code dogma/opinion. | Gibbon1 wrote: | Obligatory Hank Hill. I'll tell you hwhat. I've seen some | poorly structured code with nothing but small functions. Like a | state machine implemented via chains of callbacks that could | all just live inside one big switch statement. ___________________________________________________________________ (page generated 2022-10-13 23:00 UTC)