[HN Gopher] Memory Allocation ___________________________________________________________________ Memory Allocation Author : thecoppinger Score : 927 points Date : 2023-05-22 09:26 UTC (13 hours ago) (HTM) web link (samwho.dev) (TXT) w3m dump (samwho.dev) | kayo_20211030 wrote: | Thanks. Well written and presented. | junon wrote: | Seems to be a bug on the first interactive graph, at least for | me. Unless I'm misunderstanding the point of the graph, | `malloc(7)` only allocates 2 bytes. | ozfive wrote: | I came here to see if anyone else noticed this and am | confirming that there is a bug in the first slider on | malloc(7). Indeed it only allocates two bytes instead of seven. | samwho wrote: | Good spot! Thank you. Fix on its way out now. :) | ozfive wrote: | Thank you for building this! It's helped a lot for me to | wrap my head around the concept even more deeply than | before. | carlmr wrote: | True, it might be cut-off from screen? | samwho wrote: | Nah, I just failed at basic arithmetic :D | | I wrote this: <div class="memory" bytes="32"> | <malloc size="4" addr="0x0"></malloc> <malloc | size="5" addr="0x4"></malloc> <malloc size="6" | addr="0x9"></malloc> <malloc size="7" | addr="0xa"></malloc> <free addr="0x0"></free> | <free addr="0x4"></free> <free addr="0x9"></free> | <free addr="0xa"></free> </div> | | Instead of this: <div class="memory" | bytes="32"> <malloc size="4" addr="0x0"></malloc> | <malloc size="5" addr="0x4"></malloc> <malloc | size="6" addr="0x9"></malloc> <malloc size="7" | addr="0xf"></malloc> <free addr="0x0"></free> | <free addr="0x4"></free> <free addr="0x9"></free> | <free addr="0xf"></free> </div> | Mizoguchi wrote: | Love this. It is so important to know these fundamentals | concepts. I would like to see a similar version for SQL database | indexes as 99% of the engineers I work with have no idea how to | use them. | Dudester230518 wrote: | It's kind of not important though, except for a tiny group of | developers. | asnyder wrote: | If more took the time to EXPLAIN ANALYZE they'd see indexes in | action and hopefully learn from a few variations, especially | pre/post variations of indexes. Or so I'd hope :). | shepherdjerred wrote: | Oh, if only I had this in college! This is a fantastic | explanation. | xpuente wrote: | [1] is not very far from that. IMHO, [1] it is better. | | [1] https://pages.cs.wisc.edu/~remzi/OSTEP/vm-freespace.pdf | cromulent wrote: | Nice article, but highly editorialized title, not sure what is | intuitive about it. | kaba0 wrote: | Somewhat relevant about how GCs work, with also very great | visuals: https://spin.atomicobject.com/2014/09/03/visualizing- | garbage... | photochemsyn wrote: | Nice article on general-purpose memory allocation approaches. | While the article doesn't explicitly discuss it, these all seem | to be list-allocation mechanisms? | | > "List allocators used by malloc() and free() are, by necessity, | general purpose allocators that aren't optimized for any | particular memory allocation pattern... To understand, let's | examine commonly used list allocation algorithms: first-fit, | next-fit, best-fit and quick-fit." | | That's from an article on custom memory management (aimed at | embedded programming issues) I've found pretty useful, and it | picks up where this leaves off: | | https://www.embedded-software-engineering.de/dynamic-memory-... | | You can practice writing custom memory managers in an application | that runs on an operating system by only using the stack (just | create a big array of int etc. and call that your memory space): | | > "For the safety-critical tasks, the developer can replace the | standard allocator with a custom allocator that sets aside a | buffer for the exclusive use of that task, and satisfies all | memory allocation requests out of that buffer... The remainder of | this paper presents four examples of custom allocator algorithms: | the block allocator, stack allocator, bitmap allocator and | thread-local allocator. Custom memory managers can use any one of | these algorithms or a combination of different algorithms." | ModernMech wrote: | This is wonderful! I'm definitely going to be sharing this with | my students (college sophomores studying CS). | | If I were to make some suggestions, based on how I know they | would receive this: | | - I would make explicit reference to heap and stack. Students who | are learning this material are learning about the heap/stack | dichotomy, and I think it would really improve the exposition to | make clear that not all memory is allocated this way in a | program. | | - I would remove this line: "At the end of this post, you should | know everything you need to know to write your own allocator." I | can confidently say that my student will not be able to write an | allocator after reading this. It's nothing to do with your piece, | it's just the intersection of people who don't understand hex and | who could build an allocator after a short blog post is very very | small. Students who read this post and at the end still feel like | they can't build an allocator will end up discouraged, with a | feeling that maybe they are missing something. | | - Consider rethinking how you handle hex numbers throughout. You | introduce them and say they are distinguished by a preceding | "0x", but then you immediately omit the 0x to save space in the | figure. In my experience, students have _a lot_ of trouble with | understanding that hex and dec have the same underlying | representation. They will not be able to distinguish between hex | and dec bases implicitly, so from a pedagogical standpoint, it 's | better to be consistent throughout and include the prefix. | | - Finally at the end I would make mention that there are other | strategies for memory management to encourage further | exploration. Other languages do it differently, and for students | it's important to know which other avenues are out there. | Otherwise they might think heap-based malloc/free are the only | way to do things, the same way they often thing imperative | programming is the only way to program. | | Anyway, thank you for creating this, and I'm sure it will really | help my students. In a just world, "seeing" the memory like this | would ideally be first-class tooling for languages like C. | samwho wrote: | Really appreciate you taking the time to write this, thank you. | | I tried a couple of different ways to introduce the stack and | the heap but it always felt like it made the post too long and | complicated. In the end I decided to take a pure, idealistic | view of memory in order to focus on the algorithms used to pack | allocations effectively. You can see some of my abandoned | efforts as HTML comments in the post :D | | Introducing the 0x prefix and immediately dropping it hurt me | as well, but I didn't have a better way to make the | visualisation work on mobile. I completely agree with you that | it's not ideal. | | I'd like to do a post of this style about garbage collection at | some point. | wrycoder wrote: | Please do, this excellent post is already a good start on the | issues involved in creating compacting garbage collectors. | ModernMech wrote: | Maybe make this a series? I think another post just like this | on the stack is in order. You could show allocating stack | frames with the slider! Then you can put them both together | in a third post and show the entire memory layout of a C | program. | | Would definitely like to see more thoughts from those cute | corgis. | samwho wrote: | A few people suggested a series, but there's a human | problem: my attention jumps between topics quite a lot! At | the end of this post, and my load balancing post, my main | feeling was relief in that I could start looking into a new | topic. | | Maybe after more time has gone by I can revisit earlier | posts and do follow-ups :) | spatter wrote: | > _Others will return what 's called a "null pointer", a special | pointer that will crash your program if you try to read or write | the memory it points to._ | | This is not strictly true, it depends on the environment you're | using. Some older operating systems and some modern embedded | systems have memory mapped at the zero address. | devit wrote: | The article claims that an allocator that splits memory based on | allocation size is called a "buddy allocator". That's misleading: | an allocator that allocates an area for each size class is | usually called a "slab allocator", while a "buddy allocator" is | one that when needed subdivides a memory area with a power of two | size into two half-sized areas that are "buddies", does so | recursively to satisfy allocations, and coalesces them again when | they are free. | | E.g. the Linux kernel used (not sure if it's still like this) a | buddy allocator to allocate pages and power-of-two blocks of | pages and slab allocators to subdivide those pages and allocate | data structures. | | Another thing that the article doesn't mention that is important | is that most production allocators make use of thread-local | storage and either have per-thread caches of free blocks or | sometimes whole per-thread memory regions. This is to reduce lock | contention and provide memory that is more likely to be in the | current core's cache. | sylware wrote: | linux had a bitmap based "buddy allocator" (power of two), now | it is not bitmap based anymore (complexity not worth it | anymore, performance wise, namely simplicty was restored). | | Then linux has various slabs(slub/slob/slab), built on top of | the "buddy allocator". | | Userlevel code shoud use non virtual address stable mmap-ed | regions (slabs + offsets). Legacy "libc" services were built as | virtual address stable services... which are kind of expensive | to manage on a modern paginated system. Virtual address stable | regions should be kept to a minimum (that horrible ELF static | TLS). There is a workaround though (but linux overcommit | default policy could kill your process): a user process would | query the amount of ram on the system and mmap a region of | roughly (care of the overcommit policy) the same size, only | once per process life-time. Then you could have a virtual | address stable region which could use most if not all the | available ram (excluding hot-memory addition...). Should be | very easy to manage with lists. | samwho wrote: | You're absolutely right, I've corrected this. Thank you! | | I had originally written about threading and locality but it | made the post too long and complicated, so I cut it out for the | final draft. You can see remnants of it if you check the HTML | comments in the post :D | matheusmoreira wrote: | So good! I wish I had an article like this to learn from when I | implemented my language's memory allocator a few months ago! | Wonderful, thank you! | jamesgill wrote: | This is incredibly well done. I've never seen malloc/memory | allocation explained so clearly. I'd buy a book written like | this. | samwho wrote: | I have some bad news for you about books. | imtany wrote: | Could you elaborate? | ftxbro wrote: | > "There's no shortage of information about memory allocators on | the Internet, and if you've read this far you should be well- | placed to dive in to it. Join the discussion on Hacker News! | https://news.ycombinator.com/item?id=36029087 " | | Interesting to use hacker news as the blog's own comment section | in this way. | samwho wrote: | I've seen a few people doing it, seems to work well. | SV_BubbleTime wrote: | I am slightly embarrassed I didn't know overallocate was the term | for that. TIL! | cinntaile wrote: | Can you highlight the specific malloc() calls in the interactive | part? It confused me when it said malloc(5) because it looks | almost exactly like malloc(4). Specifically highlighting the | malloc(5) allocation would make that a bit clearer I think. | samwho wrote: | I completely agree with you on this, though I couldn't find a | great way to do it. It was suggested to me to put an outline | around the allocation or free that just happened, but I | struggled to get the box to look good when the allocation was | split across 2 lines. | | I've started writing my next post and have learnt a bit more | about PixiJS/GSAP3, and think I know a way to do it that would | work nicely but would require changing some of the underlying | code. I can't promise I'll revisit this soon, but I would like | to in future. | cinntaile wrote: | I understand. It's surprisingly difficult to do anything non- | trivial on the web. | sailorganymede wrote: | This is a great article. I also recommend K&R Chapter 8 if people | wanna know more. Do the exercises and it'll just click. | shreyshnaccount wrote: | Someone needs to make an awesome list for visual guides | ocay wrote: | Absolutely, this is so helpful | tpoacher wrote: | Great webpage, but it somehow left me more confused. | | It seems to imply that malloc/free works by boundary tag? Which I | don't think is the case? (and if not, it leaves the reader | confused as to how it then actually works). | | I know "some" languages use the tag technique (e.g. julia), but | the article seems to imply that this also applies to the c code | above? Or am I wrong and c also makes use of such a scheme when | you use pointer arithmetic?? (I don't see how that would work | with arrays if that's the case though) | samwho wrote: | I'm sorry you're more confused than you were when you started! | | The post is intending to talk about the topic of memory | allocation in a general sense. The way that malloc works on any | given system is implementation dependent, it's not possible to | say "malloc works in this way" because Debian can differ from | Arch can differ from BSD, etc. | | It's not my goal to tell you exactly how modern malloc/free | implementations work. I could probably be more clear about this | at the start of the post. | davidgrenier wrote: | The only thing that confused me is how it said we can know the | location of the block after and before by calculating: | address + <value at address> address - <value at | address-1> | | Shouldn't this be? address + <value at address> | + 3 address - <value at address-1> - 3 | samwho wrote: | Well shit. I think you're right. | davidgrenier wrote: | Well otherwise, I learned a lot and the basics are much | simpler than I expected, thank you for the article. | davidgrenier wrote: | Oh another thing, I'm not a fan of the premise: | | "As a general-purpose memory allocator, though, we can't get | away with having no free implementation." | | I have a belief that the future of software are short-lived | programs that never free memory. Programs allocate and | terminate. Short-lived program communicate with each other | via blocking CSP-style channels (see Reppy's Concurrent | Programming in ML). | | If you could also educate me on why this is a bad idea I | would appreciate. | colanderman wrote: | Even if the programs don't free memory, something has to | allocate and free memory for the programs and channels. | e_y_ wrote: | This would probably be something closer to actors | (https://en.wikipedia.org/wiki/Actor_model) rather than | programs since programs are traditionally implemented as OS | processes which are relatively expensive to spin up and | terminate. At some level, though, _somebody_ has to deal | with freeing the memory, and they may do it less | efficiently than you can. | junon wrote: | I agree with your point but disagree with your reasoning. I | think programs should always free memory at _some point_ | because then it 's easier to reason about debugging memory | leaks. | | Practically speaking though, there _are_ arena allocators | that do exactly this - you allocate a bunch of memory at | once, assign like-typed instances to "slots" in that | memory region, and then deallocate everything all at once. | Thus, the individual instance `free()` is a no-op. | saagarjha wrote: | It's not general purpose, and lots of programs that were | designed to be short-lived often end up not being so in the | future. People used to point at compilers as a typical | example of this kind of thing, well, now we have compilers | as libraries sitting resident in every popular developer | tool. | vidarh wrote: | My first large scale web application was a webmail service | built in C++ (!) where I early on decided we'd ditch | _nearly_ all freeing of memory, as it was running as a CGI, | and it was much faster to just let the OS free memory on | termination. The exception was for any particularly large | buffers. Coupled with statically linking it, it reduced the | overhead sufficiently that running it as a CGI performed | well enough to save us the massive pain of guaranteeing | sufficient isolation and ensuring we were free of memory | leaks. | | _Especially_ in a request /reply style environment, long | running application servers is largely a workaround for | high startup costs, and it's only a "bad idea" in the | instances where removing that high startup cost is too | difficult to be practical. Overall I love avoiding long | running programs. | planede wrote: | My take on this is that code should always match up malloc | and free, but your application may use an allocator where | free is noop, if that's appropriate for the application you | write. This way your code is more generic and can be reused | in an other application with different constraints. | | And as soon as you are replacing free, you can replace | malloc as well to be optimized for your use case. No need | to build difficult bookkeeping hierarchies when they will | never get used. | jandrese wrote: | So basically garbage collection via just terminating and | letting the OS handle it? | samwho wrote: | It's funny, I saw and retweeted this while writing this | post: | https://twitter.com/samwhoo/status/1650572915770036225?s=20 | | Not sure the future you describe is where we'll end up, | haven't given it a huge amount of thought. Would be | interesting to see, though. | | Things like web servers could probably get away with doing | some sort of arena allocation per request (I'd be surprised | if some don't already do this). | williamcotton wrote: | Apache does this! And I do this in my own C web | framework: | | https://github.com/williamcotton/express-c/blob/master/de | ps/... | dzaima wrote: | With a simple enough allocator, freeing things could maybe | even be beneficial even for short-lived programs, purely | from the memory being in cache already, instead of needing | to ask the OS for more (incl. page faulting & zeroing, | besides being limited by RAM throughput). For a buddy | allocator without coalescing, a free() that just adds the | argument to the corresponding freelist can be as simple as | 5 or so x86-64 instructions (the fast path of the allocator | being ~8 or so instructions; certainly more than a bump | allocator, but not by much, and the reuse benefits can | easily be pretty big). | ho_schi wrote: | This also trapped me. | robjwells wrote: | Love the dog. A bit like the "Cool Bear" asides in Amos's | articles (fasterthanli.me), a chance to pause & consolidate. | samwho wrote: | Don't tell Amos, but Cool Bear is the inspiration for Haskie, | my little husky companion. | bambax wrote: | Excellent, excellent article! I have a question though. | | > _Couldn 't we rearrange the memory to get a block of 6 | contiguous bytes? Some sort of defragmentation process?_ | | > _Sadly not. Remember earlier we talked about how the return | value of malloc is the address of a byte in memory? Moving | allocations won 't change the pointers we have already returned | from malloc. We would change the value those pointers are pointed | at, effectively breaking them. This is one of the downsides of | the malloc/free API._ | | But why not? Couldn't we store information about old pointers | somewhere and match them with new addresses when defragmenting? | Some kind of virtual memory driver that would map old pointers to | new adresses transparently for the programs? Or would it be too | much overhead for too little benefit? | aidenn0 wrote: | > But why not? Couldn't we store information about old pointers | somewhere and match them with new addresses when defragmenting? | Some kind of virtual memory driver that would map old pointers | to new adresses transparently for the programs? Or would it be | too much overhead for too little benefit? | | In languages where memory is managed for you, you can | absolutely do this, since the runtime can find every single | pointer to an object and rewrite it. | | Virtual memory _can_ let you do this, but would require a | separate page for each allocation (since virtual memory | operates on a page-level). Given that the smallest page on | modern architectures is 4k, this would mean using 4k of ram for | each allocation (and rounding up each allocation to a 4k page | boundary). | | On top of that, it's an OS system call to map and unmap pages, | which means you incur a system-call on every allocation and | deallocation, which is much slower than using a user-space | allocator. | alex7734 wrote: | To do that you either need a structure that you update every | time a pointer is created, copied, moved or deleted (too much | overhead), or you need a way to scan the entire memory and get | all the pointers. And at the point where you have a piece of | code that knows where every pointer is, you already know which | pointers aren't used anywhere anymore so it's a waste to not | have it also call free() for you. | | Once you have it call free() for you, your piece of code is now | a compacting GC, like Java's for example. | AshamedCaptain wrote: | Most OSes today do that "transparently" with virtual memory, | but usually with a coarse granularity (e.g. 4k). A page table | is just a translation of "pointers" to "memory addresses"; the | OS can rearrange physical memory as it sees fit without the | program having to update its pointers. | | In OSes without virtual memory, one option is to do the same | non-transparently: instead of returning pointers, malloc and | friends work with "pointers to pointers" (called handles), so | there is one extra level of indirection, and now the OS is free | to rearrange this 2nd level as it sees fit. Whenever a program | wants to read/write the data behind a handle, it must | dereference the handle to get to the real pointer, but it also | must let the OS know that it is currently using the real | pointer -- this is to avoid the OS moving it around. This is | usually called "locking/unlocking" the handle. | | Some classic examples are Windows 3.x, Mac OS (toolbox), | PalmOS, etc. | | https://en.wikipedia.org/wiki/Classic_Mac_OS_memory_manageme... | cconstantine wrote: | In a language like C that isn't really possible because the | language can't keep track of all of the places that memory | address is stored. | | If malloc were to return something like an address that holds | the address of memory allocated there is nothing preventing the | program from reading that address, doing math on it, and | storing it somewhere else. | kaba0 wrote: | Well, that's what some GCs do, and they do indeed defragment | the heap. | umanwizard wrote: | > Some kind of virtual memory driver that would map old | pointers to new adresses transparently for the programs | | You would need hardware support for this, since the hardware is | what decides what gets returned when a program attempts to read | from a memory location. | | Hardware already does support virtual memory but the | granularity is the page (which are a minimum of 4KiB in most | OSs). | samwho wrote: | You'd need cooperation between your malloc implementation and | the application code. It's possible, but tricky. If your malloc | returned a pointer to a pointer, and promised to keep the first | level pointer up to date, and was able to lock your application | whenever it moved things around, it could work. | | Someone else already mentioned, but garbage collected languages | do actually do this. Because they're fully in control of memory | (the language exposes no raw pointers), they're able to create | the layer of indirection you've suggested and move things | around as they please. I know at minimum the JVM does this. The | term to search for is "heap compaction." | | There's also the weird and wonderful work of Emery Berger et al | with their "Mesh" malloc implementation, which blows my mind: | https://youtu.be/XRAP3lBivYM. | eschneider wrote: | Early MacOS did this with handles (basically pointers to | pointers.) You'd lock them to read/write and when they were | unlocked, the OS was free to move the underlying memory and | change the pointer in the handle. | markus_zhang wrote: | Interesting. Is there a book that focuses on evolution of | allocators so we can follow along and code allocators of | different difficulties? | samwho wrote: | Good question! I'm not aware of one, but I also haven't looked. | | Most of the research for this post was done by reading papers | for various malloc implementations (phkmalloc, dlmalloc, | tcmalloc, mimalloc) and reading their source code. | dexter89_kp3 wrote: | this is awesome | fmiras wrote: | This is an example of why interactive blog post are awesome, | loved the illustrative sliders | winrid wrote: | I was having some fun recently just seeing how fast we can | allocate large chunks of memory. | | There's something refreshing about firing up a C (or maybe now | Zig, in my case) compiler and allocating a gigabyte of memory, | and seeing that your process is using exactly 1GB. | thecoppinger wrote: | All credit goes to my wonderful friend Sam Who: | https://twitter.com/samwhoo | | He recently published a similar guide on load balancing that is | equally intuitive and insightful: https://samwho.dev/load- | balancing/ | | I can't wait to see what he puts out next! | terrycody wrote: | https://news.ycombinator.com/item?id=35588797 | | We never missed good things lol! | fzeindl wrote: | Ahh yes. Back at university we had a class called efficient | programming where we had to implement a Unix utility and optimize | it for speed, meaning cpu cycles. | | While aggressively optimizing we replaced malloc in the end with | a function we called "cooloc", that traded portability and | security for speed. The debug tool here would have been useful. | thangalin wrote: | When writing C, I tend to avoid calling malloc and free directly. | | * https://github.com/DaveJarvis/mandelbrot/blob/master/memory.... | | I then apply this same principle of "opening" and "closing" | structures throughout the application. Generally, I can quickly | verify that the calls are balanced: | | * https://github.com/DaveJarvis/mandelbrot/blob/master/threads... | | What's nice about this pattern is that the underlying | implementation of exactly how the memory is maintained for a | particular data structure (e.g., whether malloc or | gdImageCreateTrueColor is called) becomes an implementation | detail: | | * https://github.com/DaveJarvis/mandelbrot/blob/master/image.c | | The main application opens then closes structures in the reverse | order: | | * https://github.com/DaveJarvis/mandelbrot/blob/master/main.c | | This means there's only one call to malloc and one call to free | throughout the entire application (third-party libraries | notwithstanding), allowing them to be swapped out with ease. | | Aside, logging can follow this same idea by restricting where any | text is written to the console to a single location in the code | base: | | * https://github.com/DaveJarvis/mandelbrot/blob/master/logging... | FrankyHollywood wrote: | This is a nice read too (the whole blog actually) | | http://igoro.com/archive/volatile-keyword-in-c-memory-model-... | | It's a bit old by now (2010), but I always remembered the mental | model Igor created. | bjt2n3904 wrote: | The asides with the dog are great. I've seen a few blogs use this | technique -- I'm not big on Greek, but I think Plato used this | technique too. | | It instills the idea that you should be asking a question at this | point. Some information has been provided that should generate | more questions in your head, if you're keeping pace. | samsquire wrote: | Thank you for this, this is helpful. | | I wrote a JIT compiler and I didn't bother calling free much, I | just let the operating system free up all allocated memory. | | I got into this situation often: return_struct = | do_something(mystruct); return_struct->inner_struct = | malloc(sizeof(struct my_inner_struct)); | | Now, who owns inner_struct? Who is responsible for freeing it? Do | I free it when I assign to it? | | I feel this ownership complicates cross-language FFI API calls, | because who is responsible for freeing structures depends on the | application and the platform you're running under. For example, | Rust code being called from Erlang. You have to be extra careful | when memory is managed by a different language runtime. | | I feel I am at the beginning of intuitively understanding what | memory really is: memory is just a huge contiguous region of | numbered locations. Bump allocators allocate in one direction and | free all at once. Arena allocators allocate to a preallocated | region, I think. | | Memory is a logistical problem of how you arrange and allocate | finite resources. | | I am thinking of alternative visualizations of understanding | memory, for example, I started writing an animation of binary | search: | | https://replit.com/@Chronological/ProgrammingRTS | | The idea is that you see values and memory locations move around | with the final goal being able to command memory to move around | and be computed, such as real time strategy game. | | I think if we could visualise memory as cars on a road, we would | see obvious traffic jams. | duped wrote: | > You have to be extra careful when memory is managed by a | different language runtime. | | While it would be _nice_ to have next to no overhead for FFI, | it 's not always tractable. That's why you have to serialize | across boundaries, the same as if you're serializing across | processes or the network. At least in a single virtual memory | space you can have a caller allocate a buffer and the callee | fill it, with the caller being responsible for deserializing | and freeing later. That gets you pretty far, and is relatively | safe. | | The alternative is to be callee managed, and for the callee to | return things by handle and not necessarily by pointer, but | that is also fraught. | dahart wrote: | > I feel I am at the beginning of intuitively understanding | what memory really is: memory is just a huge contiguous region | of numbered locations. | | There might be an analogy here that could help you reason about | your nested structure allocations... | | Memory is an array of bytes owned by the OS. While there are | all kinds of implementation details about addressing and | storage and performance and paging and virtual memory, it's | really just an array. The OS gives you a way to reserve pieces | of the array for your own use, and you're responsible for | giving them back if you want to play nice and/or run for a long | time, otherwise (as a safety net) the OS will take them back as | soon as you exit. | | This is, in a sense, very similar to the question you posed. An | outer routine owns the outer structure, and an inner routine | allocates some inner structure. The simplest, most intuitive, | and generally best advice is that whoever allocates is also | responsible for freeing memory. In other words, one way to | _define_ ownership of memory is by who allocates it. Implicitly | and automatically the responsibility to free that memory | belongs to owner that allocated it. It's okay to explicitly | transfer ownership, but can easily get complicated and | unintuitive. You can also consider letting the parent free your | struct to be similar to not calling free() in your JIT compiler | - it's a 'lazy' optimization to have the parent clean up - and | I don't mean that in a judgemental sense, I mean it's valid to | let the parent handle it, if you know that it will, and this | can be done without getting confused about who owns the memory | and who was actually responsible for freeing it. Note that when | you leave the parent to clean it up, you are foregoing the | ability to re-use the memory - this is true in your JIT | compiler and it's true for malloc() and free() as well. If you | let the OS handle it, you're in effect declaring that you | believe you do not need to recycle the memory allocated in your | program during it's execution. (This might be true, and it | might stay that way, but it's always worth asking if it will | remain true, since lots of people have been eventually bitten | and had to retroactively refactor for memory management when | their requirements change.) | samwho wrote: | Yeah, I hear you. I've not done a lot of FFI stuff directly, it | scares me. | | Arena allocators are cool, the idea is you allocate a large-ish | region of memory and sub-allocate into it (often with a fast, | simple allocator like a bump allocator) and then free the | large-ish block when you're done. It's a way to take knowing | how much memory you need as a whole and optimise that to a | single call to malloc/free. | | You may enjoy looking through https://www.cs.usfca.edu/~galles/ | visualization/Algorithms.ht.... | samsquire wrote: | Thanks for the link to the animations. | | I want an extremely performant deep copy solution, I've been | thinking of using an allocator to implement it. | | If we have a tree data structure or a nested hashmap, then we | want to copy it cheaply, there is copy on write. But most | copies of hashmaps are slow because they instantiate every | child object in a recursive loop. | | So I want to be able to memcpy a complicated data structure | for cheap copies. | secondcoming wrote: | > Now, who owns inner_struct? | | return_struct does since it is the only thing that knows the | address. | | > Who is responsible for freeing it? | | return_struct is, unless you hand that responsibility over to | something else. | | > Do I free it when I assign to it? | | Yes, unless you want leaks. | | > I think if we could visualise memory as cars on a road, we | would see obvious traffic jams. | | That visualisation is helpful for threads, where the program is | the road/map and the cars are the threads. I don't see how it's | useful for memory. | terrycody wrote: | Glad there are always talent people and great guide like him/this | exist! Great people and guides will help tremendous learners | along the way, timeless, priceless. | jxf wrote: | A great example of why I'll always read a post over watching a | video. | a257 wrote: | I think it will be helpful if it discusses internal | fragmentation, where the payload is smaller than the allocated | block. I observed that this was important in understanding the | purpose of various memory allocation algorithms when undertaking | the malloc lab in college. | patleeman wrote: | I love this so much, thank you for putting this together! | | My only piece of feedback would be for the "Inline Bookkeeping" | section (https://samwho.dev/memory-allocation/#inline- | bookkeeping), it took a while for me to grok the numbered list to | figure out which block corresponded to address + X. I wonder if | there is a better way to visualize the 4 numbered bullet points? | Maybe just arrows and text pointing to the visualization? | | Thanks again for this wonderful article! | samwho wrote: | Yes, this is one of the things I look back on as a weak point | in the writing. I wanted to make this a lot more clear but by | the time I'd gotten to this point in the article, I'd sort of | coded myself into a corner with the visualisation code. | | In another universe I'd hook in to the page scroll and | highlight each block being referred to as I talk about it in | the text. I'm probably not explaining that well here, but | imagine something like the way this article works: | https://pudding.cool/2017/05/song-repetition/. | wizzwizz4 wrote: | Couldn't you highlight each block being referred to as you | mouse over / click on relevant text? (The latter might not be | _terribly_ hard to do with <a> and CSS :target, if you can | give the blocks ids.) | samwho wrote: | The blocks aren't HTML elements, they're drawn on to a | canvas. It _could_ be possible regardless, though. I may | revisit this. Thanks for the idea! :) | jesselawson wrote: | Sam, this is such a wonderful resource that you've put out into | the world. Thank you for the time and care you've put into each | paragraph and interactive component. You've not only given me a | lot to think about in terms of my basic assumptions about memory, | but also about how to write and teach better online. I'm really | excited to start following you! | alberth wrote: | While I do like the article, I wish it simply used multiple | images and/or animated gifs (instead of javascript) for the | pictorials. | | It would make the site much more accessible and clear in the | event you didn't realize you had to click forward. | samwho wrote: | The accessibility of this technique is something that greatly | worries me. I try and be quite thoughtful about things like | colour palette. I'm not sure how to achieve the level of | interactivity I want while still being accessible to screen | readers, without writing two completely separate articles with | and without JavaScript. | | Definitely open to ideas. | celeritascelery wrote: | This is absolute gold. When I use things like this, I am reminded | how powerful digital learning can be. So much more capable then | just text or video. But very little content is this well put | together. Probably because it is so time intensive. | dmd wrote: | If you're not yet familiar with it - | https://ciechanow.ski/archives/ is for you (and everyone!) | samwho wrote: | The master of this artform. | throwaway689236 wrote: | Agreed, I wish I had more thing like this growing up. | samwho wrote: | This feedback has made my day. Thank you. | | I'm inspired by Bartosz Ciechanowski and Julia Evans. The web | is such a powerful toolbox. So many concepts are more complex | than text alone can hope to explain. Those two are so creative | and full of energy. | | And you're right, it's incredibly time intensive to put these | together. Thousands of lines of code, plus the text content, | plus reaching out to domain experts for reviews (shout out | Chris Down, kernel wizard extraordinaire). | samwho wrote: | Also shout out Anton Verinov (https://anton.codes/): the only | reason this web page doesn't drain your battery before you | get to the end of it. | rozularen wrote: | why is it mind you? | couchand wrote: | One more name I'dd add to this list is Mike Bostock. The care | and attention he gives to data visualization examples comes | through in the finished product. | | Communicating complex subjects through interactive visual | displays is very effective -- it's worth the effort. Thank | you! | naillo wrote: | Great job Sam | ww520 wrote: | The writing is very clear and the concepts are explained | well. Not too much information and not too little. Excellent | post. | samwho wrote: | Deja vu. :D | spenczar5 wrote: | Such great work. I learned things and have a clearer | understanding that I think I will come back to in the future. | And I say that as someone who has programmed for 15 years! I | think your effort paid off, and the inspiration shows | through. | alex7734 wrote: | > When we free memory, we should make sure that if the block we | return to the free list is next to any other free blocks, we | combine them together. This is called "coalescing." | | A little offtopic but the default Delphi 7 memory allocator did | this, except that it also merged blocks that it obtained from | different OS allocation calls. | | This worked fine for regular usage, but if that memory was ever | used for Bitmaps for UI stuff, it wouldn't work: Since Windows | does some of its UI stuff in kernel mode, before doing anything | Windows would attempt to lock the entire allocation's VAD entry | to prevent you from messing with it in another thread while it | was using it. If the Bitmap you were working on happened to | belong to two different OS-level allocations, this lock function | would fail since it wasn't meant to handle that case. | | This would lead to random, extremely cryptic errors such as | ERROR_NO_SCROLLBARS "The window does not have scroll bars." since | the lock call was deep in the stack and the callers replaced the | error with another one as it bubbled up. | | In the end we had to replace the allocator to avoid that issue. | That was a fun day I spent debugging that. | ilyt wrote: | I think the far weirder part of this was the kernel-side | handling of scrollbars | alex7734 wrote: | If I recall correctly the kernel part of things would return | an out of memory error which the user mode DLL translated to | that weird error (sometimes, other times it would just say | "out of system resources", it depended on what widget the | bitmap that overlapped the two memory regions belonged to). | | Here's a 2003 forum post from someone else having the same | problem: http://www.delphigroups.info/2/1/749064.html | derefr wrote: | Until Windows 95, Windows was essentially just a DOS | application that grabbed the framebuffer and ran an event | loop where it drew "controls" (which includes windows, | buttons, text views, and yes, scrollbars.) That was the whole | point of it. It wasn't an "OS" per se; DOS was the OS. | Windows was what a Linux-head would think of as a combination | of an X server and window manager. And Windows loaded your | "application" as essentially a DLL, with the Windows global | event loop calling into your application's event-loop | delegate handler (WndProc) whenever it has an interesting | event that your application might like to react to. | | (Your application wasn't even a "process" per se. Until | Windows 95, everything was just happening in one shared | address space, in real mode. In fact, it was only in Windows | 3.1 where user applications stopped running in ring 0!) | | If you think about it, this "the kernel is a game engine and | your application is the game" approach isn't necessarily a | _bad_ design... for a single-tasking OS 's library exokernel, | like the Wii's https://wiibrew.org/wiki/IOS. | | But, of course, Windows claimed to be a multitasking OS. But | it actually wasn't! And I don't mean the obvious thing about | it not having pre-emption. Lots of multitasking OSes didn't | have pre-emption. | | No, what I mean is that the concurrency primitive for the | cooperative scheduling wasn't the "task" (i.e. the process or | thread. Which, again, there weren't any of.) Instead, the | concurrency primitive was the _window_. Until Windows 95, | Windows was a multi- _windowing_ OS. | | Each control was owned by a window. Each window had a | WndProc. If your Windows executable (i.e. application | delegate module) set up two windows, then each window | participated separately in the global Windows event loop, up- | to-and-including things like having its own set of loaded | fonts, its own clipboard state, and its own _interned strings | table_. In OOP terms+, your application was just a dead | "class object", running no logic of its own save for one-time | load and unload hooks. It was the windows themselves that | were the "instances" of your class. | | This might make you realize why MDI (or Multiple Document | Interface, where there are multiple small per-document | "windows" inside one big window) was so popular back then. | The MDI "windows" weren't actually windows -- they didn't | have their own WndProc. They were just controls, like a tab | view is a control. Only the big container window was a real | window, and so all the _resources_ within that big window | were shared between all the virtual windows. MDI was a | memory-saving trick! | | --- | | + The actual more interesting analogy is that Windows was | essentially a (single-threaded, cooperatively-scheduled) | _actor system_ , where windows were the actors. There is a | very close parallel between (Windows 3.1 executables, Windows | 3.1 windows) and (Erlang modules, Erlang processes). | Stratoscope wrote: | > _This might make you realize why MDI (or Multiple | Document Interface, where there are multiple small per- | document "windows" inside one big window) was so popular | back then. The MDI "windows" weren't actually windows -- | they didn't have their own WndProc. They were just | controls, like a tab view is a control. Only the big | container window was a real window, and so all the | resources within that big window were shared between all | the virtual windows. MDI was a memory-saving trick!_ | | MDI may have saved some memory - I can't say one way or the | other on that - but the mechanism you describe is | incorrect. | | Every MDI child window was a window of its own with its own | WndProc. Every _control_ inside those windows was also a | window with its own WndProc. Every dialog box was also - | yes - a window with its own WndProc. | | You wouldn't always be aware of the WndProc in your code, | but it was there. | | If you ran WinSight or Spy++, you could see the entire | window hierarchy including all the child windows, child | control windows, and so on. | | Later on, a few applications implemented "windowless | controls" to save memory, but this was uncommon, especially | in the early days. For example, there was an optional | windowless version of the Rich Edit control: | | https://learn.microsoft.com/en- | us/windows/win32/controls/win... | | Fun fact: an early informal name for MDI was "Mac in a | box", because if you maximized the top level window, you | had a somewhat Mac-like environment, with one menu bar at | the top that was shared by the child windows. | | Source: I was the author of WinSight and the VBX (Visual | Basic eXtension) API. | AshamedCaptain wrote: | > Until Windows 95, everything was just happening in one | shared address space, in real mode. In fact, it was only in | Windows 3.1 where user applications stopped running in ring | 0! | | Windows 3.0 and predecessors runs in processor which had no | concept of "ring 0", so that should not be surprising at | all... | | > Your application wasn't even a "process" per se | | I think this is a bit of a "modernistic", "win32y" view of | the definition of a process. Surely there are | processes/tasks in 3.x -- you can launch multiple instances | of a module, each of them can allocate memory separately, | each of them have different resources/handles, and the OS | will cleanup for you once each instance terminates | (cleanly). Obviously, without any type of virtual address | space or memory protection, any such process can write to | and destroy the memory of any other process, but they are | still processes. The existence of Yield/DirectedYield , | which do not take/need a window, is also a hint of that. | (Note there are no threads in 3.x). | | Many platforms (that either predate VM or decide not to use | VM for e.g. power usage concerns) worked like this. MacOS, | Windows CE, PalmOS, etc. | meekaaku wrote: | You mean a bug this deep took one day to debug? | alex7734 wrote: | Yes and no, it took me from 8am to 3am once we decided it | needed to get fixed but really it sat on the app for years, | it only happened on a background process that sent print jobs | on a timer, since it used Windows GDI to compose the image we | sent to the printer it was affected (our "frontend" should've | been affected too but never was, I guess because it had a | different memory usage pattern). | | We just had it restart itself and try again whenever it got | one of those errors when printing but eventually we wanted to | add a feature that required the process to not die, and by | that time I was already 99% sure that it wasn't something in | our code and I had already ruled out threading issues. | | I ended up putting it in a VM with a kernel debugger attached | and having a script make a snapshot and make it print over | and over until it errored, then following along in IDA until | I saw what was going on. | | Having a way to trigger it (by restoring the snapshot) on | demand helped a lot, otherwise it would have taken forever to | make sense of it as it could sit without crashing for nearly | an hour. | zeusk wrote: | How do you attach kd to VM? | | While I was at MS, it was such a big PITA - we just had a | bunch of IT managed machines with KVM console access and | KDNET for debugging. | EvanAnderson wrote: | In Hyper-V it's fairly easy. You make a virtual serial | port ("COMPort"), set the bootloader to enable kernel | debugging over serial, then connect to the virtual serial | port from the host via a named pipe. | | https://learn.microsoft.com/en-us/windows- | hardware/drivers/d... | | I haven't tried it with vSphere but I suspect it'd be | similar. | bee_rider wrote: | Owch, I'm guessing that wasn't -5 hours of debugging. | duped wrote: | It's somewhat incomplete without a discussion of how one actually | gets the memory allocate to a program. | samwho wrote: | Could you elaborate on what you mean? | [deleted] | duped wrote: | It talks about how a simple malloc implementation would doll | out entries in a buffer of memory and manage free lists, but | not how the implementation actually gets that buffer from the | operating system, or return it to the system when done (mmap, | munmap, for example). | samwho wrote: | Ahh yes, I'm with you now. | | I made a deliberate choice to not include that, mostly due | to post size/complexity reasons. I decided that the focus | of the piece was going to be on the task of effectively | managing the memory you have in the face of calls to | malloc/free. I decided against talking about any | environment-specific information (brk, mmap, cache | locality, multithreading, etc.) | | May not have been the right call, but it kept the length | fairly reasonable and I think it gives people enough of a | nudge to dip their toes into the playground if they have | time. | | If you check the HTML comments on the page you'll find | quite a lot of cut content. I wanted to do deep dives on | real-world malloc implementations, but it ended up being | very difficult to cover without making the post take over | an hour to read. | OliverJones wrote: | Oh man, this brings back the days when I wrote special debug- | version malloc and free code to try to track down heap corruption | due to malloc / free misuse (in code I had contributed to). Stuff | like kbyte-long boundary buffers with bit patterns in them, and | all sorts of lookaside lists run in parallel with libc's default | code. Those bug-detectors worked OK. Hard-nosed code inspection | worked far better. | | As an old-timer in writing code, I think my generation's most- | challenging legacies (=== the things we f**ked up) are the non- | robust malloc/free concept and null-terminated text strings. Bugs | in code using those conventions have been exploitable to a really | damaging extent. I learned to program in C from K&R. And getting | data-structure code right, and safe to deploy, in that language | and its derivatives is HARD. | | The C inventors are (were in Dennis Ritchie's case) brilliant and | Bell Labs was great. Their ideas shaped the the stuff the global | internet runs on. But these parts of what thy invented ..... | ouch. (All OSs had the same problems.) | | I wish the wonderful article presented here carried a more | prominent warning about this. Many will read it as they learn to | code. The history of our profession can teach about what NOT to | do as well as what to do. | junon wrote: | This is really, really well done. Also, the allocator | playground[0] is really cool. Will be my go-to when teaching this | topic moving forward :) | | [0] https://samwho.dev/allocator-playground/ | samwho wrote: | Thanks so much, I really appreciate it. | | I'm glad you like the playground. If you don't mind me asking, | what/where/how do you teach? I was actually hoping to get the | attention of educators with this tool to see if it would make | sense in, e.g., undergrad CS courses. | junon wrote: | Just friends and colleagues, nothing formal. I wish you the | best of luck with it, though! | digitalianz wrote: | [dead] | CliffStoll wrote: | A delightful page, written in a wonderfully interactive way. | | My high congratulations for creating a most friendly, readable | and useful lesson! | RamiAwar wrote: | I love this! I wish it existed back when I was writing my first | memory allocators in university or when building a custom | EntityComponentSystem implementation. | | I'd love to also see applications of custom memory allocations. I | know about usecases in building game engines and the importance | of hitting cache there, but I'm not sure where else in the world | this would be as useful. | erwincoumans wrote: | Robotics, embedded systems, CUDA/gpgpu, AI/Deep | learning/Reinforcement Learning/LLM (PyTorch) for example. | delta_p_delta_x wrote: | I really like the website design, above all. I'd love to pen my | thoughts online, but haven't had the energy to learn static site | generation... | N-Krause wrote: | It even has a playground! https://samwho.dev/allocator- | playground/ | | How I wish I had something like that when I first learned C. | samwho wrote: | The playground was the inspiration for the post. I always | wanted to be able to fiddle with malloc implementations and see | how they work in this way. | | Admittedly the playground is hard to understand at first, and a | touch janky in places. But the workloads are taken from for- | real malloc/free traces on programs! | N-Krause wrote: | When you think that I (and probably the vast majority of | developers) used a pen and a paper for the first few years | every time I tried to visualize more complex memory, then | that's a big upgrade. | | Especially because you can scroll through all the steps. ___________________________________________________________________ (page generated 2023-05-22 23:00 UTC)