[HN Gopher] MMU gang wars: the TLB drive-by shootdown ___________________________________________________________________ MMU gang wars: the TLB drive-by shootdown Author : matt_d Score : 70 points Date : 2020-05-17 19:19 UTC (3 hours ago) (HTM) web link (bitcharmer.blogspot.com) (TXT) w3m dump (bitcharmer.blogspot.com) | joe_the_user wrote: | _Every once in a while I get involuntarily dragged into heated | debates about whether reusing memory is better for performance | than freeing it._ | | I couldn't comment on all the instances the article talks about. | But this way of asking the question seems to me to hide the | problem. It seems simpler to say "what memory allocation | algorithm should you use?" Which is to say, "does your knowledge | of your application's memory needs and memory performance trump | all the effort and knowledge that went into creating the memory | allocator of the operating system you're using?". And so then you | get into the massive number of technical considerations the | article and others might raise. | | Memory allocation is a weird thing, it's an algorithm but it's | often taken as a given in programming languages and discussions | of algorithms. | [deleted] | bitcharmer wrote: | Hi, author here. I'm not sure if I follow your argument. This | article doesn't touch on allocators at all (ie. SLUB vs SLAB). | It focuses solely on the cost of _freeing_ memory which TLB- | shootdowns are a notable part of. | | I even mention it at the beginning: | | > Regardless of the method by which your program acquired | memory there are side effects of freeing/reclaiming it. This | post focuses on the impact of so called TLB-shootdowns. | | Hope this helps. | joe_the_user wrote: | As I understand things, allocating and freeing memory pretty | much forms a single system. Especially, if I have a system | where I "manually" allocate 10 meg for my use, never free it | but use an internal method to mark the memory free or used, I | will still have issues with caching and virtual memory based | on my use of the memory. IE, reusing memory effectively | creating a "roll your own" free and allocate functions. | | And in general, how contiguously you allocate memory plays a | big part in whether freed memory can be easily discarded from | the cache. If you get the heap to be exactly like a stack, | then the cache shouldn't have problems. But I'll admit I'm | not an expert and I could be missing something. | brandmeyer wrote: | This whole rigmarole is necessary for a single reason: TLBs don't | participate in the cache coherency system. | | Uh, why is that? If they did participate, then the mere act of | writing to the cache line(s) which change the mapping would | implicitly invalidate all of the associated entries in all of the | system's TLBs. (Handwave, handwave), maybe you still end up | needing a barrier similar to the instruction barrier needed when | altering the content of executable pages. | | What's the downside? Is it just power? Or is there something more | fundamental about the TLB structure that makes it impractical? | ip26 wrote: | I would love to hear from someone with the real answer but I've | always assumed it is legacy baggage from the old days. The | impact on performance has probably never been big enough to | compel the entire x86 ecosystem to make them coherent. | monocasa wrote: | I imagine it's because the hardware doesn't have enough | context to know which virtual address spaces exist in which | cores' TLBs. ASIDs were only added in the virtualization | instructions on x86, and I can't think of an OS where those | are the same namespace across cores. | | A side reason is that it'd probably heavily complicate the | top level TLBs to have "please flush yourself" coming from | anywhere other than the core they're attached to, and those | are in the critical path between L1 and L2. | | Totally spitballing here FWIW, I haven't been part of the | design process for a TLB. ___________________________________________________________________ (page generated 2020-05-17 23:00 UTC)