hngopher.com

       [HN Gopher] MMU gang wars: the TLB drive-by shootdown
       ___________________________________________________________________
        
       MMU gang wars: the TLB drive-by shootdown
        
       Author : matt_d
       Score  : 70 points
       Date   : 2020-05-17 19:19 UTC (3 hours ago)
        
 (HTM) web link (bitcharmer.blogspot.com)
 (TXT) w3m dump (bitcharmer.blogspot.com)
        
       | joe_the_user wrote:
       | _Every once in a while I get involuntarily dragged into heated
       | debates about whether reusing memory is better for performance
       | than freeing it._
       | 
       | I couldn't comment on all the instances the article talks about.
       | But this way of asking the question seems to me to hide the
       | problem. It seems simpler to say "what memory allocation
       | algorithm should you use?" Which is to say, "does your knowledge
       | of your application's memory needs and memory performance trump
       | all the effort and knowledge that went into creating the memory
       | allocator of the operating system you're using?". And so then you
       | get into the massive number of technical considerations the
       | article and others might raise.
       | 
       | Memory allocation is a weird thing, it's an algorithm but it's
       | often taken as a given in programming languages and discussions
       | of algorithms.
        
         | [deleted]
        
         | bitcharmer wrote:
         | Hi, author here. I'm not sure if I follow your argument. This
         | article doesn't touch on allocators at all (ie. SLUB vs SLAB).
         | It focuses solely on the cost of _freeing_ memory which TLB-
         | shootdowns are a notable part of.
         | 
         | I even mention it at the beginning:
         | 
         | > Regardless of the method by which your program acquired
         | memory there are side effects of freeing/reclaiming it. This
         | post focuses on the impact of so called TLB-shootdowns.
         | 
         | Hope this helps.
        
           | joe_the_user wrote:
           | As I understand things, allocating and freeing memory pretty
           | much forms a single system. Especially, if I have a system
           | where I "manually" allocate 10 meg for my use, never free it
           | but use an internal method to mark the memory free or used, I
           | will still have issues with caching and virtual memory based
           | on my use of the memory. IE, reusing memory effectively
           | creating a "roll your own" free and allocate functions.
           | 
           | And in general, how contiguously you allocate memory plays a
           | big part in whether freed memory can be easily discarded from
           | the cache. If you get the heap to be exactly like a stack,
           | then the cache shouldn't have problems. But I'll admit I'm
           | not an expert and I could be missing something.
        
       | brandmeyer wrote:
       | This whole rigmarole is necessary for a single reason: TLBs don't
       | participate in the cache coherency system.
       | 
       | Uh, why is that? If they did participate, then the mere act of
       | writing to the cache line(s) which change the mapping would
       | implicitly invalidate all of the associated entries in all of the
       | system's TLBs. (Handwave, handwave), maybe you still end up
       | needing a barrier similar to the instruction barrier needed when
       | altering the content of executable pages.
       | 
       | What's the downside? Is it just power? Or is there something more
       | fundamental about the TLB structure that makes it impractical?
        
         | ip26 wrote:
         | I would love to hear from someone with the real answer but I've
         | always assumed it is legacy baggage from the old days. The
         | impact on performance has probably never been big enough to
         | compel the entire x86 ecosystem to make them coherent.
        
           | monocasa wrote:
           | I imagine it's because the hardware doesn't have enough
           | context to know which virtual address spaces exist in which
           | cores' TLBs. ASIDs were only added in the virtualization
           | instructions on x86, and I can't think of an OS where those
           | are the same namespace across cores.
           | 
           | A side reason is that it'd probably heavily complicate the
           | top level TLBs to have "please flush yourself" coming from
           | anywhere other than the core they're attached to, and those
           | are in the critical path between L1 and L2.
           | 
           | Totally spitballing here FWIW, I haven't been part of the
           | design process for a TLB.
        
       ___________________________________________________________________
       (page generated 2020-05-17 23:00 UTC)