[HN Gopher] Optimizations can have unexpectedly large effects wh...
       ___________________________________________________________________
        
       Optimizations can have unexpectedly large effects when combined
       with caches
        
       Author : zdw
       Score  : 168 points
       Date   : 2022-11-14 16:30 UTC (6 hours ago)
        
 (HTM) web link (justinblank.com)
 (TXT) w3m dump (justinblank.com)
        
       | nerpderp82 wrote:
       | We really don't know what causes perf differences, layout has
       | more of an effect than -O1 to -O3.
       | 
       | "Performance Matters" by Emery Berger
       | https://www.youtube.com/watch?v=r-TLSBdHe1A
        
         | tmtvl wrote:
         | I wonder if he mentions running from a different directory and
         | running as a different user to highlight how annoying the
         | effect from the environment is, as USER and PWD are part of the
         | environment.
        
       | lob_it wrote:
       | Caching db queries and media assets already got a big boost a
       | decade ago going from spinning rust to ssd to even faster NVME.
       | 
       | Optimizing DB queries on spinning rust was an exciting experience
       | though. Chasing milliseconds with caching and warming caches up
       | still brings back good memories :)
       | 
       | Scale and architecture makes new designs much easier. 1Mb of
       | code... A 100Mb database and 1Gb of media assets is becoming a
       | relic like "big iron". Blank slate canvases should skip a lot of
       | it.
       | 
       | Technology debt only gets harder to manage over time.
        
       | ISL wrote:
       | This happens in life, too. When your counterparts lose continuity
       | with your work, they turn to other work and the context-switch to
       | return to working with you becomes increasingly hard.
       | 
       | Helps to be quick.
        
       | joosters wrote:
       | "Almost all programming can be viewed as an exercise in caching."
        
       | hinkley wrote:
       | About 15 years ago I was on a team that was having so many
       | problems with caching that I began to see it for the liability
       | that it is. It started with a 2-4 week delay each release cycle
       | while they fixed caching bugs the testers had found, but once I
       | joined the project I saw how deep the problem went.
       | 
       | A number of their features invited a full table scan, which is
       | fundamentally incompatible with an LRU cache, and in particular
       | has disastrous stair step functions as the working set grows.
       | Note that this is also a couple of years before anyone had heard
       | of memcached so these were DIY implementations with multiple
       | potential sources of bugs.
       | 
       | About the time I found my feet on the project someone wanted to
       | add caching to yet another workflow that was taking 30 seconds to
       | run. It was an admin workflow so there was a little tolerance of
       | slow response times, but not 30 seconds. I volunteered for the
       | story because I thought we could do better than more caches.
       | 
       | Through conversation, step debugging, and profiling I found the
       | spot where they had wanted the caching. What the feature was
       | doing was two relatively expensive lookups of the same data,
       | filtering it on two very different dimensions, and returning the
       | intersection of the two. In typical ORM fashion, both of the
       | concerns were taking it upon themselves to fetch their own data
       | instead of having it handed to them. Since they were both called
       | from the same function, I hoisted the lookup out of the callee
       | into the caller, sending the same list to both. Then I changed
       | the code again to feed the result of call 1 to call 2, to make it
       | work more like consecutive filters.
       | 
       | The end result was that the page now loaded in 3 seconds, and I
       | was able to write better unit tests for this code. One might
       | expect that most of that improvement came from running the second
       | filter on a much smaller list, but that's where you'd be wrong.
       | The majority of that improvement came from the hoisting, not the
       | chaining. Due to the amount of work in the filters, the initial
       | lookup shouldn't have accounted for more than a third of the page
       | load time, but just sharing the same data dropped the response
       | time to around 6 seconds. Chaining the filters ended up only
       | saving 3 seconds.
       | 
       | The problem with large data sets is that CPUs have caches too.
       | And in a garbage collected programming language, there are
       | additional costs to carrying around duplicated data, particularly
       | if it is in tree form like this was. Most likely we were smashing
       | the hell out of CPU caches and introducing extra GC pressure to
       | boot. By passing in the same object tree the second call followed
       | warm and/or hot paths through heap memory.
       | 
       | I'd had a similar but less dramatic experiences a few years
       | before that, where I identified that a function, which accounted
       | for 10% of the runtime, was being invoked roughly twice as often
       | as the problem required. By that point 5% looked like low hanging
       | fruit, so I eliminated the duplicate calls and reran the
       | benchmarks. But the results didn't make sense so I tweaked and
       | prodded and fiddled. The new time was 20% lower than the old
       | time, for what should have been 5%. This is already way too long
       | so I won't go into the reasons here, but in addition to the
       | scenarios I mentioned for the previous issue, there's also
       | problems of timing accuracy in profilers, and cyclic use patterns
       | that can misattribute the expense of a call to siblings or the
       | parent. A function that creates many objects may never reach the
       | GC threshold, but a large allocation later in the sequence
       | diagram may. Or collide with cache lines that are needed again
       | later in the call graph.
        
         | bawolff wrote:
         | I feel like the moral of this story is less "caching is a
         | liability" and more garbage in garbage out. Caching can help
         | things when used appropriately, but its not magic.
        
           | nerpderp82 wrote:
           | Your performance topology is dictated by multiple hierarchies
           | of caches, many of which you might not be aware of. If they
           | disappear your application may no longer function. The cache
           | is no longer and optimization but a critical component.
           | Figuring out how to _fill_ it might be harder than how to
           | invalidate it.
        
             | hinkley wrote:
             | > Figuring out how to fill it might be harder than how to
             | invalidate it.
             | 
             | This always seems to be a problem for someone else or for
             | another day, which is part of the frustration.
             | 
             | Cold caches killing your app are a considerable problem,
             | doubly so if you deploy in the Cloud because now you aren't
             | necessarily guaranteed to be deploying into a hot data
             | center. I think there's an unspoken assumption we haven't
             | really eliminated that was if you came to work and the
             | power was out, you not only expected none of the software
             | to work, but you peer pressured other people into not
             | complaining too loudly about it when the power came back
             | on. Today is a loss. Everyone can see it.
             | 
             | But for instance if I work on an SaaS project in the Cloud,
             | not only can I just lose a region, but for many problem
             | domains my customers have mostly local customers, then I
             | have diurnal cycles of traffic that are customer specific.
             | All the non-bot traffic is in New York until 6am GMT, say.I
             | might want to spin regions down pretty aggressively during
             | parts of the day, but if spinning back up results in a
             | thundering herd problem because 'the cache makes us fast'?
             | Well then the value of my circuit breaker alerts drops to
             | less than zero.
             | 
             | That's just the first time things break. Caches and
             | capacity planning are also at odds, because capacity
             | planning is about planning your worst case and caches are
             | almost always about the happy path.
        
           | hinkley wrote:
           | Silver bullets kill everyone, not just werewolves.
           | 
           | Garbage In, Garbage Out is not wrong per se, but we have in
           | the real world the concept of an attractive nuisance, where
           | the victim is not entirely at fault for their injuries. The
           | thing, being prone to invite misuse and misadventure, bears
           | some responsibility for protecting people from themselves.
           | 
           | Caching is an attractive nuisance. These days it's simply
           | replaced Global Shared State which used to be the boogeyman,
           | but sadly we haven't collectively clued in on them being
           | equivalent.
        
             | bawolff wrote:
             | In this situation it was stuff that would be terrible both
             | with and without caching, that didn't have much to do with
             | caching itself, just the caching just provided a bit of a
             | bandaid. I think the more apt metaphor might be "seatbelts
             | make people drive recklessly".
             | 
             | Would fast computers also be an attractive nuiscense
             | because you dont have to think about performance as much?
        
               | hinkley wrote:
               | > Would fast computers also be an attractive nuisance
               | because you don't have to think about performance as
               | much?
               | 
               | Sometime in the last week or two someone asserted just
               | that. I didn't dig into that discussion very far but on
               | the face of it? Sure. Certainly grabbing the fastest
               | hardware available to Man is an expensive proposition and
               | you should hold that in reserve. You don't get to play
               | that card very often, and much less so today than during
               | the Golden Era. Post Moore's Law for sure, but to an
               | extend post-Dennard as well, since power draw of a
               | facility is a significant cost center. And without
               | Dennard you can make them more powerful but keeping them
               | efficient gets expensive.
        
       | dathinab wrote:
       | Artickle: speed up lead to avoiding cache misses which lead to
       | further speed up which lead to further less cache misses
       | 
       | This kind of interaction is important to keep in mind.
       | 
       | It applies to all kinds of caches in all kinds of situations. To
       | some degree even the instruction and data caches of modern cpus.
       | 
       | But most interesting is that in complex systems in (very very)
       | rare cases the "whole system effect" can go the other way around
       | too (due to a now faster components trashing the cache for other
       | components in a way which makes the system as a whole slower).
        
       | RicoElectrico wrote:
       | What kind of batch process could that be?
        
         | mananaysiempre wrote:
         | Judging by the off-hand mentions in the article, something
         | along the lines of ingesting the cargo manifest of a large
         | ship?
        
       | mjb wrote:
       | I really love this point.
       | 
       | For most applications the optimal size of a (time-based) cache is
       | around the point that keeping an item around in the cache is
       | equal to the cost of accessing it again. This is a point that Jim
       | Gray made in the classic "The 5 Minute Rule for Trading Memory
       | for Disc Accesses"
       | (http://www.hpl.hp.com/techreports/tandem/TR-86.1.pdf) back in
       | 1986.
       | 
       | A side effect of that is that systems that run faster, and
       | therefore access each cached it more often, get more benefit out
       | of a cache by pushing that set point out further. Whether that
       | happens in any given system depends a lot on the access patterns,
       | and especially temporal locality within those access patterns,
       | but the effect is real in a lot of different kinds of systems.
       | 
       | > It's common wisdom that systems with caches can fail badly when
       | the caches are cold. When every request misses the cache, the
       | system is overloaded, and never recovers. What I observed was the
       | mirror image.
       | 
       | The bottom line is that both locality and memoization are huge
       | contributor to the performance of systems, both positive and
       | negative. When folks (including me) point out the risks of
       | caches, we're not implying that caches are always bad. Instead,
       | it's pointing out a risk that sometimes they are too _good_ ,
       | leading to a system that can't function when the normal level of
       | locality and memoization are not available.
        
         | bombcar wrote:
         | Reminds me of:                   There are only two hard things
         | in Computer Science: cache invalidation, and naming things, and
         | off-by-one errors.
         | 
         | What people don't realize is caches are much harder than just
         | invalidation.
         | 
         | I'm a fan of continually testing (and running) things without
         | cache at all just to verify what they do, caching can often
         | cover a multitude of sins that you really don't need to be
         | doing.
        
           | hinkley wrote:
           | Caches are global shared state. Correction: _lossy_ global
           | shared state.
           | 
           | 'Naming things' when they are in a global namespace is really
           | hard, to the point where I'm starting to thing that cache
           | invalidation and naming things are not two problems but the
           | same problem in two manifestations.
        
             | kwhitefoot wrote:
             | > Caches are global shared state
             | 
             | Not necessarily. I designed an object oriented simulation
             | system which had a lot of calculated properties where the
             | calculation was a substantial amount of work but the result
             | would be queried many times before the input the function
             | changed. Caching the results inside the object provided a
             | substantial speed up (orders of magnitude) while keeping
             | the cache completely local and hidden from the client code
             | and keeping the program very modular.
             | 
             | Keeping the cache inside the objects allowed us to keep the
             | parts of the program neatly separated and minimized the
             | amount of knowledge that client code, and more importantly
             | client coders, needed to have about the objects they were
             | using.
        
               | bmm6o wrote:
               | If we're splitting hairs, your situation sounds more like
               | memoization, where you have saved the output of an
               | expensive but deterministic calculation. With the correct
               | assumptions, this can never result in incorrect behavior.
               | Worst case is excessive memory consumption. Whereas a
               | stale cache is something that needs to be taken into
               | account.
        
               | hinkley wrote:
               | What happens when the system of record changes the answer
               | halfway through the transaction?
               | 
               | By caching answers to calculations you've moved the
               | source of truth from the function to the cache. But the
               | cache is also acting as a proxy for the system of record,
               | which is a conflict of interests. That's substantially
               | part of the problem with cache invalidation.
               | 
               | ETA: There's also a common category error here that I see
               | with people who grossly misunderstand DP as 'caching'.
               | First that they think that caching and memoization are
               | the same thing, which they emphatically are not, and
               | second that DP is about memoization, which means they
               | listened for the first ten minutes and then fell asleep.
               | 
               | Tabulation is much more powerful than memoization, so
               | equating DP with memoization is impoverishing it and
               | yourself.
               | 
               | But importantly here, the distinction between memoization
               | and caching is that memoization is scoped while caching
               | is global. With memoization a single call tree gets the
               | same answers _every time_ , which means the mental
               | process of Local Reasoning is maintained. If you know the
               | call graph for this operation you know everything that
               | interacts with it. You know if 'eviction' is even a
               | thing. You also know that your unit tests are testing the
               | code as it will behave under load, not some fictional
               | happy path.
               | 
               | tl;dr: If your calculation is important it should show up
               | in your data flow, not arrive out of band from a magic
               | oracle.
        
         | hinkley wrote:
         | Caching is a weapon of last resort. In most cases it is the
         | last major performance improvement you will see from your code,
         | even if other options were available, and any information
         | architecture problems you have are set in stone or will get
         | worse.
         | 
         | Because of the locality aspect, people will assume that it's
         | 'free' to look up a value multiple times instead of passing
         | these data dependencies between the relevant methods. This
         | turns your cache into global shared state, and once that
         | happens you can't turn the cache off again. There are other
         | problems with this arrangement as well.
         | 
         | The easiest to understand (and yet I've failed on at least one
         | project) is that if the working set for a problem ever becomes
         | proportional to the size of the cache (>1 for single task
         | systems, > 1/n for systems with n concurrent tasks), then you
         | can end up evicting data in the middle of a transaction. The
         | obvious consequence is that it causes performance to jump off a
         | cliff. Particularly bad behavior with LRUs but switching to a
         | different statistical model simply reduces the height of the
         | cliff. There is still a cliff. The less obvious consequence is
         | now you have concurrency bugs, because the code assumes that
         | decisions made about the data will go the same way on each
         | call, but due to eviction, two halves of the transaction may
         | make conflicting decisions with undefined behavior.
         | 
         | But speaking as the guy who fixes the perf issues other people
         | gave up on: Once you introduce caching you poison the
         | performance well. Flame charts lie when cache is used, and
         | after people rely on the cache it is lying in the other
         | direction when the cache is off, because you're calling that
         | path 5 times now and it's swamping other bottlenecks. This is
         | what I really mean when I say it's a weapon of last resort.
         | Because once you use it, it attempts to take any other options
         | off the table. Caching will be most of the last order of
         | magnitude in scalability that your system ever sees. Every
         | 'win' from there on will take more time and energy than most
         | people are willing to invest.
        
           | brightball wrote:
           | I like applying the Circuit Breaker approach to caching.
           | Rather than applying a cache everywhere, setup a cache
           | wrapper that only triggers if the accessed resource is slower
           | than a certain threshold.
           | 
           | The goal is for the cache to go unused most of the time, but
           | during times of unexpected system stress the caches will kick
           | in, provide relief to the system and reduce impact to users.
        
             | zasdffaa wrote:
             | > The goal is for the cache to go unused most of the time
             | 
             | Then you have a useful cache sitting there unused unless at
             | peak time - why? Just enable it always. This makes no
             | sense.
        
               | brightball wrote:
               | Depends on how large the cache is and how much users
               | expect instant updates to their data. If real-time or
               | near real-time is the expectation, then avoiding the
               | cache usage unless necessary preserves their experience
               | without incurring the hurdles of cache invalidation that
               | grow more complex the more you cache.
               | 
               | EDIT:
               | 
               | Another area where this approach is beneficial is with a
               | multi-tenant system where larger tenants querying more
               | data could have a much more significant performance
               | impact than other tenants on the system.
               | 
               | Time of day/week/month usage spikes can also cause a
               | performance degrade for everyone even if it's not
               | typically stressing the system.
               | 
               | This approach is just a mitigation to prevent the system
               | from getting overload during those times. It's not the
               | right approach for every situation. Just a spin on the
               | Circuit Breaker Pattern.
               | 
               | https://en.wikipedia.org/wiki/Circuit_breaker_design_patt
               | ern
        
               | zasdffaa wrote:
               | WTF?
               | 
               | > If real-time or near real-time is the expectation, then
               | avoiding the cache usage
               | 
               | Because looking up a precalculated result in a hash table
               | is so much slower than not looking it up? And what,
               | recalculating it instead is faster than an O(1) hash
               | table lookup? If so,why are you caching?
               | 
               | I am just getting more confused by what people are saying
               | here.
        
               | hinkley wrote:
               | > I am just getting more confused by what people are
               | saying here.
               | 
               | Maybe the takeaway is that this shit is hard and everyone
               | is a little bit confused in their own way.
               | 
               | I'm not saying the only way to win is not to play, but I
               | can neither confirm nor deny rumors that I may have at
               | some point muttered this under my breath.
        
               | addaon wrote:
               | I do think you've hit on a point of confusion in the
               | parent, but...
               | 
               | In a (hard) real-time system the only thing that matters
               | is the worst-case execution time -- if your worst-case
               | time fits in the allowance you win, if it doesn't you
               | lose.
               | 
               | Adding a cache improves the average case (typically), but
               | hurts the worst case (always) -- your worst case goes
               | from "do expensive operation" to "do lookup in cache,
               | fail, do expensive operation." Even if you parallelize
               | misses ("do lookup in cache | do expensive operation, on
               | cache success cancel expensive operation, on cache miss
               | wait for expensive operation" you're generally adding at
               | least a few operations in the critical path.
        
               | salawat wrote:
               | I'm beginning to wonder whether my personal moniker of
               | "signal propagation" is nominatively equivalent to what
               | others call "cache invalidation".
               | 
               | I.e. you cannot by definition have a state change until
               | the signal thereof has traversed a link from A to B. No
               | signal, no change. If you have a cache for one system
               | that delays writes to a backing data store referenced by
               | other systems that are unaware of that cache, then you
               | have necessarily a partition problem.
               | 
               | Every problem in a distributed computing context can
               | generally be reduced to a signal propagation problem.
        
               | ilyt wrote:
               | It's only a problem if you're breaking deadlines. Like
               | having someone's status in communication app update with
               | 10s delay is probably not a a problem, so as long as you
               | don't ever cache it for longer than that, it is entirely
               | fine.
               | 
               | > If you have a cache for one system that delays writes
               | to a backing data store referenced by other systems that
               | are unaware of that cache, then you have necessarily a
               | partition problem.
               | 
               | In most cases people mean caching reads so the problem
               | here is stale reads, not delayed writes. It's not
               | necessarily the same, some component caching (or
               | essentially, queuing) write means that once the write is
               | done, all readers (assuming they don't have any extra
               | cache) see it at once, while with caching reads (and no
               | invalidation), multiple readers can have different view
        
               | hinkley wrote:
               | The one that really jams me (and others) up is that
               | signal propagation and transaction lifetimes line up
               | badly. Caches, being cheap imitations of ACID 'compliant'
               | databases, which themselves cheat all day every day to
               | make pretty numbers, almost never have read repeatable
               | semantics, so if your code splits responsibility for
               | fetching a piece of data you have one of the problems I
               | complained about at/near the top level: Half of a
               | decision can be made with one version of a piece of data
               | and the other half with another, resulting in undefined
               | behavior. In a system that asserts that A = !B you can
               | have an interaction that did some of the activities for A
               | and some of the ones for B (because we read A and A' and
               | assumed they were identical).
        
               | zasdffaa wrote:
               | You're making a lot of assertions without any backup, and
               | they don't make much sense.
               | 
               | > Caches, being cheap imitations of ACID 'compliant'
               | databases
               | 
               | No they're not, not remotely.
               | 
               | > which themselves cheat all day every day
               | 
               | I don't believe a word of this, have you any actual
               | evidence (other than I believe Oracle's serialisable is
               | actually snapshot isolation IIRC)
               | 
               | > Half of a decision can be made with one version of a
               | piece of data and the other half with another, resulting
               | in undefined behavior. In a system that asserts that A =
               | !B you can have an interaction that did some of the
               | activities for A and some of the ones for B
               | 
               | And this just makes no sense, you're claiming that
               | databases are not producing the results they should, if
               | that's the case then I don't believe you understand
               | transaction isolation levels or indeed how database
               | queries work, in which case you have problems other than
               | and larger than caching.
               | 
               | This whole thread s just getting weirder.
        
               | hinkley wrote:
               | I've always been a bit of a fan of debounce for this but
               | many of the obvious implementations are cache based which
               | is essentially inviting people to do the thing I just
               | said I hate.
               | 
               | The cost of counting upvotes or logged in users is not
               | worth the incremental value it affords the user. These
               | are examples of situations where 'eventual consistency'
               | can be embarrassingly easy. If you split the source of
               | truth (SoT) from the system of record (SoR), then the
               | cost of updating the 'truth' can be amortized across the
               | writes to the SoR. The reads do a straight lookup from
               | the SoT, which was the truth at some point but is now the
               | fiction everyone is agreeing to. The distinction is that
               | everyone agrees to the same source of truth. Nobody ever
               | goes asking the SoR for a second opinion, except the SoR,
               | which updates the SoT when appropriate. The SoT is dumb,
               | the clients are dumber, only the code that manages the
               | SoR has any smarts to it.
               | 
               | Read traffic can escalate pretty high in such a system
               | without much stress.
        
               | brightball wrote:
               | Exactly. Another situation is a multitenant system where
               | some tenants are much larger than others, triggering
               | larger queries which potentially have a larger
               | performance impact.
        
               | ilyt wrote:
               | You can queue cache refresh in the background if the
               | object in cache is close to the "limit" of how fresh you
               | want it to be. Need a bit of cleverness to avoid
               | thundering herd problem but otherwise works pretty well.
               | 
               | Also if you really want near-real time you might want to
               | actually push the updates directly to cache instead of
               | relying client to trigger cache filling. cases where you
               | need a ton of data _and_ it served real-time _and_ it 's
               | same object that constantly changes (and not say pointer
               | to a position of video stream) are pretty rare
        
               | hinkley wrote:
               | My problem with the term 'cache' here is that people have
               | expectations about what cache is and when you violate
               | them you tend to discover that they've returned the
               | favor.
               | 
               | > push the updates directly to cache instead of relying
               | client to trigger cache filling
               | 
               | You can use cache software to fulfill such a role but
               | it's not exactly a cache anymore, and keeping people from
               | treating it like one is some kind of cat herding.
               | 
               | You're using it as a data store at this point, which for
               | one has very different eviction semantics - exactly 1
               | copy of each concept, no more, no less. You can store
               | such things in Consul, for instance, and get very low
               | stale read latency. The cache implementation scales
               | higher but breaks sooner. Pick your poison.
        
               | ilyt wrote:
               | Well, it depends on your data model. Like, if you write
               | only a little (compared to reads), write-thru model of
               | cache (write to cache as well as to backing store) will
               | lose you tiny % of hit ratio at profit of never having to
               | worry about stale data.
               | 
               | > My problem with the term 'cache' here is that people
               | have expectations about what cache is and when you
               | violate them you tend to discover that they've returned
               | the favor.
               | 
               | That's with everything that have any level of complexity,
               | just see how many people got transaction isolation wrong
               | just because they had some simplistic view on how it
               | works in database they use.
        
             | pyrolistical wrote:
             | Sounds more like a dark buffer anti pattern
        
             | ilyt wrote:
             | ...instead of having better user experience all the time ?
             | And have more code to manage on top of that ?
             | 
             | Sorry but that sounds like terrible idea.
             | 
             | And when system starts being slow _the cache is empty_ ,
             | because you waited for system to slow down to start filling
             | it
        
             | bawolff wrote:
             | Having a different code path that only triggers during high
             | load feels pretty risky to me.
        
               | bell-cot wrote:
               | _In theory_ , the best way to manage this situation may
               | be with a grayzone - if all is well, then there's only a
               | (say) 5% chance of going to cache. (To at least catch
               | "the cache is broken" errors sooner.) As things get
               | worse, that chance goes up as you cross-fade to mostly
               | using the cache.
               | 
               | Important Warning: If you do not have a very good
               | understanding of your overall system, or neglect to
               | carefully dampen the cross-fade functions - then the
               | whole thing can demonstrate fast and unpredictable swings
               | in behavior. Note that being subpoenaed to testify in a
               | "Why did the Crater City chemical plant blow up?"
               | investigation is usually bad for one's career.
        
               | hinkley wrote:
               | That's better but composition is always the devil in the
               | details. Which might be what you mean when you say:
               | 
               | > or neglect to carefully dampen the cross-fade functions
               | 
               | People with a 'caching problem' don't have one cache,
               | they have a bunch of them. So that 1 in 20 chance of
               | getting grey zoned on one cache, is 1 in 400 chance of
               | getting grey zoned on two caches, and a 1 in 8000 chance
               | to get grey zoned on 3. That is a very small fraction but
               | when you're dealing with the Law of Large Numbers that's
               | something that's happening at least once a second for
               | many services.
               | 
               | But if you're using this you probably have a lot more
               | than 3 caches, and then we have to talk about the odds
               | that any request gets grey zoned at least once. which is
               | going to be 1 - (19/20)^n. At six caches you're already
               | over 1 in 4, which is going to potentially change your
               | mean response time quite a bit.
        
               | bell-cot wrote:
               | > "...which is going to potentially change your mean
               | response time quite a bit."
               | 
               | My experience is that the changing mean response time
               | isn't nearly as bad as the changing standard deviation.
               | 
               | The really big problems tend to arise when you have
               | unsuspected or under-appreciated positive feedback loops
               | - whether in the software, or in associated physical
               | systems and human behaviors. Or resonance effects. Those
               | situations are where good dampening gets really
               | important.
        
           | dist1ll wrote:
           | If I had to pick the two worst sources of bugs I can imagine,
           | it would be (1) concurrency and (2) caches. Those functions
           | often require extensive performance and load testing and have
           | tons of non-deterministic failure modes.
        
           | zasdffaa wrote:
           | In my _extensive_ experience of optimising stuff...
           | 
           | > Caching is a weapon of last resort. In most cases it is the
           | last major performance improvement
           | 
           | ...is wrong. It is one of the very first things I reach for
           | because it is the most effective and simplest, once you've
           | eliminated O(n^2) behaviour etc. It's not a magic wand but
           | nothing is.
        
             | notinfuriated wrote:
             | In my _extensive_ experience of optimizing stuff it is
             | right.
             | 
             | I cannot even count how many issues were created downstream
             | of the decision to start caching X, Y, or Z. At a point,
             | every other software engineer gets the bright idea to cache
             | their special little function or request or database call.
             | Soon enough, there are N layers of cache responsible for
             | delivering some special piece of data. The engineers build
             | tooling to clear those caches on updates, either manually
             | or automatically. Both suffer from issues, the latter
             | occasionally failing to perform and just causing general
             | confusion about why stale data is returning, and the former
             | being far worse, recruiting dozens of business stakeholders
             | into the problem of knowing-how-to-expire-cache. They start
             | making their own internal wiki pages with steps or various
             | cache tools to clear, often incorrect steps because they
             | got lucky one day by clearing cache X and not cache Y that
             | they did not know about.
             | 
             | Overall, in my experience, allowing software engineers to
             | do caching as they wish without any real guardrails has
             | caused massive problems most likely on the order of N!. I
             | cannot roll my eyes far enough into the back of my head
             | when I hear someone suggest caching as a solution any
             | longer, at least not on an application that has more than a
             | dozen business stakeholders managing data and expecting it
             | to be updateable in any reasonable way so that customers
             | see the most recent data when the business wants it.
        
             | PaulHoule wrote:
             | Caching can turn O(n2) algorithms into much faster
             | algorithms. I would point to how transformative tabling is
             | in Prolog, not to mention memoization rescuing the
             | calculation of fibonacci numbers by recursion, which
             | otherwise would be gross malpractice.
        
               | fwip wrote:
               | Caching is fine is there's no state involved. e.g: pure
               | functions like the fibonacci sequence.
               | 
               | As soon as your cache depends on a mutable state, things
               | get really hairy. Cache invalidation is one of the 2 hard
               | problems in computer science, after all.
        
               | samus wrote:
               | These are very well-behaved instances of caching. They
               | are so well-behaved that they can be fully described with
               | a little bit of analysis. Once I/O is involved or the
               | load isn't well-known beforehand, it becomes much more
               | difficult. Lazy I/O in Haskell is a very divisive topic
               | for a reason.
        
             | hinkley wrote:
             | That's not a very convincing argument. "I use it because it
             | works" without refuting any reasons I presented as to why
             | it "works" but breaks your other options.
             | 
             | How do you teach people to effectively use a profiler on
             | code with pernicious caching?
             | 
             | Part of why I'm the "Cleaner" is because I've worked with
             | any number of people who will present "evidence" that
             | everything that can be done has been done. Here are the
             | metrics from a year ago when we turned on the cache. Here
             | are the flame charts. See? Nothing to see here. They get
             | really uncomfortable six months later when I'm reporting
             | 2-3x in code that can't be optimized further. And in each
             | case everything was made more difficult by the people ahead
             | of me chumming the waters.
             | 
             | They were partially correct. There was in fact nothing left
             | for them to see, so nothing that they knew to do. But
             | "there's nothing to see here" and "there's nothing here"
             | are two very, very different statements.
        
               | trgn wrote:
               | There's a somewhat tangentially related heuristic to your
               | points imho. When writing code, when in doubt, task CPU
               | over memory, it will generally yield leaner, better
               | software. Easier to profile, debug, and optimize. Much
               | easier to scale in production environments too.
               | 
               | Code which optimizes for fast running time, tends to lean
               | heavily into using memory-bloating data structures and
               | caches. These are often poison for aggregate performance
               | in real world environments, and like you said among other
               | things, reduce visibility in the workings of system.
               | 
               | There's some implications of this stance; e.g.
               | optimizations to reduce memory footprint are generally
               | never wasted, while optimizations to squeeze out a few ns
               | of runtime may have many more unexpected consequences.
        
               | ilyt wrote:
               | > How do you teach people to effectively use a profiler
               | on code with pernicious caching?
               | 
               | Profile the function that is called if object is not
               | found in cache ? Have pluggable cache with plugin that
               | just doesn't cache? That is not hard problem to solve.
               | 
               | > They get really uncomfortable six months later when I'm
               | reporting 2-3x in code that can't be optimized further.
               | And in each case everything was made more difficult by
               | the people ahead of me chumming the waters
               | 
               | Well, they delivered (I'm guessing) 10x improvement in a
               | month by adding cache. Caching is entirely reasonable
               | first step and it can be good enough to be last one.
               | 
               | And frankly ability to use profiler and knowing some
               | architectural optimizations seems to be pretty rare
               | skill.
        
               | zasdffaa wrote:
               | Exactly! People here seem to be making a complex thing
               | out of a simple thing.
        
               | Aeolun wrote:
               | I kinda feel that your convincing counterargument is just
               | 'I have a different experience'.
               | 
               | I've never seen people reach for caches first, and then
               | discard all further optimizations.
               | 
               | Or rather, I've seen them do that, but not out of some
               | misplaced feeling that it was the only solution, just the
               | one that was easiest to accomplish in very little time.
        
               | cogman10 wrote:
               | The issue with caches is they are easy to apply. You can
               | easily drop a cache in any place you load a resource
               | without significant code changes.
               | 
               | I tend to think they are ok weapons of sometimes first
               | resort (rarely changing resource with heavy system
               | impact). But you'll definitely get better results simply
               | by rethinking and rearchitecting stuff. (Did you really
               | need to load the database just to lookup a single value?)
        
               | hinkley wrote:
               | > You can easily drop a cache in any place you load a
               | resource without significant code changes.
               | 
               | Only when the cache is brand new. The problem with people
               | with cache addiction is that they think they're free. So
               | once they know a cache is there, they begin to lean on
               | it, and lean on it. I've seen it with many different
               | teams in a number of verticals. Sometimes while I'm
               | there, sometimes before I arrived. It's the people not
               | the company.
               | 
               | This illusion that you can always turn it off later is
               | dangerous. That's only true if you have an impeccable
               | information architecture when you start. And if you have
               | that, you've already done a lot of heavy lifting before
               | caching becomes your only option.
        
               | zasdffaa wrote:
               | It's dificult to refute your argument because I don't
               | understand your points:
               | 
               | > Because of the locality aspect, people will assume that
               | it's 'free' to look up a value multiple times instead of
               | passing these data dependencies between the relevant
               | methods
               | 
               | what does that even mean? What does 'data dependencies'
               | mean specicially?
               | 
               | > then you can end up evicting data in the middle of a
               | transaction
               | 
               | So in the middle of a transaction, instead of looking up
               | data from a key - very cheap - you should do ... what?
               | Recalculate it instead?
               | 
               | > The less obvious consequence is now you have
               | concurrency bugs, because the code assumes that decisions
               | made about the data will go the same way on each call,
               | but due to eviction, two halves of the transaction may
               | make conflicting decisions with undefined behavior
               | 
               | Assuming I even understand this, you are saying where
               | there's concurrencty there's a risk of pulling stale data
               | out of a cache. Well, true. That's just something you
               | have to be careful of. Add a Tx reference to the cache
               | key.
               | 
               | > Once you introduce caching you poison the performance
               | well.
               | 
               | Not my experience at all. I can't understand why you're
               | saying this.
               | 
               | > Flame charts lie when cache is used
               | 
               | How so?
               | 
               | > and after people rely on the cache it is lying in the
               | other direction when the cache is off, because you're
               | calling that path 5 times now and it's swamping other
               | bottlenecks
               | 
               | I've no idea what this means
               | 
               | > This is what I really mean when I say it's a weapon of
               | last resort. Because once you use it, it attempts to take
               | any other options off the table
               | 
               | Heh! Absolutely not! Other optimisations are orthogonal
               | to caching IME:
               | 
               | Better algo - orthogonal
               | 
               | Go lower level - orthogonal
               | 
               | Go multicore - orthogonal
               | 
               | Noobish SQL query is begging for a rewrite - orthogonal
               | 
               | Use L1/2/3/ chip cache better, or RAM, by careful data
               | layout - orthogonal
               | 
               | Use a better library - orthogonal
               | 
               | Use better data structure eg. than linked list -
               | orthogonal
        
               | svieira wrote:
               | >> Because of the locality aspect, people will assume
               | that it's 'free' to look up a value multiple times
               | instead of passing these data dependencies between the
               | relevant methods
               | 
               | > what does that even mean? What does 'data dependencies'
               | mean specicially?
               | 
               | It means that instead of having a method like
               | `entryPoint(int)` that looks up a `Foo` by id and then
               | passes the `Foo` around (`verify(Foo)`, `Foo
               | update(Foo)`) you instead wind up with a system that
               | passes around an `int` (`verify(int)`, `/* Unused */ Foo
               | update(int)`) and each layer then looks up `Foo` in the
               | `FooService` (because `FooService` has a cache and
               | subsequent lookups should be essentially-free).
               | 
               | Even worse when `FooService` is abstracted behind a
               | thread-local like `ThreadLocal<Foo> currentFoo`, the
               | implementation of which is
               | `fooService.findById(currentFooId.get()) /* cheap,
               | because fooService has a cache */`.
               | 
               | My own take on this is that caches are global-by-default
               | when you really want local-by-default (e. g. dynamic
               | programming instead of a `static final Cache
               | theCacheForTheEntireProgram`). Local-by-default caches
               | are easy to make transactional (when the call stack
               | returns the cache is gone) and don't grow in an unbounded
               | manner in the general case.
        
               | hinkley wrote:
               | Yes!
               | 
               | Local by default also has the feature/problem of making
               | you look at the scale of the data involved in your query
               | up front, before it's having a 100-1000 req/s hitting it
               | in production. It's harder to pretend this large
               | dependency graph will just magically take care of itself
               | (or worse, be someone else's problem after you've
               | declared victory) when it's staring you in the face.
        
         | gav wrote:
         | > Instead, it's pointing out a risk that sometimes they are too
         | good, leading to a system that can't function when the normal
         | level of locality and memoization are not available.
         | 
         | And at that point cache management starts to be a huge problem.
         | Clearing caches becomes a death spiral because your system
         | can't actually support your normal load.
         | 
         | For example I worked with a large retailer, who had a giant
         | monolithic enterprise Java ecommerce system. It was so slow
         | that product detail pages took upwards of 30 seconds to render.
         | There was a cache in front of that system (Ehcache), then a
         | page-level cache (Varnish) in front of that, then finally
         | Akamai took 98% of the load.
         | 
         | Rollouts were terrifying, you cleared the cache and the site
         | might come back up. Changing Akamai configuration was also
         | similarly risky.
         | 
         | It's critical that systems don't become cache dependent for
         | load, they should be a performance boost, not a required
         | component.
        
       | nly wrote:
       | Caching is easy, cache invalidation is hard.
        
         | latchkey wrote:
         | "There are two hard things in computer science: cache
         | invalidation, naming things, and off-by-one errors."
         | 
         | https://twitter.com/codinghorror/status/506010907021828096
        
           | withinboredom wrote:
           | I have no idea why you're quoting that tweet. This saying is
           | much older than 2014.
        
       | PaulKeeble wrote:
       | Multiple times I have found the impact of cache performance to be
       | dramatic. I find that the usual optimisations in three aspects
       | are usually obvious to achieve and can bring surprisingly big
       | improvements.
       | 
       | The first is that RAM accessed randomly or in the wrong direction
       | performs very poorly compared to sequential access in order. The
       | same is true of cache because its much easier to predict and
       | prefetched data when its accessed serially. The effect is so
       | dramatic that its sometimes worth "accessing" data you don't need
       | to get access to this hardware optimisation to maintain serial
       | access. At times for example I have found Treemaps and listmaps
       | perform better than hashmaps as the cheap search serially and in
       | memory order surprisingly out does the O(log(n)) or O(n)
       | algorithms and the cost of the hash.
       | 
       | If you have a data structure that is larger than optimal and
       | overflows cache lines it can have big impacts on cache
       | utilisation and performance if the algorithm doesn't need all the
       | fields. Object orientation impacts quite heavily on this and
       | certain designs can make very large objects and these often get
       | poor cache usage.
       | 
       | Unnecessary allocation and deallocation of memory in algorithms
       | often results in the cache getting thrashed and avoiding this and
       | using the same memory can result in better hot cache usage and
       | reducing garbage collection usage.
       | 
       | Most profilers can give you a good idea of the allocations but
       | the other two are often harder to find and mostly just show up in
       | cache misses.
        
       ___________________________________________________________________
       (page generated 2022-11-14 23:00 UTC)