[HN Gopher] Optimizations can have unexpectedly large effects wh... ___________________________________________________________________ Optimizations can have unexpectedly large effects when combined with caches Author : zdw Score : 168 points Date : 2022-11-14 16:30 UTC (6 hours ago) (HTM) web link (justinblank.com) (TXT) w3m dump (justinblank.com) | nerpderp82 wrote: | We really don't know what causes perf differences, layout has | more of an effect than -O1 to -O3. | | "Performance Matters" by Emery Berger | https://www.youtube.com/watch?v=r-TLSBdHe1A | tmtvl wrote: | I wonder if he mentions running from a different directory and | running as a different user to highlight how annoying the | effect from the environment is, as USER and PWD are part of the | environment. | lob_it wrote: | Caching db queries and media assets already got a big boost a | decade ago going from spinning rust to ssd to even faster NVME. | | Optimizing DB queries on spinning rust was an exciting experience | though. Chasing milliseconds with caching and warming caches up | still brings back good memories :) | | Scale and architecture makes new designs much easier. 1Mb of | code... A 100Mb database and 1Gb of media assets is becoming a | relic like "big iron". Blank slate canvases should skip a lot of | it. | | Technology debt only gets harder to manage over time. | ISL wrote: | This happens in life, too. When your counterparts lose continuity | with your work, they turn to other work and the context-switch to | return to working with you becomes increasingly hard. | | Helps to be quick. | joosters wrote: | "Almost all programming can be viewed as an exercise in caching." | hinkley wrote: | About 15 years ago I was on a team that was having so many | problems with caching that I began to see it for the liability | that it is. It started with a 2-4 week delay each release cycle | while they fixed caching bugs the testers had found, but once I | joined the project I saw how deep the problem went. | | A number of their features invited a full table scan, which is | fundamentally incompatible with an LRU cache, and in particular | has disastrous stair step functions as the working set grows. | Note that this is also a couple of years before anyone had heard | of memcached so these were DIY implementations with multiple | potential sources of bugs. | | About the time I found my feet on the project someone wanted to | add caching to yet another workflow that was taking 30 seconds to | run. It was an admin workflow so there was a little tolerance of | slow response times, but not 30 seconds. I volunteered for the | story because I thought we could do better than more caches. | | Through conversation, step debugging, and profiling I found the | spot where they had wanted the caching. What the feature was | doing was two relatively expensive lookups of the same data, | filtering it on two very different dimensions, and returning the | intersection of the two. In typical ORM fashion, both of the | concerns were taking it upon themselves to fetch their own data | instead of having it handed to them. Since they were both called | from the same function, I hoisted the lookup out of the callee | into the caller, sending the same list to both. Then I changed | the code again to feed the result of call 1 to call 2, to make it | work more like consecutive filters. | | The end result was that the page now loaded in 3 seconds, and I | was able to write better unit tests for this code. One might | expect that most of that improvement came from running the second | filter on a much smaller list, but that's where you'd be wrong. | The majority of that improvement came from the hoisting, not the | chaining. Due to the amount of work in the filters, the initial | lookup shouldn't have accounted for more than a third of the page | load time, but just sharing the same data dropped the response | time to around 6 seconds. Chaining the filters ended up only | saving 3 seconds. | | The problem with large data sets is that CPUs have caches too. | And in a garbage collected programming language, there are | additional costs to carrying around duplicated data, particularly | if it is in tree form like this was. Most likely we were smashing | the hell out of CPU caches and introducing extra GC pressure to | boot. By passing in the same object tree the second call followed | warm and/or hot paths through heap memory. | | I'd had a similar but less dramatic experiences a few years | before that, where I identified that a function, which accounted | for 10% of the runtime, was being invoked roughly twice as often | as the problem required. By that point 5% looked like low hanging | fruit, so I eliminated the duplicate calls and reran the | benchmarks. But the results didn't make sense so I tweaked and | prodded and fiddled. The new time was 20% lower than the old | time, for what should have been 5%. This is already way too long | so I won't go into the reasons here, but in addition to the | scenarios I mentioned for the previous issue, there's also | problems of timing accuracy in profilers, and cyclic use patterns | that can misattribute the expense of a call to siblings or the | parent. A function that creates many objects may never reach the | GC threshold, but a large allocation later in the sequence | diagram may. Or collide with cache lines that are needed again | later in the call graph. | bawolff wrote: | I feel like the moral of this story is less "caching is a | liability" and more garbage in garbage out. Caching can help | things when used appropriately, but its not magic. | nerpderp82 wrote: | Your performance topology is dictated by multiple hierarchies | of caches, many of which you might not be aware of. If they | disappear your application may no longer function. The cache | is no longer and optimization but a critical component. | Figuring out how to _fill_ it might be harder than how to | invalidate it. | hinkley wrote: | > Figuring out how to fill it might be harder than how to | invalidate it. | | This always seems to be a problem for someone else or for | another day, which is part of the frustration. | | Cold caches killing your app are a considerable problem, | doubly so if you deploy in the Cloud because now you aren't | necessarily guaranteed to be deploying into a hot data | center. I think there's an unspoken assumption we haven't | really eliminated that was if you came to work and the | power was out, you not only expected none of the software | to work, but you peer pressured other people into not | complaining too loudly about it when the power came back | on. Today is a loss. Everyone can see it. | | But for instance if I work on an SaaS project in the Cloud, | not only can I just lose a region, but for many problem | domains my customers have mostly local customers, then I | have diurnal cycles of traffic that are customer specific. | All the non-bot traffic is in New York until 6am GMT, say.I | might want to spin regions down pretty aggressively during | parts of the day, but if spinning back up results in a | thundering herd problem because 'the cache makes us fast'? | Well then the value of my circuit breaker alerts drops to | less than zero. | | That's just the first time things break. Caches and | capacity planning are also at odds, because capacity | planning is about planning your worst case and caches are | almost always about the happy path. | hinkley wrote: | Silver bullets kill everyone, not just werewolves. | | Garbage In, Garbage Out is not wrong per se, but we have in | the real world the concept of an attractive nuisance, where | the victim is not entirely at fault for their injuries. The | thing, being prone to invite misuse and misadventure, bears | some responsibility for protecting people from themselves. | | Caching is an attractive nuisance. These days it's simply | replaced Global Shared State which used to be the boogeyman, | but sadly we haven't collectively clued in on them being | equivalent. | bawolff wrote: | In this situation it was stuff that would be terrible both | with and without caching, that didn't have much to do with | caching itself, just the caching just provided a bit of a | bandaid. I think the more apt metaphor might be "seatbelts | make people drive recklessly". | | Would fast computers also be an attractive nuiscense | because you dont have to think about performance as much? | hinkley wrote: | > Would fast computers also be an attractive nuisance | because you don't have to think about performance as | much? | | Sometime in the last week or two someone asserted just | that. I didn't dig into that discussion very far but on | the face of it? Sure. Certainly grabbing the fastest | hardware available to Man is an expensive proposition and | you should hold that in reserve. You don't get to play | that card very often, and much less so today than during | the Golden Era. Post Moore's Law for sure, but to an | extend post-Dennard as well, since power draw of a | facility is a significant cost center. And without | Dennard you can make them more powerful but keeping them | efficient gets expensive. | dathinab wrote: | Artickle: speed up lead to avoiding cache misses which lead to | further speed up which lead to further less cache misses | | This kind of interaction is important to keep in mind. | | It applies to all kinds of caches in all kinds of situations. To | some degree even the instruction and data caches of modern cpus. | | But most interesting is that in complex systems in (very very) | rare cases the "whole system effect" can go the other way around | too (due to a now faster components trashing the cache for other | components in a way which makes the system as a whole slower). | RicoElectrico wrote: | What kind of batch process could that be? | mananaysiempre wrote: | Judging by the off-hand mentions in the article, something | along the lines of ingesting the cargo manifest of a large | ship? | mjb wrote: | I really love this point. | | For most applications the optimal size of a (time-based) cache is | around the point that keeping an item around in the cache is | equal to the cost of accessing it again. This is a point that Jim | Gray made in the classic "The 5 Minute Rule for Trading Memory | for Disc Accesses" | (http://www.hpl.hp.com/techreports/tandem/TR-86.1.pdf) back in | 1986. | | A side effect of that is that systems that run faster, and | therefore access each cached it more often, get more benefit out | of a cache by pushing that set point out further. Whether that | happens in any given system depends a lot on the access patterns, | and especially temporal locality within those access patterns, | but the effect is real in a lot of different kinds of systems. | | > It's common wisdom that systems with caches can fail badly when | the caches are cold. When every request misses the cache, the | system is overloaded, and never recovers. What I observed was the | mirror image. | | The bottom line is that both locality and memoization are huge | contributor to the performance of systems, both positive and | negative. When folks (including me) point out the risks of | caches, we're not implying that caches are always bad. Instead, | it's pointing out a risk that sometimes they are too _good_ , | leading to a system that can't function when the normal level of | locality and memoization are not available. | bombcar wrote: | Reminds me of: There are only two hard things | in Computer Science: cache invalidation, and naming things, and | off-by-one errors. | | What people don't realize is caches are much harder than just | invalidation. | | I'm a fan of continually testing (and running) things without | cache at all just to verify what they do, caching can often | cover a multitude of sins that you really don't need to be | doing. | hinkley wrote: | Caches are global shared state. Correction: _lossy_ global | shared state. | | 'Naming things' when they are in a global namespace is really | hard, to the point where I'm starting to thing that cache | invalidation and naming things are not two problems but the | same problem in two manifestations. | kwhitefoot wrote: | > Caches are global shared state | | Not necessarily. I designed an object oriented simulation | system which had a lot of calculated properties where the | calculation was a substantial amount of work but the result | would be queried many times before the input the function | changed. Caching the results inside the object provided a | substantial speed up (orders of magnitude) while keeping | the cache completely local and hidden from the client code | and keeping the program very modular. | | Keeping the cache inside the objects allowed us to keep the | parts of the program neatly separated and minimized the | amount of knowledge that client code, and more importantly | client coders, needed to have about the objects they were | using. | bmm6o wrote: | If we're splitting hairs, your situation sounds more like | memoization, where you have saved the output of an | expensive but deterministic calculation. With the correct | assumptions, this can never result in incorrect behavior. | Worst case is excessive memory consumption. Whereas a | stale cache is something that needs to be taken into | account. | hinkley wrote: | What happens when the system of record changes the answer | halfway through the transaction? | | By caching answers to calculations you've moved the | source of truth from the function to the cache. But the | cache is also acting as a proxy for the system of record, | which is a conflict of interests. That's substantially | part of the problem with cache invalidation. | | ETA: There's also a common category error here that I see | with people who grossly misunderstand DP as 'caching'. | First that they think that caching and memoization are | the same thing, which they emphatically are not, and | second that DP is about memoization, which means they | listened for the first ten minutes and then fell asleep. | | Tabulation is much more powerful than memoization, so | equating DP with memoization is impoverishing it and | yourself. | | But importantly here, the distinction between memoization | and caching is that memoization is scoped while caching | is global. With memoization a single call tree gets the | same answers _every time_ , which means the mental | process of Local Reasoning is maintained. If you know the | call graph for this operation you know everything that | interacts with it. You know if 'eviction' is even a | thing. You also know that your unit tests are testing the | code as it will behave under load, not some fictional | happy path. | | tl;dr: If your calculation is important it should show up | in your data flow, not arrive out of band from a magic | oracle. | hinkley wrote: | Caching is a weapon of last resort. In most cases it is the | last major performance improvement you will see from your code, | even if other options were available, and any information | architecture problems you have are set in stone or will get | worse. | | Because of the locality aspect, people will assume that it's | 'free' to look up a value multiple times instead of passing | these data dependencies between the relevant methods. This | turns your cache into global shared state, and once that | happens you can't turn the cache off again. There are other | problems with this arrangement as well. | | The easiest to understand (and yet I've failed on at least one | project) is that if the working set for a problem ever becomes | proportional to the size of the cache (>1 for single task | systems, > 1/n for systems with n concurrent tasks), then you | can end up evicting data in the middle of a transaction. The | obvious consequence is that it causes performance to jump off a | cliff. Particularly bad behavior with LRUs but switching to a | different statistical model simply reduces the height of the | cliff. There is still a cliff. The less obvious consequence is | now you have concurrency bugs, because the code assumes that | decisions made about the data will go the same way on each | call, but due to eviction, two halves of the transaction may | make conflicting decisions with undefined behavior. | | But speaking as the guy who fixes the perf issues other people | gave up on: Once you introduce caching you poison the | performance well. Flame charts lie when cache is used, and | after people rely on the cache it is lying in the other | direction when the cache is off, because you're calling that | path 5 times now and it's swamping other bottlenecks. This is | what I really mean when I say it's a weapon of last resort. | Because once you use it, it attempts to take any other options | off the table. Caching will be most of the last order of | magnitude in scalability that your system ever sees. Every | 'win' from there on will take more time and energy than most | people are willing to invest. | brightball wrote: | I like applying the Circuit Breaker approach to caching. | Rather than applying a cache everywhere, setup a cache | wrapper that only triggers if the accessed resource is slower | than a certain threshold. | | The goal is for the cache to go unused most of the time, but | during times of unexpected system stress the caches will kick | in, provide relief to the system and reduce impact to users. | zasdffaa wrote: | > The goal is for the cache to go unused most of the time | | Then you have a useful cache sitting there unused unless at | peak time - why? Just enable it always. This makes no | sense. | brightball wrote: | Depends on how large the cache is and how much users | expect instant updates to their data. If real-time or | near real-time is the expectation, then avoiding the | cache usage unless necessary preserves their experience | without incurring the hurdles of cache invalidation that | grow more complex the more you cache. | | EDIT: | | Another area where this approach is beneficial is with a | multi-tenant system where larger tenants querying more | data could have a much more significant performance | impact than other tenants on the system. | | Time of day/week/month usage spikes can also cause a | performance degrade for everyone even if it's not | typically stressing the system. | | This approach is just a mitigation to prevent the system | from getting overload during those times. It's not the | right approach for every situation. Just a spin on the | Circuit Breaker Pattern. | | https://en.wikipedia.org/wiki/Circuit_breaker_design_patt | ern | zasdffaa wrote: | WTF? | | > If real-time or near real-time is the expectation, then | avoiding the cache usage | | Because looking up a precalculated result in a hash table | is so much slower than not looking it up? And what, | recalculating it instead is faster than an O(1) hash | table lookup? If so,why are you caching? | | I am just getting more confused by what people are saying | here. | hinkley wrote: | > I am just getting more confused by what people are | saying here. | | Maybe the takeaway is that this shit is hard and everyone | is a little bit confused in their own way. | | I'm not saying the only way to win is not to play, but I | can neither confirm nor deny rumors that I may have at | some point muttered this under my breath. | addaon wrote: | I do think you've hit on a point of confusion in the | parent, but... | | In a (hard) real-time system the only thing that matters | is the worst-case execution time -- if your worst-case | time fits in the allowance you win, if it doesn't you | lose. | | Adding a cache improves the average case (typically), but | hurts the worst case (always) -- your worst case goes | from "do expensive operation" to "do lookup in cache, | fail, do expensive operation." Even if you parallelize | misses ("do lookup in cache | do expensive operation, on | cache success cancel expensive operation, on cache miss | wait for expensive operation" you're generally adding at | least a few operations in the critical path. | salawat wrote: | I'm beginning to wonder whether my personal moniker of | "signal propagation" is nominatively equivalent to what | others call "cache invalidation". | | I.e. you cannot by definition have a state change until | the signal thereof has traversed a link from A to B. No | signal, no change. If you have a cache for one system | that delays writes to a backing data store referenced by | other systems that are unaware of that cache, then you | have necessarily a partition problem. | | Every problem in a distributed computing context can | generally be reduced to a signal propagation problem. | ilyt wrote: | It's only a problem if you're breaking deadlines. Like | having someone's status in communication app update with | 10s delay is probably not a a problem, so as long as you | don't ever cache it for longer than that, it is entirely | fine. | | > If you have a cache for one system that delays writes | to a backing data store referenced by other systems that | are unaware of that cache, then you have necessarily a | partition problem. | | In most cases people mean caching reads so the problem | here is stale reads, not delayed writes. It's not | necessarily the same, some component caching (or | essentially, queuing) write means that once the write is | done, all readers (assuming they don't have any extra | cache) see it at once, while with caching reads (and no | invalidation), multiple readers can have different view | hinkley wrote: | The one that really jams me (and others) up is that | signal propagation and transaction lifetimes line up | badly. Caches, being cheap imitations of ACID 'compliant' | databases, which themselves cheat all day every day to | make pretty numbers, almost never have read repeatable | semantics, so if your code splits responsibility for | fetching a piece of data you have one of the problems I | complained about at/near the top level: Half of a | decision can be made with one version of a piece of data | and the other half with another, resulting in undefined | behavior. In a system that asserts that A = !B you can | have an interaction that did some of the activities for A | and some of the ones for B (because we read A and A' and | assumed they were identical). | zasdffaa wrote: | You're making a lot of assertions without any backup, and | they don't make much sense. | | > Caches, being cheap imitations of ACID 'compliant' | databases | | No they're not, not remotely. | | > which themselves cheat all day every day | | I don't believe a word of this, have you any actual | evidence (other than I believe Oracle's serialisable is | actually snapshot isolation IIRC) | | > Half of a decision can be made with one version of a | piece of data and the other half with another, resulting | in undefined behavior. In a system that asserts that A = | !B you can have an interaction that did some of the | activities for A and some of the ones for B | | And this just makes no sense, you're claiming that | databases are not producing the results they should, if | that's the case then I don't believe you understand | transaction isolation levels or indeed how database | queries work, in which case you have problems other than | and larger than caching. | | This whole thread s just getting weirder. | hinkley wrote: | I've always been a bit of a fan of debounce for this but | many of the obvious implementations are cache based which | is essentially inviting people to do the thing I just | said I hate. | | The cost of counting upvotes or logged in users is not | worth the incremental value it affords the user. These | are examples of situations where 'eventual consistency' | can be embarrassingly easy. If you split the source of | truth (SoT) from the system of record (SoR), then the | cost of updating the 'truth' can be amortized across the | writes to the SoR. The reads do a straight lookup from | the SoT, which was the truth at some point but is now the | fiction everyone is agreeing to. The distinction is that | everyone agrees to the same source of truth. Nobody ever | goes asking the SoR for a second opinion, except the SoR, | which updates the SoT when appropriate. The SoT is dumb, | the clients are dumber, only the code that manages the | SoR has any smarts to it. | | Read traffic can escalate pretty high in such a system | without much stress. | brightball wrote: | Exactly. Another situation is a multitenant system where | some tenants are much larger than others, triggering | larger queries which potentially have a larger | performance impact. | ilyt wrote: | You can queue cache refresh in the background if the | object in cache is close to the "limit" of how fresh you | want it to be. Need a bit of cleverness to avoid | thundering herd problem but otherwise works pretty well. | | Also if you really want near-real time you might want to | actually push the updates directly to cache instead of | relying client to trigger cache filling. cases where you | need a ton of data _and_ it served real-time _and_ it 's | same object that constantly changes (and not say pointer | to a position of video stream) are pretty rare | hinkley wrote: | My problem with the term 'cache' here is that people have | expectations about what cache is and when you violate | them you tend to discover that they've returned the | favor. | | > push the updates directly to cache instead of relying | client to trigger cache filling | | You can use cache software to fulfill such a role but | it's not exactly a cache anymore, and keeping people from | treating it like one is some kind of cat herding. | | You're using it as a data store at this point, which for | one has very different eviction semantics - exactly 1 | copy of each concept, no more, no less. You can store | such things in Consul, for instance, and get very low | stale read latency. The cache implementation scales | higher but breaks sooner. Pick your poison. | ilyt wrote: | Well, it depends on your data model. Like, if you write | only a little (compared to reads), write-thru model of | cache (write to cache as well as to backing store) will | lose you tiny % of hit ratio at profit of never having to | worry about stale data. | | > My problem with the term 'cache' here is that people | have expectations about what cache is and when you | violate them you tend to discover that they've returned | the favor. | | That's with everything that have any level of complexity, | just see how many people got transaction isolation wrong | just because they had some simplistic view on how it | works in database they use. | pyrolistical wrote: | Sounds more like a dark buffer anti pattern | ilyt wrote: | ...instead of having better user experience all the time ? | And have more code to manage on top of that ? | | Sorry but that sounds like terrible idea. | | And when system starts being slow _the cache is empty_ , | because you waited for system to slow down to start filling | it | bawolff wrote: | Having a different code path that only triggers during high | load feels pretty risky to me. | bell-cot wrote: | _In theory_ , the best way to manage this situation may | be with a grayzone - if all is well, then there's only a | (say) 5% chance of going to cache. (To at least catch | "the cache is broken" errors sooner.) As things get | worse, that chance goes up as you cross-fade to mostly | using the cache. | | Important Warning: If you do not have a very good | understanding of your overall system, or neglect to | carefully dampen the cross-fade functions - then the | whole thing can demonstrate fast and unpredictable swings | in behavior. Note that being subpoenaed to testify in a | "Why did the Crater City chemical plant blow up?" | investigation is usually bad for one's career. | hinkley wrote: | That's better but composition is always the devil in the | details. Which might be what you mean when you say: | | > or neglect to carefully dampen the cross-fade functions | | People with a 'caching problem' don't have one cache, | they have a bunch of them. So that 1 in 20 chance of | getting grey zoned on one cache, is 1 in 400 chance of | getting grey zoned on two caches, and a 1 in 8000 chance | to get grey zoned on 3. That is a very small fraction but | when you're dealing with the Law of Large Numbers that's | something that's happening at least once a second for | many services. | | But if you're using this you probably have a lot more | than 3 caches, and then we have to talk about the odds | that any request gets grey zoned at least once. which is | going to be 1 - (19/20)^n. At six caches you're already | over 1 in 4, which is going to potentially change your | mean response time quite a bit. | bell-cot wrote: | > "...which is going to potentially change your mean | response time quite a bit." | | My experience is that the changing mean response time | isn't nearly as bad as the changing standard deviation. | | The really big problems tend to arise when you have | unsuspected or under-appreciated positive feedback loops | - whether in the software, or in associated physical | systems and human behaviors. Or resonance effects. Those | situations are where good dampening gets really | important. | dist1ll wrote: | If I had to pick the two worst sources of bugs I can imagine, | it would be (1) concurrency and (2) caches. Those functions | often require extensive performance and load testing and have | tons of non-deterministic failure modes. | zasdffaa wrote: | In my _extensive_ experience of optimising stuff... | | > Caching is a weapon of last resort. In most cases it is the | last major performance improvement | | ...is wrong. It is one of the very first things I reach for | because it is the most effective and simplest, once you've | eliminated O(n^2) behaviour etc. It's not a magic wand but | nothing is. | notinfuriated wrote: | In my _extensive_ experience of optimizing stuff it is | right. | | I cannot even count how many issues were created downstream | of the decision to start caching X, Y, or Z. At a point, | every other software engineer gets the bright idea to cache | their special little function or request or database call. | Soon enough, there are N layers of cache responsible for | delivering some special piece of data. The engineers build | tooling to clear those caches on updates, either manually | or automatically. Both suffer from issues, the latter | occasionally failing to perform and just causing general | confusion about why stale data is returning, and the former | being far worse, recruiting dozens of business stakeholders | into the problem of knowing-how-to-expire-cache. They start | making their own internal wiki pages with steps or various | cache tools to clear, often incorrect steps because they | got lucky one day by clearing cache X and not cache Y that | they did not know about. | | Overall, in my experience, allowing software engineers to | do caching as they wish without any real guardrails has | caused massive problems most likely on the order of N!. I | cannot roll my eyes far enough into the back of my head | when I hear someone suggest caching as a solution any | longer, at least not on an application that has more than a | dozen business stakeholders managing data and expecting it | to be updateable in any reasonable way so that customers | see the most recent data when the business wants it. | PaulHoule wrote: | Caching can turn O(n2) algorithms into much faster | algorithms. I would point to how transformative tabling is | in Prolog, not to mention memoization rescuing the | calculation of fibonacci numbers by recursion, which | otherwise would be gross malpractice. | fwip wrote: | Caching is fine is there's no state involved. e.g: pure | functions like the fibonacci sequence. | | As soon as your cache depends on a mutable state, things | get really hairy. Cache invalidation is one of the 2 hard | problems in computer science, after all. | samus wrote: | These are very well-behaved instances of caching. They | are so well-behaved that they can be fully described with | a little bit of analysis. Once I/O is involved or the | load isn't well-known beforehand, it becomes much more | difficult. Lazy I/O in Haskell is a very divisive topic | for a reason. | hinkley wrote: | That's not a very convincing argument. "I use it because it | works" without refuting any reasons I presented as to why | it "works" but breaks your other options. | | How do you teach people to effectively use a profiler on | code with pernicious caching? | | Part of why I'm the "Cleaner" is because I've worked with | any number of people who will present "evidence" that | everything that can be done has been done. Here are the | metrics from a year ago when we turned on the cache. Here | are the flame charts. See? Nothing to see here. They get | really uncomfortable six months later when I'm reporting | 2-3x in code that can't be optimized further. And in each | case everything was made more difficult by the people ahead | of me chumming the waters. | | They were partially correct. There was in fact nothing left | for them to see, so nothing that they knew to do. But | "there's nothing to see here" and "there's nothing here" | are two very, very different statements. | trgn wrote: | There's a somewhat tangentially related heuristic to your | points imho. When writing code, when in doubt, task CPU | over memory, it will generally yield leaner, better | software. Easier to profile, debug, and optimize. Much | easier to scale in production environments too. | | Code which optimizes for fast running time, tends to lean | heavily into using memory-bloating data structures and | caches. These are often poison for aggregate performance | in real world environments, and like you said among other | things, reduce visibility in the workings of system. | | There's some implications of this stance; e.g. | optimizations to reduce memory footprint are generally | never wasted, while optimizations to squeeze out a few ns | of runtime may have many more unexpected consequences. | ilyt wrote: | > How do you teach people to effectively use a profiler | on code with pernicious caching? | | Profile the function that is called if object is not | found in cache ? Have pluggable cache with plugin that | just doesn't cache? That is not hard problem to solve. | | > They get really uncomfortable six months later when I'm | reporting 2-3x in code that can't be optimized further. | And in each case everything was made more difficult by | the people ahead of me chumming the waters | | Well, they delivered (I'm guessing) 10x improvement in a | month by adding cache. Caching is entirely reasonable | first step and it can be good enough to be last one. | | And frankly ability to use profiler and knowing some | architectural optimizations seems to be pretty rare | skill. | zasdffaa wrote: | Exactly! People here seem to be making a complex thing | out of a simple thing. | Aeolun wrote: | I kinda feel that your convincing counterargument is just | 'I have a different experience'. | | I've never seen people reach for caches first, and then | discard all further optimizations. | | Or rather, I've seen them do that, but not out of some | misplaced feeling that it was the only solution, just the | one that was easiest to accomplish in very little time. | cogman10 wrote: | The issue with caches is they are easy to apply. You can | easily drop a cache in any place you load a resource | without significant code changes. | | I tend to think they are ok weapons of sometimes first | resort (rarely changing resource with heavy system | impact). But you'll definitely get better results simply | by rethinking and rearchitecting stuff. (Did you really | need to load the database just to lookup a single value?) | hinkley wrote: | > You can easily drop a cache in any place you load a | resource without significant code changes. | | Only when the cache is brand new. The problem with people | with cache addiction is that they think they're free. So | once they know a cache is there, they begin to lean on | it, and lean on it. I've seen it with many different | teams in a number of verticals. Sometimes while I'm | there, sometimes before I arrived. It's the people not | the company. | | This illusion that you can always turn it off later is | dangerous. That's only true if you have an impeccable | information architecture when you start. And if you have | that, you've already done a lot of heavy lifting before | caching becomes your only option. | zasdffaa wrote: | It's dificult to refute your argument because I don't | understand your points: | | > Because of the locality aspect, people will assume that | it's 'free' to look up a value multiple times instead of | passing these data dependencies between the relevant | methods | | what does that even mean? What does 'data dependencies' | mean specicially? | | > then you can end up evicting data in the middle of a | transaction | | So in the middle of a transaction, instead of looking up | data from a key - very cheap - you should do ... what? | Recalculate it instead? | | > The less obvious consequence is now you have | concurrency bugs, because the code assumes that decisions | made about the data will go the same way on each call, | but due to eviction, two halves of the transaction may | make conflicting decisions with undefined behavior | | Assuming I even understand this, you are saying where | there's concurrencty there's a risk of pulling stale data | out of a cache. Well, true. That's just something you | have to be careful of. Add a Tx reference to the cache | key. | | > Once you introduce caching you poison the performance | well. | | Not my experience at all. I can't understand why you're | saying this. | | > Flame charts lie when cache is used | | How so? | | > and after people rely on the cache it is lying in the | other direction when the cache is off, because you're | calling that path 5 times now and it's swamping other | bottlenecks | | I've no idea what this means | | > This is what I really mean when I say it's a weapon of | last resort. Because once you use it, it attempts to take | any other options off the table | | Heh! Absolutely not! Other optimisations are orthogonal | to caching IME: | | Better algo - orthogonal | | Go lower level - orthogonal | | Go multicore - orthogonal | | Noobish SQL query is begging for a rewrite - orthogonal | | Use L1/2/3/ chip cache better, or RAM, by careful data | layout - orthogonal | | Use a better library - orthogonal | | Use better data structure eg. than linked list - | orthogonal | svieira wrote: | >> Because of the locality aspect, people will assume | that it's 'free' to look up a value multiple times | instead of passing these data dependencies between the | relevant methods | | > what does that even mean? What does 'data dependencies' | mean specicially? | | It means that instead of having a method like | `entryPoint(int)` that looks up a `Foo` by id and then | passes the `Foo` around (`verify(Foo)`, `Foo | update(Foo)`) you instead wind up with a system that | passes around an `int` (`verify(int)`, `/* Unused */ Foo | update(int)`) and each layer then looks up `Foo` in the | `FooService` (because `FooService` has a cache and | subsequent lookups should be essentially-free). | | Even worse when `FooService` is abstracted behind a | thread-local like `ThreadLocal<Foo> currentFoo`, the | implementation of which is | `fooService.findById(currentFooId.get()) /* cheap, | because fooService has a cache */`. | | My own take on this is that caches are global-by-default | when you really want local-by-default (e. g. dynamic | programming instead of a `static final Cache | theCacheForTheEntireProgram`). Local-by-default caches | are easy to make transactional (when the call stack | returns the cache is gone) and don't grow in an unbounded | manner in the general case. | hinkley wrote: | Yes! | | Local by default also has the feature/problem of making | you look at the scale of the data involved in your query | up front, before it's having a 100-1000 req/s hitting it | in production. It's harder to pretend this large | dependency graph will just magically take care of itself | (or worse, be someone else's problem after you've | declared victory) when it's staring you in the face. | gav wrote: | > Instead, it's pointing out a risk that sometimes they are too | good, leading to a system that can't function when the normal | level of locality and memoization are not available. | | And at that point cache management starts to be a huge problem. | Clearing caches becomes a death spiral because your system | can't actually support your normal load. | | For example I worked with a large retailer, who had a giant | monolithic enterprise Java ecommerce system. It was so slow | that product detail pages took upwards of 30 seconds to render. | There was a cache in front of that system (Ehcache), then a | page-level cache (Varnish) in front of that, then finally | Akamai took 98% of the load. | | Rollouts were terrifying, you cleared the cache and the site | might come back up. Changing Akamai configuration was also | similarly risky. | | It's critical that systems don't become cache dependent for | load, they should be a performance boost, not a required | component. | nly wrote: | Caching is easy, cache invalidation is hard. | latchkey wrote: | "There are two hard things in computer science: cache | invalidation, naming things, and off-by-one errors." | | https://twitter.com/codinghorror/status/506010907021828096 | withinboredom wrote: | I have no idea why you're quoting that tweet. This saying is | much older than 2014. | PaulKeeble wrote: | Multiple times I have found the impact of cache performance to be | dramatic. I find that the usual optimisations in three aspects | are usually obvious to achieve and can bring surprisingly big | improvements. | | The first is that RAM accessed randomly or in the wrong direction | performs very poorly compared to sequential access in order. The | same is true of cache because its much easier to predict and | prefetched data when its accessed serially. The effect is so | dramatic that its sometimes worth "accessing" data you don't need | to get access to this hardware optimisation to maintain serial | access. At times for example I have found Treemaps and listmaps | perform better than hashmaps as the cheap search serially and in | memory order surprisingly out does the O(log(n)) or O(n) | algorithms and the cost of the hash. | | If you have a data structure that is larger than optimal and | overflows cache lines it can have big impacts on cache | utilisation and performance if the algorithm doesn't need all the | fields. Object orientation impacts quite heavily on this and | certain designs can make very large objects and these often get | poor cache usage. | | Unnecessary allocation and deallocation of memory in algorithms | often results in the cache getting thrashed and avoiding this and | using the same memory can result in better hot cache usage and | reducing garbage collection usage. | | Most profilers can give you a good idea of the allocations but | the other two are often harder to find and mostly just show up in | cache misses. ___________________________________________________________________ (page generated 2022-11-14 23:00 UTC)