[HN Gopher] Cache made consistent: Meta's cache invalidation sol... ___________________________________________________________________ Cache made consistent: Meta's cache invalidation solution Author : uvdn7 Score : 143 points Date : 2022-06-08 17:46 UTC (5 hours ago) (HTM) web link (engineering.fb.com) (TXT) w3m dump (engineering.fb.com) | Hnrobert42 wrote: | You can't even visit the FB blog while using NordVPN. Amazing. | [deleted] | gwbas1c wrote: | > Based on consistency tracing, we know the following happened... | | I'm a little confused: | | - Is the database updated notification going out while the | transaction is still in-progress? Doesn't it make more sense to | delay the notification until after the transaction commits? | | - If a cache tries to update itself while there's an open | transaction against the data it's trying to read, shouldn't the | database block the operation until the write is complete? And | then shouldn't another notification after the transaction commits | trigger the cache to update? | uvdn7 wrote: | > Is the database updated notification going out while the | transaction is still in-progress? | | No, that's not what happened. | | > If a cache tries to update itself while there's an open | transaction against the data it's trying to read, shouldn't the | database block the operation until the write is complete? | | Yes. Generally speaking, having a read transaction here would | be better. There are unrelated reasons why it's done this way. | The point of the example is that it's really intricate and we | can still identify the bug regardless. | pdevr wrote: | From the article: | | >>Polaris pretends to be a cache server | | De facto, it is a cache server, with the associated performance | decrease. Isn't the main purpose of caching to improve | performance? | | >>We deploy Polaris as a separate service so that it will scale | independently from the production service and its workload. | | Assuming the scaling is horizontal, so then, to synchronize among | the service instances, what do you do? Create another meta- | Polaris service? Not rhetoric or sarcasm - hoping for an open | discussion. | [deleted] | uvdn7 wrote: | > De facto, it is a cache server, with the associated | performance decrease. Isn't the main purpose of caching to | improve performance? | | Polaris only receives cache invalidation event, and doesn't | serve any client queries. | | > so then, to synchronize among the service instances | | It doesn't synchronize among the service instances. It's | actually basically stateless. It pretends to be a cache server | only for receiving invalidation events, and acts as a client, | and assumes no knowledge of the cache internals. | pdevr wrote: | Thanks for the clarification. However, in that case, won't | the querying of all the cache copies/replicas turn out to be | the bottleneck at high volumes? Because you are going to have | to check all the existing copies/replicas somewhere, right? | uvdn7 wrote: | Yep. Polaris checks can be sampled, but still covers pretty | much all types of workload and scenarios/races over time. | bklaasen wrote: | > Phil Karlton famously said, "There are only two hard things in | computer science: cache invalidation and naming things." | | ...and off-by-one errors. | [deleted] | Trufa wrote: | This seems to be a pure click bait title, cache invalidation will | always be hard since it's basically a balanced give and take | issue which simple logic dictates can't avoid. | uvdn7 wrote: | Can you elaborate? | | I went into details in the post about what I believe is the | root cause of why cache invalidation is hard, which seems | different than what you are saying. | orf wrote: | > Take TAO, for example. It serves more than one quadrillion | queries a day | | That's 11,574,074,074 requests per second. | irrational wrote: | Phil Karlton famously said, "There are only two hard things in | computer science: cache invalidation and naming things and off- | by-one errors." | ntoskrnl wrote: | Correct me if I'm wrong, but the title seems a little clickbaity. | "Cache invalidation might no longer be hard" --> "We built a | tracing library and manually fixed the bugs it uncovered" | kilburn wrote: | "A little clickbaity" if you are generous. Short-sighted and | grandiloquent if you are not. | | Short-sighted because it narrows the original problem (cache | invalidation) to a specific version ( _detecting_ cache | invalidation errors _past an arbitrary time-frame_ ). | | Grandiloquent because it then claims to have "solved" the | general problem, whereas they haven't even solved the narrowed | version. Notice their own premise: | | > For any anomaly in a stateful service, it is an anomaly only | if clients can observe it one way or the other. Otherwise, we | argue that it doesn't matter at all. | | This is a practical attitude that can yield great results, and | probably even good engineering practice. However, it will never | lead to _solving_ a problem, because a problem isn 't solved | until you _prove_ that it is. | uvdn7 wrote: | > For any anomaly in a stateful service, it is an anomaly | only if clients can observe it one way or the other. | Otherwise, we argue that it doesn't matter at all. | | Thanks for the comment! I stand by this claim; I would love | to hear more about why an anomaly (any anomaly) matters if it | can't possibly be observed by any client for stateful | service. | tofuahdude wrote: | It is just the difference between "matters" and "solved". | kilburn wrote: | The claim itself is fine because it is kind of | tautological. An error that _cannot be observed in any way_ | is probably not an error. However, you _have_ to prove that | it can 't be observed. | | This is quite different than what the described system does | though. The system flags errors that _have_ been observed. | The correct claim to go with what the article does would | thus be: | | > For any anomaly in a stateful service, it is an anomaly | only if the current clients running their current | operations happen to observe it. Otherwise, we argue that | it doesn't matter at all. | | It's much more noticeable this way, isn't it? | uvdn7 wrote: | This is a fair point! | | The exact same argument applies to normal exceptions and | errors. It might be the case that out of quadrillions of | queries a day, just none of them hit the error path, and | hence we won't observe any issues. | | I can see that in this case, if it matters or not becomes | somewhat subjective. It's a good point though! | uvdn7 wrote: | I am the author; so I am obviously biased here. | | I am serious when I say cache invalidation might no longer be a | hard thing in computer science. In the post, I explained why | cache invalidation is hard; and how we solve/manage its unique | challenge and complexity. By my definition, we are solving the | cache invalidation problem. | | The analogy I have is Paxos. It's notoriously hard to implement | Paxos correctly. Google published a paper on Paxos Made Live | just on how they managed the implementation and | productionization complexity. | continuational wrote: | Please be careful with such bold claims. You don't really | address the hard part of cache invalidation, which is to | figure out _when_ to do it. | uvdn7 wrote: | > which is to figure out when to do it. | | Can you elaborate? | continuational wrote: | Sure - you typically cache the result of some expensive | query. | | The hard part of cache invalidation is to detect _when_ | an update somewhere in your system is going to affect the | result of that query, such that you need to trigger an | invalidation. | uvdn7 wrote: | Good point! | | We have memcache which is a look-aside cache that serves | this type of workload. What you described can be solved | by adding one level of abstraction and letting reads and | writes all go through it. | | Now on your read path, you can construct arbitrary | complex sql queries or whatnot, but it must take some | kind of input to filter on. Those become part of the | "keys". | | The invariant is that as long as the "context/filter" you | encode covers all the mutations which would impact your | cache data, you should be good. Based on our experience, | it has been fairly managable. | irrational wrote: | >The invariant is that as long as the "context/filter" | you encode covers all the mutations which would impact | your cache data, you should be good. | | Well, isn't this the truly hard part of cache | invalidation? | uvdn7 wrote: | I can see that. One can argue that the "difficulty" is | front loaded. If or not you can identify the list of | "context" is local. I guess you can probably come up with | complicated dependencies and argue it's hard to capture | the dependencies and I would agree with you. | | Now getting back to the "what's really hard about cache | invalidation" part. Even with a much simpler model. Say | you just have a k/v store, no joins, nothing. Is cache | invalidation simple in that case? | | I went into details about why it's still insanely hard. | And the big example at the end might help make that | point. | | And this is one level beneath challenges from tracking | dependencies, and I argue that's what makes cache | invalidation hard. | | Now going back to your example with complicated | dependencies. Maybe TTL is a better solution. With many | dependencies, any changes from the dependency list might | trigger invalidation. At some point, just doing TTL, | would be simpler. | ahahahahah wrote: | Yeah, the entire discussion here is fucked because the | poster doesn't understand what the original quote even | meant. | darig wrote: | dang wrote: | Yes - we changed it now and I posted | https://news.ycombinator.com/item?id=31672562. | londons_explore wrote: | > In other words, 99.99999999 percent of cache writes are | consistent within five minutes. | | Those are the kind of rates that lead to very hard to find bugs. | You won't properly write code in downstream systems to properly | handle that failure case, and suddenly someone will get paged at | 3am because it has just happened years after the code was | deployed and nobody really understands how this corner case | suddenly happened! | | When I'm building systems, every happy codepath should either | happen frequently or never. | | Having a cache whose design allows it to be stale but only very | rarely seems like a bad idea. If I was building something on top | of this cache I would have a layer on top to force staleness for | 1 in 1000 requests just to force the application built on top to | be designed to handle that properly. | uvdn7 wrote: | I agree with everything you said. | | > Having a cache whose design allows it to be stale but only | very rarely seems like a bad idea. | | And that's exactly why we first brought this number from 6 9's | to more than 10 9's; and we are not done yet. Also the cache is | not "designed" to be inconsistent. It's like Paxos is designed | to solve the consensus problem, but when implemented, most | Paxos implementations do not work properly. | VWWHFSfQ wrote: | As naive as it is, I always kinda liked MySQL's query cache | invalidation strategy: blow away the whole cache anytime a | mutating statement arrives at the server. | | Simple. Effective. But it obviously doesn't work for everything. | Trufa wrote: | Cause who needs scaling anyway!? Joking aside, strategy seems | logical for smaller stuff. | judofyr wrote: | > At a high level, Polaris interacts with a stateful service as a | client and assumes no knowledge of the service internals. This | allows it to be generic. We have dozens of Polaris integrations | at Meta. "Cache should eventually be consistent with the | database" is a typical client-observable invariant that Polaris | monitors, especially in the presence of asynchronous cache | invalidation. In this case, Polaris pretends to be a cache server | and receives cache invalidation events. | | I'm a bit confused here. Polaris behaves as a regular cache | server which receives invalidation events, but isn't the typical | bug related to cache invalidation that a service _forgets_ to | contact the cache server for invalidation? So this will only | catch cases where (1) you remember to contact Polaris, but you | forgot to contact other cache servers [which Polaris happens to | know about], OR (2) you 're not handling errors during | invalidation requests to the cache server [and the request to | Polaris was successful]? Or are you cache servers "smart" and | might have internal logic which "ignores" an invalidation? | | What am I missing? | | EDIT: Reading through "A real bug we found and fixed this year" | and I'm still a bit confused. It seems like a very contrived bug | directed directly to how you deal with versioning (e.g. you allow | the latest version to be present with stale metadata, or | something?). My main concern with cache invalidation is _what_ to | invalidate at what time. | uvdn7 wrote: | > isn't the typical bug related to cache invalidation that a | service forgets to contact the cache server for invalidation? | | That's not the case based on our experience at FB. At-least- | once delivery is a solved problem basically. | | But you are absolutely right that if there's an issue in the | invalidation delivery, it's possible that polaris won't receive | the event as well. Polaris actually supports a separate event | stream (all the way from client initiated writes) to cover this | case. | benlivengood wrote: | It helps that TAO is a write through cache so clients can't | really forget to invalidate, correct? If someone were to | directly write to MySQL shards there would be stale data | ~indefinitely. | | I'm assuming the versioning is what ensures proper write, | invalidation, and fetch ordering so that e.g. slow mysql | writes/replication don't cause remote TAO clusters to receive | an invalidation message and then read a stale value from a | replica? | uvdn7 wrote: | > It helps that TAO is a write through cache so clients | can't really forget to invalidate, correct? | | That's not the case. | | > If someone were to directly write to MySQL shards there | would be stale data | | For the sake of this discussion, you can assume this is the | setup, and everything should still stand. | | > I'm assuming the versioning is what ensures proper write, | invalidation, and fetch ordering so that e.g. slow mysql | writes/replication don't cause remote TAO clusters to | receive an invalidation message and then read a stale value | from a replica? | | You should join us :p This is getting into the details of | our cache invalidation protocol and our cache consistency | model. Maybe another post on another day! | uvdn7 wrote: | > EDIT: Reading through "A real bug we found and fixed this | year" and I'm still a bit confused. | | yeah that specific bug is convoluted. | | > It seems like a very contrived bug directed directly to how | you deal with versioning (e.g. you allow the latest version to | be present with stale metadata, or something?). | | This is mostly for performance reason that I won't go into | details too much here. Version is mostly an internal concept | that clients don't care about. So it's OK-ish. | | > My main concern with cache invalidation is what to invalidate | at what time. | | Can you elaborate? Our experience on cache invalidation is that | you invalidate on mutation, and in terms of "what to | invalidate" it depends on the computation, which you might have | to give me a little more details. | benlivengood wrote: | Do you guarantee referential integrity in TAO? Last I heard the | answer is no; complex queries are not supported and clients would | have to do their own work with internal versions to achieve a | consistent view on the whole graph (if such a consistent view | even exists). But it seems to work fine since global referential | integrity doesn't seem to be a big deal; there aren't a lot of | cases where dependency chains are long enough or actions quick | enough that it matters (e.g. a group admin adds a new user, makes | them an admin, who adds an admin, and then everyone tries to | remove the admin who added them [always locally appearing as if | there is >=1 remaining admin], but causes the group to ultimately | have no admins when transactions settle). Run into any fun issues | like that? | | Contrasting that with Spanner where cache coherency is solved by | reading from the recent past at consistent snapshots or by | issuing global transactions that must complete without conflict | within a brief truetime window. I am guessing the cost of Spanner | underneath the social graph would be a bit too much for whatever | benefits might be gained, but curious if anyone looked into using | something similar. | uvdn7 wrote: | For the fun issues you described, read-modify-write (either | using optimistic concurrency control or pessimistic) can work, | if I understand your question correctly. | | Spanner is awesome and one of my favorite systems/papers. I | think it would be very computational and power expensive to run | the social graph workload on a spanner-like system. Do you have | an estimate of how many megawatts (if not more) are needed to | support one quadrillion queries a day on Spanner? | lucideer wrote: | This would have been a really good article in itself, without the | frankly unbelievable hubris of the author. | | TL;DR this is a blogpost about some very interesting problems | with distributed cache consistency and observability. The article | doesn't address cache invalidation and - based on their replies | here - the author doesn't seem to understand the cache | invalidation problem at all. | uvdn7 wrote: | I am glad that you liked the content. | | On the definition of cache invalidation, and specifically why | it's hard. Can you send me a link to any definition of it? This | is what's in wikipedia and I think it's reasonable. | | > Cache invalidation is a process in a computer system whereby | entries in a cache are replaced or removed. | | And I think I am describing that process, and what's hard about | it. Some comments here explicitly talk about dependencies, | which I can see why it's hard. My point is that even without | dependencies, cache invalidation remains a hard problem. Now | about dependency tracking, some of my thoughts are captured | here https://news.ycombinator.com/item?id=31674933. | lucideer wrote: | Numerous replies to your comments here have pointed out why | cache invalidation is hard. You've responded to them by | saying "Good point!" (always with an exclamation point), and | then proceeded to demonstrate in your response that you | didn't understand their point. | | This comment describes the "hard" part of cache invalidation | best: https://news.ycombinator.com/item?id=31674251 - your | response to them makes very little sense. | | First, you acknowledge that they're correct, though you use | obtuse language to say so: you seem to like using the very | abstract term "dependency" to represent the very simple | concept of detecting updates. | | In your second paragraph you then go off on an unrelated | tangent by saying: | | > _Now getting back to the "what's really hard about cache | invalidation" part._ | | No. You're not "getting back" to that - you're changing the | subject back to the topic of your article, which is unrelated | to cache invalidation. | | > _Say you just have a k /v store, no joins, nothing._ | | Cache invalidation is about invalidation - it's not about | your store architecture. It's not about the implementation of | logic that processes the clearing/overwrite of values, it's | about the _when_ and nothing else. | | > _just doing TTL, would be simpler._ | | Yes. It is simpler. If you want to avoid solving _the_ hard | problem, you can use a TTL. Now... how long should it be? | uvdn7 wrote: | First of all, I do think you folks make good points. Let me | try again. | | > The invariant is that as long as the "context/filter" you | encode covers all the mutations which would impact your | cache data, you should be good. | | I wrote this line, which is referred to as that being | exactly why cache invalidation is hard, in the commented | you linked. | | > it's about the when and nothing else. | | Let's talk about that. Not to over generalize this, with a | simpler cache model (say you just cache a single item), do | you agree that solving the "when" problem is very | managable? If not, I would like to be enlightened. | | Now with this very simple cache model, where we have | "magically" solved the "when" problem. Do you think cache | invalidation is solved? Or it's simple? After knowing when, | you still need to actually update cache right, and not to | leave it in an inconsistent state (against the source of | truth). Is that simple? | | Let's essentially break cache invalidation into a few parts | 1. knowing when/who to invalidate 2. actually processing | the invalidate | | My argument is that #1 can be very managable with simpler | data models. #2 can't be avoided; and #2 is very hard. | pyrolistical wrote: | I don't get it. If you already have the version in the cache, | then when a client does GET X, the cache can reply X@v4. | | As long as the client submits transactions with "I calculated | this based off of X@v4" then the database can reject if there is | a newer version of X. | | The client can then replay the transactions against the cache and | by then there will be a X@v5. | | With this scheme you can track the transaction replay rate | instead of having to build a new system to read-back the cache | for consistency. | | To get the traceability on which cache node is stale, the client | can forward a header from the cache node that identifies it. Then | using the same transaction rejection rate ops has visibility on | which cache node isn't being updated. | | No cache invalidation needed. Always just cache forever | X@version, it's just that the cache allows an unstable query for | GET X. | jitl wrote: | Tracking all data dependencies used for all writes in a large | system seems rather challenging. | | Plus what about read workloads, like "pull message from queue | and send it as a push notification to user Y"? I guess it's | fine if the push is re-delivered due to stale cache? | dragontamer wrote: | Versions of cache-data aren't numbers, they're vectors-of- | numbers (at a minimum). | | Here's a bunch of versions of variable "X": | | X@Alice-V1, X@Alice-V2, X@Bob-V1, X@Alice-V2/Bob-V2, X@Carol-V1 | | Which version of "X" should you read? | | EDIT: Alice, Bob, and Carol are three different servers. What | has happened here is that Alice and Bob are closer together, so | their updates are faster between each other (synchronizing the | writes). Carol is slower for some reason (bad connection?), and | is going to update off of old data. | | In this case, the two valid caches are X@Alice-V2/Bob-V2, and | X@Carol-V1. The other cached data is old and can be discarded. | | Things get complicated when you have not only reads (such as in | your simplified question), but also writes that are cached | (such as what happens in practice). | moralestapia wrote: | ??? | | We seem to have different concepts with regards to the cache | invalidation problem. | | The one I know has to do with "Is there a change in the source | data? How should I check this? How often?" and such things. | | Yours seems to be "In my distributed system, I cannot guarantee | the order of the messages that get exchanged, and thus, different | nodes may end up having different information at times". | | IMO you have a consensus/distributed data problem, not a cache | invalidation problem (the inconsistency between your cache nodes | is a consequence of the underlying design of your system). | throwdbaaway wrote: | Exactly. The race condition problem from the example below | still exists even without any caching, and perhaps still exists | even without any replica. | | > Imagine that after shuffling Alice's primary message store | from region 2 to region 1, two people, Bob and Mary, both sent | messages to Alice. When Bob sent a message to Alice, the system | queried the TAO replica in a region close to where Bob lives | and sent the message to region 1. When Mary sent a message to | Alice, it queried the TAO replica in a region close to where | Mary lives, hit the inconsistent TAO replica, and sent the | message to region 2. Mary and Bob sent their messages to | different regions, and neither region/store had a complete copy | of Alice's messages. | | A solution to this problem is to update Alice's primary message | store to be "region 2;region 1" for a period of time, and get | Alice to retrieve messages from both message stores during that | period. | | However, since you have a caching layer that doesn't have an | explicit TTL, such that any cache inconsistency can last | indefinitely, you now have a much bigger distributed data | problem. | tremon wrote: | FAFAIK the original cache invalidation problem is always seen | from the point of view of a writer in a multiple-writer | distributed system. It's not about determining the freshness of | data as seen from a reader/consumer, but about atomically (!) | updating all data copies by a write action in a distributed | system. | | If you assume loosely-coupled, eventually-consistent data, you | haven't solved the problem at all -- you've just side-stepped | it. If you assume a single-writer, multiple-reader | architecture, you don't have the original problem at all. | uvdn7 wrote: | > the original cache invalidation problem | | Do you have any reference or links about it? Thanks. | | > point of view of a writer in a multiple-writer distributed | system > If you assume a single-writer, multiple-reader | architecture, you don't have the original problem at all. | | It starts to sounds like you are describing Paxos? | uvdn7 wrote: | I think https://news.ycombinator.com/item?id=31672541 might | address your question. | moralestapia wrote: | No, mine was not a question, it was a statement. | | You misnamed the problem you are trying to solve. | uvdn7 wrote: | OK. I am here to learn. Say we go with your definition of | cache invalidation problem. Do you think that's THE hard | part of cache invalidation? | | Say, you are just caching simple k/v data (no joins, | nothing). Are you claiming cache invalidation is simple in | that case? | | Also the reason why I didn't mention the cache invalidation | dependencies is that _I believe_ it's a solved problem (see | the link above, we do operate memcache at scale). I am | happy to discuss if and why it wouldn't work for your case. | moralestapia wrote: | >Are you claiming cache invalidation is simple in that | case? | | No. | | >I believe it's a solved problem | | It's not. | | Suppose you have source-of-truth A (doesn't really matter | if it's a key-value store or whatever, it could even be a | function for all intents) and a few clients B1, B2, B3, | ... that rely on the data from A. You have to keep them | in sync. When should B* check if A has changed? Every | time they need it? Every minute? Every hour? Every day? | This is the cache invalidation problem; which IMO is not | even a problem but a tradeoff, but whatever. | | Epilogue: With all due respect, you have an unbelievable | career, IBM, Autodesk, Google, Box and now Meta. None of | those companies would give me five minutes of their time | because I am self-taught, yet here we are :) | unboxingelf wrote: | Epilogue: With all due respect, you have an unbelievable | career, IBM, Autodesk, Google, Box and now Meta. None of | those companies would give me five minutes of their time | because I am self-taught, yet here we are :) | | Correlation does not imply causation. I'm also self | taught and some of those companies have given me a lot | more than 5 minutes of their time. | | Here's some unsolicited feedback - I interpreted your | comment above as implying you know more than $peer, and | it comes off a little snarky with the emoji face. That | approach doesn't demonstrate professionalism to me. On | the other end, $peer focuses on the technical challenge. | Most of the gig boils down to communicating ideas. | uvdn7 wrote: | > Suppose you have source-of-truth A (doesn't really | matter if it's a key value store or whatever, it could be | a function for all intents) and a few clients B1, B2, B3, | ... that rely on the data from A. You have to keep them | in sync. When should B* check if A has changed? Every | time they need it? Every minute? Every hour? Every day? | That is the cache invalidation problem. | | A few things to clarify here what you are referring to as | clients are cache hosts (as they keep data). You seem to | imply that the cache is running on the client? I was | referring to cache servers (think memcache, Redis, etc.), | for which the membership can be determined. So on update | (e.g. when you mutate A, you know all the Bs to | invalidate). | | Now continuing with your example, with cache running on | the client. Assuming we are talking about same concept | when we say "client", the membership is non | deterministic. Clients can come and go (connect and | disconnect as they wish). There are some attempts to do | invalidation-based cache on clients, but they are hard | because of the reason I just mentioned. So usually client | cache is TTL'ed. E.g. the very browser you are using to | see this comment has a lot of things cached. DNS is not | going to send an invalidate event to your browser. It't | TTL based. | | I guess what I am saying is that cache invalidation | rarely applies to cache side cache as far as I know. | Maybe you have a different example, which we can discuss. | moralestapia wrote: | Caches, clients, Facebooks, hosts, Metas, Redises(?), ... | all those things don't really matter. | | What matters is B* reads from A, how often should that | happen? That's it, literally. That's the whole problem. | teraflop wrote: | It doesn't matter whether the cache is co-located with | the "client" that ultimately uses the data. Say A is a | database, and B1, B2, B3... are memcached servers. The | exact same situation applies. | | > So on update (e.g. when you mutate A, you know all the | Bs to invalidate). | | But "knowing" this is a big part of what people mean when | they say cache invalidation is hard! If the value in B is | dependent on a complicated function of A's state, then it | may be difficult to automatically determine, for any | given mutation to A, which parts of B's cache need to be | invalidated. | | > There are some attempts to do invalidation-based cache | on clients, but they are hard because of the reason I | just mentioned. So usually client cache is TTL'ed. | | Given this statement, the original title of this | submission is even more baffling. If you recognize that | data can be cached in clients, and that invalidating | those caches is so hard that most systems -- including | yours -- just completely abandon the goal of being able | to do it correctly/reliably, then how can you claim your | system makes it no longer a hard problem? | uvdn7 wrote: | Good points. I will try to address them. | | > But "knowing" this is a big part of what people mean | when they say cache invalidation is hard! | | I can see that. Memcache is a look-aside cache we have at | scale. There are abstractions on top to manage this | complexity; and it has been fairly managable. I am sure | you can come up with a complicated dependency tree that | things are not obvious at all. But when you do have a | very large dependency tree, any change in them can | trigger cache invalidation, at which point, caching with | TTL will be a better option IMO. I can see where you are | coming from. | | > If you recognize that data can be cached in clients, | and that invalidating those caches is so hard that most | systems | | My reasoning for this being hard is different than yours | I think. In my comment, it's due to indeterminism of the | cluster membership. I think in that case, we are talking | about different problems. | uvdn7 wrote: | It's a good observation that out-of-orderness and asynchrony | contribute a lot to the challenge. | | > the inconsistency between your cache nodes is a consequence | of the underlying design of your system | | I disagree with this. First of all, imposing order and | synchrony is expensive, slow and unreliable. Even if we do | that, cache fill and invalidation will never be externally | coordinated; and will collide on the cache host. | wildmanx wrote: | > > the inconsistency between your cache nodes is a | consequence of the underlying design of your system | | > I disagree with this. First of all, imposing order and | synchrony is expensive, slow and unreliable. Even if we do | that, cache fill and invalidation will never be externally | coordinated; and will collide on the cache host. | | This is a weird reply. GP writes "A implied B" and you | disagree by arguing that you had to do A. Sure. But it's | still what implied B. Quite a fundamental logic flaw, no? | uvdn7 wrote: | I don't see that. Yes I did say I had to do A. Then I said | that it doesn't imply B. Because even if the underlying | system is strongly ordered, we can't order external events. | Did I miss something? | wildmanx wrote: | A == "Design of your system" | | B == "Inconsistency between your caches" | | GP: A -> B | | You: "Imposing order and synchrony is [bad things]" == | "If you don't do A then [bad things]", i.e., (not A) -> | [bad things] | | Also you: "Even if we did, then [other issues]" == "Even | if you don't do A then [other issues]", i.e., (not A) -> | ..what? B? | | You are certainly not arguing against the A -> B | implication. | uvdn7 wrote: | I really appreciate your comment. I am learning as we | speak. | | I was essentially saying !A !-> !B - even if we have a | strongly ordered system, there will still be | inconsistencies/races/problems. | | And I can see why it's a bad/weird argument. Thank you | for pointing it out. Really appreciate it! | seqizz wrote: | Yeah I was also intrigued by the title and couldn't find what I | expected tbh. | cratermoon wrote: | Yeah it reads to me like they redefined "cache consistency" to | mean something different. | uvdn7 wrote: | That's not intended. What's your definition of cache | consistency? | rossmohax wrote: | How both of these can be true? | | > Data in cache is not durable, which means that sometimes | version information that is important for conflict resolution can | get evicted. | | > We also added a special flag to the query that Polaris sends to | the cache server. So, in the reply, Polaris would know whether | the target cache server has seen and processed the cache | invalidation event. | | To make special flag work, cache server need to track not only | current version state, but also past versions. If it tracks past | versions, then conflicts can be resolved at the cache server | level, but whole premise of article is that cache servers can't | resolve conflicts by themselves. | uvdn7 wrote: | Great question! | | I will try to answer this one without too much implementation | details. | | First of all, cache items can be evicted and are not durable, | which is just a fact. But that doesn't mean we can't track | progress (in terms of "time" or "logical time"/"order" in | distributed system terms). The social graph is sharded (not | surprisingly). We can keep progress of each cache host per | shard, which is just a map kept in memory, which doesn't get | evicted. | | I hope this answers your question. | isodev wrote: | yodsanklai wrote: | Totally off-topic, but... | | https://www.facebook.com/help/152637448140583 | | > No, we don't sell your information. Instead, based on the | information we have, advertisers and other partners pay us to | show you personalized ads on the Facebook family of apps and | technologies. | | A lot can be said about Meta but I find it counterproductive to | make up facts. It's not correct that they "sell personal | details". | gtirloni wrote: | Your comment does not contribute anything useful to this | technical subject. | uvdn7 wrote: | I am the author of the blog post. I believe the methodology | described should be applicable to most if not all invalidation- | based caches. I am serious when I say that cache invalidation | might no longer be a hard thing in computer science. AMA! | robmccoll wrote: | Saying a problem isn't hard to solve because you have tools to | analyze the correctness of your solution seems like a stretch. | uvdn7 wrote: | But that's not what I am saying though ... | | > you have tools to analyze the correctness of your solution | | That's half of it. Cache invalidation is hard not only | because of the complexity of cache coherence protocols, some | of which is not very complicated. But cache invalidation does | introduce countless races that manage to introduce cache | inconsistencies in ways that are just hard to imagine ahead | of time (in my experience). IMO, that's the harder part of | cache invalidation - when cache inconsistencies happen, | answering the "why" question is much harder than having a | cache invalidation protocol (you can have TLA+ for one if you | will). And answering the "why my cache is inconsistent" is | the problem we solved, which I think is the harder part of | the cache invalidation problem. | latchkey wrote: | > answering the "why my cache is inconsistent" is the | problem we solved | | That should be the title and focus of your post. Instead, | it feels like grandiose claims about solving cache | invalidation itself. | uvdn7 wrote: | > Instead, it feels like grandiose claims about solving | cache invalidation itself. | | That is definitely something I worried about. But at the | same time, I do think we solved the harder part of the | cache invalidation problem. TAO and Memcache serves | quadrillions queries a day. Based on our experience, | answering the question of "why my cache is inconsistent" | is the hardest thing about cache invalidation. | | Cache invalidation protocols can be complicated, but some | are pretty managable. You can also verify it using TLA+ | if you will. But once the rubber hits the road, some | cache entries tend to be inconsistent. | | I definitely worry about people taking this the wrong | way, but at the same time, I stand by the claim of "cache | invalidation might no longer be a hard thing in computer | science". | latchkey wrote: | The jist of what I got from the post was that you created | a tool which monitors caches and that helped find bugs | that cause inconsistent cache issues. How is that solving | a computer science problem? | uvdn7 wrote: | The tool that monitors cache consistency is easy to | build. That by itself doesn't solve anything major. The | most important contribution is a novel approach on | consistency tracing that helps find out "why" caches are | inconsistent - pinpoint a bug is much harder than saying | "there is a bug". This is based on the key insight that a | cache inconsistency can only be introduced (for an | invalidation-based cache) in a short time window after | the write/mutation, this is what makes the tracing | possible. | | > How is that solving a computer science problem? | | It depends on your definition of a computer science | problem. I am definitely not solving P = NP. By your | definition, does Google's Paxos Made Live paper solve a | computer science problem? | | The claim is more a play on Phil Karlton's quote, as the | work here makes cache invalidation much easier (in my | opinion). Also Phil Karlton's quote doesn't necessarily | _make_ a problem a computer science problem, don't you | think? I think it's a good quote and there's a lot of | truth in it. | jitl wrote: | How do you handle caching derived data assembled from multiple | individual records? For example, how would you maintain a cache | for a query like getFriendsOfFriends(userId)? | | My context is working on Notion's caches for page data. Our | pages are made out of small units called "blocks" that store | pointers to their child content. To serve all the blocks needed | to render a page, we need to do a recursive traversal of the | block tree. We have an inconsistent cache of "page chunks" | right now, how would you think about making a consistent cache? | uvdn7 wrote: | This is fantastic question! We face something very similar | (if not identical). | | There are two parts to solve this problem 1. monitor and | measure how consistent the cache is 2. figure out why they | are inconsistent | | I will focus on #1 in this comment. You can build something | very similar to Polaris (mentioned in the blog) that - tails | your database's binlog so it knows when e.g. "friendship" | data is mutated - it can then perform the computation to | figure out which cache entries "should have been" updated. | E.g. if Alice just friended Bob, then Alice's friends-of- | friends and Bob's friends-of-friends should reflect the | change. And your monitoring service will "observe" that and | alert on anomalies | ahahahahah wrote: | So if you just implement the cache invalidation logic in | your monitoring system, you can tell if you got the cache | invalidation logic correct in your caching system. That | sounds really helpful! | uvdn7 wrote: | That's not the case though. Polaris acts as a client and | only monitors client observable effect, and assumes no | knowledge of the server internals. | | I am trying to help. | continuational wrote: | With the risk of stating the obvious - the hard part of cache | invalidation is to know _when_ to invalidate the cache. | uvdn7 wrote: | We invalidate cache upon mutations. When else would you do | it? | continuational wrote: | Please see https://news.ycombinator.com/item?id=31672541 | rajesh-s wrote: | Off topic but what tool did you use to create those cache | hierarchy diagrams? | simonw wrote: | Is your argument here that cache invalidation may no longer be | hard because you can implement systems like Polaris which | continually test your cache implementation to try and catch | invalidation problems so you can then go and fix them? | | EDIT: Already answered in this reply: | https://news.ycombinator.com/item?id=31671794 | uvdn7 wrote: | Polaris is actually fairly simple to build. The harder | question to answer is "why cache is inconsistent" and how you | debug. The second half of the post talks about consistency | tracing, which tracks all cache data state mutations. | | Distributed systems are state machines, with consistency | tracing keeping track of all the state transitions, debugging | cache inconsistencies in an invalidation-based cache is very | actionable and managable based on our experience. | politician wrote: | I read the article. It seems to be suggesting that cache | invalidation might no longer be a hard thing in computer | science because of the insights uncovered by your tracing and | observability solution. IOW, now that you can measure and audit | the system, you're able to find and fix bugs in the system. Is | that the correct take-away? | uvdn7 wrote: | Yep. You're exactly right. And the approach is generic and I | think it should work for everyone. | | The idea is that with all these observability capabilities, | debugging cache inconsistencies is getting very actionable | and close to how we debug an error with message telling us | exactly where an exception happened. | ArrayBoundCheck wrote: | I didn't understand the tracing part. Is tracing 100% inside of | polaris? If not does it start at the database? The first cache? | | Does the invalidation need to be predictable before you can use | tracing? What kind of parameters do you have so you don't have | too much logging or too little? | uvdn7 wrote: | Tracing is not in polaris. It's a separate library that runs | in every cache host. Its main job is logging every cache | state mutation for traced writes. So when polaris detects | cache inconsistencies, we know what happened and why cache is | inconsistent. | | It starts all the way from client initiated write (where we | mint a unique id, and plumb it all the way through). | | > What kind of parameters do you have so you don't have too | much logging or too little? | | This is the key question! If you go to the second half of the | post, it talks about an insight about how we managed to log | only when and where cache inconsistencies _can_ be | introduced. There's only a small window after mutation/write | where cache can become inconsistent due to invalidation. So | it's very cheap to trace and provide the information we need. | [deleted] | dang wrote: | The submitted title ("Cache invalidation might no longer be a | hard thing in Computer Science") broke the site guidelines badly. | They say: | | " _Please use the original title, unless it is misleading or | linkbait; don 't editorialize._" - | https://news.ycombinator.com/newsguidelines.html | | Editorializing the original title to make it _more_ linkbait is | going the wrong way down a one-way street. Please don 't. | | (If you're the author, then please either use the original title | or, if it's linkbait, change the HN title to make it less so.) | gjs278 wrote: | bigbillheck wrote: | As far as I'm concerned I haven't seen any change. | uvdn7 wrote: | Accepted. Just to be clear, it's not a link bait as far as I | understand it for two reasons 1. I am dead serious about making | cache invalidation a simpler problem 2. "Cache invalidation | might no longer be a hard problem in Computer Science" is the | subtitle of the corresponding systems @scale talk I just gave. | | I respect the guidelines and I am OK with the rename. I just | want to clarify that I don't think the old title is misleading. | PKop wrote: | It's still massively click-bait until such time as the claim | isn't pure speculation. Also, casually optimistic claims | about complex and hard problems are the essence of click- | bait. There isn't much unique about _your_ instance of it | uvdn7 wrote: | > until such time as the claim isn't pure speculation | | Which I don't think it is. | | > casually optimistic claims about complex and hard | problems are the essence of click-bait | | A group of people at Facebook worked on this for many | years. TAO and Memcache serves quadrillions of queries a | day. I wanted to make this claim - cache invalidation might | no longer be a hard problem in Computer Science - years | ago; but I didn't because I don't want to jump the gun. We | baked the solution for years. There's nothing casual about | this. | pvg wrote: | Just about all papers and write-ups that describe big | advances on important problems aren't titled 'Big advance | in Problem X' or 'X now an ex-problem'. | uvdn7 wrote: | This is a fair point. Lesson learned. Thank you :D | o_____________o wrote: | Small comment of appreciation for your attitude and quick | adjustment, nice to see on the internet! | tomcam wrote: | Agreed. Gracefully handled. I happened to agree with the | arguments, but treasure civil discourse. | AtNightWeCode wrote: | So basically, classic timestamp versioning with some consistency | checking. Might work. Cache invalidation is hard. The only | problem I ever faced while working in the biz that I could not | solve even at theory level was cache related. | | What I thought the article would tackle is the dependencies | between stored objects. Some solve it with super complicated | graphs others by just flushing the entire cache. | | At Meta, why not just use flat files or a store that act as one, | pubsub the changes, listen on the changes and update aggregated | views as needed and store them. Then just short-term cache | everything on the edge. | uvdn7 wrote: | > What I thought the article would tackle is the dependencies | between stored objects. | | Like tracking dependencies, so it knows what to invalidate on | mutations? | AtNightWeCode wrote: | Yes. The Boss changes his name from Mark. It will directly | hit a lot of caches. It will need an update of all bosses | that materialized this. Then all the staff that materialized | the boss of the boss and so on. This is simple example | because it is a tree. This can be circular dependencies as | well. ___________________________________________________________________ (page generated 2022-06-08 23:00 UTC)