[HN Gopher] Ask HN: Options for handling state at the edge? ___________________________________________________________________ Ask HN: Options for handling state at the edge? With Cloudflare workers able to be called single digit ms away from customers on much of the planet now, I wonder how I can keep state as close to the workers / lambdas as possible. What are the options we have for handling state as the edge? What do you use in your business or service? Author : CaptainJustin Score : 58 points Date : 2022-05-11 16:42 UTC (6 hours ago) | powersurge360 wrote: | I haven't done this, but I've been thinking about it lately. | Fly.IO has had some very interesting ideas on this if you want to | use a relational database. There was an article about litestream | that would allow you to replicate your SQLite database to an | arbitrary number of nodes, which means that every application | server would have a SQLite file sitting on it for read queries, | and then you can capture write queries and forward them to a | write leader and let that user continue talking to that server | until it replicates across your application servers. | | You can do basically the same idea with any relational database, | have a write leader... somewhere and a bunch of read replicas | that live close to the edge. | | There's also what you would call cloud native data stores that | purport to solve the same issue, but I don't know much about how | they work because I much prefer working w/ relational databases | and most of those are NoSQL. And I haven't had to actually solve | the problem yet for work so I also haven't made any compromises | yet in how I explore it. | | Another interesting way to go might be CockroachDB. It's wire | compatible w/ PostgreSQL and supposedly automatically clusters | and shares data in the cluster. I don't know very much about it | but it seems to be becoming more and more popular and many ORMs | seem to have an adapter to support it. May also be worth looking | into because if it works as advertised you can get an RBDMS that | you can deploy to an arbitrary number of places and then | configure to talk to one another and not have to worry about | replicating the data or routing correctly to write leaders and | all that. | | And again, I'm technical, but I haven't solved these problems so | consider the above to be a jumping off point and take nothing as | gospel. | adam_arthur wrote: | Depends on your product, but I'm able to do everything via | Cloudflare workers, KV, DurableObjects, and use JSON files stored | in Cloudflare CDN as source of truth (hosted for free btw) | | Cloudflare KV can store most of what you need in JSON form, while | DurableObjects let you model updates with transactional | guarantees. | | My app is particularly read heavy though, and backing data is | mostly static (but gets updated daily). | | Honestly after using Cloudflare feel like they will easily become | the go to cloud for building small/quick apps. Everything is | integrated much better than AWS and way more user friendly from | docs and dev experience perspective. Also their dev velocity on | new features is pretty insane. | | Honestly didn't think that much of them until I started digging | into these things | don-code wrote: | I recently had an opportunity to build an application on top of | Lambda@Edge (AWS's equivalent of Cloudflare workers). The | prevailing wisdom there was to make use of regional services, | like S3 and DynamoDB, from the edge. That, of course, makes my | edge application depend on calls to a larger, further away point | of presence. | | While it's possible to distribute state to many AWS regions and | select the closest one, I ended up going a different route: | packaging state alongside the application. Most of the | application's state was read-only, so I ended up packaging the | application state up as JSON alongside the deployment bundle. At | startup, it'd then statically read the JSON into memory - this | performance penalty only happens at startup, and as long as the | Lambda functions are being called often (in our case they are), | requests are as fast as a memory read. | | When the state does need to get updated, I just redeploy the | application with the new state. | | That strategy obviously won't work if you need "fast" turnaround | on your state being in sync at all points of presence, or if | users can update that state as part of your application's | workflow. | geewee wrote: | We do something similar in Climatiq on Fastly's Compute@Edge. | When building the application we load in a big chunk of read- | only data in-memory, and serialize that memory to a file. When | we spin up our instance, all we have to do is load that file | into memory and then we have tons of read-only data in just a | few ms. | chucky_z wrote: | I think this is a really clear winner for something like | Litestream, where you can have state far away but sync it | locally with periodic syncs if you can live with 'small wait on | startup' and 'periodic state updates'. | powersurge360 wrote: | Isn't this effectively the same as using a static site | generator? Could you potentially freeze or pre-bake your | generated files and then just serve that? | Elof wrote: | Check out Macrometa, a data platform that uses CRDTs to manage | state a N number of pops and also does real time event | processing. - https://macrometa.com (full disclosure, I work at | Macrometa) | rad_gruchalski wrote: | Cloudflare Workers with k/v store, R2 and their new D1 database. | crawdog wrote: | I have used card database files before with success. | https://cr.yp.to/cdb.html | | Have your process regularly update the CDB file from a blob store | like S3. Any deltas can be pulled from S3 or you can use a | message bus if the changes are small. Every so often pull the | latest CDB down and start aggregating deltas again. | | CDB performs great and can scale to multiple GBs. | jhgb wrote: | I thought it was "constant database"? Is it indeed meant to | mean "card database"? | crawdog wrote: | my typo thanks for catching that | kevsim wrote: | CloudFlare just announced their own relational DB for workers | today: https://blog.cloudflare.com/introducing-d1 | | On HN: https://news.ycombinator.com/item?id=31339299 | bentlegen wrote: | The convenience of this announcement makes me feel like the | original post is an astroturfed marketing effort by Cloudflare. | I'd love to know (I'm genuinely curious!), but unless OP admits | as much, I don't know that anyone would put their hand up. | jFriedensreich wrote: | it really depends on the type of state: | | cloudflare kv store is great if the supported write pattern fits | | if you need something with more consistency between pops durable | objects should be on your radar | | i also found that cloudant/couchdb is a perfect fit for a lot of | usecases with heavy caching in the cf worker. its also possible | to have multiple master replication with each couchdb cluster | close to the local users, so you dont have to wait for writes to | reach a single master on the other side of the world | tra3 wrote: | I never thought of this, but now I have a lot of questions. Do | you have an application in mind where this would be useful? Most | of my experience if with traditional webapps/SaaS so I'd love to | see an example. | deckard1 wrote: | Doesn't Cloudflare have a cache API and/or cache fetch calls for | workers? | | A number of people are talking about Lambda or loading files, | SQLite, etc. These aren't likely to work on CF. CF uses isolated | JavaScript sandboxes. You're not guaranteed to have two workers | accessing the same memory space. | | This is, in general, the problem with serverless. The model of | computing is proprietary and very much about the fine print | details. | | edit: CF just announced their SQLite worker service/API today: | https://blog.cloudflare.com/introducing-d1/ | ccouzens wrote: | I've got a Fastly compute@edge service. My state is relatively | small (less than a MB of JSON) and only changes every few hours. | So I compile the state into the binary and deploy that. | | I can share a blog post about this if there is interest. | | It gives us very good performance (p95 under 1ms) as the function | doesn't need to call an external service. | anildash wrote: | would love to hear more about how you're doing this, been | poking at the idea but haven't seen a running example | documented | ccouzens wrote: | https://medium.com/p/302f83a362a3 | | There's a heading "This is how we made it fast" about 1/3 of | the way down if you'd like to skip the introduction and | background. | asdf1asdf wrote: | You just developed your application from the cache inwards, | instead of the application outwards. | | Now on to develop the actual application that will host/serve | your data to said cache layer. | | If you learn basic application architecture concepts, you won't | be fooled by sales person lies again. | fwsgonzo wrote: | Just build a tiny application alongside an open source Varnish | instance, and use it as a local backend. It's "free" if you have | decent latency to the area of Internet you care about. For | example, my latency is just fine to all of Europe so I host | things myself. | | If you want to go one step further you can build a VMOD for | Varnish to run your workloads inside Varnish, even with Rust: | https://github.com/gquintard/vmod_rs_template | F117-DK wrote: | R2, KV, D1 and Durable Objects. Many options in the Cloudflare | Suite. | efitz wrote: | AWS Lambda functions have a local temp directory. I have | successfully used that in the past to store state. | | In my application, I had a central worker process that would | ingest state updates and would periodically serialize the data to | a MySQL database file, adding indexes and so forth and then | uploading a versioned file to S3. | | My Lambda workers would check for updates to the database, | downloading the latest version to the local temp directory if | there was not a local copy or if the local copy was out of date. | | Then the work of checking state was just a database query. | | You can tune timings etc to whatever your app can tolerate. | | In my case the problem was fairly easy since state updates only | occurred centrally; I could publish and pull updates at my | leisure. | | If I had needed distributed state updates I would have just made | the change locally without bumping version, and then send a | message (SNS or SQS) to the central state maintainer for commit | and let the publication process handle versioning and | distribution. | michaellperry71 wrote: | There are many technical solutions to this problem, as others | have pointed out. What I would add is that data at the edge | should be considered immutable. | | If records are allowed to change, then you end up in situations | where changes don't converge. But if you instead collect a | history of unchanging events, then you can untangle these | scenarios. | | Event Sourcing is the most popular implementation of a history of | immutable events. But I have found that a different model works | better for data at the edge. An event store tends to be centrally | localized within your architecture. That is necessary because the | event store determines the one true order of events. But if you | relax that constraint and allow events to be partially ordered, | then you can have a history at the edge. If you follow a few | simple rules, then those histories are guaranteed to converge. | | Rule number 1: A record is immutable. It cannot be modified or | deleted. | | Rule number 2: A record refers to its predecessors. If the order | between events matters, then it is made explicit with this | predecessor relationship. If there is no predecessor | relationship, then the order doesn't matter. No timestamps. | | Rule number 3: A record is identified only by its type, contents, | and set of predecessors. If two records have the same stuff in | them, then they are the same record. No surrogate keys. | | Following these rules, analyze your problem domain and build up a | model. The immutable records in that model form a directed | acyclic graph, with arrows pointing toward the predecessors. Send | those records to the edge nodes and let them make those | millisecond decisions based only on the records that they have on | hand. Record their decisions as new records in this graph, and | send those records back. | | Jeff Doolittle and I talk about this system on a recent episode | of Software Engineering Radio: https://www.se- | radio.net/2021/02/episode-447-michael-perry-o... | | No matter how you store it, treat data at the edge as if you | could not update or delete records. Instead, accrue new records | over time. Make decisions at the edge with autonomy, knowing that | they will be honored within the growing partially-ordered | history. | weatherlight wrote: | Fly.io | lewisl9029 wrote: | A lot of great info here already, but I just wanted to add my 2c | as someone who's been chasing the fast writes everywhere dream | for https://reflame.app. | | Most of the approaches mentioned here will give you fast reads | everywhere, but writes only fast if you're close to some | arbitrarily chosen primary region. | | A few technologies I've experimented with for doing fast, | eventually consistently replicated writes: DynamoDB Global | Tables, CosmosDB, Macrometa, KeyDB. | | None of them are perfect, but in terms of write latency, active- | active replicated KeyDB in my fly.io cluster has everything else | beat. It's the only solution that offered _reliable_ sub-5ms | latency writes (most are close to 1-2ms). Dynamo and Cosmos | advertise sub-10ms, but in practice, while _most_ writes fall in | that range, I've seen them fluctuate wildly to over 200ms (Cosmos | was much worse than Dynamo IME), which is to be expected on the | public internet with noisy neighbors. | | Unfortunately, I got too wary of the operational complexity of | running my own global persistent KeyDB cluster with potentially | unbounded memory/storage requirements, and eventually migrated | most app state over to use Dynamo as the source of truth, with | the KeyDB cluster as a auto-replicating caching layer so I don't | have to deal with perf/memory/storage scaling and backup. So far | that has been working well, but I'm still pre-launch so it's not | anywhere close to battle tested. | | Would love to hear stories from other folks building systems with | similar requirements/ambitions! | lewisl9029 wrote: | One thing I forgot to mention: Reflame requires fast writes | globally only for a small subset of use cases. For everything | else, it only needs fast reads globally, and for those I've | been really liking FaunaDB. | | It's not SQL, but it offers strongly consistent global writes | that allows me to reason about the data as if it lived in a | regular strongly-consistent non-replicated DB. This has been | incredibly powerful since I don't have to worry at all about | reading stale data like I would with an eventually consistently | read-replicated DB. | | It comes at the cost of write latency of ~200ms, which is still | perfectly serviceable for everything I'm using it for. | rektide wrote: | I don't have a whole lot to say on this right now (very WIP), but | I have a strong belief that git is a core tool we should be using | for data. | | Most data-formats are thick-formats, pack data into a single | file. Part of the effort in switching to git would be a shift to | trying to unpack our data, to really make use of the file system | to store fine grained pieces of data. | | It's been around for a while, but Irmin[1] (written in Ocaml) is | a decent-enough almost-example of these kinds of practices. It | lacks the version control aspect, but 9p is certainly another | inspiration, as it encouraged state of all things to be held & | stored in fine-grained files. Git I think is a superpower, but | just as much: having data which can be scripted, which speaks the | lingua-franca of computing- that too is a superpower. | | [1] https://irmin.org/ | https://news.ycombinator.com/item?id=8053687 (147 points, 8 years | ago, 25 comments) | richardwhiuk wrote: | You really want to use CRDTs, not data types subject to human | resolved merge conflicts. | rektide wrote: | I feel like crdts are sold as a panacea. I can esily imagine | users making conflicting changes, so I dont really see or | understand what the real value or weaknesses of CRDTs are. | | Im also used to seeing them used for online synchronization, | & far less examples of distributed crdts, which is, to me, | highly important. | | Git by contrast has straightforward & good merge strategies. | At this point, I feel like the problems are complex & that we | need complex tools that leave users & devs in charge & | steering. Im so ready to be wrong, but I dont feel like these | problems are outsmartable; crdts have always felt like they | try to define a too limited world. For now, I feel like tools | for managing files between different fs'es are more complex, | but a minimum level of possibility we need. | weego wrote: | _I have a strong belief that git is a core tool we should be | using for data_ | | It isn't, we shouldn't, and you're not the first and won't be | the last person to put time into this. It's neither a | compelling solution nor even a particularly good one. ___________________________________________________________________ (page generated 2022-05-11 23:01 UTC)