[HN Gopher] Introducing ReadySet ___________________________________________________________________ Introducing ReadySet Author : alanamarzoev4 Score : 107 points Date : 2022-04-05 17:32 UTC (5 hours ago) (HTM) web link (blog.readyset.io) (TXT) w3m dump (blog.readyset.io) | dkhenry wrote: | I am really excited to see where readyset can take this | technology, if deployment is as simple as they say this sounds | like an instant win for any high throughput service. | | I am curious how they handle queries that would overflow local | main memory, like if I just had a PK lookup on a 10TB table you | obviously can't store all that in RAM, and would still need to do | some form of cache invalidation. | Jonhoo wrote: | The trick is "partial view materialization" | (https://jon.thesquareplanet.com/papers/phd-thesis.pdf). | Basically, you only materialize results for commonly-accessed | keys, and compute other keys on-demand. | dkhenry wrote: | Is there a way to federate which keys are commonly accessed ? | Like if I commonly access the entire table can I direct | inbound traffic to different application servers and have | them access different caches so each cache can pull only a | subset of the data into the cache, and not worry about things | like which keys are being written globally | glittershark wrote: | We've thought about that, actually! We have an experimental | mode where multiple copies of the same query can be created | (actually just multiple copies of the leaf node in the | dataflow graph, so intermediate state is reused) with | different subsets of keys materialized - the idea is then | that these separate readers would be run on different | regions, so eg the reader in the EU region gets keys for EU | users, and the reader in the NA region gets keys for NA | users. | ko27 wrote: | Sounds great, until you realize you are switching to eventual | consistency for your whole application. | sedev wrote: | Digging down a couple layers of links from this, the underlying | paper, "Partial State in Dataflow-Based Materialized Views" | https://jon.thesquareplanet.com/papers/phd-thesis.pdf is pretty | intriguing. It sounds like a potential free lunch in specific | performance areas, which means it also sounds too good to be | true, but if it turns out to be a metaphorical 90%-off lunch | that's still very promising. | glittershark wrote: | the remaining 10% of the 90%-off free lunch is pretty much just | eventual consistency - it can occasionally be the case that you | write something to the DB, and an immediate subsequent write | doesn't see it. That said, there are escape-hatches there | (we'll proxy queries that happen inside of a transaction to the | upstream mysql/postgresql database, and there's an experimental | implementation of opt-in Read-Your-Writes consistency), and I'd | wager that the vast majority of "traditional" web applications | can tolerate slightly stale reads. | | Our official docs also have an aptly-titled: "what's the | catch?" section: | https://docs.readyset.io/concepts/overview#whats-the-catch | Jonhoo wrote: | Oh hey, that's my thesis! Happy to answer any questions you may | have about it :) There's also the OSDI'18 paper here which may | be of interest: https://jon.tsp.io/papers/osdi18-noria.pdf | adamgordonbell wrote: | This is super exciting. Ever since I talked to you about | Noria I've been telling people about this concept. I'm | excited to see a production ready implementation of it. | BenoitP wrote: | Big fan here! | | I've been following the space since a bit of time, and I must | say it's exciting. To me this is the future of apps where the | Truth lives server-side, and everything reacts from there; | With partial state evaluation lowering resource consumption | to a minimum. | | Kafka Streams and Apache Flink seem to be focused on real- | time analytics, and I wish they'd get there to stimulate the | space. | | Are you affiliated with ReadySet? | educaysean wrote: | According to the linked article, Jon appears to be one of | the co-founders | Jonhoo wrote: | I'm pretty excited about it too! I remember when I | initially started the research I was amazed that this | didn't already exist. | | Some context: | https://twitter.com/jonhoo/status/1511401461669720068 | | Basically, I co-founded the company around the time I | graduated, but had had my fill of database research after | six years of PhD. So I joined AWS to work on Rust while | Alana (the CEO) took on leading ReadySet. | msvan wrote: | I've used both of the suggested methods under "Current standards | for scaling out databases" so I see where this is coming from. | But I peeked at the AWS reference architecture, and it places a | Consul and ReadySet deployment in my environment for me to run | and maintain. I feel like any sales pitch for this really needs | to convince me that having these things in my environments is | going to be worth the hassle in terms of milliseconds and | dollars, as opposed to just using RDS read replicas and paying a | bit more. Then again, I can see this being an obvious choice if | you're growing very quickly or have tight latency requirements. | | With that said, it looks like cool tech and I read Jon's Rust for | Rustaceans which serves as a stamp of quality for this even if I | haven't tried it yet! | zeroonetwothree wrote: | This feels like the sort of the where it works great 90% of the | time, but as soon as you want to do something more | complicated/nontrivial it doesn't handle it properly and is | impossible to debug/improve since you are using an opaque | solution. | | At least with scaling replicas or having a dumb cache layer it's | easy to understand the system. | skyde wrote: | Hi, Jon Gjengset I am so happy to see attempt to make Noria | usable in legacy applications. Am I right in assuming this need | to consume the replication log from the primary database? Or | write request going through ReadSet proxy will produce its own | change feed ? | thecompilr wrote: | Hi, ReadySet engineer here. It does replicate using the | replication log. | zomglings wrote: | This seems like it could be a good fit for my team (and we have | been discussing this kind of caching). | | We frequently (~ once per second) run queries over relations that | are increasing at a rate of ~100 rows per second (append only, no | updates). | | Could this cause any performance concerns for ReadySet? How much | control do we have over the frequency of reconstruction of cached | data based on the flow graph? | vinay_ys wrote: | I can't think of real-world examples of apps that have read path | scaling problem that this tech would solve well. It would be | great if the authors can catalogue a bunch of real-world use- | cases. (real-world customer case-studies would be even better). | | Today, machines are super huge (in terms of compute cores, memory | and iops for storage and network) and a single Mysql or | PostgreSQL database can do a lot of work. This makes it much much | easier to build apps that don't have as many users at Internet | scale - that is pretty much all enterprise apps - without | resorting to distributed databases. | | In Internet scale consumer app domains like e-commerce/delivery | or fintech where relational databases are used heavily, most | queries would have strict correctness requirements and won't | tolerate staleness. Also, most query-results would be highly | specific to each user and won't have much cache hits. Also, apps | in general are increasingly personalised and have fast changing | content. | | In terms of technology evolution, I see people moving from single | large machine databases to distributed sql databases as their | use-cases scale. | | And as distributed sql databases mature, I expect they will get | built-in capability to generate user-defined materialised views | with flexibility to manage their placement w.r.t class and number | of machines to compute and serve them etc. | staticassertion wrote: | My company is building a realtime analytics service for | security. You have a lot of reads that can often be answered | with stale data (thanks to our data modeling, which provides | ACID 2.0 semantics). | | I think a lot of applications could probably benefit from this | _if_ they were built with a data model in mind that leverages | it properly. But if you 1:1 migrate your code that relies on | ACID transactions over to something with Strong Eventual | Consistency... yeah, that 's gonna be a bad time. | js4ever wrote: | Is it open source? | staticassertion wrote: | Very cool tech here, excited for the future. Congrats on | fundraising! | sulam wrote: | I was prepared to be super skeptical here, but this actually | looks like it could be really good, without all the gotchas I was | expecting. I think the only potential hole I see is if you do a | lot of db-side code as stored procedures and what-not. I can't | tell from the write up if they can keep those consistent. ___________________________________________________________________ (page generated 2022-04-05 23:00 UTC)