[HN Gopher] Introducing ReadySet
       ___________________________________________________________________
        
       Introducing ReadySet
        
       Author : alanamarzoev4
       Score  : 107 points
       Date   : 2022-04-05 17:32 UTC (5 hours ago)
        
 (HTM) web link (blog.readyset.io)
 (TXT) w3m dump (blog.readyset.io)
        
       | dkhenry wrote:
       | I am really excited to see where readyset can take this
       | technology, if deployment is as simple as they say this sounds
       | like an instant win for any high throughput service.
       | 
       | I am curious how they handle queries that would overflow local
       | main memory, like if I just had a PK lookup on a 10TB table you
       | obviously can't store all that in RAM, and would still need to do
       | some form of cache invalidation.
        
         | Jonhoo wrote:
         | The trick is "partial view materialization"
         | (https://jon.thesquareplanet.com/papers/phd-thesis.pdf).
         | Basically, you only materialize results for commonly-accessed
         | keys, and compute other keys on-demand.
        
           | dkhenry wrote:
           | Is there a way to federate which keys are commonly accessed ?
           | Like if I commonly access the entire table can I direct
           | inbound traffic to different application servers and have
           | them access different caches so each cache can pull only a
           | subset of the data into the cache, and not worry about things
           | like which keys are being written globally
        
             | glittershark wrote:
             | We've thought about that, actually! We have an experimental
             | mode where multiple copies of the same query can be created
             | (actually just multiple copies of the leaf node in the
             | dataflow graph, so intermediate state is reused) with
             | different subsets of keys materialized - the idea is then
             | that these separate readers would be run on different
             | regions, so eg the reader in the EU region gets keys for EU
             | users, and the reader in the NA region gets keys for NA
             | users.
        
       | ko27 wrote:
       | Sounds great, until you realize you are switching to eventual
       | consistency for your whole application.
        
       | sedev wrote:
       | Digging down a couple layers of links from this, the underlying
       | paper, "Partial State in Dataflow-Based Materialized Views"
       | https://jon.thesquareplanet.com/papers/phd-thesis.pdf is pretty
       | intriguing. It sounds like a potential free lunch in specific
       | performance areas, which means it also sounds too good to be
       | true, but if it turns out to be a metaphorical 90%-off lunch
       | that's still very promising.
        
         | glittershark wrote:
         | the remaining 10% of the 90%-off free lunch is pretty much just
         | eventual consistency - it can occasionally be the case that you
         | write something to the DB, and an immediate subsequent write
         | doesn't see it. That said, there are escape-hatches there
         | (we'll proxy queries that happen inside of a transaction to the
         | upstream mysql/postgresql database, and there's an experimental
         | implementation of opt-in Read-Your-Writes consistency), and I'd
         | wager that the vast majority of "traditional" web applications
         | can tolerate slightly stale reads.
         | 
         | Our official docs also have an aptly-titled: "what's the
         | catch?" section:
         | https://docs.readyset.io/concepts/overview#whats-the-catch
        
         | Jonhoo wrote:
         | Oh hey, that's my thesis! Happy to answer any questions you may
         | have about it :) There's also the OSDI'18 paper here which may
         | be of interest: https://jon.tsp.io/papers/osdi18-noria.pdf
        
           | adamgordonbell wrote:
           | This is super exciting. Ever since I talked to you about
           | Noria I've been telling people about this concept. I'm
           | excited to see a production ready implementation of it.
        
           | BenoitP wrote:
           | Big fan here!
           | 
           | I've been following the space since a bit of time, and I must
           | say it's exciting. To me this is the future of apps where the
           | Truth lives server-side, and everything reacts from there;
           | With partial state evaluation lowering resource consumption
           | to a minimum.
           | 
           | Kafka Streams and Apache Flink seem to be focused on real-
           | time analytics, and I wish they'd get there to stimulate the
           | space.
           | 
           | Are you affiliated with ReadySet?
        
             | educaysean wrote:
             | According to the linked article, Jon appears to be one of
             | the co-founders
        
             | Jonhoo wrote:
             | I'm pretty excited about it too! I remember when I
             | initially started the research I was amazed that this
             | didn't already exist.
             | 
             | Some context:
             | https://twitter.com/jonhoo/status/1511401461669720068
             | 
             | Basically, I co-founded the company around the time I
             | graduated, but had had my fill of database research after
             | six years of PhD. So I joined AWS to work on Rust while
             | Alana (the CEO) took on leading ReadySet.
        
       | msvan wrote:
       | I've used both of the suggested methods under "Current standards
       | for scaling out databases" so I see where this is coming from.
       | But I peeked at the AWS reference architecture, and it places a
       | Consul and ReadySet deployment in my environment for me to run
       | and maintain. I feel like any sales pitch for this really needs
       | to convince me that having these things in my environments is
       | going to be worth the hassle in terms of milliseconds and
       | dollars, as opposed to just using RDS read replicas and paying a
       | bit more. Then again, I can see this being an obvious choice if
       | you're growing very quickly or have tight latency requirements.
       | 
       | With that said, it looks like cool tech and I read Jon's Rust for
       | Rustaceans which serves as a stamp of quality for this even if I
       | haven't tried it yet!
        
       | zeroonetwothree wrote:
       | This feels like the sort of the where it works great 90% of the
       | time, but as soon as you want to do something more
       | complicated/nontrivial it doesn't handle it properly and is
       | impossible to debug/improve since you are using an opaque
       | solution.
       | 
       | At least with scaling replicas or having a dumb cache layer it's
       | easy to understand the system.
        
       | skyde wrote:
       | Hi, Jon Gjengset I am so happy to see attempt to make Noria
       | usable in legacy applications. Am I right in assuming this need
       | to consume the replication log from the primary database? Or
       | write request going through ReadSet proxy will produce its own
       | change feed ?
        
         | thecompilr wrote:
         | Hi, ReadySet engineer here. It does replicate using the
         | replication log.
        
       | zomglings wrote:
       | This seems like it could be a good fit for my team (and we have
       | been discussing this kind of caching).
       | 
       | We frequently (~ once per second) run queries over relations that
       | are increasing at a rate of ~100 rows per second (append only, no
       | updates).
       | 
       | Could this cause any performance concerns for ReadySet? How much
       | control do we have over the frequency of reconstruction of cached
       | data based on the flow graph?
        
       | vinay_ys wrote:
       | I can't think of real-world examples of apps that have read path
       | scaling problem that this tech would solve well. It would be
       | great if the authors can catalogue a bunch of real-world use-
       | cases. (real-world customer case-studies would be even better).
       | 
       | Today, machines are super huge (in terms of compute cores, memory
       | and iops for storage and network) and a single Mysql or
       | PostgreSQL database can do a lot of work. This makes it much much
       | easier to build apps that don't have as many users at Internet
       | scale - that is pretty much all enterprise apps - without
       | resorting to distributed databases.
       | 
       | In Internet scale consumer app domains like e-commerce/delivery
       | or fintech where relational databases are used heavily, most
       | queries would have strict correctness requirements and won't
       | tolerate staleness. Also, most query-results would be highly
       | specific to each user and won't have much cache hits. Also, apps
       | in general are increasingly personalised and have fast changing
       | content.
       | 
       | In terms of technology evolution, I see people moving from single
       | large machine databases to distributed sql databases as their
       | use-cases scale.
       | 
       | And as distributed sql databases mature, I expect they will get
       | built-in capability to generate user-defined materialised views
       | with flexibility to manage their placement w.r.t class and number
       | of machines to compute and serve them etc.
        
         | staticassertion wrote:
         | My company is building a realtime analytics service for
         | security. You have a lot of reads that can often be answered
         | with stale data (thanks to our data modeling, which provides
         | ACID 2.0 semantics).
         | 
         | I think a lot of applications could probably benefit from this
         | _if_ they were built with a data model in mind that leverages
         | it properly. But if you 1:1 migrate your code that relies on
         | ACID transactions over to something with Strong Eventual
         | Consistency... yeah, that 's gonna be a bad time.
        
       | js4ever wrote:
       | Is it open source?
        
       | staticassertion wrote:
       | Very cool tech here, excited for the future. Congrats on
       | fundraising!
        
       | sulam wrote:
       | I was prepared to be super skeptical here, but this actually
       | looks like it could be really good, without all the gotchas I was
       | expecting. I think the only potential hole I see is if you do a
       | lot of db-side code as stored procedures and what-not. I can't
       | tell from the write up if they can keep those consistent.
        
       ___________________________________________________________________
       (page generated 2022-04-05 23:00 UTC)