[HN Gopher] How PlanetScale Boost serves SQL queries faster
       ___________________________________________________________________
        
       How PlanetScale Boost serves SQL queries faster
        
       Author : mrbbk
       Score  : 113 points
       Date   : 2022-11-15 16:58 UTC (6 hours ago)
        
 (HTM) web link (planetscale.com)
 (TXT) w3m dump (planetscale.com)
        
       | bsnnkv wrote:
       | PlanetScale is such a cool name, fits really well for a database
       | company. Just goes to show that even these days when I think that
       | naming something new is impossible, there is still a lot of room
       | to be creative.
        
         | bearjaws wrote:
         | They probably paid a pretty penny for it.
         | 
         | My first startup job was a skunks work project and we had
         | around 128 noun-adjective pairs we wanted to find a .com domain
         | for. All of them were taken.
         | 
         | We had to settle on a .io domain, and this was 7 years ago.
         | 
         | 2 year in we came up with a better name and managed to get a
         | .com... with a dash in the URL.
        
           | emptysea wrote:
           | Startup I worked at paid like 100k + equity for a two word
           | .com domain name and it wasn't anywhere near as nice of name
           | as planet scale is
        
           | swozey wrote:
           | I worked at a 4 letter .com startup and we paid $800k for the
           | domain. We never made revenue anywhere near our domain and
           | when the company eventually shuttered the most revenue we'd
           | ever made was selling the domain again.
        
             | laristine wrote:
             | I deal in domains and stories like this still never cease
             | to amaze me. In any case, I'm glad your company was able to
             | sell the domain for a good sum back.
        
           | ushakov wrote:
           | there are still plenty of great .com available, you just
           | gotta be more creative
        
       | vyrotek wrote:
       | This reminds me a little of "materialized views". But essentially
       | every query is potentially a view you can materialize (cache).
       | And with this being managed at the DB level it knows when new
       | data invalidated the previous results.
       | 
       | Traditionally, other materialized view implementations have very
       | strict query requirements though. The queries had to be
       | deterministic. No left joins, dates, etc. This is required in
       | order to properly detect when data changes "impact" the view. I
       | wonder how they get around it.
       | 
       | Update: Ah, ok! Here's a write up on how it works a bit. My last
       | startup built a system like this specifically to power a
       | gamification engine. Would have been nice to have this 10 years
       | ago.
       | 
       | https://planetscale.com/blog/how-planetscale-boost-serves-yo...
       | 
       | > The Boost cluster lives alongside your database's shards and
       | continuously processes the events relayed by the VStream. The
       | result of processing these events is a partially materialized
       | view that can be accessed by the database's edge tier. This view
       | contains some, but not all, of the rows that could be queried.
        
         | dang wrote:
         | (We've merged the threads so that writeup is now the URL at the
         | top)
        
       | obviyus wrote:
       | Has anyone who has used PlanetScale in production comment about
       | their experience? I was evaluating a few options a couple of
       | weeks ago but ended up going with just RDS due to lack of
       | feedback for PlanetScale here on HN.
        
         | gtCameron wrote:
         | We have been running PlanetScale as our production database for
         | about 6 months, migrated from Aurora Serverless. I love it,
         | their query insights tool has been a game changer for us and
         | has allowed us to optimize a ton of queries in our application.
         | Their support is always available and highly technical.
         | 
         | For a sense of scale, we have ~150gb of data running around 5
         | trillion row reads + 500 million row writes per month
        
           | revicon wrote:
           | We're you using the Aurora Serverless data APIs? Curious if
           | there is something equivalent on PlanetScale.
        
             | samlambert wrote:
             | https://github.com/planetscale/database-js
        
             | gtCameron wrote:
             | I was not, we are a Laravel PHP backend, using the standard
             | PHP stuff for connection management
        
               | wolfhumble wrote:
               | From what I understand your webserver and php
               | implementation is stored on different servers from
               | PlanetScale's DBs(?)
               | 
               | Just wonder: How are the DB queries from your php
               | implementation to the Planetscale DBs affected by network
               | latency (hops and length between servers) as well as
               | bandwidth (query results returned by PlanetScale DBs)?
               | 
               | Thanks! :-)
        
             | [deleted]
        
         | mythrwy wrote:
         | We looked at it, but it was a little "different" and we didn't
         | want the learning curve, so we went with ScaleGrid instead.
         | 
         | This caching does look cool, perhaps I'll revisit PlanetScale
         | later on my own time.
        
         | joshstrange wrote:
         | I left Aurora serverless (v1, v2 pricing was insane) for PS and
         | I've been extremely happy. It was way cheaper and less of a
         | headache than RDS and my actual usage was way less than I
         | anticipated (since it's hard to think in terms of row
         | reads/writes when working at a high level). With PS I get a
         | dev/qa/staging/prod DB for $30/mo vs spinning up multiple RDS
         | instances. Even with Aurora Serverless (v1) where you can spin
         | down to 0 it was cheaper to go with PS. 1 DB unit on Aurora
         | Serverless (v1) cost like $45/mo (for my 1 prod instance) so
         | for $15 less I got all my other environments without having to
         | wait for them to spin up after they went to sleep.
         | 
         | My usage is way under some of my sibling comments but it's a
         | been a joy to use and $360/yr to not have to worry about
         | scaling my DB, backups, schema migrations, and now caching is a
         | steal for me. Could I run my DB on a $5/mo DO box (or similar)?
         | Probably, though I'd probably want at least the $10/$15 size
         | box for when my software gets a little more load. Even if I
         | knew for sure I could run on the $5 box I'd still rather pay
         | $30/mo to never worry about my DB and the schema migration tool
         | is awesome.
        
       | xmorse wrote:
       | Query memoization with optimistic updates
        
       | dianfishekqi wrote:
       | It looks like it uses the same ideas as Noria
       | 
       | https://www.youtube.com/watch?v=s19G6n0UjsM
       | 
       | https://github.com/mit-pdos/noria
        
         | wwarner wrote:
         | Yes and good for Planetscale to build it out!
        
         | dang wrote:
         | Discussed in a few small past threads:
         | 
         |  _Noria: Dynamic, partially-stateful data-flow for high-perf
         | web applications_ -
         | https://news.ycombinator.com/item?id=29615085 - Dec 2021 (10
         | comments)
         | 
         |  _Noria: dynamic, partially-stateful data-flow for high-
         | performance web apps_ -
         | https://news.ycombinator.com/item?id=18330477 - Oct 2018 (1
         | comment)
         | 
         |  _Noria: dynamic, partially-stateful data-flow for high-
         | performance web apps_ -
         | https://news.ycombinator.com/item?id=18176135 - Oct 2018 (1
         | comment)
        
       | stalluri wrote:
       | Vstream looks super cool. Can we also use it create subscriptions
       | that can bind with ReactHooks on the front-end ? I think
       | PlanetScale can easily deliver amazing or better than firebase
       | subscriptions. All we need is React and NextJs SDKs to get
       | started with :-)
        
         | httgp wrote:
         | Supabase does real-time subscriptions really well!
         | 
         | And it does have great guides for use with React and Next.js
        
       | p10jkle wrote:
       | See also https://readyset.io/ for generic SQL support (not just
       | Planetscale)
        
       | hotdamnson wrote:
       | Why do these new big thing databases make SQL look like some
       | witchcraft?
       | 
       | Here is some proper SQL query:
       | 
       | SELECT DISTINCT                      r.id,
       | r.owner_id,                   r.name,
       | COUNT(r.id) OVER (PARTITION BY r.id) AS COUNT              FROM
       | repository r              JOIN star s ON s.repository_id = r.id
       | 
       | ORDER BY 4 DESC;
        
         | jakewins wrote:
         | This is not what the query in the post is doing.
         | 
         | You are counting all stars of all repos, they are counting
         | stars of one (parameterized) repo id.
        
           | hotdamnson wrote:
           | I just posted the essence of the query, add
           | 
           | Where r.id = :repo
           | 
           | and you will have the same thing.
        
       | Jonhoo wrote:
       | :wave: Author of the paper this work is based on here.
       | 
       | I'm so excited to see dynamic, partially-stateful data-flow for
       | incremental materialized view maintenance becoming more wide-
       | spread! I continue to think it's a _great_ idea, and the speed-
       | ups (and complexity reduction) it can yield are pretty immense,
       | so seeing more folks building on the idea makes me very happy.
       | 
       | The PlanetScale blog post references my original "Noria" OSDI
       | paper (https://pdos.csail.mit.edu/papers/noria:osdi18.pdf), but
       | I'd actually recommend my PhD thesis instead
       | (https://jon.thesquareplanet.com/papers/phd-thesis.pdf), as it
       | goes much deeper about some of the technical challenges and
       | solutions involved. It also has a chapter (Appendix A) that
       | covers how it all works by analogy, which the less-technical
       | among the audience may appreciate :) A recording of my thesis
       | defense on this, which may be more digestible than the thesis
       | itself, is also online at
       | https://www.youtube.com/watch?v=GctxvSPIfr8, as well as a shorter
       | talk from a few years earlier at
       | https://www.youtube.com/watch?v=s19G6n0UjsM. And the Noria
       | research prototype (written in Rust) is on GitHub:
       | https://github.com/mit-pdos/noria.
       | 
       | As others have already mentioned in the comments, I co-founded
       | ReadySet (https://readyset.io/) shortly after graduating
       | specifically to build off of Noria, and they're doing amazing
       | work to provide these kinds of speed-ups for general-purpose
       | relational databases. If you're using one of those, it's worth
       | giving ReadySet a look to get these kinds of speedups there! It's
       | also source-available @ https://github.com/readysettech/readyset
       | if you're curious.
        
         | exabrial wrote:
         | for readyset: Is there a deb package available or something
         | lighter weight than docker, kubernets, etc? I'd just like to
         | run it as a regular unix process and start/stop it with
         | systemd.
        
           | greg-m wrote:
           | yes! shoot me an email - greg@readyset.io - we're in the
           | process of building binaries for more platforms, lmk which
           | you need.
        
             | exabrial wrote:
             | I mean just the standard x86/ubuntu 22.04 would be nice.
             | It'd reduce a lot of friction to people try to evaluate
             | your product!
        
         | brancz wrote:
         | I don't really know either very well, but how does Noria
         | compare to Naiad? Are they comparable at all?
         | 
         | I already had Naiad on my reading list, definitely adding Noria
         | as well! Thank you very much for your work!
        
       | Nican wrote:
       | Awesome! I have seen PlanetScale hype up this release for weeks,
       | and glad to finally be reading about it.
       | 
       | My initial thoughts after reading the blog post, just to poke
       | holes in their new product:
       | 
       | 1. Costs. This can save time on read, but it is also introducing
       | additional writes to the database, that can be pretty expensive.
       | PlanetScale can scale horizontally, but have to watch out how
       | much it is going to be paying for the extra machines. (Albeit-
       | machines are usually always cheaper than developers)
       | 
       | 2. Consistency. It was not clear if it is going to make
       | committing transactions slower to keep all the views up to date,
       | or if the materialized view is running slightly behind real-time.
       | 
       | 2a. And how does the materialized view handle large/slow
       | transactions? Is there going to be any kind of serialization
       | locks? Are the views correct inside of the transaction?
       | 
       | 3. Predictability. Query planning is a necessary hell, and
       | different queries might have different patterns that might
       | introduce slightly different materialized views, that could have
       | been maybe served under the same view. Increasing the cost.
       | 
       | 3a. SQL Server took a slightly different route lately for
       | performance, in which queries will have different plans depended
       | on the table statistics. I wonder how such a feature would play
       | with Boost, and if slightly different query plans might generate
       | different materialized views.
        
         | mwarkentin wrote:
         | The docs indicated the cache may be behind by a few hundred MS:
         | https://planetscale.com/docs/concepts/query-caching-with-pla...
         | 
         | > There is a small delay between when these changes are
         | committed to the database and when the cache has been updated.
         | This delay is typically measured in hundreds of milliseconds.
         | Those of you familiar with MySQL replication can think of it as
         | reading from a replica. Typically we've found that most use
         | cases work perfectly fine, even when returning results that may
         | be slightly out-of-date.
        
         | tanoku wrote:
         | Hey Nican! Thanks for the feedback. It wasn't clear from the
         | blog post, but as the sibling poster points out, the system has
         | full eventual consistency: it behaves like a replica, but it
         | replicates a whole cluster of MySQL instances simultaneously
         | (i.e. your full PlanetScale database). Because of this design,
         | we never lock or affect the performance of writes to the main
         | database.
         | 
         | As for predictability, we're working on some interesting
         | optimizations that allow similar queries to reuse the internal
         | state of each other, so the system becomes more efficient the
         | more queries it's caching. Stay tuned!
        
       | joshstrange wrote:
       | > As rows are inserted, updated, and deleted in the database, the
       | cache is kept up-to-date in real-time, just like a read replica.
       | No TTLs, no invalidation logic, and no caching infrastructure to
       | maintain.
       | 
       | This is so freaking neat. Caching is one of the harder things to
       | get consistently right and even if this was a tool that had
       | TTLs+API to invalidate it would be cool but not even having to
       | worry about that is even better.
       | 
       | PlanetScale continues to be an awesome service that lets you not
       | worry about your DB and instead focus on your application.
       | 
       | My only wish for PlanetScale would be a few more (lower) tiers.
       | Their free tier is very generous but has a few little things
       | (like more than 1 dev/prod branch) that aren't supported and I
       | always feel antsy about not having a prod-like DB for qa/staging.
       | I normally use 3 branches and the free plan only supports 2,
       | which I think changed, I thought I used more than 1 dev branch
       | before I started paying.
       | 
       | I have a very burst-y application (it's for events, so it ramps
       | up a few months before the event, then is crazy for 2-7 days
       | during, then usage drops to pretty much 0 for the next ~9
       | months), I'd love to lower my costs for those 9 months (I could
       | look into downgrading to the free plan but I'd rather pay just a
       | little less and have my quotas drop accordingly). In the end
       | PlanetScale is still worth it for me at $360/yr so I'm not
       | complaining too much. For smaller projects I just worry about
       | using the PS free tier since if I go over those limits the jump
       | is steep ($0->$30/mo), that said I might be overthinking it.
        
         | datalopers wrote:
         | they don't say but I assume this is an implementation of
         | differential dataflow (edit: changed to a better link) [1]
         | 
         | [1] https://www.microsoft.com/en-us/research/wp-
         | content/uploads/...
        
           | killjoywashere wrote:
           | link is unresponsive. HN death hug?
        
             | rbranson wrote:
             | alternate: https://github.com/TimelyDataflow/differential-
             | dataflow/blob...
        
             | killjoywashere wrote:
             | Also available here: https://www.microsoft.com/en-
             | us/research/publication/differe...
        
           | ignoramous wrote:
           | See also https://readyset.io/ and https://materialize.com/
           | 
           | There's also the exotic https://dynimize.com/ (unsure of
           | their current state).
        
           | rorymalcolm wrote:
           | They acknowledge it is based off Noria[1] in the longer
           | blogpost
           | 
           | [1] https://pdos.csail.mit.edu/papers/noria:osdi18.pdf
        
             | juancampa wrote:
             | It's awesome to see Jon Gjengset's work being used in
             | production. He has one of the best Rust youtube channels
             | IMO and goes deep into Rust nuances. Highly recommended for
             | anyone trying to learn Rust or deepen their understanding.
        
             | dang wrote:
             | The longer blogpost is now the URL at the top (we merged
             | the threads)
        
         | hinkley wrote:
         | If there is one thing I'd like to have in this decade (well,
         | last decade really but that ship has sailed) it would be a
         | standardized format for WAL. There's been a whole bunch of tech
         | over the last ten years that was in large part plastering over
         | this hole. Having a reporting database that directly consumed
         | WAL data from an OLTP database for instance.
        
       | kevinburke wrote:
       | Seems neat, but why is this better than Hadoop?
        
         | LewisJEllis wrote:
         | Hadoop isn't a database, they don't do anything close to the
         | same thing. Nobody is cross-shopping PlanetScale vs Hadoop.
         | 
         | The cross-shop is PlanetScale vs Amazon RDS, Amazon Aurora,
         | Google Cloud SQL, Firebase, Supabase, self-hosting Vitess or
         | MySQL, etc.
        
         | gigatexal wrote:
         | Because Hadoop is super duper slow? Isn't that why the industry
         | moved away from it years ago?
        
       | marzoevam wrote:
       | It's super exciting to see Noria-based partially materialized
       | views get this well-deserved airtime! Eliminating error-prone
       | caching logic without any code or infrastructure changes in the
       | context of _any_ database is our core mission over at ReadySet,
       | and is the reason why Jon Gjengset and I spun the company out of
       | MIT research on Noria back in 2020. You can read more in our
       | initial announcement here: https://readyset.io/blog/introducing-
       | readyset
       | 
       | If you're reading this announcement post and want to play around
       | with instant query caching ala Noria in your existing Postgres or
       | MySQL database, shoot me a me an email and we'll bump you up on
       | our cloud waitlist :) alana@readyset.io
        
         | saybar wrote:
         | At PolyScale [1], we agree that eliminating error-prone caching
         | logic without any code or infrastructure changes is a worthy
         | goal. However, we have taken a different approach to caching,
         | zero configuration, fully automated.
         | 
         | You can get connected today in a few minutes, without code or
         | configuration. PolyScale supports Postgres, MySQL, MariaDB and
         | SQL Server, with GraphQL with others coming soon. You can also
         | try the live demo [2].
         | 
         | [1] https://www.polyscale.ai/ [2]
         | https://playground.polyscale.ai/
        
           | vyrotek wrote:
           | This looks really great. Happy to see SQL Server support
           | there.
        
       | emptysea wrote:
       | I'm really curious how this works and how it's implementation
       | compares to something like materialize -- I wonder if there are
       | any caveats around consistency
        
         | giovannibonetti wrote:
         | It seems similar to MIT's Noria [1]
         | 
         | > Noria is a new streaming data-flow system designed to act as
         | a fast storage backend for read-heavy web applications based on
         | Jon Gjengset's Phd Thesis, as well as this paper from OSDI'18.
         | It acts like a database, but precomputes and caches relational
         | query results so that reads are blazingly fast. Noria
         | automatically keeps cached results up-to-date as the underlying
         | data, stored in persistent base tables, change. Noria uses
         | partially-stateful data-flow to reduce memory overhead, and
         | supports dynamic, runtime data-flow and query change.
         | 
         | [1] https://github.com/mit-pdos/noria
        
         | nickvanw wrote:
         | Great question! We have a technical blog post about how
         | PlanetScale Boost is implemented:
         | https://planetscale.com/blog/how-planetscale-boost-serves-yo...
         | 
         | In short, it can be compared in consistency to an up-to-date
         | read replica; PlanetScale Boost uses Vitess' VStream to process
         | events as they happen and keep itself up to date. The blog has
         | much more information if you're curious.
        
           | [deleted]
        
           | dang wrote:
           | (We've merged the threads so that blog post is now the URL at
           | the top)
        
             | nickvanw wrote:
             | Thank you as always for everything that you do @dang!
        
       | kerblang wrote:
       | It appears the catch is that you have to use their managed
       | service; no DIY installation.
       | https://planetscale.com/docs/concepts/deployment-options
       | 
       | Acceptable for some, maybe not others
        
       | Eclyps wrote:
       | I just started using Planetscale for small projects here and
       | there. More and more of my projects are FE-heavy and don't
       | require a big dedicated database (NextJS apps, mostly hardcoded
       | designs or headless CMS like Sanity). There are times where I
       | need to store just small bits of data, maybe contact form
       | submissions or something. It's been super great to be able to
       | quickly hook up planetscale to a nextjs api function and have
       | that data persisted within a matter of minutes.
       | 
       | I've yet to use it on anything large-scale, though, so I can't
       | speak to performance when you're really pushing it.
        
       | aantix wrote:
       | Didn't MySQL implement query level caching a while back?
        
         | ryanisnan wrote:
         | I assume this is a bit of a joke, but query caching at least
         | was not good in 5.5-5.7, so it would often be disabled. I don't
         | know how 8 performs.
        
         | rbranson wrote:
         | It was removed from MySQL in 8.0 because it wasn't very useful.
         | MySQL query caching does exact matching on the query string and
         | any row update to a table used for a given cached query nukes
         | the entire cache. So it's only useful for a small set of niche
         | use cases where tables are essentially static.
        
       | CharlesW wrote:
       | Dupe: https://news.ycombinator.com/item?id=33610996
        
       | endisneigh wrote:
       | Anyone compare this and cockroachdb?
        
       | capableweb wrote:
       | Slightly off-topic but trying to understand something from the
       | landing page:
       | 
       | > Powered by open source tech - Built at Google to scale
       | YouTube.com to billions of users
       | 
       | Is this a Google project/business owned by Alphabet? The text
       | seems to indicate so, but I find no information about it when
       | doing some quick searching or browsing through the website.
        
         | aarondf wrote:
         | Nope! That part you quoted is referring to Vitess, which was
         | built at Google to scale YouTube.
         | 
         | See more: https://planetscale.com/vitess
        
           | capableweb wrote:
           | Aha, I see. Thanks for explaining!
           | 
           | So I'm guessing PlanetScale now helps maintain Vitess and
           | PlanetScale is somewhat of a hosted Vitess for people who
           | don't want to self-host?
        
             | aarondf wrote:
             | Yup! Vitess is at the core of PlanetScale and enables us to
             | add lots of cool stuff on top (branching, Boost, etc) but
             | Vitess itself is still open source!
        
       | _ben_ wrote:
       | For database caching outside of PlanetScale, PolyScale.ai [1]
       | provides a serverless database edge cache that is compatible with
       | Postgres, MySQL, MariaDB and MS SQL Server. Requires zero
       | configuration or sizing etc.
       | 
       | 1.https://www.polyscale.ai/
        
         | rbranson wrote:
         | I tried to use PolyScale in the past but had issues with
         | performance because updating a row would invalidate the entire
         | cache. I wonder if that has improved?
        
           | _ben_ wrote:
           | Yes, in the early versions of the automated invalidation, the
           | logic cleared all cached data based on tables. That is no
           | longer the case. The invalidations only remove the affected
           | data from the cache, globally. You can read more here:
           | https://docs.polyscale.ai/how-does-it-work#smart-
           | invalidatio...
        
             | rbranson wrote:
             | It didn't impact everything, I think I was hitting this
             | case:
             | 
             | > When a query is deemed too complex to determine what
             | specific cached data may have become invalidated, a
             | fallback to a simple but effective table level invalidation
             | occurs.
        
               | saybar wrote:
               | We've made a lot of changes - give it a try again or feel
               | free to reach out to support@polyscale.ai and we'd be
               | happy to assist you.
        
       | edmundsauto wrote:
       | I just started a small hobby project and selected supabase for my
       | db provider. Anyone with experience in both Supa and PlanetScale
       | care to comment about the differences?
       | 
       | To me, it looks like supabase is designed to take full advantage
       | of postgres features. plpgsql triggers + RLS + clientside auth +
       | streaming changes to subscribers (including via web hooks) are my
       | favorite features. (They also have js edge functions, but I use
       | lambda instead b/c I prefer python)
       | 
       | Supabase feels like the scrappy company with amazing focus, akin
       | to an early MailChimp (circa 2007). PlanetBase feels more like
       | early Snowflake - massive scale, focus on performance, can match
       | anything feature-by-feature. One is a master of their craft, the
       | other is a gorilla at scale.
       | 
       | Curious what others think. I haven't used PlanetBase extensively
       | so don't have much to go on except their marketing.
        
         | Jarwain wrote:
         | That sounds about right from my understanding. Supabase was
         | made as an alternative to firebase, acting as a data layer with
         | a lot of features simplifying application development.
         | 
         | Planet Base feels like Snowflake, or some aspects of fly.io, or
         | timescale's managed cloud offering; their focus is on the core
         | database tech and delivering that in a scalable manner.
        
         | greg-m wrote:
         | ReadySet (readyset.io) supports the same style of caching and
         | works with Supabase, if you want to check us out :)
         | 
         | I have a few extra cloud invites: greg@readyset.io
        
           | edmundsauto wrote:
           | Much appreciated, I will check you out for my next project.
           | Right now I'm not able to migrate as I'm trying to get an MVP
           | up and running and have spent a few days deeply integrating
           | w/ Supabase.
           | 
           | What are your core value prop differences between your
           | service and sb? Just curious how I should think about your
           | offering compared to what I'm familiar with.
        
             | greg-m wrote:
             | We work with supabase - so your DB still deploys out to
             | them, but we cache queries in memory so they're
             | significantly lower latency. Since we're offloading reads,
             | we also help handling traffic spikes, lower costs, etc.
        
       ___________________________________________________________________
       (page generated 2022-11-15 23:00 UTC)