[HN Gopher] How PlanetScale Boost serves SQL queries faster ___________________________________________________________________ How PlanetScale Boost serves SQL queries faster Author : mrbbk Score : 113 points Date : 2022-11-15 16:58 UTC (6 hours ago) (HTM) web link (planetscale.com) (TXT) w3m dump (planetscale.com) | bsnnkv wrote: | PlanetScale is such a cool name, fits really well for a database | company. Just goes to show that even these days when I think that | naming something new is impossible, there is still a lot of room | to be creative. | bearjaws wrote: | They probably paid a pretty penny for it. | | My first startup job was a skunks work project and we had | around 128 noun-adjective pairs we wanted to find a .com domain | for. All of them were taken. | | We had to settle on a .io domain, and this was 7 years ago. | | 2 year in we came up with a better name and managed to get a | .com... with a dash in the URL. | emptysea wrote: | Startup I worked at paid like 100k + equity for a two word | .com domain name and it wasn't anywhere near as nice of name | as planet scale is | swozey wrote: | I worked at a 4 letter .com startup and we paid $800k for the | domain. We never made revenue anywhere near our domain and | when the company eventually shuttered the most revenue we'd | ever made was selling the domain again. | laristine wrote: | I deal in domains and stories like this still never cease | to amaze me. In any case, I'm glad your company was able to | sell the domain for a good sum back. | ushakov wrote: | there are still plenty of great .com available, you just | gotta be more creative | vyrotek wrote: | This reminds me a little of "materialized views". But essentially | every query is potentially a view you can materialize (cache). | And with this being managed at the DB level it knows when new | data invalidated the previous results. | | Traditionally, other materialized view implementations have very | strict query requirements though. The queries had to be | deterministic. No left joins, dates, etc. This is required in | order to properly detect when data changes "impact" the view. I | wonder how they get around it. | | Update: Ah, ok! Here's a write up on how it works a bit. My last | startup built a system like this specifically to power a | gamification engine. Would have been nice to have this 10 years | ago. | | https://planetscale.com/blog/how-planetscale-boost-serves-yo... | | > The Boost cluster lives alongside your database's shards and | continuously processes the events relayed by the VStream. The | result of processing these events is a partially materialized | view that can be accessed by the database's edge tier. This view | contains some, but not all, of the rows that could be queried. | dang wrote: | (We've merged the threads so that writeup is now the URL at the | top) | obviyus wrote: | Has anyone who has used PlanetScale in production comment about | their experience? I was evaluating a few options a couple of | weeks ago but ended up going with just RDS due to lack of | feedback for PlanetScale here on HN. | gtCameron wrote: | We have been running PlanetScale as our production database for | about 6 months, migrated from Aurora Serverless. I love it, | their query insights tool has been a game changer for us and | has allowed us to optimize a ton of queries in our application. | Their support is always available and highly technical. | | For a sense of scale, we have ~150gb of data running around 5 | trillion row reads + 500 million row writes per month | revicon wrote: | We're you using the Aurora Serverless data APIs? Curious if | there is something equivalent on PlanetScale. | samlambert wrote: | https://github.com/planetscale/database-js | gtCameron wrote: | I was not, we are a Laravel PHP backend, using the standard | PHP stuff for connection management | wolfhumble wrote: | From what I understand your webserver and php | implementation is stored on different servers from | PlanetScale's DBs(?) | | Just wonder: How are the DB queries from your php | implementation to the Planetscale DBs affected by network | latency (hops and length between servers) as well as | bandwidth (query results returned by PlanetScale DBs)? | | Thanks! :-) | [deleted] | mythrwy wrote: | We looked at it, but it was a little "different" and we didn't | want the learning curve, so we went with ScaleGrid instead. | | This caching does look cool, perhaps I'll revisit PlanetScale | later on my own time. | joshstrange wrote: | I left Aurora serverless (v1, v2 pricing was insane) for PS and | I've been extremely happy. It was way cheaper and less of a | headache than RDS and my actual usage was way less than I | anticipated (since it's hard to think in terms of row | reads/writes when working at a high level). With PS I get a | dev/qa/staging/prod DB for $30/mo vs spinning up multiple RDS | instances. Even with Aurora Serverless (v1) where you can spin | down to 0 it was cheaper to go with PS. 1 DB unit on Aurora | Serverless (v1) cost like $45/mo (for my 1 prod instance) so | for $15 less I got all my other environments without having to | wait for them to spin up after they went to sleep. | | My usage is way under some of my sibling comments but it's a | been a joy to use and $360/yr to not have to worry about | scaling my DB, backups, schema migrations, and now caching is a | steal for me. Could I run my DB on a $5/mo DO box (or similar)? | Probably, though I'd probably want at least the $10/$15 size | box for when my software gets a little more load. Even if I | knew for sure I could run on the $5 box I'd still rather pay | $30/mo to never worry about my DB and the schema migration tool | is awesome. | xmorse wrote: | Query memoization with optimistic updates | dianfishekqi wrote: | It looks like it uses the same ideas as Noria | | https://www.youtube.com/watch?v=s19G6n0UjsM | | https://github.com/mit-pdos/noria | wwarner wrote: | Yes and good for Planetscale to build it out! | dang wrote: | Discussed in a few small past threads: | | _Noria: Dynamic, partially-stateful data-flow for high-perf | web applications_ - | https://news.ycombinator.com/item?id=29615085 - Dec 2021 (10 | comments) | | _Noria: dynamic, partially-stateful data-flow for high- | performance web apps_ - | https://news.ycombinator.com/item?id=18330477 - Oct 2018 (1 | comment) | | _Noria: dynamic, partially-stateful data-flow for high- | performance web apps_ - | https://news.ycombinator.com/item?id=18176135 - Oct 2018 (1 | comment) | stalluri wrote: | Vstream looks super cool. Can we also use it create subscriptions | that can bind with ReactHooks on the front-end ? I think | PlanetScale can easily deliver amazing or better than firebase | subscriptions. All we need is React and NextJs SDKs to get | started with :-) | httgp wrote: | Supabase does real-time subscriptions really well! | | And it does have great guides for use with React and Next.js | p10jkle wrote: | See also https://readyset.io/ for generic SQL support (not just | Planetscale) | hotdamnson wrote: | Why do these new big thing databases make SQL look like some | witchcraft? | | Here is some proper SQL query: | | SELECT DISTINCT r.id, | r.owner_id, r.name, | COUNT(r.id) OVER (PARTITION BY r.id) AS COUNT FROM | repository r JOIN star s ON s.repository_id = r.id | | ORDER BY 4 DESC; | jakewins wrote: | This is not what the query in the post is doing. | | You are counting all stars of all repos, they are counting | stars of one (parameterized) repo id. | hotdamnson wrote: | I just posted the essence of the query, add | | Where r.id = :repo | | and you will have the same thing. | Jonhoo wrote: | :wave: Author of the paper this work is based on here. | | I'm so excited to see dynamic, partially-stateful data-flow for | incremental materialized view maintenance becoming more wide- | spread! I continue to think it's a _great_ idea, and the speed- | ups (and complexity reduction) it can yield are pretty immense, | so seeing more folks building on the idea makes me very happy. | | The PlanetScale blog post references my original "Noria" OSDI | paper (https://pdos.csail.mit.edu/papers/noria:osdi18.pdf), but | I'd actually recommend my PhD thesis instead | (https://jon.thesquareplanet.com/papers/phd-thesis.pdf), as it | goes much deeper about some of the technical challenges and | solutions involved. It also has a chapter (Appendix A) that | covers how it all works by analogy, which the less-technical | among the audience may appreciate :) A recording of my thesis | defense on this, which may be more digestible than the thesis | itself, is also online at | https://www.youtube.com/watch?v=GctxvSPIfr8, as well as a shorter | talk from a few years earlier at | https://www.youtube.com/watch?v=s19G6n0UjsM. And the Noria | research prototype (written in Rust) is on GitHub: | https://github.com/mit-pdos/noria. | | As others have already mentioned in the comments, I co-founded | ReadySet (https://readyset.io/) shortly after graduating | specifically to build off of Noria, and they're doing amazing | work to provide these kinds of speed-ups for general-purpose | relational databases. If you're using one of those, it's worth | giving ReadySet a look to get these kinds of speedups there! It's | also source-available @ https://github.com/readysettech/readyset | if you're curious. | exabrial wrote: | for readyset: Is there a deb package available or something | lighter weight than docker, kubernets, etc? I'd just like to | run it as a regular unix process and start/stop it with | systemd. | greg-m wrote: | yes! shoot me an email - greg@readyset.io - we're in the | process of building binaries for more platforms, lmk which | you need. | exabrial wrote: | I mean just the standard x86/ubuntu 22.04 would be nice. | It'd reduce a lot of friction to people try to evaluate | your product! | brancz wrote: | I don't really know either very well, but how does Noria | compare to Naiad? Are they comparable at all? | | I already had Naiad on my reading list, definitely adding Noria | as well! Thank you very much for your work! | Nican wrote: | Awesome! I have seen PlanetScale hype up this release for weeks, | and glad to finally be reading about it. | | My initial thoughts after reading the blog post, just to poke | holes in their new product: | | 1. Costs. This can save time on read, but it is also introducing | additional writes to the database, that can be pretty expensive. | PlanetScale can scale horizontally, but have to watch out how | much it is going to be paying for the extra machines. (Albeit- | machines are usually always cheaper than developers) | | 2. Consistency. It was not clear if it is going to make | committing transactions slower to keep all the views up to date, | or if the materialized view is running slightly behind real-time. | | 2a. And how does the materialized view handle large/slow | transactions? Is there going to be any kind of serialization | locks? Are the views correct inside of the transaction? | | 3. Predictability. Query planning is a necessary hell, and | different queries might have different patterns that might | introduce slightly different materialized views, that could have | been maybe served under the same view. Increasing the cost. | | 3a. SQL Server took a slightly different route lately for | performance, in which queries will have different plans depended | on the table statistics. I wonder how such a feature would play | with Boost, and if slightly different query plans might generate | different materialized views. | mwarkentin wrote: | The docs indicated the cache may be behind by a few hundred MS: | https://planetscale.com/docs/concepts/query-caching-with-pla... | | > There is a small delay between when these changes are | committed to the database and when the cache has been updated. | This delay is typically measured in hundreds of milliseconds. | Those of you familiar with MySQL replication can think of it as | reading from a replica. Typically we've found that most use | cases work perfectly fine, even when returning results that may | be slightly out-of-date. | tanoku wrote: | Hey Nican! Thanks for the feedback. It wasn't clear from the | blog post, but as the sibling poster points out, the system has | full eventual consistency: it behaves like a replica, but it | replicates a whole cluster of MySQL instances simultaneously | (i.e. your full PlanetScale database). Because of this design, | we never lock or affect the performance of writes to the main | database. | | As for predictability, we're working on some interesting | optimizations that allow similar queries to reuse the internal | state of each other, so the system becomes more efficient the | more queries it's caching. Stay tuned! | joshstrange wrote: | > As rows are inserted, updated, and deleted in the database, the | cache is kept up-to-date in real-time, just like a read replica. | No TTLs, no invalidation logic, and no caching infrastructure to | maintain. | | This is so freaking neat. Caching is one of the harder things to | get consistently right and even if this was a tool that had | TTLs+API to invalidate it would be cool but not even having to | worry about that is even better. | | PlanetScale continues to be an awesome service that lets you not | worry about your DB and instead focus on your application. | | My only wish for PlanetScale would be a few more (lower) tiers. | Their free tier is very generous but has a few little things | (like more than 1 dev/prod branch) that aren't supported and I | always feel antsy about not having a prod-like DB for qa/staging. | I normally use 3 branches and the free plan only supports 2, | which I think changed, I thought I used more than 1 dev branch | before I started paying. | | I have a very burst-y application (it's for events, so it ramps | up a few months before the event, then is crazy for 2-7 days | during, then usage drops to pretty much 0 for the next ~9 | months), I'd love to lower my costs for those 9 months (I could | look into downgrading to the free plan but I'd rather pay just a | little less and have my quotas drop accordingly). In the end | PlanetScale is still worth it for me at $360/yr so I'm not | complaining too much. For smaller projects I just worry about | using the PS free tier since if I go over those limits the jump | is steep ($0->$30/mo), that said I might be overthinking it. | datalopers wrote: | they don't say but I assume this is an implementation of | differential dataflow (edit: changed to a better link) [1] | | [1] https://www.microsoft.com/en-us/research/wp- | content/uploads/... | killjoywashere wrote: | link is unresponsive. HN death hug? | rbranson wrote: | alternate: https://github.com/TimelyDataflow/differential- | dataflow/blob... | killjoywashere wrote: | Also available here: https://www.microsoft.com/en- | us/research/publication/differe... | ignoramous wrote: | See also https://readyset.io/ and https://materialize.com/ | | There's also the exotic https://dynimize.com/ (unsure of | their current state). | rorymalcolm wrote: | They acknowledge it is based off Noria[1] in the longer | blogpost | | [1] https://pdos.csail.mit.edu/papers/noria:osdi18.pdf | juancampa wrote: | It's awesome to see Jon Gjengset's work being used in | production. He has one of the best Rust youtube channels | IMO and goes deep into Rust nuances. Highly recommended for | anyone trying to learn Rust or deepen their understanding. | dang wrote: | The longer blogpost is now the URL at the top (we merged | the threads) | hinkley wrote: | If there is one thing I'd like to have in this decade (well, | last decade really but that ship has sailed) it would be a | standardized format for WAL. There's been a whole bunch of tech | over the last ten years that was in large part plastering over | this hole. Having a reporting database that directly consumed | WAL data from an OLTP database for instance. | kevinburke wrote: | Seems neat, but why is this better than Hadoop? | LewisJEllis wrote: | Hadoop isn't a database, they don't do anything close to the | same thing. Nobody is cross-shopping PlanetScale vs Hadoop. | | The cross-shop is PlanetScale vs Amazon RDS, Amazon Aurora, | Google Cloud SQL, Firebase, Supabase, self-hosting Vitess or | MySQL, etc. | gigatexal wrote: | Because Hadoop is super duper slow? Isn't that why the industry | moved away from it years ago? | marzoevam wrote: | It's super exciting to see Noria-based partially materialized | views get this well-deserved airtime! Eliminating error-prone | caching logic without any code or infrastructure changes in the | context of _any_ database is our core mission over at ReadySet, | and is the reason why Jon Gjengset and I spun the company out of | MIT research on Noria back in 2020. You can read more in our | initial announcement here: https://readyset.io/blog/introducing- | readyset | | If you're reading this announcement post and want to play around | with instant query caching ala Noria in your existing Postgres or | MySQL database, shoot me a me an email and we'll bump you up on | our cloud waitlist :) alana@readyset.io | saybar wrote: | At PolyScale [1], we agree that eliminating error-prone caching | logic without any code or infrastructure changes is a worthy | goal. However, we have taken a different approach to caching, | zero configuration, fully automated. | | You can get connected today in a few minutes, without code or | configuration. PolyScale supports Postgres, MySQL, MariaDB and | SQL Server, with GraphQL with others coming soon. You can also | try the live demo [2]. | | [1] https://www.polyscale.ai/ [2] | https://playground.polyscale.ai/ | vyrotek wrote: | This looks really great. Happy to see SQL Server support | there. | emptysea wrote: | I'm really curious how this works and how it's implementation | compares to something like materialize -- I wonder if there are | any caveats around consistency | giovannibonetti wrote: | It seems similar to MIT's Noria [1] | | > Noria is a new streaming data-flow system designed to act as | a fast storage backend for read-heavy web applications based on | Jon Gjengset's Phd Thesis, as well as this paper from OSDI'18. | It acts like a database, but precomputes and caches relational | query results so that reads are blazingly fast. Noria | automatically keeps cached results up-to-date as the underlying | data, stored in persistent base tables, change. Noria uses | partially-stateful data-flow to reduce memory overhead, and | supports dynamic, runtime data-flow and query change. | | [1] https://github.com/mit-pdos/noria | nickvanw wrote: | Great question! We have a technical blog post about how | PlanetScale Boost is implemented: | https://planetscale.com/blog/how-planetscale-boost-serves-yo... | | In short, it can be compared in consistency to an up-to-date | read replica; PlanetScale Boost uses Vitess' VStream to process | events as they happen and keep itself up to date. The blog has | much more information if you're curious. | [deleted] | dang wrote: | (We've merged the threads so that blog post is now the URL at | the top) | nickvanw wrote: | Thank you as always for everything that you do @dang! | kerblang wrote: | It appears the catch is that you have to use their managed | service; no DIY installation. | https://planetscale.com/docs/concepts/deployment-options | | Acceptable for some, maybe not others | Eclyps wrote: | I just started using Planetscale for small projects here and | there. More and more of my projects are FE-heavy and don't | require a big dedicated database (NextJS apps, mostly hardcoded | designs or headless CMS like Sanity). There are times where I | need to store just small bits of data, maybe contact form | submissions or something. It's been super great to be able to | quickly hook up planetscale to a nextjs api function and have | that data persisted within a matter of minutes. | | I've yet to use it on anything large-scale, though, so I can't | speak to performance when you're really pushing it. | aantix wrote: | Didn't MySQL implement query level caching a while back? | ryanisnan wrote: | I assume this is a bit of a joke, but query caching at least | was not good in 5.5-5.7, so it would often be disabled. I don't | know how 8 performs. | rbranson wrote: | It was removed from MySQL in 8.0 because it wasn't very useful. | MySQL query caching does exact matching on the query string and | any row update to a table used for a given cached query nukes | the entire cache. So it's only useful for a small set of niche | use cases where tables are essentially static. | CharlesW wrote: | Dupe: https://news.ycombinator.com/item?id=33610996 | endisneigh wrote: | Anyone compare this and cockroachdb? | capableweb wrote: | Slightly off-topic but trying to understand something from the | landing page: | | > Powered by open source tech - Built at Google to scale | YouTube.com to billions of users | | Is this a Google project/business owned by Alphabet? The text | seems to indicate so, but I find no information about it when | doing some quick searching or browsing through the website. | aarondf wrote: | Nope! That part you quoted is referring to Vitess, which was | built at Google to scale YouTube. | | See more: https://planetscale.com/vitess | capableweb wrote: | Aha, I see. Thanks for explaining! | | So I'm guessing PlanetScale now helps maintain Vitess and | PlanetScale is somewhat of a hosted Vitess for people who | don't want to self-host? | aarondf wrote: | Yup! Vitess is at the core of PlanetScale and enables us to | add lots of cool stuff on top (branching, Boost, etc) but | Vitess itself is still open source! | _ben_ wrote: | For database caching outside of PlanetScale, PolyScale.ai [1] | provides a serverless database edge cache that is compatible with | Postgres, MySQL, MariaDB and MS SQL Server. Requires zero | configuration or sizing etc. | | 1.https://www.polyscale.ai/ | rbranson wrote: | I tried to use PolyScale in the past but had issues with | performance because updating a row would invalidate the entire | cache. I wonder if that has improved? | _ben_ wrote: | Yes, in the early versions of the automated invalidation, the | logic cleared all cached data based on tables. That is no | longer the case. The invalidations only remove the affected | data from the cache, globally. You can read more here: | https://docs.polyscale.ai/how-does-it-work#smart- | invalidatio... | rbranson wrote: | It didn't impact everything, I think I was hitting this | case: | | > When a query is deemed too complex to determine what | specific cached data may have become invalidated, a | fallback to a simple but effective table level invalidation | occurs. | saybar wrote: | We've made a lot of changes - give it a try again or feel | free to reach out to support@polyscale.ai and we'd be | happy to assist you. | edmundsauto wrote: | I just started a small hobby project and selected supabase for my | db provider. Anyone with experience in both Supa and PlanetScale | care to comment about the differences? | | To me, it looks like supabase is designed to take full advantage | of postgres features. plpgsql triggers + RLS + clientside auth + | streaming changes to subscribers (including via web hooks) are my | favorite features. (They also have js edge functions, but I use | lambda instead b/c I prefer python) | | Supabase feels like the scrappy company with amazing focus, akin | to an early MailChimp (circa 2007). PlanetBase feels more like | early Snowflake - massive scale, focus on performance, can match | anything feature-by-feature. One is a master of their craft, the | other is a gorilla at scale. | | Curious what others think. I haven't used PlanetBase extensively | so don't have much to go on except their marketing. | Jarwain wrote: | That sounds about right from my understanding. Supabase was | made as an alternative to firebase, acting as a data layer with | a lot of features simplifying application development. | | Planet Base feels like Snowflake, or some aspects of fly.io, or | timescale's managed cloud offering; their focus is on the core | database tech and delivering that in a scalable manner. | greg-m wrote: | ReadySet (readyset.io) supports the same style of caching and | works with Supabase, if you want to check us out :) | | I have a few extra cloud invites: greg@readyset.io | edmundsauto wrote: | Much appreciated, I will check you out for my next project. | Right now I'm not able to migrate as I'm trying to get an MVP | up and running and have spent a few days deeply integrating | w/ Supabase. | | What are your core value prop differences between your | service and sb? Just curious how I should think about your | offering compared to what I'm familiar with. | greg-m wrote: | We work with supabase - so your DB still deploys out to | them, but we cache queries in memory so they're | significantly lower latency. Since we're offloading reads, | we also help handling traffic spikes, lower costs, etc. ___________________________________________________________________ (page generated 2022-11-15 23:00 UTC)