[HN Gopher] One million queries per second with MySQL
       ___________________________________________________________________
        
       One million queries per second with MySQL
        
       Author : aarondf
       Score  : 125 points
       Date   : 2022-09-01 18:35 UTC (4 hours ago)
        
 (HTM) web link (planetscale.com)
 (TXT) w3m dump (planetscale.com)
        
       | truth_seeker wrote:
       | Correct me if my knowledge is outdated,
       | 
       | Vitess deployment comes at a cost:
       | 
       | 1. You can't have single database view of all shards 2. No
       | distributed ACID 3. No distributed JOINs
       | 
       | If I were to do it in PostgreSQL 14 natively without any other
       | third party plugins, I can get help of Table Partitioning (say
       | root server) + Tables on Foreign Servers (say leaf servers)
       | through FDW. Parallel scans and JOINs are allowed. Also,
       | PostgreSQL 15 (in beta now) will allow parallel commits to
       | foreign server tables through FDW.
        
         | harshitgangal wrote:
         | Disclaimer: I am a Vitess Maintainer.
         | 
         | Please find my answer inline
         | 
         | 1. You can't have single database view of all shards
         | 
         | - Vitess does query planner and routing for the application so
         | application does not have to know about shards. Yes, Vitess
         | gives a view of a single database.
         | 
         | 2. No distributed ACID
         | 
         | - This has been talked a lot feature and it is in our roadmap
         | to provide. Currently, we have a twoPC implementation which we
         | do not recommend for production use. Right now the users of
         | vitess either choose their sharding key in such a way that all
         | the queries in a transaction goes to single shard or the
         | application can handle best effort multi shard transaction.
         | 
         | 3. No distributed JOINs
         | 
         | - Vitess does support distributed joins, aggregation,
         | subqueries and other complex queries. Vitess thrive to be 100%
         | Mysql compatible. If anyone encouter such query, a github issue
         | helps in knowing and prioritizing.
        
           | truth_seeker wrote:
           | Thanks for info.
        
         | mattlord wrote:
         | > If I were to do it in PostgreSQL 14 natively without any
         | other third party plugins, I can get help of Table Partitioning
         | (say root server) + Tables on Foreign Servers (say leaf
         | servers) through FDW. Parallel scans and JOINs are allowed.
         | Also, PostgreSQL 15 (in beta now) will allow parallel commits
         | to foreign server tables through FDW.
         | 
         | I wish you all the best in operating that at scale and
         | providing a reliable general purpose PostgreSQL DBaaS (even if
         | only internally) around it. :-) PostgreSQL is great, not going
         | to argue otherwise -- every solution has tradeoffs and if the
         | tradeoffs of one solution fit your usage best then go with it.
         | If you need to scale out and manage a large cluster of
         | PostgreSQL instances as if it was a single logical database
         | then I'd recommend something that's in a similar space as
         | Vitess and PlanetScale -- Citus and Citus on Azure
         | respectively. At the hand waving level it's always easy to "do
         | this, and that", but it's a whole other matter when you develop
         | a general purpose production-grade product that anyone can use
         | and operate at virtually infinite scale. :-)
        
           | truth_seeker wrote:
        
         | aeyes wrote:
         | FDW is not nearly as usable as you think. Joins of even 2 or 3
         | remote tables with more than a few hundred rows in the join are
         | enough to make simple queries take minutes to return even if
         | you tune all the settings and have statistics of remote tables.
         | 
         | Also, how do you reshard without downtime?
        
           | truth_seeker wrote:
           | (Plz excuse the brevity. I am using my mobile phone to reply)
           | 
           | Have you read this page carefully especially the section
           | "remote execution option"
           | 
           | https://www.postgresql.org/docs/current/postgres-fdw.html
           | 
           | If not give it a try.
           | 
           | Also this one might help too.
           | 
           | https://www.percona.com/blog/2021/06/02/postgres_fdw-
           | enhance...
           | 
           | Now, about re-sharding:
           | 
           | I know this is not as care free as in NewSQL dbs like
           | CockroachDB but its not that difficult either.
           | 
           | Choosing a shard key ultimately comes down to breaking down
           | your ER diagram and inspecting the incoming write flow to
           | certain tables which are bound to grow.
           | 
           | In any of the cases you don't have to let your user suffer
           | from downtime. At least I have not. Even if you chose wrong
           | shard key, You can gracefully transfer or re-distribute the
           | data records by creating new tables and renaming then with
           | old ones. The window of appearant downtime here is few
           | milliseconds. This is not something which will happen
           | frequently.
        
         | gen220 wrote:
         | Postgres FDW [1] seems to me to be a crazy cool (i.e.
         | unexpected and good) feature of Postgres, but I haven't seen
         | much chatter about it on HN in the rdbms stories.
         | 
         | Has anyone here used it? What is/was your experience like?
         | 
         | [1]: https://www.postgresql.org/docs/current/postgres-fdw.html
        
           | sroussey wrote:
           | That's a great feature but not al all comparable for a
           | deployment.
           | 
           | The closest thing for PostgreSQL would be Citus, I imagine.
        
       | 29athrowaway wrote:
       | Not all queries are the same.
        
       | andersonrkton wrote:
       | I would like to see a bench using the "common" setup and not the
       | enterprise one... what are the limits on those?
        
         | forgot_old_user wrote:
         | says this in the article at the end:
         | 
         | >We ran this benchmark against a single-tenant environment,
         | with levels of resources that we reserve for our enterprise
         | customers.
        
       | nelsondev wrote:
       | How much hardware did you need?
        
       | tiffanyh wrote:
       | In 2018, folks were getting 4M queries per second using SQLite
       | (BedrockDB) [0].
       | 
       | This was also accomplished from just 1 server, not needing 40
       | shards like blog post.
       | 
       | [0] https://blog.expensify.com/2018/01/08/scaling-sqlite-
       | to-4m-q...
        
         | mattlord wrote:
         | The Expensify test was targeted at demonstrating the upper
         | limits of SQLite itself. The hardware used: > 1TB of DDR4 RAM >
         | 3TB of NVME SSD storage > 192 physical 2.7GHz cores (384 with
         | hyperthreading)
         | 
         | And a very simple read-only query w/o overlapping rows, with
         | the full dataset already loaded in memory: > The query itself
         | just sums a range of 10 rows randomly located inside 10B row
         | (447GB) two column database of random integers, one of which is
         | indexed and the other not. This means that the test is fully
         | cached in RAM, and the database is precached prior to the test.
         | No writes are done during this test, only reads.
         | 
         | All that is good and fine for that purpose -- to demonstrate
         | the limits of what SQLite can do in scaling UP.
         | 
         | The purpose of the PlanetScale post was demonstrating linear
         | scalability by scaling OUT using a realistic workload (TPC-C is
         | a warehouse / ecommerce site simulation) on small commodity VMs
         | (ec2 isntances).
         | 
         | As noted, you're comparing two very different tests that had
         | very different goals.
        
         | throwusawayus wrote:
         | i strongly dislike planetscale and even i think this is not a
         | useful or fair comparison .. sqlite works with local files,
         | much simpler, no network latency, no HA
        
           | kamikaz1k wrote:
           | why do you dislike t?
        
           | marginalia_nu wrote:
           | > i strongly dislike planetscale and even i think this is not
           | a useful or fair comparison .. sqlite works with local files,
           | much simpler, no network latency, no HA
           | 
           | To be fair, if you accomplish the same effect without needing
           | distributed computing, then the solution gets much simpler,
           | you cut out worrying about network latency and HA*
           | 
           | * the probability of node failure grows very rapidly with the
           | number of nodes. Same reason you probably don't need RAID for
           | 1 disk, but you definitely do for a dozen
        
         | rbranson wrote:
         | Apples to oranges comparison. The benchmark in the PlanetScale
         | blog post is based on TPC-C.
        
         | [deleted]
        
         | [deleted]
        
         | SnowHill9902 wrote:
         | And with awk you can query 10M lines per second.
        
         | bitobserver wrote:
         | "Granted, it's not SQLite by itself. We've wrapped it in a
         | custom distributed transaction layer ..."
        
           | tiffanyh wrote:
           | How is this any different than the blog post that uses
           | Vitess, which is a custom wrapper around MySQL?
           | 
           | BedrockDB is a custom wrapper around SQLite.
        
       | hbrn wrote:
       | I'm unfamiliar with vitess, but what exactly is the achievement
       | here?
       | 
       | If you have a shared-nothing architecture, you can keep
       | indefinitely adding nodes to get more throughput. You can easily
       | do a billion qps if you wanted to.
       | 
       | The downside is that you have to eliminate relations (or at least
       | don't expect them to be performant/scalable).
       | 
       | Am I missing something?
        
         | [deleted]
        
         | samlambert wrote:
         | You understand the point. It's a real point. Show me a modern
         | relational database that can scale this predictably on the
         | cloud with this level operability.
        
           | tiffanyh wrote:
           | When you have to eliminate data relationships for it to
           | scale, you no longer have a "relational" database.
        
             | sroussey wrote:
             | You would be surprised then. Most SaaS companies can easily
             | shard by customer. All the customer data stays together,
             | relational and all.
             | 
             | Cross-customer queries will be somewhat slower.
        
               | tiffanyh wrote:
               | Sure, but what you are describing is no longer a multi-
               | tenant application/database. It's essentially a single-
               | tenant deployment of your tech stack _per_ customer.
               | Which is not very cost effective.
        
               | staticassertion wrote:
               | I would still consider it a multi-tenant system. It's a
               | single database to manage, which distributes your
               | customers using their identity as a partitioning key.
        
               | sroussey wrote:
               | It's multi tenant --- you have 10,000,000 customers with
               | 1,000,000 each on 10 servers, for example.
               | 
               | Or I don't understand what you mean...
        
               | [deleted]
        
           | hbrn wrote:
           | A relational database without relations is an oxymoron. As
           | folks pointed out, you also have to throw ACID away. So
           | what's left of the original database, SQL-dialect? I bet that
           | gets limited too.
           | 
           | Look, I get it, you have to sell your product. Some folks
           | want semi-infinitely scalable storage, and they don't
           | understand that the only way to achieve it is turning their
           | DB into a key-value store. As a side effect they would have
           | to rewrite their whole application from scratch, but they
           | would only realize it after they get vendor locked in.
           | 
           | You can advertise your solution as MySQL-compatible. And I
           | can claim that it's dishonest.
        
             | mattlord wrote:
             | > A relational database without relations is an oxymoron.
             | 
             | OK. You're the only one talking to this straw man though.
             | :-) Every Vitess user that I'm aware of has a pretty
             | typical 2NF/3NF schema design. A small sampling of them
             | being listed here: https://vitess.io
             | 
             | You setup your data distribution/partitioning/sharding
             | scheme so that you have data locality for 99.9999+% of your
             | queries -- meaning that the query executes against a data
             | subset that lives on a single shard/node (e.g. sharding by
             | customer_id) -- and you live with the performance hit and
             | consistency tradeoffs for those very rare cases that cross
             | shard queries cannot be avoided (Vitess does support this).
             | You should do this even if the solution you're using claims
             | to have distributed SQL with ACID and MVCC
             | guarantees/properties. There's no magic that improves the
             | speed of light and removes other resource constraints. In
             | practice most people say they want perfect
             | security/consistency/<name your desired property here> but
             | then realize that the costs (perf, resources, $$, etc) are
             | simply so high that it is not practical for their
             | business/use case.
             | 
             | I know MySQL fairly well (I started working at MySQL, AB in
             | 2003) and you can certainly claim that "MySQL-compatible"
             | is dishonest but I would offer a counter claim that either
             | you don't know this space very well or you're not operating
             | in good faith here.
        
               | hbrn wrote:
               | To be fair, I skimmed through your docs and did misread
               | them initially: I thought you don't allow foreign keys,
               | but you actually don't allow foreign key _constraints_.
               | 
               | If you are still allowing JOINs within a shard, then I
               | need to apologize.
        
               | hodgesrm wrote:
               | To pile on your answer a bit the manual bucketing you
               | describe is exactly how ClickHouse works in most cases.
               | It won't allow joins / IN on multiple distributed tables
               | --i.e., sharded/replicated tables--unless you explicitly
               | set a property called distributed_product_mode. [0] This
               | is to prevent you from shooting yourself in the foot
               | either by bad performance or improperly distributed data.
               | 
               | This constraint will eventually be relaxed but most apps
               | are able to work around it just fine. The ones that can't
               | use Snowflake.
               | 
               | [0] https://clickhouse.com/docs/en/operations/settings/se
               | ttings/...
        
             | closeparen wrote:
             | "Relational" actually refers to tables, not foreign keys.
             | https://en.wikipedia.org/wiki/Relation_(database)
        
               | johannes1234321 wrote:
               | More to "relational algebra" of which joins are a part.
               | (But not necessarily "foreign keys")
        
       | [deleted]
        
       | loxias wrote:
       | Is a shard a server? If so, how powerful of a machine were they
       | using? I'm deeply unimpressed at only 1M TPS on a _40 node_
       | cluster.
        
       | ijidak wrote:
       | Very good job.
       | 
       | A few questions.
       | 
       | What is the response payload size? How large is the key space?
       | And how many unique keys?
       | 
       | I may have missed those details in the write up.
       | 
       | It's hard to judge queries per second without understanding those
       | pieces.
       | 
       | Genuinely curious.
        
       | chreniuc wrote:
       | I'm curious about the latency between a backend app that runs on
       | different cloud provider than aws(eg hetzner), because I've
       | noticed that they run on aws(it shows up when choosing the region
       | for your database). Has anyone used this and connected from a
       | different cloud provider? Have you noticed big latencies?
        
         | orware wrote:
         | You will generally run into different types of latencies in
         | this case since you have the general connection latency when
         | those are getting established plus the regular physical/network
         | latency between where the database is located and your own
         | servers.
         | 
         | For connections, since a TLS handshake is required, the impact
         | of physical distances can have a greater impact on the
         | connection time. The following article:
         | https://sking7.github.io/articles/44961356.html actually
         | provides a good 3.5x-4x figure which correlates with some
         | connection tests I've completed.
         | 
         | In other words, if an initial TCP packet takes ~100ms to get
         | from the database to your server, then establishing a TLS
         | connection to the database will probably be around 400ms.
         | 
         | Once the connection is established, running queries over an
         | open connection is generally going to be quicker, at least for
         | simpler queries. More complex queries will still take whatever
         | time they need to process on the database end before they can
         | start sending results back so results will generally vary
         | there.
         | 
         | But going back to that 100ms example...if the amount of data
         | being returned from a very simple query is minimal than the
         | response time here would be very close to that 100ms figure
         | over an already open connection and likely would go up from
         | there depending on the complexity of the query and amount of
         | data needing to be returned.
         | 
         | Since the connection hostnames are publicly accessible and TLS
         | is always required for connections you can easily test from
         | your own provider's location. So long as the general physical
         | location isn't too far away from a supported region, the
         | latency overall shouldn't be unusable.
         | 
         | I may have mangled some terminology/analogies above but
         | hopefully that helps provide a bit of a ballpark for you. If
         | you have specific to/from regions in mind I might be able to
         | try and collect some specific numbers for you!
        
           | toast0 wrote:
           | I haven't spent time optimizing TLS between a database client
           | and server, but in HTTPS, using TLS 1.3 without early data
           | (or TLS 1.2 with somewhat optimistic handshake handling) gets
           | you to one added roundtrip, TLS 1.3 early data gets you down
           | to zero added round trips. Early data isn't always
           | appropriate, because there's potential for replays, but the
           | latency improvement might be worth considering for some
           | selects.
        
         | [deleted]
        
         | minhazm wrote:
         | It's depends on how far the two data centers are from one
         | another. But the big cloud providers have data centers in
         | similar locations since they usually want to be near an
         | internet exchange[1]. For example most companies that provide
         | cloud services have a data center in Virginia, USA. So if both
         | data centers are in the same region then you can get numbers in
         | the sub 5ms range. As long as you're not crossing the ocean the
         | latency should be fine.
         | 
         | 1.
         | https://en.wikipedia.org/wiki/List_of_Internet_exchange_poin...
        
       | leetrout wrote:
       | I know their devs (or the vitess devs) are on here...
       | 
       | No foreign keys are supported - are there any other schema level
       | constraints that are not supported that would also enable scaling
       | out and achieving these numbers?
        
       | samsquire wrote:
       | HandlerSocket is an interesting approach to turn MySQL into a
       | NoSQL database and get 700,000 requests per second on a single
       | node.
       | 
       | http://yoshinorimatsunobu.blogspot.com/2010/10/using-mysql-a...
       | 
       | When I benchmarked Redis on an Intel hexacore I got 800,000
       | requests a second on 8 terminal windows. So I assume the MYSQL
       | data fit in RAM.
        
       | maxdemarzi wrote:
       | What were the stats on each Shard? Was each shard running 4
       | cores? 8 cores? 32 cores? RAM? The blog post is light on details.
        
       ___________________________________________________________________
       (page generated 2022-09-01 23:00 UTC)