[HN Gopher] The myth of "joins don't scale" ___________________________________________________________________ The myth of "joins don't scale" Author : forrestbrazeal Score : 40 points Date : 2020-07-08 20:28 UTC (2 hours ago) (HTM) web link (blog.dbi-services.com) (TXT) w3m dump (blog.dbi-services.com) | m0xte wrote: | It's not a myth. In the trite examples provided with tiny | datasets they scale. In stark reality when you've got 5000+ qps | of dog shit queries of several joins each and buggered table | statistics (thanks optimiser!) coming in on realistic datasets | they are a damage amplifier. | | Really all they do is repeat work in the most costly way possible | that should have been done up front once. But that incurs the | cost of knowing what you're looking for before you do it. | zozbot234 wrote: | If you want to avoid repeating work, you can set up a | materialized view. But really, quite often a few tailored | indexes are enough. | dman wrote: | People do not realize just how far industry practitioners take | joins :) So my take is that users will find a way to take joins | to the point where they do not scale. | crazygringo wrote: | Unfortunately, the author/article seems to _completely_ miss the | meaning of "joins don't scale". | | _Obviously_ indexed joins on a single database server scale just | fine, that 's the entire selling point of an RDMBS! | | The meaning of "joins don't scale" is that they don't scale | _across_ database servers, when your dataset is too big to fit in | a single database instance. Joins scale across _rows_ , they | don't scale across _servers_. | | Now a lot of people don't realize how insanely powerful single- | server DB's can be. A lot of people that assume they need to | architect for a multi-server DB don't realize they can get away | with a hugely-provisioned SSD single-server indexed database with | a backup, with performant queries. | | But if you're absolutely sure you need to be planning to run at | close to Twitter or Facebook scale one day, then yep, you'd | better architect from the beginning not to use joins. And then | whether you pick an RDBMS or NoSQL solution is mostly a tooling | issue. | redisman wrote: | > But if you're absolutely sure you need to be planning to run | at close to Twitter or Facebook scale one day, then yep, you'd | better architect from the beginning not to use joins. And then | whether you pick an RDBMS or NoSQL solution is mostly a tooling | issue. | | Even then you probably get a few years of use out of that | simple setup while you work on migrating to a horizontal | solution. And by then more likely than hitting Twitter-scale, | your company has gone bankrupt. | yelloweyes wrote: | reality is 99% of companies out there will be just fine with a | single db server | | all this nosql fad is simply because SQL is hard and newbies | would just rather call it old and slow because they can't be | bothered to learn it | DaiPlusPlus wrote: | Cross-database JOINs _should_ work if the database server is | written to handle that - take Spanner or CockroachDB: they're | both capable of high-performance cross-database and cross- | server joins. | | I think the poor reputation that cross-server joins have comes | from servers like MS SQL Server that do cross-server joins | through ODBC which is obviously going to be terrible for | performance (though I don't know exactly how - or even if - SQL | Server handles a highly-selective INNER JOIN differently | compared to a join with lower specificity on the remote table). | DaiPlusPlus wrote: | I get the impression that (crucially: outside of Facebook/Google- | scale applications) the strongest advocates for No-SQL databases | are people who had a poor experience with SQL databases as | juniors or students - or don't have the necessary experience and | skills to make the most of a SQL RDBMS, such as not understanding | relational-algebra or relational-calculus - or not even knowing | SQL at all - or having DBA skills go know how to define the best | indexes - or even what a cardinality-estimate in an execution | plan is. If you throw all of those topics at someone and also add | that they only have a week to build the thing - it's no-wonder | that people turn to No-SQL. | | I'm not putting those people down either - I would have opted for | No-SQL myself if I wasn't already familiar with RDBS theory and | management. What I am criticising is the large amount of manual- | management needed when you do run an RDBMS: things like automatic | index generation and pruning, nightly index rebuilds and | statistics computing, etc arguably SHOULD be part of the RDBS | out-of-the-box. To my knowledge only IaaS databases have features | like that (it's what justifies the subscription price...). Some | on-prem database server products will generate recommendations | but they won't automatically implement them on their own accord. | Another issue I have is with the design of the SQL language: it's | unnecessarily verbose and very confusing for beginners as its | basis in relational-calculus instead of relational-algebra throws | people off ("SELECT is evaluated _after_ the WHERE clause?!"). | | If SQL is like SGML and XML, which was replaced in many | applications by the faster and lighter JSON format, can we do the | same for SQL? Unfortunately there's too much institutional | inertia behind SQL for any of the DBMS vendors to go-it-alone, | and all of the SQL implementations are incompatible with each | other anyway. | ladberg wrote: | I totally agree with this. I'm relatively new to SQL but have | recently had to start messing around with it. In all other CS | domains I know exactly what will happen under the hood for any | given action, but for SQL I have zero clue. | | I would love it if I could just write queries and it would | automatically add an index or do something else for performance | if needed, and possibly report back to me if a query I wrote | has particularly bad performance. | Sohcahtoa82 wrote: | This page is ridiculous. | | It downloads 89 separate JS files and then breaks scrolling with | my scroll wheel on Firefox. | | This is why most non-technical users like AMP. | dfee wrote: | This is not reading material for non-technical users. | | It's also highly irrelevant to the point that's made (or | discussion about the validity of the claims). | y2bd wrote: | This kind of comment is now against the guidelines: | https://news.ycombinator.com/newsguidelines.html (third from | bottom). | twirlock wrote: | The guidelines are stupid. This website is another casualty | of spooks atroturfing the internet. At least you're so | pathetically officious about it that we get a laugh. | baddox wrote: | That's the first time I've seen that guideline. It seems like | the "too common to be interesting" principle would apply to | an enormous amount of comments here. ___________________________________________________________________ (page generated 2020-07-08 23:00 UTC)