[HN Gopher] The myth of "joins don't scale"
       ___________________________________________________________________
        
       The myth of "joins don't scale"
        
       Author : forrestbrazeal
       Score  : 40 points
       Date   : 2020-07-08 20:28 UTC (2 hours ago)
        
 (HTM) web link (blog.dbi-services.com)
 (TXT) w3m dump (blog.dbi-services.com)
        
       | m0xte wrote:
       | It's not a myth. In the trite examples provided with tiny
       | datasets they scale. In stark reality when you've got 5000+ qps
       | of dog shit queries of several joins each and buggered table
       | statistics (thanks optimiser!) coming in on realistic datasets
       | they are a damage amplifier.
       | 
       | Really all they do is repeat work in the most costly way possible
       | that should have been done up front once. But that incurs the
       | cost of knowing what you're looking for before you do it.
        
         | zozbot234 wrote:
         | If you want to avoid repeating work, you can set up a
         | materialized view. But really, quite often a few tailored
         | indexes are enough.
        
         | dman wrote:
         | People do not realize just how far industry practitioners take
         | joins :) So my take is that users will find a way to take joins
         | to the point where they do not scale.
        
       | crazygringo wrote:
       | Unfortunately, the author/article seems to _completely_ miss the
       | meaning of  "joins don't scale".
       | 
       |  _Obviously_ indexed joins on a single database server scale just
       | fine, that 's the entire selling point of an RDMBS!
       | 
       | The meaning of "joins don't scale" is that they don't scale
       | _across_ database servers, when your dataset is too big to fit in
       | a single database instance. Joins scale across _rows_ , they
       | don't scale across _servers_.
       | 
       | Now a lot of people don't realize how insanely powerful single-
       | server DB's can be. A lot of people that assume they need to
       | architect for a multi-server DB don't realize they can get away
       | with a hugely-provisioned SSD single-server indexed database with
       | a backup, with performant queries.
       | 
       | But if you're absolutely sure you need to be planning to run at
       | close to Twitter or Facebook scale one day, then yep, you'd
       | better architect from the beginning not to use joins. And then
       | whether you pick an RDBMS or NoSQL solution is mostly a tooling
       | issue.
        
         | redisman wrote:
         | > But if you're absolutely sure you need to be planning to run
         | at close to Twitter or Facebook scale one day, then yep, you'd
         | better architect from the beginning not to use joins. And then
         | whether you pick an RDBMS or NoSQL solution is mostly a tooling
         | issue.
         | 
         | Even then you probably get a few years of use out of that
         | simple setup while you work on migrating to a horizontal
         | solution. And by then more likely than hitting Twitter-scale,
         | your company has gone bankrupt.
        
         | yelloweyes wrote:
         | reality is 99% of companies out there will be just fine with a
         | single db server
         | 
         | all this nosql fad is simply because SQL is hard and newbies
         | would just rather call it old and slow because they can't be
         | bothered to learn it
        
         | DaiPlusPlus wrote:
         | Cross-database JOINs _should_ work if the database server is
         | written to handle that - take Spanner or CockroachDB: they're
         | both capable of high-performance cross-database and cross-
         | server joins.
         | 
         | I think the poor reputation that cross-server joins have comes
         | from servers like MS SQL Server that do cross-server joins
         | through ODBC which is obviously going to be terrible for
         | performance (though I don't know exactly how - or even if - SQL
         | Server handles a highly-selective INNER JOIN differently
         | compared to a join with lower specificity on the remote table).
        
       | DaiPlusPlus wrote:
       | I get the impression that (crucially: outside of Facebook/Google-
       | scale applications) the strongest advocates for No-SQL databases
       | are people who had a poor experience with SQL databases as
       | juniors or students - or don't have the necessary experience and
       | skills to make the most of a SQL RDBMS, such as not understanding
       | relational-algebra or relational-calculus - or not even knowing
       | SQL at all - or having DBA skills go know how to define the best
       | indexes - or even what a cardinality-estimate in an execution
       | plan is. If you throw all of those topics at someone and also add
       | that they only have a week to build the thing - it's no-wonder
       | that people turn to No-SQL.
       | 
       | I'm not putting those people down either - I would have opted for
       | No-SQL myself if I wasn't already familiar with RDBS theory and
       | management. What I am criticising is the large amount of manual-
       | management needed when you do run an RDBMS: things like automatic
       | index generation and pruning, nightly index rebuilds and
       | statistics computing, etc arguably SHOULD be part of the RDBS
       | out-of-the-box. To my knowledge only IaaS databases have features
       | like that (it's what justifies the subscription price...). Some
       | on-prem database server products will generate recommendations
       | but they won't automatically implement them on their own accord.
       | Another issue I have is with the design of the SQL language: it's
       | unnecessarily verbose and very confusing for beginners as its
       | basis in relational-calculus instead of relational-algebra throws
       | people off ("SELECT is evaluated _after_ the WHERE clause?!").
       | 
       | If SQL is like SGML and XML, which was replaced in many
       | applications by the faster and lighter JSON format, can we do the
       | same for SQL? Unfortunately there's too much institutional
       | inertia behind SQL for any of the DBMS vendors to go-it-alone,
       | and all of the SQL implementations are incompatible with each
       | other anyway.
        
         | ladberg wrote:
         | I totally agree with this. I'm relatively new to SQL but have
         | recently had to start messing around with it. In all other CS
         | domains I know exactly what will happen under the hood for any
         | given action, but for SQL I have zero clue.
         | 
         | I would love it if I could just write queries and it would
         | automatically add an index or do something else for performance
         | if needed, and possibly report back to me if a query I wrote
         | has particularly bad performance.
        
       | Sohcahtoa82 wrote:
       | This page is ridiculous.
       | 
       | It downloads 89 separate JS files and then breaks scrolling with
       | my scroll wheel on Firefox.
       | 
       | This is why most non-technical users like AMP.
        
         | dfee wrote:
         | This is not reading material for non-technical users.
         | 
         | It's also highly irrelevant to the point that's made (or
         | discussion about the validity of the claims).
        
         | y2bd wrote:
         | This kind of comment is now against the guidelines:
         | https://news.ycombinator.com/newsguidelines.html (third from
         | bottom).
        
           | twirlock wrote:
           | The guidelines are stupid. This website is another casualty
           | of spooks atroturfing the internet. At least you're so
           | pathetically officious about it that we get a laugh.
        
           | baddox wrote:
           | That's the first time I've seen that guideline. It seems like
           | the "too common to be interesting" principle would apply to
           | an enormous amount of comments here.
        
       ___________________________________________________________________
       (page generated 2020-07-08 23:00 UTC)