[HN Gopher] Demystifying Database Performance for Developers
       ___________________________________________________________________
        
       Demystifying Database Performance for Developers
        
       Author : winslett
       Score  : 78 points
       Date   : 2022-05-06 17:01 UTC (5 hours ago)
        
 (HTM) web link (www.crunchydata.com)
 (TXT) w3m dump (www.crunchydata.com)
        
       | magicalhippo wrote:
       | Generally I'd say do the same as you would with your compiler
       | when trying to optimize your code: spend time with the query
       | analyzer.
       | 
       | What the query optimizer will or won't do depends on your
       | database software and your data, in addition to your query of
       | course.
       | 
       | For example, the article states "publish_date < now() - interval
       | '3 days'" is always faster than "publish_date + interval '3 days'
       | < now()".
       | 
       | Well not for the database server we use, because it evaluates
       | now() per row turning both into table scans... yay.
       | 
       | In addition to the suggestions in the article, I'd say use common
       | table expressions (CTE), or sub-query if your database server
       | doesn't have CTEs or treats them as an optimization barrier.
       | 
       | I've found the optimizer tries to be smart but doesn't recognize
       | which of the joins will be the most specific filter, causing it
       | to join tons of rows, just to throw most of them away with a
       | later join. Using a CTE or sub-query can force the optimizers
       | hand.
       | 
       | Also I've often found it useful to use CTEs or sub-queries for
       | returning row values, typically aggregates. This can especially
       | work well if you're limiting the rows returned using TOP or
       | LIMIT. Again this can avoid the DB throwing away work.
        
         | Something1234 wrote:
         | what database server evaluates now() per row?
        
           | rileymat2 wrote:
           | That seems like it would cause a lot of problems, other than
           | the optimization.
        
       | eatonphil wrote:
       | This is a good intro! It's good to have quick articles like this
       | where you can easily read and pick up the basics.
       | 
       | It's also pretty easy to build your own basic in-memory database
       | if you have time/interest to show yourself these characteristics
       | through experiment. And doing it experimentally ingrains the
       | concepts better than just reading about it IMO.
        
         | bboylen wrote:
         | How might one go about this?
         | 
         | Do you have any relevant articles?
        
       | charcircuit wrote:
       | I feel like the use of indexes is one of the least mysterious
       | parts of databases performance.
        
       | leeoniya wrote:
       | i remember being surprised by how slow GROUP BY year(date),
       | month(date) was in MySQL, despite the date column being properly
       | indexed.
       | 
       | the solution was to store each of those in separate indexed
       | columns instead of re-computing them at query time. think i set
       | up an INSERT trigger to add these to a separate "index" table
       | that i then joined to get the GROUP BY to be fast.
        
         | macintux wrote:
         | That concept was a big part of the data modeling story when I
         | was helping people understand how to use Riak, a distributed
         | key/value store. Generate your results as you receive the data,
         | whenever possible, and run batch processes to generate them
         | when you can't.
        
       | spacemanmatt wrote:
       | Good article. Story of my career: Accepting as well as teaching
       | that a faster CPU won't make my database faster.
        
       ___________________________________________________________________
       (page generated 2022-05-06 23:00 UTC)