[HN Gopher] Demystifying Database Performance for Developers ___________________________________________________________________ Demystifying Database Performance for Developers Author : winslett Score : 78 points Date : 2022-05-06 17:01 UTC (5 hours ago) (HTM) web link (www.crunchydata.com) (TXT) w3m dump (www.crunchydata.com) | magicalhippo wrote: | Generally I'd say do the same as you would with your compiler | when trying to optimize your code: spend time with the query | analyzer. | | What the query optimizer will or won't do depends on your | database software and your data, in addition to your query of | course. | | For example, the article states "publish_date < now() - interval | '3 days'" is always faster than "publish_date + interval '3 days' | < now()". | | Well not for the database server we use, because it evaluates | now() per row turning both into table scans... yay. | | In addition to the suggestions in the article, I'd say use common | table expressions (CTE), or sub-query if your database server | doesn't have CTEs or treats them as an optimization barrier. | | I've found the optimizer tries to be smart but doesn't recognize | which of the joins will be the most specific filter, causing it | to join tons of rows, just to throw most of them away with a | later join. Using a CTE or sub-query can force the optimizers | hand. | | Also I've often found it useful to use CTEs or sub-queries for | returning row values, typically aggregates. This can especially | work well if you're limiting the rows returned using TOP or | LIMIT. Again this can avoid the DB throwing away work. | Something1234 wrote: | what database server evaluates now() per row? | rileymat2 wrote: | That seems like it would cause a lot of problems, other than | the optimization. | eatonphil wrote: | This is a good intro! It's good to have quick articles like this | where you can easily read and pick up the basics. | | It's also pretty easy to build your own basic in-memory database | if you have time/interest to show yourself these characteristics | through experiment. And doing it experimentally ingrains the | concepts better than just reading about it IMO. | bboylen wrote: | How might one go about this? | | Do you have any relevant articles? | charcircuit wrote: | I feel like the use of indexes is one of the least mysterious | parts of databases performance. | leeoniya wrote: | i remember being surprised by how slow GROUP BY year(date), | month(date) was in MySQL, despite the date column being properly | indexed. | | the solution was to store each of those in separate indexed | columns instead of re-computing them at query time. think i set | up an INSERT trigger to add these to a separate "index" table | that i then joined to get the GROUP BY to be fast. | macintux wrote: | That concept was a big part of the data modeling story when I | was helping people understand how to use Riak, a distributed | key/value store. Generate your results as you receive the data, | whenever possible, and run batch processes to generate them | when you can't. | spacemanmatt wrote: | Good article. Story of my career: Accepting as well as teaching | that a faster CPU won't make my database faster. ___________________________________________________________________ (page generated 2022-05-06 23:00 UTC)