[HN Gopher] The SQLite Index Suggester ___________________________________________________________________ The SQLite Index Suggester Author : polyrand Score : 147 points Date : 2022-07-05 18:02 UTC (4 hours ago) (HTM) web link (www.sqlite.org) (TXT) w3m dump (www.sqlite.org) | yvan wrote: | I find it interesting, as recently I was reading about how | complex it would/could be to create an index suggester. | | https://www.depesz.com/2021/10/22/why-is-it-hard-to-automati... | Someone wrote: | It's like writing a compiler or interpreter: writing one is | easy; writing a good one extremely hard. | | This suggester isn't very good. It takes a single query and | suggests indexes for it. A good one would take a mix of queries | and suggest a set of indexes, also considering the impact on | write speed of additional indexes (table updates often need to | update indexes, too) | | For the example in this article, if the table is large and the | average number of rows with a given 'a' value is close to 1 or | if most queries are for 'a' values that aren't in the database, | it may even be better to do CREATE INDEX x1a ON | x1(a); | | That gives you a smaller index, decreasing disk usage. | zimpenfish wrote: | > This suggester isn't very good. It takes a single query and | suggests indexes for it. | | The underlying API can analyse multiple queries - looks like | they've only coded up the test `.expert` command for one. | | From [1], "The sqlite3expert object is configured with one or | more SQL statements by making one or more calls to | sqlite3_expert_sql(). Each call may specify a single SQL | statement, or multiple statements separated by semi-colons." | then "sqlite3_expert_analyze() is called to run the | analysis." | | [1] https://www.sqlite.org/src/dir?ci=trunk&name=ext/expert | rileymat2 wrote: | >It takes a single query and suggests indexes for it. A good | one would take a mix of queries and suggest a set of indexes, | also considering the impact on write speed of additional | indexes (table updates often need to update indexes, too) | | This is my pet peeve with SQL Server SSMS will give you a | missing index suggestion and cost... the problem is | inexperienced people will take the suggestion as is and | create way too many highly specialized indexes over time. | lfittl wrote: | As one of the authors of the tool that was critiqued in | depesz' blog post from last year, I fully agree that looking | at this on a per-query basis is too simple. | | Since we wrote our initial index suggestion tool for | Postgres, we actually went back to the drawing board, | examined the concerns brought up, and developed a new per- | table Index Advisor for Postgres that we recently released | [1]. | | The gist of it: Instead of looking at the "perfect" index for | each query, its important to test out different "good enough" | indexes that cover multiple queries. Additionally, as you | note, the write overhead of indexes needs to be considered | (both from a table writes / second approach, as well as disk | space used at a given moment in time). | | I think this is a fascinating field and there is lots more | work to be done. I've also found the 2020 paper "Experimental | Evaluation of Index Selection Algorithms" [2] pretty useful, | that compares a few different approaches. | | [1] https://pganalyze.com/blog/automatic-indexing-system- | postgre... | | [2] https://www.vldb.org/pvldb/vol13/p2382-kossmann.pdf | krylon wrote: | This sounds really cool! | | I've sometimes wondered why server-based RDBMSs don't offer | something like this. Is it too hard to implement? Or did people | just not think of it? Or do they have something like this and I | just never learned about it? | pronoiac wrote: | You'd probably enjoy this other discussion on the front page: | https://news.ycombinator.com/item?id=31990836 | [deleted] | A321321 wrote: | Microsoft SQL Server definitely has suggestions for missing | indexes. The quality of the suggestions are debatable though | gfody wrote: | sql server has the index tuning wizard (itwiz) that will | sample your data and make suggestions, and more lately the | missing index stuff as a consequence of how the execution | planner is designed (iirc it tries to generate optimal plans | regardless of what indexes are available and when plans are | eliminated because the necessary indexes don't exist it emits | records for the various "missing index" dmvs) | dmitriid wrote: | Microsoft SQL Server query analyzer was was essential in | identifying missing indexes. Wherever you saw "full table | scan" on a table, you knew it was missing an index. | | I don't know if it's still around, but in mid-2000s it was | light years ahead of any other database. | striking wrote: | https://explain.dalibo.com/ and https://explain.depesz.com/ | are indispensable tools for visualizing Postgres' EXPLAIN | output in a similar way. I will agree that the MSSQL query | visualizer is more visually effective, though. | iasay wrote: | Indeed. It's even more fun when it starts generating really | bad execution plans for table statistics it has completely | got wrong, almost always while I'm eating my lunch. | krylon wrote: | Thank you! | abraae wrote: | Oracle (and likely others) have had this since well into the | last century. As you might expect since it can save a huge | amount of money on a large database, it's part of an | (expensive) add on. | realPubkey wrote: | I did something similar for a NoSQL database [1]. The biggest | surprise was how much the query performance can change for an | index when the data distribution changes slightly. For example | using a real distribution for an 'age' field instead of just | using a random number like in the test data. | | [1] https://rxdb.info/query-optimizer.html | contingencies wrote: | _Sin #5 - Unrepresentative workloads: The common assumption in | academic research systems is that the cluster workload is | relatively homogenous. Most research evaluations measure | performance by running a single job on an otherwise idle | cluster._ - Schwarzkopf et al, 'The seven deadly sins of cloud | computing research' (2014) | | .. via https://github.com/globalcitizen/taoup | jessaustin wrote: | It seems better for birthdate to be stored in the database and | age just to be calculated when needed? | realPubkey wrote: | Yes of course you better store the birthday normally. This | was just a projection of the real data set, used in my | testings. | HPsquared wrote: | The 'age' thing might be due to the difference between floats | and ints? | | Integer "age" has many repeats, but random floats are unique. | That, or random ints might be from a large pool, again not many | repeats. ___________________________________________________________________ (page generated 2022-07-05 23:01 UTC)