[HN Gopher] Index Merges vs. Composite Indexes in Postgres and M... ___________________________________________________________________ Index Merges vs. Composite Indexes in Postgres and MySQL Author : Sirupsen Score : 132 points Date : 2022-11-27 19:02 UTC (9 hours ago) (HTM) web link (sirupsen.com) (TXT) w3m dump (sirupsen.com) | magicalhippo wrote: | A composite index can also be used for a partial index scan[1], | so if you're frequently looking for just int100 as well as the | int100,int1000 combination (as per the article's example), then a | composite index int100,int1000 can be used for queries just | filtering on int100. | | The order of the columns in the composite index might matter | then. We got some nice savings by reordering columns in indexes | and changing the join order in queries (our DB wasn't _that_ | smart) or adding a "useless" filters to the where clause, | allowing us to consolidate multiple indexes into one composite | one. | | [1]: might be using the wrong terminology here, I'm not talking | about a partial index[2], but using only the first N dimensions | of a M-dimensional index. | | [2]: https://www.postgresql.org/docs/current/indexes-partial.html | winrid wrote: | There are some DBs where the order of fields doesn't matter in | the compound index if you're just searching by one field (but | of course is less efficient). I think Oracle supports this. | chrsig wrote: | > (but of course is less efficient) | | This means the order _does_ matter | winrid wrote: | Yes...? | | Sorry my comment is not semantically accurate enough for | you. | dspillett wrote: | All can search an index by just one field if it is the first | field covered by the index, few can use an index if your | predicate concerns the second and not the first. | | IIRC Oracle does support this and calls it a skip-scan. If | the first column in the index is not very selective this can | be very efficient, though if the first column _is_ good in | terms of selectivity then a skip-scan will be very | inefficient and the DB might as well just scan the whole | index. Given that a good choice of index usually has a fairly | selective column as its first, and that unless storage space | is very cramped you are likely better off just defining an | extra index without the column that you want to not touch in | some queries, this means that a skip-scan isn 't helpful as | often as one might think which is one of the reasons DB | engine maintainers give for not spending time implementing | and maintaining the feature. | | There are some use cases where having the option to skip-scan | is definitely a significant bonus, but they are rare enough | that I understand why most engine maintainers have not | implemented them. | winrid wrote: | Yes, I'm talking about skip-scan. I know pretty much all | DBs can use compound index predicates for querying :) | | I can think of several cases where it would have been nice | to have, sometimes "good enough" for a particular read | pattern is better than additional write overhead etc. | Sirupsen wrote: | MySQL can do skip-scans | | https://dev.mysql.com/doc/refman/8.0/en/range- | optimization.h... | winrid wrote: | Nice, I did not know MySQL supported this. Someday I'll | work with MySQL again, good to know. | magicalhippo wrote: | Good point, I reworded it to be more general. Can't recall | seeing our DB being that clever. | thom wrote: | Postgres is capable of doing this with some index types, but | generally more slowly than a B-tree index being asked about | its leftmost columns: | | https://www.postgresql.org/docs/current/indexes- | multicolumn.... | | Also worth noting that for very complex workloads that need | to support arbitrary subsets of equality matches over | columns, a Bloom filter might work best: | | https://www.postgresql.org/docs/current/bloom.html | mattashii wrote: | This dependz on the index type, but PostgreSQL also supports | queries based on arbitrary attributes of multi-attribute | indexes: both GIN and BRIN can be used for queries on any of | the indexed expressions. | | Maybe eventually PG will also get index skip scans for | btrees, which could allow for arbitrary columns searches in | the btree; but we're not there yet by a large margin. | [deleted] | bawolff wrote: | Honestly im kind of surprised that they are even that close. I | wonder if this changes at scale when the intersection is larger. | Sirupsen wrote: | Ideally, I would add three graphs to the post: | | (1) Table size on the x-axis, and time on the y-axis for index | merge vs composite index | | (2) Number of columns on the x-axis, and time on the y-axis for | both | | (3) Number of final matches on the x-axis, and time on the | y-axis for both | | But ran out of time and decided to test with the table size of | 10M rows, and a 100-ish result set. That's in my experience a | decent representation for what you might be doing with a | relational database. | arynda wrote: | Comparison on Clickhouse, also runs in about 30-40ms, however | there's no indexing being used and this is a full-table scan. | create table if not exists test_table ( id | UInt64, text1 String, text2 String, | int1000 UInt64, int100 UInt64, int10 | UInt64, int10_2 UInt64 ) engine = | MergeTree() order by (id) ; | insert into test_table with repeat('b', 1024) | as one_kib, repeat('b', 255) as bytes_255 | select number as id, one_kib, | bytes_255, rand() % 1000 as int1000, | rand() % 100 as int100, rand() % 10 as int10, | rand() % 10 as int10_2 from numbers(10e6) ; | > select count(*) from test_table where int1000 = 1 and int100 = | 1; +-count()-+ | 9949 | +---------+ | 1 row in set. Elapsed: 0.034 sec. Processed 10.00 million rows, | 160.00 MB (290.93 million rows/s., 4.65 GB/s.) | | The same table but with 1B rows instead, runs in ~1800ms | > select count(*) from test_table where int1000 = 1 and int100 = | 1; +-count()-+ | 999831 | +---------+ | 1 row in set. Elapsed: 1.804 sec. Processed 1.00 billion rows, | 16.00 GB (554.24 million rows/s., 8.87 GB/s.) | | [1] Converted the table create and insert logic from here: | https://github.com/sirupsen/napkin-math/blob/master/newslett... | hodgesrm wrote: | > however there's no indexing being used and this is a full- | table scan. | | That first steatement about "no indexing being used" is not | quite correct if the query is run exactly as you show in your | nice example. | | ClickHouse performs what is known as PREWHERE processing which | will effectively use the int1000 and int100 columns as indexes. | It scans those columns and knocks out any blocks (technically | granules containing by default 8192 rows) that do not values | that match the filter conditions. It then performs a scan on | the remaining blocks to get the actual counts. | | PREWHERE is effective because columns are compressed and scans | are fast. If there's any pattern to the filter columns (for | example monotonically increasing counters) or their values have | high cardinality PREWHERE processing will remove a large number | of blocks. This will make the rest of the scan far faster. | | In your dataset it may not be especially efficient because you | use random values, which don't necessarily compress well, and | the values will appear in many blocks. It works much better in | real datasets where data are more correlated. | | EDIT: PREWHERE is _much_ faster in cases where you are doing | more complex aggregation on many columns. Counts of course don | 't need to scan any extra values so it's not helpful in this | case. | | p.s. Scans are ridiculously fast. | arynda wrote: | > ClickHouse performs what is known as PREWHERE processing > | p.s. Scans are ridiculously fast. | | Good point, I should have mentioned this was basically a | worst-case scenario for Clickhouse as the data layout is | totally random (same approach as OP used in their benchmark) | and isn't able to utilize any granule pruning, sorting, or | skip indexing, but is still able to achieve such remarkable | speeds. | hodgesrm wrote: | What's cool is that even in this case ClickHouse is still | stupid fast compared to most other databases. ;) | paulmd wrote: | > p.s. Scans are ridiculously fast. | | this is really the lesson of SOLR. full-scan all the things, | aggregate as you go, broadcast disk IO to multiple listeners. | | why do a bunch of 4K random IO when you could full-scan at | bus speed? yeah you can make the 4K random IO super fast but | that's not where hardware is going, and it's also | scalable/clusterable where RDBMS caps out at one machine and | clustering is kinda ugly. | Sirupsen wrote: | Are you aware of a good write-up on how Clickhouse/other | columnar databases do the intersection? | twoodfin wrote: | One obvious way is to build a bitmap indexed by row position | for each filter. Both the "&" intersect and the final bit | count can be rocket fast on modern CPU vector units. | arynda wrote: | Not in particular sorry, most of the good content I've found | is on Altinity [1] and Alibaba's technical blogs [2][3]. | These tend to be mostly focused on how the data itself is | stored and how to use Clickhouse, but don't really dive into | the specifics of how query processing is performed. | | [1] https://altinity.com/blog/ | | [2] https://www.alibabacloud.com/blog/clickhouse-kernel- | analysis... | | [3] https://www.alibabacloud.com/blog/clickhouse-analysis-of- | the... | hodgesrm wrote: | ClickHouse uses a single primary key index, which matches the | sort order, plus skip indexes, which knock out blocks to | scan. Here's a writeup that explains skip indexes. | | https://altinity.com/blog/clickhouse-black-magic-skipping- | in... | | You can also check out the following webinar, which explains | how ClickHouse indexes work in general. Here's a link to the | discussion of indexes. | | https://youtu.be/1TGGCIr6dMY?t=1933 | | p.s. The blog article is missing some images that WordPress | seems to have lost but you'll still get the idea. (Should be | fixed shortly.) | | Disclaimer: I work for Altinity | Lukas1994 wrote: | Good stuff! What's the size difference between the composite | index vs the two separate indices? | Sirupsen wrote: | I've added this to the article, thanks! | | Composite index (int64, int64): ~70 MiB in Postgres, ~350 MiB | in MySQL | | Single index (int64): ~70 MiB in Postgres, ~240 MiB in MySQL | | If you assume the majority of an index are index entries of | (int64, int64, int64) where the third number is some sort of | identifier for the record on the heap, you'd expect this to be | ~230 MiB. So Postgres does some compression very well here, and | MySQL has a bit more overhead for its indexes it seems. | libraryofbabel wrote: | As you mention in the article, this is small by modern | hardware standards and means the indexes can be entirely held | in memory. Curious how things look for much larger tables | where the index has to be read from disk? | fabian2k wrote: | Could be the deduplication newer Postgres versions have for | B-Tree indexes: | | https://www.postgresql.org/docs/current/btree- | implementation... | | > 67.4.3. Deduplication | | > A duplicate is a leaf page tuple (a tuple that points to a | table row) where all indexed key columns have values that | match corresponding column values from at least one other | leaf page tuple in the same index. Duplicate tuples are quite | common in practice. B-Tree indexes can use a special, space- | efficient representation for duplicates when an optional | technique is enabled: deduplication. | | > Deduplication works by periodically merging groups of | duplicate tuples together, forming a single posting list | tuple for each group. The column key value(s) only appear | once in this representation. This is followed by a sorted | array of TIDs that point to rows in the table. This | significantly reduces the storage size of indexes where each | value (or each distinct combination of column values) appears | several times on average. The latency of queries can be | reduced significantly. Overall query throughput may increase | significantly. The overhead of routine index vacuuming may | also be reduced significantly. | Sirupsen wrote: | Excellent, thank you! I'll add that to the article. | fabian2k wrote: | This is a guess on my part, though I think a plausible | one. To verify you'd probably have to compare an index | with unique values to one with many identical ones. | masklinn wrote: | An other thing which might be of interest: what if you use | convering indexes for the two single indexes? Is postgres | able to do an index-only scan then, thanks to the coverage? | | Or does it decide to use only one index and filter based on | the covered values? | pdhborges wrote: | For these 64-bit index entries we'd expect to have to scan | roughly: index_row_size[?]rows=2[?]64bit[?]10^5=1.5MiB | | Where do the 10^5 rows come from? With a composite index and a | point query doesn't the database scan just the 100 returned rows? | Sirupsen wrote: | You're absolutely right! | | I forgot to move this around when I updated the article's | structure. This is only relevant when doing the index merge. | The article has been updated ___________________________________________________________________ (page generated 2022-11-28 05:00 UTC)