[HN Gopher] Fast Counting with PostgreSQL and Haskell ___________________________________________________________________ Fast Counting with PostgreSQL and Haskell Author : EntICOnc Score : 23 points Date : 2021-12-26 09:44 UTC (13 hours ago) (HTM) web link (jezenthomas.com) (TXT) w3m dump (jezenthomas.com) | kjeetgill wrote: | > Some clever abuse of the query planner output quickly gives us | an estimate. | | Tldr: They estimate counts by parsing it out of the query plan | but get an exact count if the estimate is under a threshold. | | Clever! But it makes me nervous. I was expecting it to involve | HLL or CountMinSketch or something. | KarlKemp wrote: | This is the ugliest hack I've seen since I wrote a php script | that would output a Perl script that would create an excel file. | | I have nothing but respect for the author, but if this is needed, | maybe Postgres needs to invest some work? | Zababa wrote: | How is that an ugly hack? That looks like a reasonable solution | to a reasonable problem. Counting millions of things with user- | defined filters looks like a hard problem. | Rezwoodly wrote: | Why can't he just use an index? | WJW wrote: | If I understand correctly, it is because they want to support | arbitrary queries and so a single index would only help for | certain queries but not others. | DylanSp wrote: | Given that the users can apply arbitrary filters, what exactly | would you index? | Rezwoodly wrote: | Surely you could use compound indexing, depending on the set | size of fields to be indexed, should be achievable. | amichal wrote: | Indexes and compound indexes tend to become slow themselves | when you end up indexing a significant fraction of the | columns or when your result sets are a significant portion | of the total for count. They cost extra time and space to | update on write. If your database has concurrent writes | going on Postgres still needs to verify which row versions | are visible to the transaction on read. The additional | pressure on memory and disk means you have a bigger working | set. The query planner has a combinatorial number of | indexes to choose from So the planning stage slows down. | You have to spend even more time on analyze to keep the | estimates (which power this technique) correct etc. At 10s | of millions of rows with more than a few low cardinality | columns in the filter it's sometimes faster to just get as | much of the table as you can into ram and accept the scan. ___________________________________________________________________ (page generated 2021-12-26 23:01 UTC)