[HN Gopher] Fast Counting with PostgreSQL and Haskell
       ___________________________________________________________________
        
       Fast Counting with PostgreSQL and Haskell
        
       Author : EntICOnc
       Score  : 23 points
       Date   : 2021-12-26 09:44 UTC (13 hours ago)
        
 (HTM) web link (jezenthomas.com)
 (TXT) w3m dump (jezenthomas.com)
        
       | kjeetgill wrote:
       | > Some clever abuse of the query planner output quickly gives us
       | an estimate.
       | 
       | Tldr: They estimate counts by parsing it out of the query plan
       | but get an exact count if the estimate is under a threshold.
       | 
       | Clever! But it makes me nervous. I was expecting it to involve
       | HLL or CountMinSketch or something.
        
       | KarlKemp wrote:
       | This is the ugliest hack I've seen since I wrote a php script
       | that would output a Perl script that would create an excel file.
       | 
       | I have nothing but respect for the author, but if this is needed,
       | maybe Postgres needs to invest some work?
        
         | Zababa wrote:
         | How is that an ugly hack? That looks like a reasonable solution
         | to a reasonable problem. Counting millions of things with user-
         | defined filters looks like a hard problem.
        
       | Rezwoodly wrote:
       | Why can't he just use an index?
        
         | WJW wrote:
         | If I understand correctly, it is because they want to support
         | arbitrary queries and so a single index would only help for
         | certain queries but not others.
        
         | DylanSp wrote:
         | Given that the users can apply arbitrary filters, what exactly
         | would you index?
        
           | Rezwoodly wrote:
           | Surely you could use compound indexing, depending on the set
           | size of fields to be indexed, should be achievable.
        
             | amichal wrote:
             | Indexes and compound indexes tend to become slow themselves
             | when you end up indexing a significant fraction of the
             | columns or when your result sets are a significant portion
             | of the total for count. They cost extra time and space to
             | update on write. If your database has concurrent writes
             | going on Postgres still needs to verify which row versions
             | are visible to the transaction on read. The additional
             | pressure on memory and disk means you have a bigger working
             | set. The query planner has a combinatorial number of
             | indexes to choose from So the planning stage slows down.
             | You have to spend even more time on analyze to keep the
             | estimates (which power this technique) correct etc. At 10s
             | of millions of rows with more than a few low cardinality
             | columns in the filter it's sometimes faster to just get as
             | much of the table as you can into ram and accept the scan.
        
       ___________________________________________________________________
       (page generated 2021-12-26 23:01 UTC)