[HN Gopher] An unexpected find that freed 20GB of unused index s...
       ___________________________________________________________________
        
       An unexpected find that freed 20GB of unused index space in
       PostgreSQL
        
       Author : haki
       Score  : 285 points
       Date   : 2021-02-01 14:17 UTC (8 hours ago)
        
 (HTM) web link (hakibenita.com)
 (TXT) w3m dump (hakibenita.com)
        
       | tantalor wrote:
       | Graphing "free storage" is meaningless and confusing; it should
       | be "used storage".
       | 
       | Available storage depends on usage and capacity.
       | 
       | Edit: I meant for this article; of course I believe it is useful
       | to track this in practice.
        
         | AnotherGoodName wrote:
         | Used makes sense for getting a feeling for pure performance
         | (smaller the better as its likely to be in memory).
         | 
         | Available makes sense for knowing when things will just plain
         | break (reaching 0 = write failure for a DB).
         | 
         | >Every few months we get an alert from our database monitoring
         | to warn us that we are about to run out of space.
         | 
         | In this case they were avoiding their DB server breaking. They
         | didn't do this for performance reasons.
        
         | zufallsheld wrote:
         | If you pay for a fixed amount of storage and only use it
         | partially, you should also monitor free storage sdso you know
         | when you waste it.
        
         | fanf2 wrote:
         | Free storage is what matters because it makes it very obvious
         | when you are getting close to a disk-full outage.
        
       | brianberns wrote:
       | > REINDEX INDEX CONCURRENTLY index_name;
       | 
       | > If for some reason you had to stop the rebuild in the middle,
       | the new index will not be dropped. Instead, it will be left in an
       | invalid state and consume space.
       | 
       | Well, that sure sounds like a bug in PostreSQL to me.
        
         | striking wrote:
         | Well, you can't just _delete_ it. It is an object that was
         | created by some user and there 's no good reason for the
         | database to get rid of it automatically. The database keeps a
         | record of the invalid thing, even though it is invalid.
        
           | brianberns wrote:
           | The good reason to get rid of it automatically: It takes up
           | space.
           | 
           | Is there any good reason to keep it? (The fact that it was
           | "created by some user" doesn't seem like much of a reason.)
           | 
           | IMHO, creating an index should be atomic: Either you end up
           | with a valid index, or you end up with nothing.
        
         | voganmother42 wrote:
         | pretty well documented behavior as far as concurrent:
         | https://www.postgresql.org/docs/current/sql-createindex.html...
        
       | ivoras wrote:
       | Is the partial index technique to avoid indexed NULL data as
       | effective for PostgreSQL 13+?
       | 
       | It looks like in v13+ PostgreSQL could create a single leaf for
       | NULL data and just store row pointers within it, which should
       | reduce data sizes at least a bit.
        
         | chrismeller wrote:
         | He actually mentioned index de-duplication earlier:
         | https://hakibenita.com/postgresql-unused-index-size#activati...
         | 
         | If I had to guess, I would say that it doesn't accomplish
         | anything (or as much as you'd think) for null values simply
         | because there is no real data to store in either approach, you
         | just have a bunch of pointers either way.
        
           | petergeoghegan wrote:
           | NULL values are not special as far as deduplication is
           | concerned. They use approximately as much disk space as a
           | non-NULL integer column without deduplication, and compress
           | just as well with deduplication. Deduplication is effective
           | because it eliminates per-tuple overhead, so you see most of
           | the benefits even with index tuples that naturally happen to
           | have physically small keys. You'll still get up to a 3x
           | decrease in storage overhead for the index provided there is
           | low cardinality data (and not necessarily that low
           | cardinality, ~10 or so tuples per distinct value will get you
           | there).
           | 
           | The NULL issue is documented directly -- see the "Note" box
           | here:
           | 
           | https://www.postgresql.org/docs/devel/btree-
           | implementation.h...
        
         | mattashii wrote:
         | Not per se _as effective_, but it will still help a lot. NULL
         | tuples pre-pg13 take ~ 14 bytes each, and 18 bytes when
         | aligned. (= 2 (ItemID, location on page) + 6 (TID) + 2 (t_info)
         | + 4 (NULL bitmap) + 4 bytes alignment). When deduplication is
         | enabled for your index, then your expected tuple size becomes
         | just a bit more than 6 bytes (~ 50 TIDs* in one tuple => 2
         | (ItemId) + 6 (alt tid) + 2 (t_info) + 4 (null bitmap) + 50 * 6
         | (heap TIDs) / 50 => ~ 6.28 bytes/tuple).
         | 
         | So, deduplication saves some 65% in index size for NULL-only
         | index-tuples, and the further 35% can be saved by using a
         | partial index (so, in this case, deduplication could have saved
         | 13GB).
         | 
         | *note: last time I checked, REINDEX with deduplication enabled
         | packs 50 duplicates in one compressed index tuple. This varies
         | for naturally grown indexes, and changes with column types and
         | update access patterns.
        
           | [deleted]
        
       | mulander wrote:
       | Partial indexes are amazing but you have to keep in mind some
       | pecularities.
       | 
       | If your query doesn't contain a proper match with the WHERE
       | clause of the index - the index will not be used. It is easy to
       | forget about it or to get it wrong in subtle ways. Here is an
       | example from work.
       | 
       | There was an event tracing structure which contained the event
       | severity_id. Id values 0-6 inclusive are user facing events.
       | Severity 7 and up is debug events. In practice all debug events
       | were 7 and there were no other values above 7. This table had a
       | partial index with WHERE severity_id < 7. I tracked down a
       | performance regression, when an ORM (due to programmer error)
       | generated WHERE severity_id != 7. The database is obviously not
       | able to tell that there will never be any values above 7 so the
       | index was not used slowing down event handling. Turning the query
       | to match < 7 fixes the problem. The database might also not be
       | able to infer that the index can be indeed used, for example when
       | prepared statements are involved WHERE severity_id < ?. The
       | database will not be able to tell that all bindings of ? will
       | satisfy < 7 so will not use the index (unless you are running PG
       | 12, then that might depend on the setting of plan_cache_mode[1]
       | but I have not tested that yet).
       | 
       | Another thing is that HOT updates in PostgreSQL can't be
       | performed if the updated field is indexed but that also includes
       | being part of a WHERE clause in a partial index. So you could
       | have a site like HN and think that it would be nice to index
       | stories WHERE vote > 100 to quickly find more popular stories.
       | That index however would nullify the possiblity of a hot update
       | when the vote tally would be updated. Again, not a problem but
       | you need to know the possible drawbacks.
       | 
       | That said, they are great when used for the right purpose. Kudos
       | to the author for a nice article!
       | 
       | [1] - https://postgresqlco.nf/doc/en/param/plan_cache_mode/
        
         | GordonS wrote:
         | > The database is obviously not able to tell that there will
         | never be any values above 7
         | 
         | You say "obviously", but with updated statistics this is the
         | exactly the kind of thing you might expect the planner to know
         | and aid index decisions.
         | 
         | I'm a huge fan of Postgres, coming to it around 5 years ago
         | from at least 10 previous years with SQL Server, but I have hit
         | a few things like this in that time. IME the planner is much
         | more fickle about how you specify your predicates than SQL
         | Server is.
        
           | londons_explore wrote:
           | All statistics in postgres are considered best effort
           | guidance. Even if the statistics are wrong it can never
           | impact the correctness of the results.
        
           | mulander wrote:
           | No, I don't think statistics can let you get away with this.
           | Databases are concurrent, you can't guarantee that a
           | different session will not insert a record that invalidates
           | your current statistics.
           | 
           | You could argue that it should be able to use it if the table
           | has a check constraint preventing severity_id above 7 being
           | ever inserted. That is something that could be done, I don't
           | know if PostgreSQL does it (I doubt it) or how feasable it
           | would be.
           | 
           | Is SQL Server able to make an assumption like that purely
           | based on statistics? Genuine question.
        
             | GordonS wrote:
             | > No, I don't think statistics can let you get away with
             | this. Databases are concurrent, you can't guarantee that a
             | different session will not insert a record that invalidates
             | your current statistics.
             | 
             | Of course you can't make a guarantee like that, but why
             | would you need to? Statistics are there to guide planner
             | choices, not make cast iron predictions.
        
               | cmeacham98 wrote:
               | Because the database has to decided whether or not to use
               | the index. If it decides to use the index, and there
               | _are_ values above 7, then it will misbehave (the query
               | will miss those results). Now of course the database
               | could then scan the rows for values above 7 it missed but
               | at that point there 's no point in using the index and
               | you might as well have row scanned for the original
               | query.
               | 
               | As a result, the database has to be 100% sure that there
               | are no values _at all_ above 7 to safely and efficiently
               | use the index, ex. when there's a constraint.
        
               | mulander wrote:
               | Let us say that in our example table we have 100 000
               | records with severit_id < 7, 200 000 with severity_id = 7
               | and 3 records with severity_id = 8.
               | 
               | Statistics claim 100k id < 7, 200k id = 7 and 0 with id >
               | 7. The last 3 updates could have happened right before
               | our query, the statistics didn't update yet.
               | 
               | Let us assume that we blindly trust the statistics and
               | they currently state that there are absolutely no values
               | with severity_id > 7 and you have a query WHERE
               | severity_id != 7 and a partial index on severity_id < 7.
               | 
               | If you trust the statistics and actually use the index
               | the rows containing severity_id = 8 will never be
               | returned by the query even if they exist. So by using the
               | index you only scan 100 k rows and never touch the
               | remaining ~200k. However this query can't be answered
               | without scanning all ~300k records. This means, that on
               | the same database you would get two different results for
               | the exact same query if you decided to drop the index
               | after the first run. The database can't fall back and
               | change the plan during execution.
               | 
               | Perhaps I misunderstood you originally. I thought you
               | suggested that the database should be able to know that
               | it can still use the index because currently the
               | statistics claim that there are no records that would
               | make the result incorrect. You are of course correct,
               | that the statistics are there to guide the planner
               | choices and that is how they are used within PostgreSQL -
               | however some plans will give different results if your
               | assumption about data are wrong.
        
             | mattb314 wrote:
             | I doubt it? At least the number times the "last updated"
             | column appears on SQL server stats [1] leads me to believe
             | it collects stats async with updates to the table.
             | 
             | The only system I've heard of that relies on up-to-date
             | statistics for correctness is snowflake (long but
             | interesting talk here [2]), where having accurate max/mins
             | for each micro partition is really helpful for cutting down
             | the amount of data in the large range scan queries common
             | in BI. I'd guess that being a BI system, snowflake can get
             | away with higher row update latency too.
             | 
             | [1] https://www.sqlshack.com/sql-server-statistics-and-how-
             | to-pe...
             | 
             | [2] https://www.youtube.com/watch?v=CPWn1SZUZqE
        
       | lucian1900 wrote:
       | Partial indexes can be useful in any case where one value has
       | much higher cardinality than others.
       | 
       | Indexing boolean columns is often only useful if one of the
       | values is uncommon and the index is partial to only include those
       | uncommon rows.
        
         | mnw21cam wrote:
         | Agreed. To explain why this is the case, consider that table in
         | the story that had 99% NULL values. If you were to try to run
         | "SELECT FROM table WHERE column IS NULL", then Postgresql
         | wouldn't use the index anyway, because it would be faster to
         | just read sequentially through the entire table and filter out
         | the 1% that don't match.
        
           | maweki wrote:
           | That would highly depend on what you select. If the query
           | could be answered by index only, like COUNT(*), it would
           | probably use the index. You are right if you want to query
           | any data from that row that's not in the index.
        
             | mnw21cam wrote:
             | I might be out of touch a little with Postgres (I last used
             | it in 2010), but my impression was that COUNT(*) still
             | needed to scan the actual table in order to exclude rows
             | that had been deleted in a transaction, due to the way
             | multi-version concurrency worked. Is this something that
             | has been improved since then?
        
               | zeroimpl wrote:
               | They support index-only scans now, so there is some sort
               | of optimization which bypasses the table lookup, at least
               | in certain cases.
        
       | mjw1007 wrote:
       | Summary: if you have an index on a column which is mostly NULL,
       | consider using a partial index covering only the records where
       | it's non-NULL.
        
         | latch wrote:
         | Another benefit of partial indexes is to limit a unique
         | constraint:
         | 
         | create index users_email on users(email) where status !=
         | 'delete'
        
           | SomeHacker44 wrote:
           | Be very careful, then, as the optimizer will (usually?) not
           | use the index if the condition is not part if the query.
        
             | jackTheMan wrote:
             | you can create views for that, then it will be always part
             | of the query.
        
       | malinens wrote:
       | Too bad MySQL does not have partial indexes.
       | 
       | We have one huge table I want to add some indexes for specific
       | cases (for max 1% of records) but server will not have enough
       | memory for it if I add those indexes for all records :/
        
         | fraktl wrote:
         | MySQL has pluggable storage engines. TokuDB does what you're
         | after (adds indexes on the fly, as well as alter tables on the
         | fly without overloading the server).
        
           | wolf550e wrote:
           | Altering table online without using pt-online-schema-change
           | doesn't help if they want an index that covers only some of
           | the keys but not others.
        
           | hu3 wrote:
           | This page about TokuDB reads:
           | 
           | > TokuDB has been deprecated by its upstream maintainer. It
           | is disabled from MariaDB 10.5 and has been been removed in
           | MariaDB 10.6 - MDEV-19780. We recommend MyRocks as a long-
           | term migration path.
           | 
           | https://mariadb.com/kb/en/tokudb/
           | 
           | Is MyRocks comparable?
        
         | crazygringo wrote:
         | As long as you've got primary keys on the huge table, there's a
         | hacky solution -- create a second table with columns for just
         | the first table's primary key and the columns you're indexing
         | and your desired index, and ensure you always
         | write/update/delete both tables simultaneously using
         | transactions. Then when needed, use the index on the second
         | table and join it to your first with the primary key.
         | 
         | Annoying, but it should work for most queries I'd expect
         | without too much SQL.
         | 
         | I've definitely "rolled my own indexing" like this in the past,
         | though it's more often been duplicating strings into a custom
         | "collation" or other transformations.
         | 
         | Another solution is simply to split your table in two, with the
         | same columns in both, and the index only on one of the tables.
         | But of course that really depends on your business logic --
         | queries that need to retrieve data from both tables together
         | can get pretty hairy/slow, and if you've got auto-incrementing
         | PKEY's then avoiding collisions between the two tables can be
         | tricky on its own. So this is definitely the less general
         | solution.
         | 
         | Of coure it certainly would be nicer if MySQL supported partial
         | indexes. It seems so useful, I'm surprised it didn't happen
         | long ago.
        
           | lucian1900 wrote:
           | The first approach is one of the steps towards normalising a
           | database.
        
             | crazygringo wrote:
             | Actually it's the opposite of database normalization.
             | 
             | Normalizing removes data redundancy. This adds data
             | redundancy.
             | 
             | When I design a database structure, it's common to start
             | with the most normalized representation possible. And then
             | to denormalize the minimum necessary for performance
             | reasons -- duplicating rows and/or columns just like here
             | so certain data can be retrieved more quickly, whenever
             | indexes aren't powerful or featured enough.
        
               | fauigerzigerk wrote:
               | I think what lucian1900 may be thinking is that instead
               | of                   create table purchase_order (
               | id int primary key,             ordered_on timestamptz
               | not null,             customer_id int not null references
               | customer,             canceled_on timestamptz         );
               | 
               | you could have                   create table
               | purchase_order (             id int primary key,
               | ordered_on timestamptz not null,             customer_id
               | int not null references customer         );
               | create table order_cancelation (             order_id int
               | primary key references purchase_order,
               | canceled_on timestamptz not null         );
               | 
               | This is indeed a better normalised schema and it allows
               | you to index order_cancelation.canceled_on without
               | worrying about nulls.
        
               | lucian1900 wrote:
               | Exactly, that's what I thought was being described.
        
               | crazygringo wrote:
               | Oh then, absolutely. I was assuming a constraint that
               | columns couldn't be removed from the original table. But
               | if you can, then your example is an even better solution.
        
         | abfan1127 wrote:
         | Could you create a temporary high memory slave MySQL server,
         | sync the master, add the index, sync back to master, and
         | decommission the temporary high memory? I don't know enough
         | about master/slave operations to know if it would work.
        
           | wolf550e wrote:
           | 'malinens doesn't have the storage space to store an index
           | over a column if all values in the column are indexed. They
           | want to index only some values, but not others. This feature
           | does not exist in MySQL.
        
       | matsemann wrote:
       | > _Clear bloat in tables_
       | 
       | Ohh, we've had issues with this. We have this table that's mostly
       | ephemeral data, so rows are constantly inserted and then deleted
       | after a certain amount of time. Due to a bug the deletion didn't
       | work for a while and the db grew very large. Fixed the deletion,
       | but no amount of vacuuming actually allows us to fully reclaim
       | that space so we don't have to pay for it.
       | 
       | At the same time the extra cost is probably negligible compared
       | to spending more energy fixing it..
        
         | hinkley wrote:
         | The problem we always ran into with deletes is them triggering
         | full table scans because our indexes weren't set up correctly
         | to test foreign key constraints properly. Constant game of
         | whack-a-mole that everyone quickly grew tired of. Also more
         | indexes increases the slope of the line for insert operations
         | as data size grows.
         | 
         | Another solution is tombstoning data so you never actually do a
         | DELETE, and partial indexes go a long way to making that scale.
         | It removes the logn cost of all of the dead data on every
         | subsequent insert.
        
           | mulander wrote:
           | > The problem we always ran into with deletes is them
           | triggering full table scans because our indexes weren't set
           | up correctly to test foreign key constraints properly.
           | 
           | This is a classic case where partitioning shines. Lets say
           | those are logs. You partition it monthly and want to retain 3
           | months of data.
           | 
           | - M1 - M2 - M3
           | 
           | When M4 arrives you drop partition M1. This is a very fast
           | operation and the space is returned to the OS. You also don't
           | need to vacuum after dropping it. When you arrive at M5 you
           | repeat the process by dropping M2.
           | 
           | > Another solution is tombstoning data so you never actually
           | do a DELETE, and partial indexes go a long way to making that
           | scale. It removes the logn cost of all of the dead data on
           | every subsequent insert.
           | 
           | If you are referring to PostgreSQL then this would actually
           | be worse than outright doing a DELETE. PostgreSQL is copy on
           | write so an UPDATE to a is_deleted column will create a new
           | copy of the record and a new entry in all its indexes. The
           | old one would still need to be vacuumed. You will accumulate
           | bloat faster and vacuums will have more work to do.
           | Additionally, since is_deleted would be part of partial
           | indexes like you said, a deleted record would also incur a
           | copy in all indexes present on the table.
           | 
           | Compare that to just doing the DELETE which would just store
           | the transaction ID of the query that deleted the row in cmax
           | and a subsequent vacuum would be able to mark it as reusable
           | by further inserts.
        
       | alexfromapex wrote:
       | It seems like this is an optimization that Postgres should handle
       | internally, doesn't it?
        
       | boomer918 wrote:
       | Partial indexes can flip query plans if the covered part becomes
       | so small that it won't be represented when sampled by the stats
       | collector. The planner could then decide that the index scan
       | isn't worth it and could try an alternative less efficient index
       | if one exists.
        
         | tbrock wrote:
         | Yeah and sadly using the index in those scenarios could be even
         | more worth it due to the high selectivity it has.
         | 
         | Is PG smart enough to avoid that if the query patterns are
         | frequently or exclusively covered by the index?
        
       | gangstead wrote:
       | The included query for finding which indexes in your database
       | could benefit from a partial index is amazing. Thanks for putting
       | the extra effort into this post.
        
       | nieve wrote:
       | The article includes a couple of useful queries unrelated to the
       | "find" and led me to these useful bloat-detection resources
       | https://wiki.postgresql.org/wiki/Show_database_bloat
       | https://github.com/ioguix/pgsql-bloat-estimation
        
       | pierrebai wrote:
       | The chart seems to show an uptick of 2GB, not 20GB. Am I missing
       | something?
        
       | pottertheotter wrote:
       | This has nothing to do with the content, but the design of this
       | page really stuck out to me. It's very easy to read and doesn't
       | have fluff. But it still feels modern (in the good way). It's
       | perfectly balanced.
        
         | paxys wrote:
         | Agreed! It's a perfect example of how you can make a website
         | with "modern" features (responsive design, accessible, mobile
         | friendly, dark mode, static pages, embeds) without it being a
         | bloated mess.
        
       | de6u99er wrote:
       | When I did my Oracle DBA training 15 years ago, I learnt about
       | database reorgs.
       | 
       | It means basically exporting your database (or tables) and
       | importing it again. What happens is that deleted data which
       | doesn't necessarily free up space (Oracle reuses the freed up
       | space sometimes) doesn't get exported.
       | 
       | https://www.iri.com/blog/vldb-operations/database-reorgs-why...
       | 
       | https://asktom.oracle.com/pls/apex/f?p=100:11:0::::P11_QUEST...
        
         | brianwawok wrote:
         | A vacuum full basically does this for a table, copying the data
         | from location A to location B, cleaning up junk. I think index
         | rebuilding may take a separate command?
        
           | Tostino wrote:
           | Vacuum full does a index rebuild automatically. Since a
           | vacuum full builds an entire new heap table, the old indexs
           | are all pointing to the incorrect locations for all tuples,
           | so it has no choice but to rebuild.
        
             | brianwawok wrote:
             | Excellent there you go.
             | 
             | Is there a way to just do the index build part, short of
             | dropping an index and adding it back?
        
               | jvolkman wrote:
               | I believe pg_repack can do that.
        
               | Jedd wrote:
               | This is described about a quarter the way into TFA.
               | 
               | > REINDEX INDEX CONCURRENTLY index_name;
               | 
               | (A vanilla REINDEX will lock the table, preventing
               | writes, while it runs. The CONCURRENT creates a new
               | index, replicates any updates to the original while it
               | does so, and then does an atomic switcheroo at the end.)
        
         | jeffbee wrote:
         | Dumping and reloading databases used to be mandatory for major
         | postgresql updates, which is one of the reasons postgresql
         | wasn't suitable for production workloads until recently and
         | also why it resisted fixing bugs in vacuum, index, and
         | compaction for many years.
        
           | AbacusAvenger wrote:
           | Whoah, that's news to me.
           | 
           | I used PostgreSQL fairly recently (a year or so ago?) and
           | ended up abandoning it after I was forced to do the
           | export/import dance through a few version upgrades.
           | 
           | When did that requirement go away?
        
             | dialamac wrote:
             | It never did.
             | 
             | The difference is that you can use logical replication
             | since 10 to prevent downtime during upgrade.
             | 
             | Which if you were using it a year ago could have been done.
        
             | jskrablin wrote:
             | Ever since there's pg_upgrade available. Since 8.4 or so -
             | https://www.percona.com/blog/2019/04/12/fast-upgrade-of-
             | lega...
             | 
             | Dump and reload is ok if you have a small database or can
             | afford hours of downtime... if not, use pg_upgrade.
        
             | Uberphallus wrote:
             | Since 9 there's pg_upgrade, personally I never had an issue
             | and it was very fast, so the downtime is in the order of a
             | few minutes, which is ok for my usecase. YMMV.
        
               | avereveard wrote:
               | "pg_upgrade does its best to make sure the old and new
               | clusters are binary-compatible, e.g., by checking for
               | compatible compile-time settings, including 32/64-bit
               | binaries. It is important that any external modules are
               | also binary compatible, though this cannot be checked by
               | pg_upgrade."
               | 
               | This makes me very nervous tho, I've at least two exts
               | (trigrams and gists) maybe they work, maybe not, I just
               | prefer the ease of mind of a old fashioned dump.
        
       ___________________________________________________________________
       (page generated 2021-02-01 23:01 UTC)