[HN Gopher] An unexpected find that freed 20GB of unused index s...
       ___________________________________________________________________
        
       An unexpected find that freed 20GB of unused index space (2021)
        
       Author : skadamat
       Score  : 212 points
       Date   : 2023-08-28 14:42 UTC (8 hours ago)
        
 (HTM) web link (hakibenita.com)
 (TXT) w3m dump (hakibenita.com)
        
       | voiper1 wrote:
       | (2021) I thought the picture looked familiar.
        
       | pletnes wrote:
       | Does this principle apply if you have one or a few values that
       | are very common? E.g an integer column with 90% of them being 0?
        
         | convivialdingo wrote:
         | An index is most efficient when it represents a unique set of
         | values. It's still very useful to have an index for grouping,
         | but if the groups represent a severe minority then you will end
         | up wasting a lot of space and cycles searching through the
         | index.
        
         | lozenge wrote:
         | Yes, Postgres keeps track of the most common values. If it
         | knows 0 is common and "where val = 0" will keep 90% of rows it
         | might choose to use a table scan instead of the index.
        
         | teddyh wrote:
         | Couldn't you create separate indexes for the 0 and non-0 cases?
        
           | recursive wrote:
           | If the zeroes are most of the common value, it could be
           | slower to use the index when searching for zeroes. How so?
           | Reading values that aren't included in the index require
           | following a reference back to the table row. If 90% of the
           | rows in your query are zeroes, you'd be better of not using
           | that column in your query planning. A naive filter, possibly
           | even after a table scan, is likely to be faster than using
           | the index.
        
             | teddyh wrote:
             | Would an index of the non-0 rows still help?
        
               | recursive wrote:
               | Yes. If your WHERE clause is searching for only the ones
               | for instance.
        
         | ChristianGeek wrote:
         | Only if you used null to represent the common value, which
         | generally isn't a good idea.
        
       | hinkley wrote:
       | The first large scale project I worked on, my team couldn't
       | figure out why operations had slowed down as the data set grew.
       | 
       | Indexes have log(n) insertion time per record. If you had 1000
       | records in your test database, as you approach 65k your insertion
       | time goes up by 60% (2^10 vs 2^16 records). Success slows
       | everything down, and there are only so many server upgrades
       | available.
       | 
       | Add a couple new indexes for obscure features the business was
       | looking for, and now you're up to double.
        
         | winrid wrote:
         | Double a very small number. If that's seriously an issue use a
         | faster disk. So many people trying to run a DB on EBS with less
         | IOPS than my PC from 2015.
         | 
         | I manage plenty of DBs with hundreds of millions of records and
         | 40+ indexes per table/collection...
        
         | vjerancrnjak wrote:
         | Insertion should still be extremely fast for such a small
         | index, right?
         | 
         | Doing binary search over a b-tree page is <100 cycles. b-tree
         | traversal over 100M records should still be measured in
         | microseconds and binary search over that should be in
         | microseconds too if not in 100s of nanoseconds.
        
           | hinkley wrote:
           | This conversation is about unrepresentative sample data.
        
             | vjerancrnjak wrote:
             | Are you saying that the index is made on columns which only
             | have few unique values so you need to scan everything all
             | the time?
             | 
             | Given that you've included the log(b) claim I thought you
             | were referring to a normally functioning btree.
             | 
             | A situation where 65k rows end up slowing down things
             | severely should have more to do with other database
             | features (foreign keys and similar).
        
           | lazide wrote:
           | 'It depends' - don't forget about locking overhead, needing
           | to flush multiple pages/leafs as thresholds get hit, disk I/O
           | wait times, etc.
           | 
           | At the beginning it's rarely noticeable, but it does exist
           | and under heavy load or with a lot of index writes needing to
           | happen it can cause very noticeable overhead.
        
             | vjerancrnjak wrote:
             | Yes, but these bits don't scale with log(n) index tree.
             | They depend on something other than the index data
             | structure.
        
       | efxhoy wrote:
       | I spent last week freeing up 200GB from our 600GB db with just
       | reindex and pg_repack. The worst offender was a 17GB (of data)
       | table that had 142GB of indexes. Reindex took it down to 21GB.
       | The table indexing is crazy and has multiple indexes over
       | different sets of columns.
       | 
       | A contributing factor for the huge index I think was the
       | distribution of data. It's had inserts, updates and deletes
       | continuously since 2015. Data is more likely to be deleted the
       | older it gets so there's more data from recent years, but about
       | 0.1% of the data is still from 2015. I think maybe this skewed
       | distribution with a very long tail meant vacuum had a harder time
       | dealing with that index bloat.
        
       | dang wrote:
       | Discussed at the time:
       | 
       |  _An unexpected find that freed 20GB of unused index space in
       | PostgreSQL_ - https://news.ycombinator.com/item?id=25988871 - Feb
       | 2021 (78 comments)
        
       | halayli wrote:
       | I highly recommend pganalyze.com to discover unused indexes,
       | optimization opportunities, and high latency queries.
        
       | aftbit wrote:
       | Making indexes smaller is nice even when you have a ton of
       | storage, as then more can fit into the hot set. However as
       | someone who runs TB of databases, "just provision more storage"
       | is always a valid option. Especially if you are outside the
       | cloud. If you have your own hardware, new enterprise NVMe SSDs
       | are about $80/TB, and DDR4 RAM is around $1.20/GB. Four hours of
       | engineering time (very roughly $1000) buys either 800 GB of RAM
       | or 12 TB of storage.
        
         | paulddraper wrote:
         | Neither one will save you fro slower writes due to unnecessary
         | indices.
        
         | dchftcs wrote:
         | Depending on the scale and complexity, if you make no effort to
         | control resource usage, costs grow exponentially, sometimes
         | even without growth in business because the requirements just
         | become more complex. For a certain optimization, you may save
         | 1TB today, but end up saving 2TB some years down the road; just
         | a few of these decisons you could be looking at a difference of
         | a magnitude or more, sometimes even at larger scale. Overall
         | there's always a balance to be struck.
        
         | rockostrich wrote:
         | Just provision more storage is usually the answer on the cloud
         | as well unless someone has a concrete idea as to how to use
         | less storage. Although the math is a bit trickier since it's a
         | monthly price rather than a one time cost (although if your
         | database is growing at a constant rate then that one time
         | provisioning is really just a monthly price on-prem as well).
        
         | crabbone wrote:
         | This is not at all a good way to compare.
         | 
         | Engineering hours: there's a good chance you pay for that once,
         | and that solves the problem.
         | 
         | 10 SSDs: will require rack space, electricity, PCIe slots,
         | timely replacement, management software... most of these
         | expenses will be recurring expenses. If done once, sometimes
         | the existing infrastructure can amortize these expenses (i.e.
         | you might have already had empty space in a rack, you might
         | have already had spare PCIe slots, etc.), but amortization will
         | only work in small numbers.
         | 
         | Another aspect of this trade-off: systems inevitably lose
         | performance per unit of equipment as they grow due to
         | management expenses and increased latency. So, if you keep
         | solving problems by growing the system, overall, the system
         | will become more and more "sluggish" until it becomes
         | unserviceable.
         | 
         | On the other hand, solutions which minimize the number of
         | system resources necessary to accomplish a task increase
         | overall performance per unit. In other words, create a higher-
         | quality system, which is an asset in its own right.
        
         | krembo wrote:
         | Even on RDS 20GB's price is neglectible for most companies and
         | doesn't worth the added efforts and engineers salaries of
         | looking into that. From DBA perspective, that's a cool find.
         | Thanks for sharing.
        
         | pgwhalen wrote:
         | I'm not super in the know about it, but I believe the _total_
         | cost of SAN storage at my employer (on prem) is ~10X that.
        
         | codetrotter wrote:
         | > Four hours of engineering time (very roughly $1000)
         | 
         | Why $1000 for four hours?
         | 
         | Does it cost your company $150,000 per month per engineer?
        
         | vikingerik wrote:
         | Well, there's a multiplier between the nominal capacity and the
         | capacity you actually need to purchase for your whole system.
         | You're not buying just that 1 TB. You probably want at least
         | two live failover servers at least, maybe more. Plus however
         | many layers of backups and disaster recovery; a year of weekly
         | backups makes your 1 TB into 50, even if it's offline storage.
         | 
         | My own company struggles with that - throwing more storage on
         | the live db is easy, so we've kept doing that for years, but
         | pushing around multi-terabyte backups is getting cumbersome and
         | we're going to have to slim down the data in prod even at the
         | cost of engineer effort.
        
           | danielheath wrote:
           | Weekly full backups are quite large vs eg ZFS snapshots,
           | right?
        
           | Dylan16807 wrote:
           | Are you putting indexes into your backups?
        
             | krab wrote:
             | pg_basebackup is, like any physical replication of
             | Postgres.
        
             | nine_k wrote:
             | Backing up indexes makes sense if you care about restoring
             | the DB _quickly_.
        
             | [deleted]
        
         | _a_a_a_ wrote:
         | > new enterprise NVMe SSDs are about $80/TB, and DDR4 RAM is
         | around $1.20/GB
         | 
         | sounds incredibly cheap
        
       | telios wrote:
       | The post makes mention of B-tree de-duplication that is present
       | in PostgreSQL 13, but not 12, the version they're using; at the
       | same time, they're noting that the vast majority of values in
       | some of their foreign key indexes are NULL.
       | 
       | I have to wonder if B-tree de-duplication would have helped with
       | that particular case? The PostgreSQL 13 documentation seems to
       | imply it, as far as I can tell[0] (under 63.4.2):
       | 
       | > B-Tree deduplication is just as effective with "duplicates"
       | that contain a NULL value, even though NULL values are never
       | equal to each other according to the = member of any B-Tree
       | operator class.
       | 
       | I don't think it would be as effective as a partial index as
       | applied in the post, I'm just curious.
       | 
       | [0]: https://www.postgresql.org/docs/13/btree-implementation.html
        
         | luhn wrote:
         | In previous discussion of this article, an HN user did the
         | math: pg12 is 16 bytes per NULL and pg13 is 6.32 bytes per
         | NULL. https://news.ycombinator.com/item?id=25989467 So
         | definitely some pretty significant savings there.
        
       | zzzeek wrote:
       | Wow, they are not kidding, this is truly "we saved 20g with one
       | weird trick". We get a lot of requests for users wanting to use
       | these unusual Postgresql index forms that are largely unheard of
       | in old school Oracle / SQL Server shops, I didn't realize they
       | were indexing NULL values either.
        
         | aidos wrote:
         | The fact that this is news to you makes me feel better about
         | not considering it myself. I suspect we have a few large
         | indexes that could be given the same treatment.
        
       | hn_throwaway_99 wrote:
       | Wow, thanks for this great writeup! I thought this was really
       | useful not just from the point of the "partial index 'find'" that
       | is the focus of the post (but that was a great point and thing to
       | be aware of), but from the general overview of good techniques
       | and things to be aware of in postgres if you're worried about
       | using space inefficiently. Saving this one for reference!
       | 
       | One minor "word to the wise", cause I thought this was a great
       | post but also has the potential to be misused: if you work at a
       | startup or early stage company, it is nearly _always_ the better
       | decision to throw more disk space at a storage problem like this
       | than worry about optimizing for size. Developers are expensive,
       | disk space is cheap.
        
         | nico wrote:
         | > if you work at a startup or early stage company, it is nearly
         | always the better decision to throw more disk space at a
         | storage problem like this than worry about optimizing for size.
         | Developers are expensive, disk space is cheap
         | 
         | This is great advice. In general at the beginning it is better
         | to keep things as simple as possible.
         | 
         | At one fast growing startup I worked at, one of the founders
         | insisted we kept upgrading just one machine (plus redundancy
         | and backups). It was a great strategy! Kept the architecture
         | super simple, easy to manage, easy to debug and recover. For
         | the first 5 years of the company, the whole thing ran on one
         | server, while growing exponentially and serving millions of
         | users globally.
         | 
         | After seeing that, it's clear to me you should only upgrade
         | when needed, and in the simplest, most straightforward way
         | possible.
        
         | jtc331 wrote:
         | There's a lot of emphasis here about just adding storage, but
         | that's missing the fact that unnecessary indexing affects
         | writes and reads performance also (and potentially quite
         | significantly).
         | 
         | Using a partial index where it's obviously matches the use case
         | (like when a majority of values are null) is just _correct
         | modeling_ and should not be considered a premature optimization
         | or waste or developer time.
        
       | hyperman1 wrote:
       | In a similar vein, I got good value from these scripts:
       | 
       | https://github.com/NikolayS/postgres_dba
       | 
       | I managed to free 10% a.k.a about 100 GB of storage by
       | reorganizing table column ordering in a big table.
        
       ___________________________________________________________________
       (page generated 2023-08-28 23:00 UTC)