[HN Gopher] PostgreSQL: No More Vacuum, No More Bloat
       ___________________________________________________________________
        
       PostgreSQL: No More Vacuum, No More Bloat
        
       Author : pella
       Score  : 78 points
       Date   : 2023-07-15 21:03 UTC (1 hours ago)
        
 (HTM) web link (www.orioledata.com)
 (TXT) w3m dump (www.orioledata.com)
        
       | ccleve wrote:
       | Is there any documentation on the "extensibility patches"? What
       | did you have to do to core Postgres to get this new approach to
       | work?
        
       | oaiey wrote:
       | I read object relational? Can someone enlighten me? Entity
       | relational fine but what makes it object relational? Has someone
       | flipped on the buzzword in the years I did not pay attention
        
         | zetalyrae wrote:
         | I looked this up the other day because I was similarly
         | surprised, I think it refers to Postgres' ability to do table
         | inheritance: https://www.postgresql.org/docs/current/tutorial-
         | inheritance...
        
         | feike wrote:
         | PostgreSQL has used this term for decades!
         | 
         | The oldest I can find is from 1998 (PostgreSQL 6.3), but it was
         | probably in use even before.
         | 
         | > Postgres offers substantial additional power by incorporating
         | the following four additional basic concepts in such a way that
         | users can easily extend the system:
         | 
         | classes inheritance types functions
         | 
         | Other features provide additional power and flexibility:
         | 
         | constraints triggers rules transaction integrity
         | 
         | These features put Postgres into the category of databases
         | referred to as object-relational
         | 
         | https://www.postgresql.org/docs/6.3/c0101.htm
        
         | brazzy wrote:
         | https://stackoverflow.com/questions/45865961/what-does-postg...
        
         | grzm wrote:
         | It's been object relational (and described as such) going _way_
         | back. I think the most visible (if infrequently used) object-
         | oriented feature that it has is inheritance:
         | https://www.postgresql.org/docs/current/tutorial-inheritance...
        
         | nieve wrote:
         | It's because PostgreSQL has inheritance and has almost
         | certainly used the term object relational since before you
         | heard of it.
        
       | glogla wrote:
       | I with people would stop with the "Uber migrated from Postgres to
       | MySQL" thing. Uber migrated from Postgres used as relational
       | database to something that is basically their own non-relational
       | database using MySQL as distributed key-value store. It is not
       | really situation applicable to most users of Postgres.
       | 
       | Anyway, this design of MVCC which moves older data into undo logs
       | / segments is used by Oracle DB, so it definitely works. The
       | common challenge with it is that reading older versions of data
       | is slower, because you have to look it up in a log, and sometimes
       | the data is removed from the log before your transactions
       | finishes, getting the dreaded "Snapshot Too Old" error.
       | 
       | E: I don't see in the article when rows get evicted from the undo
       | logs. If when they are no longer needed, I'm not sure where the
       | improvement comes from because it should be similar amount of
       | bookkeeping? If it's a circular buffer that can ran out of space
       | like Oracle does it that would mean under high write load long-
       | running transactions starts to fail which is pretty unpleasant.
        
         | paulddraper wrote:
         | SQL Server avoids vacuum as well, it might be this way, I can't
         | recall.
         | 
         | And of course MySQL avoids vacuum by giving a giant middle to
         | concurrency considerations.
        
           | glogla wrote:
           | I'm pretty sure SQL Server and MySQL use locking instead of
           | MultiVersion Concurrency Control so they don't keep more
           | copies of data around. No vacuum needed but there's a
           | possibility of things blocking.
           | 
           | But I might be out of date.
        
             | paulddraper wrote:
             | Yes, MySQL has read locks.
             | 
             | But I don't believe SQL Server does.
        
             | evanelias wrote:
             | InnoDB (MySQL's default storage engine) implements MVCC
             | using undo logging and background purge threads. It scales
             | to highly concurrent OLTP workloads quite well. It doesn't
             | work well with OLAP workloads / long-running transactions
             | though. The oldest active transaction will block purging of
             | anything newer than that transaction's snapshot.
        
       | ruuda wrote:
       | Vacuum does more than removing dead tuples though, there is still
       | a need to update statistics and summarize BRINs.
        
         | javajosh wrote:
         | Yes. The (psql 15) docs are well written:
         | https://www.postgresql.org/docs/15/routine-vacuuming.html
        
         | wild_egg wrote:
         | You're thinking of ANALYZE which is a separate operation that's
         | commonly run during vacuuming but can be invoked independently
        
       | rickette wrote:
       | The article contains a link with the rather curious title "10
       | things that Richard Branson hates about PostgreSQL".... Turns out
       | the guy who wrote that blog is called Rick Branson, not Richard.
        
         | rcme wrote:
         | Rick is a nickname for Richard.
        
       | daenney wrote:
       | I love the whole "2.3x less CPU overhead per transaction" where
       | Postgres scales from 5% to 65% CPU usage and Oriole sits
       | constantly at 90%. That doesn't seem like a huge success to me?
       | The predictability sure is nice, but moving the lower end up by
       | 85% is something I'd be rather worried about
        
         | waterproof wrote:
         | Eyeballing the tps graph, OrioleDB is doing 5x tps while using
         | 2x the CPU. So about 5/2=2.5x the CPU per transaction.
         | 
         | Checks out.
        
           | gary_0 wrote:
           | 5x tps with 2x CPU is 2/5 = 0.4x the CPU (ie. it's more
           | efficient per transaction).
        
         | acjohnson55 wrote:
         | You generally want to keep your CPU fully utilized. It looks
         | like Oriole is doing significantly more transactions and is
         | CPU-bound, due to much lower IO requirements. The good news is
         | that it implies you could get even more performance out of
         | Oriole by vertically scaling to a more powerful CPU, whereas
         | Postgres would not continue to increase in performance this
         | way.
         | 
         | Those idle times on the Postgres server _could_ be used for
         | something else, if you 're thinking in a desktop OS mindset.
         | But for servers, you tend to want machines that are doing one
         | thing and are optimized for that thing.
        
           | raggi wrote:
           | It's hard to generalize on these points. In a situation where
           | the throughput was inverted but the proportional system usage
           | was the same, you would instead say "you can still vertically
           | scale by adding more disks", rather than saying adding bigger
           | cpu. It's not meaningful in isolation.
           | 
           | It may be reasonable to suggest that for a new code base that
           | is cpu bound there's a good chance there is low hanging fruit
           | for cpu optimizations that may further increase the
           | throughput gap. It's also the case however that the prior
           | engines tuning starting life on much older computer
           | architectures, drastically different proportional syscall
           | costs and so on, it very often means that there's low hanging
           | fruit in configuration to improve baseline benchmarks such as
           | these. High io time suggests poor caching which in many
           | scenarios you'd consider a suboptimal deployed configuration.
           | 
           | It's not just the devil that's in the details, it's
           | everything.
        
             | acjohnson55 wrote:
             | That makes sense. I'm mostly just trying to explain the
             | counterintuitive reason that the high CPU usage shouldn't
             | be interpreted as a flaw.
        
             | raggi wrote:
             | To be a little more clear on what the detail of the
             | benchmark in question is: it's a benchmark that explicitly
             | exercises a pathological use case for postgresqls current
             | design, one that nonetheless functions, and demonstrates
             | that the advertised engine does not have that pathology. A
             | key takeaway should probably be, if you're a Postgres user:
             | if your workload looks exactly like this (sparse upserts
             | into a large data set at a high rate) then you might want
             | to evaluate your the runway of your architecture before the
             | geometric costs or latency stalls become relevant - just as
             | for cost analysis of any other system. What is somewhat
             | interesting in this article, and not super clearly
             | presented, is that this workload is actually fairly
             | pathological for most existing engines offering this set of
             | structural and query facilities, and that's interesting, if
             | this is the niche you need. Most people do some amount of
             | this, but not always at a super high rate, and there are
             | ways to get the same effective writable/readable data using
             | a different schema, while avoiding it. Nice thing here is
             | you can do the one-liner version.
        
             | gary_0 wrote:
             | > you can still vertically scale by adding more disks
             | 
             | Parallelizing IO is a lot different from scaling up CPU
             | power, though. I'd imagine DB server IO performance has a
             | lot less lower-hanging fruit than CPU/software performance.
        
           | tanelpoder wrote:
           | > You generally want to keep your CPU fully utilized.
           | 
           | Not in real life concurrent systems where latency matters. In
           | addition to the queuing/random request arrival rate reasons,
           | all kinds of funky latency hiccups start happening both at
           | the DB and OS level when you run your CPU _average_
           | utilization near 100%. Spinlocks, priority inversion, etc.
           | Some bugs show up that don't manifest when running with lower
           | CPU utilization etc.
        
         | adsharma wrote:
         | It's not clear if the CPU cost per tx is any worse. Was
         | OrioleDB doing 5x the transactions at this CPU usage?
        
         | [deleted]
        
         | jklehm wrote:
         | My read is that it's at 90% because they are saturating the CPU
         | to that point with the TPS threshold they use for comparison,
         | the TPS of Oriole is constant and way higher than pg in these
         | charts at least.
         | 
         | I'd think the CPU will drop proportionally to the TPS, they
         | just want to show how high it can go here.
        
         | pella wrote:
         | With the same equipment, your performance is now five times
         | better. (5X higher TPS) We need to test again with more
         | hardware, but if you can maintain 3 times the performance at
         | the lower end, it could be a good alternative for some users.
         | 
         |  _" As the cumulative result of the improvements discussed
         | above, OrioleDB provides:
         | 
         | - 5X higher TPS,
         | 
         | - 2.3X less CPU load per transaction,
         | 
         | - 22X less IOPS per transaction,
         | 
         | - No table and index bloat."_
        
       | pella wrote:
       | simple OrioleDB docker build tutorial :
       | 
       | https://github.com/orioledb/orioledb/blob/main/doc/docker_us...
        
       | mynonameaccount wrote:
       | [dead]
        
       ___________________________________________________________________
       (page generated 2023-07-15 23:00 UTC)