[HN Gopher] PostgreSQL: No More Vacuum, No More Bloat ___________________________________________________________________ PostgreSQL: No More Vacuum, No More Bloat Author : pella Score : 78 points Date : 2023-07-15 21:03 UTC (1 hours ago) (HTM) web link (www.orioledata.com) (TXT) w3m dump (www.orioledata.com) | ccleve wrote: | Is there any documentation on the "extensibility patches"? What | did you have to do to core Postgres to get this new approach to | work? | oaiey wrote: | I read object relational? Can someone enlighten me? Entity | relational fine but what makes it object relational? Has someone | flipped on the buzzword in the years I did not pay attention | zetalyrae wrote: | I looked this up the other day because I was similarly | surprised, I think it refers to Postgres' ability to do table | inheritance: https://www.postgresql.org/docs/current/tutorial- | inheritance... | feike wrote: | PostgreSQL has used this term for decades! | | The oldest I can find is from 1998 (PostgreSQL 6.3), but it was | probably in use even before. | | > Postgres offers substantial additional power by incorporating | the following four additional basic concepts in such a way that | users can easily extend the system: | | classes inheritance types functions | | Other features provide additional power and flexibility: | | constraints triggers rules transaction integrity | | These features put Postgres into the category of databases | referred to as object-relational | | https://www.postgresql.org/docs/6.3/c0101.htm | brazzy wrote: | https://stackoverflow.com/questions/45865961/what-does-postg... | grzm wrote: | It's been object relational (and described as such) going _way_ | back. I think the most visible (if infrequently used) object- | oriented feature that it has is inheritance: | https://www.postgresql.org/docs/current/tutorial-inheritance... | nieve wrote: | It's because PostgreSQL has inheritance and has almost | certainly used the term object relational since before you | heard of it. | glogla wrote: | I with people would stop with the "Uber migrated from Postgres to | MySQL" thing. Uber migrated from Postgres used as relational | database to something that is basically their own non-relational | database using MySQL as distributed key-value store. It is not | really situation applicable to most users of Postgres. | | Anyway, this design of MVCC which moves older data into undo logs | / segments is used by Oracle DB, so it definitely works. The | common challenge with it is that reading older versions of data | is slower, because you have to look it up in a log, and sometimes | the data is removed from the log before your transactions | finishes, getting the dreaded "Snapshot Too Old" error. | | E: I don't see in the article when rows get evicted from the undo | logs. If when they are no longer needed, I'm not sure where the | improvement comes from because it should be similar amount of | bookkeeping? If it's a circular buffer that can ran out of space | like Oracle does it that would mean under high write load long- | running transactions starts to fail which is pretty unpleasant. | paulddraper wrote: | SQL Server avoids vacuum as well, it might be this way, I can't | recall. | | And of course MySQL avoids vacuum by giving a giant middle to | concurrency considerations. | glogla wrote: | I'm pretty sure SQL Server and MySQL use locking instead of | MultiVersion Concurrency Control so they don't keep more | copies of data around. No vacuum needed but there's a | possibility of things blocking. | | But I might be out of date. | paulddraper wrote: | Yes, MySQL has read locks. | | But I don't believe SQL Server does. | evanelias wrote: | InnoDB (MySQL's default storage engine) implements MVCC | using undo logging and background purge threads. It scales | to highly concurrent OLTP workloads quite well. It doesn't | work well with OLAP workloads / long-running transactions | though. The oldest active transaction will block purging of | anything newer than that transaction's snapshot. | ruuda wrote: | Vacuum does more than removing dead tuples though, there is still | a need to update statistics and summarize BRINs. | javajosh wrote: | Yes. The (psql 15) docs are well written: | https://www.postgresql.org/docs/15/routine-vacuuming.html | wild_egg wrote: | You're thinking of ANALYZE which is a separate operation that's | commonly run during vacuuming but can be invoked independently | rickette wrote: | The article contains a link with the rather curious title "10 | things that Richard Branson hates about PostgreSQL".... Turns out | the guy who wrote that blog is called Rick Branson, not Richard. | rcme wrote: | Rick is a nickname for Richard. | daenney wrote: | I love the whole "2.3x less CPU overhead per transaction" where | Postgres scales from 5% to 65% CPU usage and Oriole sits | constantly at 90%. That doesn't seem like a huge success to me? | The predictability sure is nice, but moving the lower end up by | 85% is something I'd be rather worried about | waterproof wrote: | Eyeballing the tps graph, OrioleDB is doing 5x tps while using | 2x the CPU. So about 5/2=2.5x the CPU per transaction. | | Checks out. | gary_0 wrote: | 5x tps with 2x CPU is 2/5 = 0.4x the CPU (ie. it's more | efficient per transaction). | acjohnson55 wrote: | You generally want to keep your CPU fully utilized. It looks | like Oriole is doing significantly more transactions and is | CPU-bound, due to much lower IO requirements. The good news is | that it implies you could get even more performance out of | Oriole by vertically scaling to a more powerful CPU, whereas | Postgres would not continue to increase in performance this | way. | | Those idle times on the Postgres server _could_ be used for | something else, if you 're thinking in a desktop OS mindset. | But for servers, you tend to want machines that are doing one | thing and are optimized for that thing. | raggi wrote: | It's hard to generalize on these points. In a situation where | the throughput was inverted but the proportional system usage | was the same, you would instead say "you can still vertically | scale by adding more disks", rather than saying adding bigger | cpu. It's not meaningful in isolation. | | It may be reasonable to suggest that for a new code base that | is cpu bound there's a good chance there is low hanging fruit | for cpu optimizations that may further increase the | throughput gap. It's also the case however that the prior | engines tuning starting life on much older computer | architectures, drastically different proportional syscall | costs and so on, it very often means that there's low hanging | fruit in configuration to improve baseline benchmarks such as | these. High io time suggests poor caching which in many | scenarios you'd consider a suboptimal deployed configuration. | | It's not just the devil that's in the details, it's | everything. | acjohnson55 wrote: | That makes sense. I'm mostly just trying to explain the | counterintuitive reason that the high CPU usage shouldn't | be interpreted as a flaw. | raggi wrote: | To be a little more clear on what the detail of the | benchmark in question is: it's a benchmark that explicitly | exercises a pathological use case for postgresqls current | design, one that nonetheless functions, and demonstrates | that the advertised engine does not have that pathology. A | key takeaway should probably be, if you're a Postgres user: | if your workload looks exactly like this (sparse upserts | into a large data set at a high rate) then you might want | to evaluate your the runway of your architecture before the | geometric costs or latency stalls become relevant - just as | for cost analysis of any other system. What is somewhat | interesting in this article, and not super clearly | presented, is that this workload is actually fairly | pathological for most existing engines offering this set of | structural and query facilities, and that's interesting, if | this is the niche you need. Most people do some amount of | this, but not always at a super high rate, and there are | ways to get the same effective writable/readable data using | a different schema, while avoiding it. Nice thing here is | you can do the one-liner version. | gary_0 wrote: | > you can still vertically scale by adding more disks | | Parallelizing IO is a lot different from scaling up CPU | power, though. I'd imagine DB server IO performance has a | lot less lower-hanging fruit than CPU/software performance. | tanelpoder wrote: | > You generally want to keep your CPU fully utilized. | | Not in real life concurrent systems where latency matters. In | addition to the queuing/random request arrival rate reasons, | all kinds of funky latency hiccups start happening both at | the DB and OS level when you run your CPU _average_ | utilization near 100%. Spinlocks, priority inversion, etc. | Some bugs show up that don't manifest when running with lower | CPU utilization etc. | adsharma wrote: | It's not clear if the CPU cost per tx is any worse. Was | OrioleDB doing 5x the transactions at this CPU usage? | [deleted] | jklehm wrote: | My read is that it's at 90% because they are saturating the CPU | to that point with the TPS threshold they use for comparison, | the TPS of Oriole is constant and way higher than pg in these | charts at least. | | I'd think the CPU will drop proportionally to the TPS, they | just want to show how high it can go here. | pella wrote: | With the same equipment, your performance is now five times | better. (5X higher TPS) We need to test again with more | hardware, but if you can maintain 3 times the performance at | the lower end, it could be a good alternative for some users. | | _" As the cumulative result of the improvements discussed | above, OrioleDB provides: | | - 5X higher TPS, | | - 2.3X less CPU load per transaction, | | - 22X less IOPS per transaction, | | - No table and index bloat."_ | pella wrote: | simple OrioleDB docker build tutorial : | | https://github.com/orioledb/orioledb/blob/main/doc/docker_us... | mynonameaccount wrote: | [dead] ___________________________________________________________________ (page generated 2023-07-15 23:00 UTC)