[HN Gopher] Postgres scaling advice
       ___________________________________________________________________
        
       Postgres scaling advice
        
       Author : BrentOzar
       Score  : 310 points
       Date   : 2021-01-26 12:25 UTC (10 hours ago)
        
 (HTM) web link (www.cybertec-postgresql.com)
 (TXT) w3m dump (www.cybertec-postgresql.com)
        
       | orlovs wrote:
       | I am totally pro-hosted solutions. However, hosted postgres is
       | always none of it. I had experience on same size machine some
       | queries took 30x time longer on hosted vs selfhosted
        
       | StreamBright wrote:
       | >> A single PostgreSQL instance can easily do hundreds of
       | thousands of transactions per second
       | 
       | For example, on my (pretty average) workstation, I can do ca. 25k
       | simple read transactions per 1 CPU core on an "in memory" pgbench
       | dataset...with the default config for Postgres v13! With some
       | tuning (by the way, tuning reads is much harder in Postgres than
       | tuning writes!) I was able to increase it to ~32k TPS per core,
       | meaning: a top-notch, dedicated hardware server can do about 1
       | million short reads! With reads, you can also usually employ
       | replicas - so multiply that by 10 if needed! You then need to
       | somehow solve the query routing problem, but there are tools for
       | that. In some cases, the new standard LibPQ connection string
       | syntax (target_session_attrs) can be used - with some shuffling.
       | By the way, Postgres doesn't limit the number of replicas, though
       | I personally have never witnessed more than 10 replicas. With
       | some cascading, I'm sure you could run dozens without bigger
       | issues.
       | 
       | This sort of hand-wavy "benchmark" is not really good for anybody
       | other then the author's satisfaction. Real world scenarios are
       | not like that.
        
       | kevincox wrote:
       | I wonder when using a distributed database (like CockroachDB)
       | will be the default for new applications. Right now it seems that
       | they are less feature and harder to set up than traditional
       | RDBMSes but I can only assume that this gap will narrow and at
       | some point in the future things will be "scalable by default".
       | (Of course no DB is going to prevent all ways to shoot yourself
       | in the foot)
        
         | andreimatei1 wrote:
         | I think the "default" will evolve with whatever offers the best
         | "serverless" experience in the public clouds. In particular,
         | the cheapest and most granularly-billed option.
        
         | WJW wrote:
         | This question is similar to asking on a car forum when using a
         | 40 foot lorry will be the default starter car for everyone. The
         | answer is "probably never" because while it does offer superior
         | cargo transport scalability, the tradeoffs are not worth it for
         | the vast majority of users. The question is posed like
         | distributed databases have no disadvantages over non-
         | distributed databases, but that is simply not the case.
         | Clustering introduces all sort of problems, from network splits
         | to slow(er) joins across nodes.
        
           | __s wrote:
           | To back up how far one server can go, Stack Overflow used a
           | single database for a long time
           | 
           | https://nickcraver.com/blog/2016/02/17/stack-overflow-the-
           | ar... (2013 post had much less redundancy, but even their
           | 2016 architecture is pretty undistributed in terms of being
           | able to recreate everything from the single source of truth
           | database)
        
             | bcrosby95 wrote:
             | Yeah, it's been a while since I've been in a high traffic
             | situation, but back in the late 00s we had a couple sites
             | with 3-5 million daily unique users. Each site had a single
             | master, a couple read replicas, and some queries were
             | cached in memcached.
             | 
             | The problem in MySQL we eventually ran into was the read
             | replicas fell behind master during peak load (by 30-60
             | minutes depending upon the day). So we still had to hit the
             | master for certain queries. I left before we took the next
             | step to fix this issue.
        
           | kevincox wrote:
           | I don't think it is quite the same.
           | 
           | - Switching from a car to a van to a lorry is fairly low
           | cost. You don't need to recreate your product (probably).
           | 
           | - You don't need to run distributed databases in a cluster to
           | start.
           | 
           | But I think most importantly the decrease in dev speed and
           | performance is an investment in future scalability. And I
           | only imagine that this different will shrink over time to
           | where for example a 1 node "distributed" DB isn't that
           | different to work with than a 1 node "traditional" DB. And
           | that small difference pays off because adding a second node
           | doesn't have huge tradeoffs.
           | 
           | I agree that right now it doesn't make sense. If I was
           | starting a new product I would still fire up a PostgreSQL
           | instance. However I think that the day will come where the
           | difference is small enough and software support is complete
           | enough that we will start with something distributed, much
           | like people don't often start with SQLite today, even though
           | it is quicker to get going (also not a great comparison
           | because it has less features).
        
             | andreimatei1 wrote:
             | > I agree that right now it doesn't make sense.
             | 
             | This CRDB engineer respectfully disagrees. This thread
             | takes it as a given that a non-distributed DB is better if
             | you don't need to scale up (i.e. if you run a single
             | "node"). Let me offer a couterpoint: it's easier to embed
             | CRDB into some software you're distributing than it is to
             | embed Postgres. This is to say, we do try to compete at
             | every scale (well, perhaps not at SQLite scale). CRDB
             | doesn't have all the SQL features of PG, but it does have
             | its pluses: CRDB does online schema changes, it is
             | generally simpler to run, comes with a UI, comes with more
             | observability, can backup to the cloud, can be more easily
             | embedded into tests.
             | 
             | Online schema changes are a big deal; the other thing that
             | I hope will help us win small-scale hearts and minds is the
             | ever-improving observability story. I hope CRDB will
             | develop world-class capabilities here. Other open-source
             | databases traditionally have not had many capabilities out
             | of the box.
        
             | yuribro wrote:
             | > But I think most importantly the decrease in dev speed
             | and performance is an investment in future scalability.
             | 
             | It makes sense only if you'll ever need this scalability.
             | And you take a hit on other fronts too: Infra cost,
             | Deployment complexity. And both deployment complexity and
             | code complexity also increase QA cost, instability, product
             | and company reputation.
             | 
             | >> much like people don't often start with SQLite today
             | 
             | Maybe they should start with SQLite by default
        
               | lrem wrote:
               | For personal projects, I start with SQLite by default.
               | Never came close to its scalability limit too ;)
        
             | WJW wrote:
             | Making investments in future scalability at the cost of dev
             | speed and performance is exactly the wrong tradeoff given
             | that 90% of startups fail. At the start, when you have very
             | little income, you want to allow for as much speed and
             | flexibility as possible so that you can get to
             | product/market fit ASAP. By the time your company gets big
             | enough that plain MySQL/Postgres can't handle the load any
             | longer, you will have more than enough money to afford a
             | few experts that can help you migrate.
        
         | cuu508 wrote:
         | To me it looks like there are not many affordable options right
         | now.
         | 
         | CockroachDB understandably wants you to use their Cloud or
         | Enterprise products: the OSS version is quite limited. For
         | example it doesn't support row-level partitioning
         | (https://www.cockroachlabs.com/docs/stable/configure-
         | replicat...). Which means, if I understand correctly, it is not
         | of much help for scaling writes to a single big table.
        
           | nvanbenschoten wrote:
           | Hi cuu508, CockroachDB engineer here. You are correct that
           | row-level partitioning is not supported in the OSS version of
           | CRDB. However, it sounds like there's a bit of confusion
           | about where manual table partitioning is and is not needed.
           | The primary use-case for row-level partitioning is to control
           | the geographic location of various data in a multi-region
           | cluster. Imagine a "users" table where EU users are stored on
           | European servers and NA users are stored on North American
           | servers.
           | 
           | If you are only looking to scale write throughput then manual
           | partitioning is not be needed. This is because CRDB
           | transparently performs range partitioning under-the-hood on
           | all tables, so all tables scale in response to data size and
           | load automatically. If you are interested in learning more, h
           | ttps://www.cockroachlabs.com/docs/stable/architecture/distr..
           | . discusses these concepts in depth.
        
             | cuu508 wrote:
             | Thanks for explaining, and sorry -- looks like I jumped to
             | conclusions too quickly!
        
           | kevincox wrote:
           | Oh, I didn't realize that the free version was that limited.
           | I guess I need another name to use as the default open source
           | distributed database.
        
           | berns wrote:
           | No, you didn't understand correctly. The feature that isn't
           | supported is row level _replication zones_. Replication zones
           | allows to define the location of replicas.
        
         | kryptiskt wrote:
         | Why would it ever be default? Very few applications need a
         | distributed database. These days you can get a single machine
         | with hundreds of terabytes of storage and terabytes of RAM. Add
         | a spare machine for failover and you have a reliable setup for
         | any but the biggest tasks. And the tasks that that setup isn't
         | sufficient for will certainly demand more thought than a
         | cookie-cutter setup of some random distributed DB.
        
         | jandrewrogers wrote:
         | What we need is a database that can both scale-up _and_ scale-
         | out. Most distributed databases offer poor efficiency and
         | performance on a per node basis, which has a high operational
         | cost. This is why people avoid using distributed databases
         | unless they need it. A scale-up database can serve as much
         | workload as pretty large scale-out database in practice. This
         | discontinuity creates the market for scale-up systems.
         | 
         | There is literally nothing preventing distributed databases
         | from having excellent scale-up performance too. Unfortunately,
         | people who design distributed databases have a strong bias
         | toward unnecessarily throwing hardware at performance and
         | scalability problems. This is partly because very few people
         | know _how_ to design a modern scale-up database; making
         | something (trivially) distributed is easier.
        
       | FpUser wrote:
       | >"try all the common vertical scale-up approaches and tricks. Try
       | to avoid using derivative Postgres products, or employing
       | distributed approaches, or home-brewed sharding at all costs -
       | until you have, say, less than 1 year of breathing room
       | available."
       | 
       | Very healthy approach. I've always followed the idea of vertical
       | scalability when writing my modern C++ app servers with local
       | Postgres. Since I do not sell those to FAANG I've never failed
       | finding decent very reasonably priced piece of hardware be it
       | dedicated hosting or on prem that would not satisfy client's need
       | for any foreseeable future. More then that. I've never needed
       | even top of the line hardware for that. I concentrate on features
       | and robustness instead. Using C++ also gives nice speedup.
        
       | isoprophlex wrote:
       | Pretty solid advice, nice article.
       | 
       | One thing:
       | 
       | > For example, on my (pretty average) workstation, I can do ca.
       | 25k simple read transactions per 1 CPU core on an "in memory"
       | pgbench dataset...with the default config for Postgres v13!
       | 
       | Forget about reaching those numbers on managed DB-as-a-service
       | instances, specifically Azure managed postgres. In my experience
       | these have comparatively poor peak performance, with high
       | variability in latency to boot.
       | 
       | Bare metal all the way, if you can spare the allocation of a
       | dedicated DBA.
       | 
       | Also... if rapid reads and steady performance is what you're
       | after, provision a replica for the occasional analytical
       | workloads.
        
         | samf wrote:
         | Good info; I came here looking for experience with managed
         | databases. Does anyone have experience with managed instances
         | on other platforms?
        
       | jandrewrogers wrote:
       | The assertion that PostgreSQL can handle dozens of TB of data
       | needs to be qualified, as this is definitely not the case in some
       | surprising and unexpected cases that are rarely talked about.
       | 
       | PostgreSQL's statistics collection, which is used by the query
       | planner, _doesn 't scale with storage size_. For some ordinary
       | data distributions at scale, the statistical model won't reflect
       | any kind of reality and therefore can produce pathological query
       | plans. It is quite difficult to get around this scaling behavior.
       | Consequently, I've moved away from using PostgreSQL for data
       | models with distributions such that these query planner
       | limitations will occur. These pathologies occur well before the
       | "dozens of TB" range.
       | 
       | The statistics collector is not a scalable design generally, but
       | that is another matter. In its current state it does not degrade
       | gracefully with scale.
        
         | aeyes wrote:
         | Vacuum is a way bigger problem, if you have a few million
         | updates per day on a 5TB table you are going to have a hard
         | time keeping index and page bloat down.
         | 
         | Sure, if your tables are insert only you might be fine but
         | doing any kind of DDL or maintenance (analyze after version
         | upgrades) is going to ruin your day.
        
         | laurenz_albe wrote:
         | This seems to be unfounded criticism. When statistics are
         | gathered, PostgreSQL samples a certain percentage of the table,
         | so that obviously scales. The number of "most common values"
         | and histogram buckets scales up to 10000, which should be good
         | even for large tables. While I'll readily admit that not all
         | aspects of cross-column dependencies are dealt with, and cross-
         | table distributions are not considered, that has nothing to do
         | with size. I guess you hit a problem somewhere, couldn't solve
         | it and jumped to unwarranted conclusions.
        
           | jandrewrogers wrote:
           | It is ironic that you accuse me of "unwarranted conclusions".
           | I've been customizing and modifying PostgreSQL internals for
           | almost two decades, I know how to read the source. You aren't
           | as familiar with PostgreSQL as you think you are.
           | 
           | This wasn't my problem, I was asked by a well-known company
           | with many large PG installations and enterprise support
           | contracts to look at the issue because no one else could
           | figure it out. The limitations of the statistics collector
           | are not only evident in the source _but they are documented
           | there_. There are also deep architectural reasons why you can
           | 't trivially modify the statistics collector -- I looked into
           | this option -- to work for larger tables without introducing
           | other serious issues.
           | 
           | If you have a uniform distribution of values, the statistics
           | collector will work fine. In the above case, the values
           | followed a power law distribution which created extreme
           | biases in the statistical model due to necessary restrictions
           | on sampling. Other distributions can have similar effects
           | once you exceed the ability of the statistics collector to
           | acquire a representative sample.
        
             | brasetvik wrote:
             | > I know how to read the source. You aren't as familiar
             | with PostgreSQL as you think you are.
             | 
             | Oh, maybe you have read some of Laurenz Albe's many
             | contributions to Postgres, then. https://git.postgresql.org
             | /gitweb/?p=postgresql.git&a=search...
        
             | srcreigh wrote:
             | Naive question. Wouldn't sampling a power law dataset be
             | straightforward? The idea is there's only a few outlier
             | values, and the rest are uncommon. This distribution seems
             | extremely common. Ie column with mostly NULL values and the
             | rest somewhat unique non null strings.
             | 
             | I'm curious what data you saw and why the sampling didn't
             | work?
        
         | fabian2k wrote:
         | You can change the statistics target, see
         | https://www.postgresql.org/docs/current/runtime-config-query...
         | 
         | You can also create more advanced statistics over multiple
         | columns: https://www.postgresql.org/docs/current/planner-
         | stats.html
         | 
         | But if your statistics are bad, it will certainly mess up some
         | of your query plans.
        
           | jandrewrogers wrote:
           | Yes, everyone knows these things, it does not address the
           | issue. Due to its architecture, there are difficult to change
           | internal limits on how PostgreSQL samples the data.
           | 
           | Under some conditions, it is not possible for the sampling
           | mechanics to build a representative model -- the actual
           | statistics used for query planning will be quasi-random. One
           | of the ways this manifests is that every time you rebuild the
           | statistics you get a completely different statistical model
           | even if the data has barely changed.
        
         | gher-shyu3i wrote:
         | > PostgreSQL's statistics collection, which is used by the
         | query planner, doesn't scale with storage size.
         | 
         | Out of curiosity, do statistics collectors for other offerings
         | (e.g. MySQL, SQL Server, Oracle) scale with storage size?
        
         | boomer918 wrote:
         | Theoretically your argument would make sense, but practically
         | your data distribution should be uniform enough for the
         | statistics collector to work for terabytes and terabytes.
         | 
         | I've seen query plans get messed up with partial indexes, but
         | if you have a regular index, you needn't worry.
        
         | natmaka wrote:
         | Even when using as much stat samples as possible (3000000) for
         | each of those tables: ALTER TABLE table_name ALTER column_name
         | SET STATISTICS 10000; See
         | https://www.postgresql.org/docs/current/sql-altertable.html
         | 
         | AFAIK the PostgreSQL's approach is based upon
         | http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.53....
        
         | fanf2 wrote:
         | Does statistics collection scale better with partitioned
         | tables? TFA mentions that partitioning helps with vacuum, and
         | AIUI vacuum workers also collect statistics.
        
         | jajool wrote:
         | I have seen many PostgreSQL benchmarks having solid performance
         | with TB data but my real world experience is the complete
         | opposite.
         | 
         | Here are some of the main issues that I have encountered so
         | far:
         | 
         | 1. Queries on large tables (around 10 GB) are slow even when
         | "index only scan" is used because of MVCC and the way
         | postgreSQL manages concurrency.
         | 
         | 2. Hot-standby instances can't be used for anything serious
         | since all queries are dropped regularly (I believe it's not
         | safe to use "hot_standby_feedback" config to overcome this
         | issue).
         | 
         | 3. It is not possible to have tables with heavy "update"
         | workflows. (because of simultaneous autovaccum execution)
         | 
         | I would be very happy if anyone could show me that I am wrong.
        
           | YorickPeterse wrote:
           | > 2. Hot-standby instances can't be used for anything serious
           | since all queries are dropped regularly (I believe it's not
           | safe to use "hot_standby_feedback" config to overcome this
           | issue).
           | 
           | Hot-standby instances are fine. We've been using these for
           | GitLab.com for a number of years now, and never have they
           | caused any issues. We do have some code in place that retries
           | queries when they are cancelled by the hot-standby, but IIRC
           | this doesn't happen often.
        
           | tommyzli wrote:
           | 1. You'll have to define "slow" - I have a 3TB table where an
           | index only scan takes under 1ms
           | 
           | 2. hot_standby_feedback is absolutely safe. I've got 5 hot
           | standbys in prod with that flag enabled
           | 
           | 3. Again, it depends on how "heavy" your update throughput
           | is. It is definitely tough to find the right balance to
           | configure autovacuum between "so slow that it can't keep up"
           | and "so fast that it eats up all your I/O"
        
           | singron wrote:
           | 1 mostly applies to update heavy tables since index only
           | scans use the visibility map, which would be frequently
           | invalidated.
           | 
           | 3 is definitely true, especially the larger the table. I've
           | had success splitting frequently updated columns out into
           | their own much smaller table, or any other trick to
           | concentrate updates into a small table. Also MVCC bookkeeping
           | requires updating pages, so an UPDATE that doesn't change any
           | field and SELECT FOR UPDATE will cause the same problem.
        
         | andy_ppp wrote:
         | This sounds really in depth and useful, are there more in depth
         | articles about how to avoid these types of query.
        
         | claytonjy wrote:
         | This is intriguing; could you give an example of a distribution
         | Postgres struggles with at scale, and an application that
         | produces such a distribution?
        
           | mkilling wrote:
           | We've recently been surprised by wrong estimates made by the
           | planner resulting in inefficient query plans.
           | 
           | It seems that power law distributions over multiple tables
           | aren't handled very well:
           | https://stackoverflow.com/questions/65861780/is-there-a-
           | way-...
        
         | silon42 wrote:
         | Is it easy to disable it? Personally I would not want any
         | unpredictable behavior from query planner anyway, especially at
         | scale.
        
           | Tostino wrote:
           | You would get predictably bad behavior and performance
           | without the statistics collected.
        
             | ben509 wrote:
             | Predictably bad won't result in a sudden spike in IO usage
             | that brings down a production system. This is why Oracle
             | has SQL profiles.
        
               | Tostino wrote:
               | Then use something like pg_hint_plan:
               | https://pghintplan.osdn.jp/pg_hint_plan.html (never used
               | it myself)
               | 
               | I've been bitten by bad optimizer choices which took way
               | too long to figure out how to debug / fix, so I know the
               | pain. I do hope the PG optimizer continues improving.
               | It's one of the weakest areas in comparison to commercial
               | DB's IMO.
        
         | eatonphil wrote:
         | Neither of you have given a reproducible example one way or the
         | other! I could share that I know of TB of core analytics data
         | running postgresql and it does alright. But it would be nicest
         | if there were a more tangible study to talk about.
        
       | jjice wrote:
       | In the opinion of a last semester CS student who has never
       | written an application from scratch that needed more than a
       | SQLite DB (so take me with a half grain of salt), it seems like
       | premature optimization, while always talked about, is very
       | common. I see people talking about using Kubernetes for internal
       | applications and I just can't figure out why. If it's a hobby
       | project and you want to learn Kubernetes, that's a different
       | situation, but in the case of making a real application that
       | people will use, it seems like a lot of us can get away with a
       | single DB, a few workers of our app, and maybe a cache.
       | 
       | I'm speaking out of little experience though. I just think that a
       | lot of us can get away with traditional vertical scaling and not
       | think too much about it.
        
         | pg_bot wrote:
         | What you're describing is called resume driven development. It
         | happens every few years when people want to cash in on
         | trends/buzzwords that people believe will be disruptive to all
         | industries but are just tools to have in the toolbox for most.
         | New tools pop up all the time that fit this mould. Over the
         | past ten years I can think of Hadoop (Big data), MongoDB
         | (NoSQL), Kubernetes, "Serverless" computing, and TensorFlow.
         | While all these tools have legitimate use cases, they are often
         | overused due to marketing or industry hype.
         | 
         | Adding artificial intelligence to your recipe application is
         | unlikely to make any sense, but people do it because they want
         | to have AI software engineer on their resume.
        
           | nforest wrote:
           | For artificial intelligence, I think it's more often
           | marketing driven development. It's easier to seem disruptive
           | if you claim to have AI in your product. Easier to get
           | funding and have people talk about your company. I feel like
           | it comes more often from business executives than technical
           | people.
        
         | kodah wrote:
         | > but in the case of making a real application that people will
         | use, it seems like a lot of us can get away with a single DB, a
         | few workers of our app, and maybe a cache.
         | 
         | A couple of points:
         | 
         | 1. Kubernetes can run monoliths. It's certainly not exclusive
         | to microservices or SOA. It's just a compute scheduler, quite
         | similar to AWS's EC2 reservations and auto-scaling groups
         | (ASG's).
         | 
         | 2. I can't speak for every corporation, but if you already have
         | patterns for one platform (note: "platform" in this context
         | means compute scheduling. eg: AWS, GCP, Kubernetes, Serverless)
         | then you will inevitably try to copy patterns you already
         | implement internally. A lot of times, for better or for worse,
         | it's not what fits best unless what fits best and what you have
         | available are highly conflicting.
         | 
         | 3. A lot of times "scaling" is actually code for multi-tenancy.
         | As an industry, we should probably be explicit when we're
         | scaling for throughput, redundancy, and/or isolation. They are
         | not the same thing and at times at odds with each other.
         | 
         | 4. I don't really like your use of "real application" here as
         | it implies some level of architectural hierarchy. My main
         | takeaway after 10+ years of professional development is that
         | architectures are often highly contextual to resource
         | availability, platform access, and personal preferences.
         | Sometimes there's a variable of languages too, because some
         | languages make microservice architecture quite easy while
         | others make it a royal PITA.
        
         | wejick wrote:
         | I know one of the biggest Ecommerce shop in Asia were using 1
         | big DB with multiple read only slave in monolithic architecture
         | for more than 5 years.
         | 
         | However not only driven by DB performance, but also on
         | organizing hundreds of engineers they adapted microservice
         | architecture. Then they slowly migrating to per domain specific
         | DB, it is just classic microservice migration story.
         | 
         | While single DB may bring us pretty long way, designing the
         | system into more discipline logical domain level segregation
         | will help when there's need to move to microservice.
         | 
         | *looks like HN reader quite sensitive with microservice
         | migration comment, usually this kind of comment got down voted
         | easily.
        
           | collyw wrote:
           | Stack Overflow runs what is essentially a monolithic
           | architecture. Though they do have a few services, it isn't
           | what I would describe as a micro-service architecture.
           | 
           | https://stackexchange.com/performance
        
           | pas wrote:
           | Monzo (UK bank) has 1600+ microservices, but mandates a
           | common framework/library and uses Cassandra. (Which is
           | basically a shared nothing, bring your own schema
           | "database".)
           | 
           | So it makes sense to combine advantages of different
           | approaches.
        
         | mumblemumble wrote:
         | It's resume-driven development, and it's also entertainment-
         | driven development. Bringing in new technologies gives you a
         | chance to play with a new toy. That's an effective way to make
         | your job more interesting when the thing you're supposed to be
         | working on is boring. Which, in business applications, is more
         | often than not the case.
        
           | spaetzleesser wrote:
           | In today's job market resume driven development is a very
           | rational choice. I work in medical devices so we are pretty
           | conservative and generally way behind the cutting edge. This
           | makes it really hard to find jobs at non medical companies. I
           | would recommend anybody who has the chance to use the latest
           | and shiniest stuff to do so because it's good for your career
           | .
        
             | mumblemumble wrote:
             | Very good point. Seems like yet another example of how
             | carefully optimizing all the individual parts of a system
             | can paradoxically de-optimize the overall system.
        
         | zdragnar wrote:
         | In big enough organizations, it is very easy to lose track of
         | who owns what, especially when it is those little ad-hoc
         | internal tools. Manually managing the infrastructure for them
         | is a recipe for them to become permanently enshrined in the
         | wasteland of "services we think we use, but do not maintain
         | because we don't remember who needed it or put it up or how to
         | configure it".
         | 
         | K8s isn't the only answer, but if you are already using it for
         | your large applications, it isn't much work to reuse the
         | existing tooling and infrastructure, and now you st least have
         | the dockerfile as a reference if nothing else.
         | 
         | OTOH, if you have an existing tooling setup / pipeline that is
         | _not_ K8s, there isn 't a good reason to use it for a small
         | application.
        
           | corty wrote:
           | Having a Dockerfile that copies a few binary blobs into an
           | age-old distro image isn't an improvement, it's a huge
           | liability. And most of that stuff that no one knows anything
           | about anymore is like that. Same as with an old VM or PM.
           | 
           | I'd rather have that old crap as a physical machine. Why?
           | Because the hardware lifetime "naturally" limits the lifetime
           | of such applications. If the hardware dies, it forces a
           | decision to spend some money to keep it running or throw it
           | away, which, given that hardware is expensive, usually
           | results in throwing it away.
        
             | vlovich123 wrote:
             | Set up your docker file to be part of your CI so that your
             | binary blobs are built from source with regularity? That's
             | typically the solution I've seen work well. Manually
             | maintained stuff (especially for stuff that may not be the
             | thing everyone is primarily doing) generally doesn't scale
             | well without automation (speaking as someone who's seen
             | organizations grow). This is also true of "getting started"
             | guides. Can't tell you how much maintenance and run time
             | I've saved converting those wikis to Python scripts.
        
               | corty wrote:
               | Yes, of course. That would be ideal. That's what we do
               | for everything we can control.
               | 
               | But as someone in the IT dept., far too often you get
               | some container that either was built by someone who long
               | left the company or an external consultant who got paid
               | to never return. Sourcecode is usually unavailable, and
               | if it is available, will only build on that one laptop
               | that the consultant used. The IT department gets left
               | with the instruction "run it, it was expensive" and "no,
               | you don't get any budget to fix it". That results in the
               | aforementioned containers of doom...
               | 
               | Yes, I'm bitter and cynical. Yes, I'm leaving as soon as
               | I can :)
        
               | striking wrote:
               | I hate to raise a seemingly obvious point, but this
               | doesn't seem like a problem with Docker.
        
             | wdb wrote:
             | Can still remember the multinational were wanted to host
             | the webapplciation we developed from them at their data
             | centre. They wanted to charge us (not the business unit for
             | some odd reason) nearly EUR20.000 per year to host a RoR
             | web application.
             | 
             | Ended up, hosting it ourselves, and in the last year they
             | were paying EUR100k per year for it. As we would just sell
             | the same setup for each deployment for their customers.
             | They probably been cheaper off to host it themselves.
        
             | giantrobot wrote:
             | In theory that works. In practice it rarely does. Docker et
             | al gained popularity because they made it way more
             | practical for projects to be managed as the world works
             | rather than as it should be, for good or ill. Before Docker
             | it was moving old applications to VMs and before that it
             | was running them in chroot horror shows.
        
             | yowlingcat wrote:
             | > the hardware lifetime "naturally" limits the lifetime of
             | such applications
             | 
             | Oh, my sweet summer child. Of all the things that
             | "naturally" limits the lifetime of such applications, that
             | is not it. Consider the case of the mainframes running the
             | US banking system, for example.
        
           | waynesonfire wrote:
           | docker and k8s adoption can force a company that over years
           | has developed and perfected a standard practice of deploying
           | application, with a half-ass'd solution that ends up solving
           | the wrong problems and costing way more. The "shipping beats
           | perfecting" mantra is very much at play here. This is due to
           | the amount of time it would take to achieve parity. At the
           | end of the day, the new solution ends up looking like a step-
           | back.
           | 
           | Such a practice combined with the mentality that software
           | engineers should do their own dev-ops can easily lead to an
           | environment of spaghetti applications where every developer
           | working on the new platform does things slightly different
           | because the replacement solution wasn't complete and had to
           | be addressed by countless band-aids by engineers across the
           | band of talent.
           | 
           | Furthermore, for the features that were able to achieve
           | parity, you're now forcing your entire organization to re-
           | learn the new development process. The docs our abysmal and
           | the software engineers that developed the original solution
           | have moved on since the hard work of "productionalization"
           | remains and they're not interested in that.
        
             | akiselev wrote:
             | _> docker and k8s adoption can force a company that over
             | years has developed and perfected a standard practice of
             | deploying application, with a half-ass 'd solution that
             | ends up solving the wrong problems and costing way more._
             | 
             | docker and k8s adoption can also force a company that over
             | years has developed and perfected a half-ass'd solution to
             | deploying applications, with a single(-ish) source of truth
             | that ends up solving the right organizational problems
             | instead of the wrong technical ones (and ends up costing
             | way more, at least in the short term).
        
         | avereveard wrote:
         | Kubernetes isn't only about scaling. The repeatability of
         | deployment process is a great asset to have as well.
        
           | Silhouette wrote:
           | But there are much simpler ways than K8s to achieve
           | automated/repeatable deployments, if that is your goal.
        
             | objektif wrote:
             | Can you please name a few?
        
         | kitd wrote:
         | _I see people talking about using Kubernetes for internal
         | applications and I just can 't figure out why._
         | 
         | There is benefit in having established platforms for running
         | your code, and this is especially true for large orgs where the
         | people who run the systems are an entirely different group from
         | those that developed or assembled it. And people (+ their
         | skills) are what cost the most money in any business.
         | 
         | It's true that many/most systems don't require a full
         | Kubernetes stack (for instance), but if a critical mass of the
         | business IT is going that way, doing the same with your own
         | makes sense from an economies of scale PoV.
        
           | mvanbaak wrote:
           | > There is benefit in having established platforms for
           | running your code
           | 
           | You do know k8s is very new, there's a constant stream of
           | changes and updates to it, etc etc? it's not established.
           | It's known, but that's it.
        
             | thraxil wrote:
             | I think what they may mean is more that there is an
             | established platform within the organization. One that you
             | have expertise and experience with,
             | monitoring/backup/security tools that work with it, etc.
             | K8s might not have as long a pedigree as VMs, but if you
             | already have a setup to run K8s, people who know how to use
             | it, documentation and tooling that allow devs to run their
             | apps on it securely and efficiently, etc. it's pretty
             | reasonable to want to encourage devs to "just" use k8s it
             | if they want to stand up a new service rather than spinning
             | up whatever random collection of technologies that dev
             | happens to know better.
        
             | pas wrote:
             | 1.0 was released in 2015. There are stable LTS vendors for
             | it.
             | 
             | It's pretty established. And much saner than cobbling
             | together Ansible/Puppet/Chef playbooks for everything.
        
               | majewsky wrote:
               | Saying that 2021's Kubernetes is established because 1.0
               | was released in 2015 is like saying that 1991's Linux is
               | stable because Unix had existed for 20 years at that
               | point. Kubernetes 1.0 and 1.20 share the same name,
               | design principles and a certain amount of API
               | compatibility, but it's impossible to take a nontrivial
               | application running on 1.20 and just `kubectl apply` it
               | on 1.0. Too much has changed.
               | 
               | Kubernetes is _just now_ entering the realm of  "becoming
               | stable". Maybe in five years or so it'll finally be
               | boring (in the best sense of the word) like Postgres.
        
         | collyw wrote:
         | Speaking as someone with 20 years in the industry, what you say
         | is correct. Most applications would be find on a single server
         | and a classic LAMP stack. But that ain't cool these days.
        
         | ccmcarey wrote:
         | I think it's easier to run a small k8s cluster than it is to
         | attempt to recreate a lot of the functionality provided
         | manually, especially if you're running in a cloud where the
         | control plane is handled for you.
         | 
         | It provides unified secrets management, automatic service
         | discovery and traffic routing, controllable deployments,
         | resource quotas, incredibly easy monitoring (with something
         | like a prometheus operator).
         | 
         | Being able to have your entire prod environment defined in
         | declarative yml is just so much better than running something
         | like a mishmash of ansible playbooks.
         | 
         | If your application runs on a single host and you don't really
         | care about SLAs or zero downtime deploys, sure, use some adhoc
         | deploy scripts or docker compose. Any more than that, and I
         | think k8s pays for itself.
        
           | jayd16 wrote:
           | Agreed. What is it that people are doing (or not doing) where
           | a simple managed k8s cluster is more work than the minimum
           | way to do this?
           | 
           | Are teams not even setting up automated builds and just
           | sshing into a box?
        
             | hectormalot wrote:
             | For me that's Heroku. I just push my app (RoR) and done.
             | I've actually moved it once to k8s for a few months and
             | decided to move it back after I understood how easy heroku
             | made the opa side of the business.
             | 
             | Note: it's a side project, 50 business users, $5k annual
             | revenue, 4h per month as target for the time spent on
             | customer support, admin and maintenance. So it's both easy
             | to pay heroku and important for me to not spend too much
             | time on Ops.
        
         | Silhouette wrote:
         | Very many successful applications can indeed run on a single DB
         | server (modulo redundancy in case of failures). Vertical
         | scaling isn't trendy, but it is effective, until it's not.
         | 
         | I have yet to encounter a real situation where it _suddenly_
         | became impossible to run a production DB on a single, high-spec
         | server, without knowing far enough in advance to plan a careful
         | migration to a horizontally scaled system if and when it was
         | necessary.
        
         | vlovich123 wrote:
         | I'm not saying in all companies, but as you grow you have lots
         | of different teams with different needs. So then you spin up a
         | tools team to manage the engineering infrastructure since you
         | can't do it as-hoc anymore (CI, source control, etc). So to
         | make that team more efficient, you let them force one size fits
         | all solutions. While this may feel constraining for a given
         | problem domain, it actually makes engineers more portable
         | between projects within the company which is valuable. Thus
         | having one DB or cloud thing that's supported for all teams for
         | all applications is valuable even if sometimes it isn't
         | necessarily the absolute best fit (and the complexity is
         | similarly reduced as good companies will ensure there's tooling
         | to make those complex things easy to configure in consistent
         | ways). Your tools team and the project team will work together
         | to smooth out any friction points. Why? Because for larger
         | numbers of engineers collaborating this is an efficient
         | organization that takes advantage of specialization. A
         | generalist may know a bit about everything (useful when
         | starting) but a domain expert will be far more equipped to
         | develop solutions (better results when you have the headcount).
        
         | gowld wrote:
         | Kubernetes is a container system, mostly orthogonal to 3- or
         | 4-tier application design.
        
           | h0l0cube wrote:
           | And typically a single DO droplet would suffice for a toy
           | project or POC, for which Ansible is probably the more
           | expedient option. But maybe they're not in a rush, and
           | learning K8s is just another feather in their cap .
        
         | barrkel wrote:
         | Kubernetes is for when you need to allocate CPU like you
         | allocate RAM, _and_ you don 't want to be tied to a higher
         | level API sold by a vendor.
        
         | smoyer wrote:
         | > I see people talking about using Kubernetes for internal
         | applications.
         | 
         | I think the important issue when first starting a project is to
         | create a "12 Factor App" so that if and when you create a
         | Docker image and/or run the application in Kubernetes, you
         | don't have to rewrite the entire application. Most of the tools
         | I write run on the CLI but I am in fact a fan of Kubernetes for
         | services, message processing and certain batch jobs simply
         | because I don't have to manage their life-cycles.
        
           | Deadron wrote:
           | 12 factor apps sacrifice performance and simplicity of your
           | environment for scalability. Unless you are guaranteed to
           | start with a worldwide audience its complete overkill. A
           | better solution is to write your application with the rules
           | in mind with the goal of making it easy to transition to a 12
           | factor style app when its needed. Scale up then scale out
           | will result in the best performance for your users.
        
             | vp8989 wrote:
             | The thinking on efficiency vs scalability is largely
             | influenced by the US tech company meta where founders give
             | up equity so they don't have to worry about profitability
             | for a very long time.
             | 
             | In that case, it is preferred to burn piles of cash on AWS
             | instead of potentially needing to sacrifice revenue because
             | you can't scale quickly enough.
             | 
             | An architecture that is not scalable is considered a
             | failure whereas one that is complex and inefficient is much
             | more tolerated (as long as it can scale out) ... at least
             | until the funding dries up or Wall Street activists get
             | involved.
        
             | sudhirj wrote:
             | The 12 factors are mostly common sense that apply in pretty
             | much any situation - they help with fancy deployments but
             | also with single servers on DO or even on-Prem servers.
        
         | Turbots wrote:
         | For one application, kubernetes is obviously giant overkill.
         | Companies with hundreds and thousands and applications need a
         | platform that can offer standardised deployment pattern,
         | runtime pattern and scaling pattern for those apps. Kubernetes
         | is a great start for managing all those containers in a
         | consistent, secure, multi tenant environment. I know customers
         | with literally 10000s of VMs that are moving to more cloud like
         | environments, either on premise in their own datacenter or in
         | the public cloud. They need these kind of platforms to automate
         | away the repetitive stuff and have a secure posture, throughout
         | the company.
        
         | rc_hackernews wrote:
         | I haven't worked at a FAANG or any other company even close to
         | that level of scale, so you can take me with half a grain of
         | salt too.
         | 
         | But what you said is absolutely true. It's also something you
         | will very much experience once you start working
         | professionally.
         | 
         | I'm in no position to give you advice, and I think I might be
         | giving advice to myself...just don't let it get to you.
        
           | btilly wrote:
           | I have worked at FAANGs before.
           | 
           | I am in firm agreement. I think that far too many people are
           | trying to solve the problems that they wish that they had,
           | rather than making it easy to solve the ones that they do
           | have. Going to something like Kubernetes when there is no
           | particular reason for it is a good example of that trend.
           | 
           | When you really need distributed, there is no substitute. But
           | far more think that they need it than do.
        
         | laurencerowe wrote:
         | Micro-service architectures are an absurd overcomplexity for
         | smaller internal apps. They only really make sense when a
         | monolithic system becomes too large for a single team to
         | manage, at which point the micro-service boundaries reflect
         | team boundaries.
        
       | SPBS wrote:
       | If you are going to allocate sharded databases per client with
       | identical schemas, might as well give each of them an sqlite
       | database? Since you're massively cutting down the writes if it's
       | one database per customer.
        
         | brandmeyer wrote:
         | And then _just one_ query comes along where you need to make an
         | update that should be globally visible, and not just visible to
         | the shard.
         | 
         | I can see why you would stick with full Postgres for as long as
         | practicable.
        
       | bmcahren wrote:
       | Avoiding sharding and complex replication is very smart to
       | postpone as late as possible with any database (mysql, postgres,
       | mongodb). It can be very fragile or fail in unexpected or unusual
       | ways and most importantly it can take _much_ longer to fix. E.g.
       | 18 hours instead of 2 hours of downtime.
        
         | evanelias wrote:
         | Once your data grows very large, a successfully-implemented
         | sharding solution actually improves availability, rather than
         | reducing it.
         | 
         | With a huge monolithic database, a failure causes downtime for
         | your entire product/company. Replica cloning and backups are
         | slow. Major version upgrades are stressful because it's all-or-
         | nothing.
         | 
         | With a sharded environment, a single shard failure only impacts
         | a portion of your userbase, and smaller databases are faster to
         | perform operational actions on.
         | 
         | There are definitely major downsides to sharding, but they tend
         | to be more on the application side in my experience.
        
           | codyb wrote:
           | Wouldn't sharding generally result in overlap between the key
           | ranges so that a database shard going down doesn't have to
           | result in any downtime?
           | 
           | Then your issue is replication of writes I suppose. Probably
           | depends on use case what configuration you choose.
        
             | Tostino wrote:
             | Totally depends on how you shard. In my case (b2b), i'd be
             | sharding by tenant. Having a single tenant go down would
             | not have the same impact as every single tenant going down.
        
             | wejick wrote:
             | Honestly I never found the case when this happens, data
             | always falls into 1 shard according to the key. Then comes
             | the concept of shard replica where the shard can live in
             | several nodes and form a redundancy.
             | 
             | However I'm noy sure how usually it's being setup on
             | Postgres
        
             | evanelias wrote:
             | With a typical sharded relational database setup, each
             | sharding key value maps to exactly one shard.
             | 
             | There should be replicas of that shard, which can be
             | promoted in case of a master failure. But in rare cases all
             | replicas may also be degraded or inconsistent and therefore
             | non-promotable. When this happens to a giant monolithic
             | non-sharded database, the impact is far more catastrophic
             | than when it happens to a single smaller shard.
             | 
             | In any case, replication is a separate concern from
             | sharding. Each sharded database replica set (traditionally
             | 1 master + N replicas) has a data set that is unique /
             | independent of other sharded database replica sets.
             | 
             | That said, some of the "NewSQL" distributed databases may
             | arrange their data in more complex ways. [Edit to clarify:
             | I mean in respect to the _combination_ of both sharding and
             | replication. With a traditional sharded relational
             | database, you have replica sets where each replica contains
             | the exact same shard or shards; with a NewSQL distributed
             | DB, data may be replicated in a more complex arrangement]
        
               | zinclozenge wrote:
               | Most, if not all, commercial newsql distributed databases
               | do range based splitting of the data, with each range
               | managed by a raft group. Raft groups get migrated between
               | nodes to avoid hot spots, among other scheduling
               | criteria. TiDB does this for sure, I'd be surprised if
               | CockroachDB (and yugabyte and dgraph) doesn't do it.
        
           | bmcahren wrote:
           | Shards increase the number of failure modes and increase the
           | complexity of those failure modes. For _most_ businesses, the
           | recommendation holds true... keep it simple, don 't shard
           | until you _need_.
           | 
           | I find it somewhat concerning that MongoDB has a better
           | architecture for upgrades than Postgres. You add a replica to
           | the cluster running the new major version and then switch
           | that replica over as your primary once you've replaced enough
           | instances in the cluster. Having worked in Oracle and MySQL
           | for years then having switched to a company with a MongoDB
           | framework I forgot how stressful upgrades would be with such
           | archaic limitations.
        
             | evanelias wrote:
             | > Shards increase the number of failure modes and increase
             | the complexity of those failure modes.
             | 
             | I would only agree with this during the initial
             | implementation of sharding. Once deployed and stable, I
             | have not found this to be the case, at all.
             | 
             | I say this as someone who has directly architected a
             | sharded database layer that scaled to over a trillion rows,
             | and later worked on core automation and operations for
             | sharded databases that scaled to an incalculable number of
             | rows (easily over 1 quadrillion).
             | 
             | In both cases, each company's non-sharded databases were
             | FAR more operationally problematic than the sharded ones.
             | The sharded database tiers behave in common ways with
             | relatively uniform workloads, and the non-sharded databases
             | were each special snowflakes using different obscure
             | features of the database.
             | 
             | > keep it simple, don't shard until you need
             | 
             | I would have wholeheartedly agreed with this until a few
             | years ago. Cloud storage now permits many companies to run
             | monster 10+ TB monolithic relational databases. Technically
             | these companies no longer "need" to shard, possibly ever.
             | But at these data sizes, many operations become extremely
             | painful, problematic, and slow.
        
               | throwdbaaway wrote:
               | One of those companies is facebook right? I think this
               | blog post provides a much more balanced view:
               | http://yoshinorimatsunobu.blogspot.com/2017/11/towards-
               | bigge...
        
       | numlock86 wrote:
       | I like how the article starts with the metrics of what a single
       | node can already do. The trend appears to be to scale right at
       | the start, even before you have your first real customer ... the
       | implications and results are obvious.
        
       | AdrianB1 wrote:
       | I have experience with other relational DB products, but the
       | principle should be similar; I have a few databases over 1 TB,
       | but scaling to tens of TB would require more RAM than what a
       | typical 1S or 2S can support. CPU's are not that problematic with
       | the huge number of cores in AMD Epyc, but RAM is a serious
       | limitation, in my world I need between 15% and 60% RAM to
       | database size ration, depending if it is transactional, reporting
       | or somewhere in between. Taking a 50 TB database as an example,
       | it is too much for a 4TB RAM 2 socket Epyc system.
        
       | beck5 wrote:
       | When it comes to replication & automatic failover in PG what is
       | the 2021 gold standard setup?
        
         | __s wrote:
         | pg_auto_failover is good
         | https://www.citusdata.com/blog/2019/05/30/introducing-pg-aut...
         | 
         | Disclosure: I work for Citus
        
         | ksec wrote:
         | In 2016 I was expecting some sane defaults like MySQL would
         | have arrived by 2018 or 2019.
         | 
         | Looks like 2021 isn't that much different to 2016. There are
         | work being done, but doesn't seems to be anywhere close to the
         | level of MySQL.
        
           | dijit wrote:
           | wow, are you kidding? MySQL replication is possibly the worst
           | I've ever seen.
           | 
           | There is almost no consideration for the target being up to
           | date, I have personally experienced missed inserts,
           | replication lags, and the replica being set to read/write.
           | 
           | PGs integrated replication is far superior, it even has
           | support for bootstrapping a new replica node without rsync
           | and a command to `promote` the replica to write master.
        
             | claytonjy wrote:
             | What about Percona? I haven't used it, but a common
             | sentiment on HN has been that Postgres doesn't have
             | something quite on par with it.
             | 
             | Citus and, more recently, Patroni, seem to be the dominant
             | Postgres analogues; have they caught up? Where do they
             | dominate?
        
             | evanelias wrote:
             | Replication issues in modern MySQL are caused by user
             | error, not inherent bugs. For example if you're configuring
             | replication manually, you must ensure that replicas are set
             | to read_only. Clustering options introduced over the past
             | few years do this automatically though.
             | 
             | > PGs integrated replication is far superior, it even has
             | support for bootstrapping a new replica node without rsync
             | and a command to `promote` the replica to write master
             | 
             | MySQL has this functionality too, e.g.
             | https://dev.mysql.com/doc/refman/8.0/en/clone-plugin.html
             | and https://dev.mysql.com/doc/mysql-shell/8.0/en/working-
             | with-re...
             | 
             | To be clear, I am not claiming MySQL is superior to PG.
             | However, I have found that many PG users are unfamiliar
             | with the huge improvements introduced in MySQL over the
             | past ~8 years.
        
         | hintbits wrote:
         | Use built-in physical replication, it works and it's good.
         | 
         | Patroni is widely used for automatic failover, at least where
         | you don't want a possibility of a split brain.
        
       | mmacedo_en wrote:
       | I'm building an app using Postgres for the first time. Naturally
       | I was a bit worried about performance and scaling if the not
       | launched yet app becomes a major success.
       | 
       | I began simulating a heavy use scenario. 100k users creating 10
       | records daily for three years straight.
       | 
       | 100000 x 10 x 365 x 3 ~= 1 billion rows or about 200 GB with a
       | record's length of 200 bytes. This is peanuts for modern
       | databases and hardware.
       | 
       | Seems like a single node can support it for a long way before I
       | have to worry about performance...
        
         | majewsky wrote:
         | You classifying 11 writes per second as "heavy use" reminds me
         | of how people on average completely underestimate how fast
         | computers actually are (when they're not bogged down by crappy
         | programs).
        
           | mixmastamyk wrote:
           | I don't believe the grandparent's simulation actually took
           | three years, it was likely operations with a particular data
           | size that was tested.
           | 
           | Still, your main point stands. Around 2001 I wrote a
           | C-program to record every file and size on a large hard disk.
           | We were all amazed that it finished (seemingly) before the
           | enter key had come back up. Must be a bug somewhere, right?
           | Nope.
           | 
           | Much earlier I wrote a Pascal program on a 486 in school that
           | did some calculations over and over again, writing the output
           | to the screen. It blew my mind then how fast the computer
           | could do it.
        
       | sharadov wrote:
       | Over- engineering has become a disease, and has reached pandemic
       | -level proportions, primarily driven by resume-padders and
       | clueless management folk.
        
       | marcus_holmes wrote:
       | TLDR: As always, don't fix a problem until just before you have
       | it.
        
       | fabian2k wrote:
       | One point I found very interesting was the following paragraph:
       | 
       | > For example, on my (pretty average) workstation, I can do ca.
       | 25k simple read transactions per 1 CPU core on an "in memory"
       | pgbench dataset...with the default config for Postgres v13!
       | 
       | In my own very unscientific experiments I never got values as
       | high as that, but in the area of around 4k transactions per
       | second total on multiple cores. Of course I'm comparing very
       | different things, I was looking at more than just the database
       | and there are lots of other aspects I probably did in a different
       | way.
       | 
       | I find this interesting as it probably means that my bottleneck
       | wasn't entirely where I thought it was. I have to play around
       | with that some time again.
       | 
       | If I wanted to find out the baseline Postgres performance on my
       | hardware for trivial read queries like looking up individual rows
       | by primary key with a database that fits entirely in RAM, how
       | would I do that? I know there is pgbench, but that performs more
       | complex queries as far as I understand.
        
         | ahachete wrote:
         | As this is a public reference: GitLab's Postgres cluster
         | handles peaks of 300K tps, where the master node alone supports
         | around 60K-80K. And this is not in-memory (datasize in the
         | order of 8TB, RAM 600GB).
         | 
         | https://about.gitlab.com/blog/2020/09/11/gitlab-pg-upgrade/
         | 
         | And there's still room for vertical scaling.
         | 
         | Disclaimer: we provide Postgres support for GitLab.
        
           | mmacedo_en wrote:
           | HN is missing an user follow and user tagging button :-)
        
             | ahachete wrote:
             | If that's for me, I really appreciate that :)
             | 
             | There's obviously that option on Twitter (same username as
             | here), if you want to follow there ;)
        
         | __s wrote:
         | You can give pgbench files to have it run the query you want
         | (see -f flag)
         | 
         | https://www.postgresql.org/docs/current/pgbench.html
        
       | wilsonrocks wrote:
       | This was an interesting read for a database novice. It seems like
       | a lot of the quoted stats are about in memory datasets - is that
       | realistic?
        
         | dap wrote:
         | It depends a lot on what you're doing. In my last role, keeping
         | the Postgres database in memory was absolutely not an option,
         | and we ran into major issues related to physical I/O.
        
         | bmcahren wrote:
         | Yes. Memory can reach 768GB on a single instance today and I
         | imagine that to expand. From there you can scale by sharding.
         | 
         | In memory provides real-time transactions you can't guarantee
         | when using disk-based storage.
        
           | ahachete wrote:
           | EC2's u-24tb1.metal is 224/448 cores/hyperthreads and 24TB
           | RAM ;)
           | 
           | https://aws.amazon.com/sap/instance-types/
        
             | wiredfool wrote:
             | And nearly 2 million for a 3 year reserved instance.
        
               | atonse wrote:
               | This reminded me of a story from 15 years ago.
               | 
               | I once worked for a company that was writing a proposal
               | for a US Homeland Security IT system. This was 2006. I
               | wasn't involved in it but my office-mate was. He randomly
               | turned his chair around and said "hey, can you go on
               | Dell.com and see if you can build the most expensive
               | server imaginable" - so I did and I ended up at around
               | $350k. I don't remember what it was, but at the time the
               | stats just felt absolutely obscene. I do remember that,
               | at that scale we were buying a closet, not just one box.
               | 
               | And I told him the price, and he said, "They have set
               | aside $20 million for server hardware, and what you
               | configured is way way overkill for what they think they
               | need" - so we were both so damn perplexed because, at
               | least from what he told me, this didn't need that much.
               | And it wasn't one of those classified projects where they
               | couldn't tell you what it was for. It was a pretty boring
               | line of business type thing (like HR or building access
               | control or something).
               | 
               | Maybe that's probably more a story of just how much cash
               | was being poured into that agency during the years after
               | 9/11.
        
               | jarym wrote:
               | They got to the end of their financial year and had $$$
               | left to spend... happens all the time.
        
           | dekimir wrote:
           | > In memory provides real-time transactions you can't
           | guarantee when using disk-based storage.
           | 
           | This is changing as we discover better ways of coding IO
           | operations.
           | 
           | https://www.globenewswire.com/news-
           | release/2019/11/05/194114...
           | 
           | https://itnext.io/modern-storage-is-plenty-fast-it-is-the-
           | ap...
        
           | bcrosby95 wrote:
           | How long does it take to get the data off disk and into
           | memory after coming online? A decade ago filling just 64GB of
           | memory with our hot dataset was a painful process. I can't
           | imagine it's any nicer with 768GB.
        
           | fabian2k wrote:
           | You can get more than that on common servers, see e.g. the
           | recent post by Let's Encrypt on their new database server:
           | 
           | https://letsencrypt.org/2021/01/21/next-gen-database-
           | servers...
           | 
           | They have 2TB RAM in there, and I suspect that is not the
           | largest possible amount if you're willing to spend more money
           | (though probably the largest possible amount for that
           | particular server).
           | 
           | I played around a bit on the Dell website, and the largest
           | server I could find supported up to 6TB RAM, with a price at
           | ~300k EUR (I assume nobody pays list price for these).
        
             | tyingq wrote:
             | 4TB seems to be the limit for normal-ish servers. 32 slots
             | filled with 128GB DIMMS. You can find servers with more
             | slots (48 is common, 6TB), but you're going past most
             | people's definition of "commodity".
        
           | [deleted]
        
         | 0x10c0fe11ce wrote:
         | Yes. That's why the OP talked about separating hot/cold data
         | and scale up instead of scale out.
        
       | f430 wrote:
       | anybody know if this applies to Temporal Tables? I've thought of
       | using Datomic but it seems like Temporal Tables does the job, not
       | sure about immutability and how that might impact performance
       | because you are now dealing with keeping a timestamp record of
       | every transactions.
        
       | ComodoHacker wrote:
       | It's not surprising to hear such advice from PostgreSQL
       | consultancy shop: don't bother and relax until the day you
       | desperately need PostgreSQL consultancy! /s
       | 
       | Seriously though, the main point stands. PostgreSQL isn't
       | designed with "distributed" in mind, so try vertical scaling
       | tricks first.
        
         | [deleted]
        
         | mumblemumble wrote:
         | The skepticism is founded, but there's some truth.
         | 
         | PostgreSQL (and traditional RDBMSes) aren't built with
         | horizontal scaling in mind. That's both a blessing and a curse.
         | The curse is obvious. The blessing is that it means that they
         | will let you get _much_ further with vertical scaling.
         | Especially if you take the time to learn how to tune them.
         | 
         | Support for horizontal scaling comes at a cost. When I've done
         | performance analyses, I regularly find that parallel and
         | distributed implementations spend most their wall clock time on
         | synchronization and communication. Occasionally more than 90%.
         | That creates a big up-front cost you need to overcome before
         | you see a net benefit. It also leads to a sort of horizontal
         | scaling version of the rocket equation where returns can
         | diminish rapidly.
        
       | ccleve wrote:
       | The difficulty with this advice is that it assumes that you have
       | many small transactions.
       | 
       | Yes, of course, you should try to build your application so
       | queries and transactions are very short. That solves a great many
       | problems.
       | 
       | But sometimes you can't. Sometimes you just have to do joins
       | across large tables. There just isn't any other way. Your query
       | is going to run for 5, 10, maybe 30 seconds. That's a huge burden
       | on your server, and it will slow down the other tenants. In that
       | case, the _only_ answer is to distribute the queries across other
       | boxes.
       | 
       | I agree with the advice in general -- delay moving to a
       | distributed system as long as possible -- but sometimes you just
       | can't.
        
         | btilly wrote:
         | Point #1, there is a world of difference between a reporting
         | database and a transactional database. If you need a reporting
         | database, ship logs and set up a reporting database separate
         | from your transactional one. That solves most of the problem.
         | 
         | Point #2, the fact that you've hit performance problems does
         | not mean that you need to distribute. Every real system that
         | I've seen has had order of magnitude performance improvements
         | left when it first looked like it was topping out. Add a
         | caching layer, have better indexes, more targeted queries,
         | appropriate use of window functions, etc. Give those a try
         | before biting the bullet on distributed data.
         | 
         | Point #3, here is a good rule of thumb for distributed systems.
         | Whatever scale you hit on a single machine, you can probably
         | gain 1-2 orders of magnitude of performance by switching to a
         | faster language and carefully optimizing. If you switch to
         | distributed, you'll LOSE at least an order of magnitude
         | performance due to the overhead of RPCs, but are able to scale
         | indefinitely afterwards.
         | 
         | If you're distributing for reliability, great. But if you're
         | distributing for performance and you have less than 20 machines
         | in your system, either your problem is embarrassingly parallel
         | or you likely aren't achieving a net win.
         | 
         | I've seen a lot of people prematurely distribute, run into
         | performance challenges, solve them, then pat themselves on the
         | back for being smart enough to have distributed their code.
         | While failing to recognize that they were addressing a self-
         | inflicted injury.
        
           | ajsharp wrote:
           | My thoughts exactly on point #1. Nothing in a hot path should
           | take multiple seconds.
        
         | giantrobot wrote:
         | Postgres has materialized views for the case of expensive
         | queries. Instead of doing a view query in real-time it
         | physically caches the results and has the same accessibility as
         | a regular view.
         | 
         | They don't solve all situations with expensive queries but they
         | help a lot. The fact they behave like dynamic views means you
         | can migrate a dynamic view to materialized if you run into
         | performance issues without making changes to the client
         | application. With views in general you can shuffle around
         | tables and relationships without the client application knowing
         | or caring.
        
         | mmacedo_en wrote:
         | I've seen queries running for 1 minute 2 minutes raising user
         | complaints. Then we looked at it, and with a few changes in
         | indexes and query hints brought it down to sub-second
         | execution.
         | 
         | Before thinking about distributed systems, there is an entire
         | database optimization toolkit to make use of: primary key
         | review, secondary index creation, profiling, view or stored
         | procedure creation, temporary tables, memory tables and so on.
        
           | ozkatz wrote:
           | This. Using any database requires building up an expertise
           | and understanding how to use its capabilities properly. If
           | you hit a wall with a non-distributed database and your
           | solution is to replace it with a distributed one - you will
           | have a bad time. The surface area of what you need to know to
           | use it properly still includes your basic data modeling,
           | indices, troubleshooting that was required before, with a
           | whole lot of networking, consensus protocols and consistency
           | models to worry about (just to name a few).
        
           | CharlesW wrote:
           | > _Then we looked at it, and with a few changes in indexes
           | and query hints brought it down to sub-second execution._
           | 
           | This is exactly what the parent comment said: "you should try
           | to build your application so queries and transactions are
           | very short".
           | 
           | If you're claiming that the parent is incorrect about
           | "sometimes, the only answer is to distribute the queries
           | across other boxes", my guess is that probably don't work at
           | a scale where you've learned that query optimization can't
           | solve every database performance problem.
        
             | btilly wrote:
             | Based on my past experiences, I'd be happy to take an even
             | money bet that more than 95% of organizations that go
             | distributed for performance reasons actually caused
             | themselves more problems than they solved.
             | 
             | This does NOT mean that there are no use cases for
             | distributed - I've seen Google's internals and it would be
             | impossible to do that any other way. But it does mean that
             | when someone is telling you with a straight face that they
             | needed horizontal scalability for performance, you should
             | assume that they were probably wrong. (Though probably
             | saying that is not the wisest thing - particularly if the
             | person you're talking to is the architect whose beautiful
             | diagrams you'd be criticizing.)
             | 
             | So yes, there are problems that require a distributed
             | architecture for performance problems. That doesn't
             | contradict the point that every other option should be
             | explored first.
        
               | CharlesW wrote:
               | > _Based on my past experiences, I 'd be happy to take an
               | even money bet that more than 95% of organizations that
               | go distributed for performance reasons actually caused
               | themselves more problems than they solved._
               | 
               | Oh, I'm absolutely positive you're right! And as you
               | know, even in distributed architectures there's a huge
               | range of solutions from "let's do these _n_ reasonably
               | simple things " to "let's just rebuild our system to
               | solve every conceivable future problem". I don't know of
               | a scenario where the latter has ever worked.
        
           | jeffbee wrote:
           | Indexes are not free, they take up space and they make
           | mutations more costly. Also, building the index may not even
           | be possible while your application is running, because
           | postgresql and other RDBMS have inadequate facilities for
           | throttling index construction such that it doesn't harm the
           | online workload. You might have to build indexes at midnight
           | on Sundays, or even take your whole system offline. It can be
           | a nightmare.
           | 
           | This isn't just a problem for SQL databases. Terrible-but-
           | popular NoSQL systems like MongoDB also rely heavily on
           | indexes while providing zero or few safety features that will
           | prevent the index build from wrecking the online workload.
           | 
           | I personally prefer databases that simply do not have
           | indexes, like bigtable, because they require more forethought
           | from data and application architects, leading to
           | fundamentally better systems.
        
             | btilly wrote:
             | I've seen https://www.postgresql.org/docs/12/sql-
             | createindex.html#SQL-... work well in practice on busy
             | transactional databases. I'd be interested in knowing about
             | cases where it doesn't work well.
             | 
             | My experience on systems without indexes differs strongly
             | from yours. Yes, they can work well. But if you have
             | multiple use cases for how your data is being queried, they
             | push you into keeping multiple copies of your data. And
             | then it is very, very easy to lose consistency. And yes, I
             | know about "eventual consistency" and all that - I've found
             | that in practice it is a nightmare of special cases that
             | winds up nowhere good.
        
               | jeffbee wrote:
               | Just from personal experience, if they can build gmail on
               | top of a database (bigtable) having neither indexes nor
               | consistency, then probably it will also be suitable for
               | the purposes of my far smaller, much less demanding
               | products.
               | 
               | On the other hand I've seen, and am currently suffering
               | through, products that have desperate performance
               | problems with trivial amounts (tens of GB) of data in
               | relational databases with indexes aplenty.
        
         | mcherm wrote:
         | > Sometimes you just have to do joins across large tables.
         | There just isn't any other way. Your query is going to run for
         | 5, 10, maybe 30 seconds.
         | 
         | While that is true, I would speculate (based on my own
         | experience, not any actual research) that it is far more common
         | that users have an unoptimized query or database schema
         | (perhaps it just needs an index to be created) which is taking
         | tens of seconds to run but doesn't NEED to, than it is that
         | users have a need for a complex query that cannot run faster
         | than that.
         | 
         | So for MOST users, the best advice is to learn how to use
         | analyze query and other database tools to optimize execution.
         | Only if you ALREADY know that should you be considering moving
         | to a read replica or some sort of distributed database.
        
         | cliftonk wrote:
         | You could always start with postgresql and use the transaction
         | log to materialize views for particular use cases (ie
         | differential data flow). This post is clickbaity in that it
         | doesn't give any administration advice but right in the broad
         | sense that postgres/mysql are still the best low-latency single
         | sources of truth available
        
       | petr25102018 wrote:
       | For anyone interested in general advice regarding scaling
       | traditional SQL databases, check out my article [0] where I
       | collected various techniques when I investigated this topic in my
       | last workplace.
       | 
       | [0] https://stribny.name/blog/2020/07/scaling-relational-sql-
       | dat...
        
       ___________________________________________________________________
       (page generated 2021-01-26 23:00 UTC)