[HN Gopher] Squeeze the hell out of the system you have
       ___________________________________________________________________
        
       Squeeze the hell out of the system you have
        
       Author : sbmsr
       Score  : 269 points
       Date   : 2023-08-11 18:18 UTC (4 hours ago)
        
 (HTM) web link (blog.danslimmon.com)
 (TXT) w3m dump (blog.danslimmon.com)
        
       | javajosh wrote:
       | I'll probably get down-voted for saying this (again), but a key
       | way to squeeze unimaginable amounts of performance is to _lean
       | into stored procedures_.
       | 
       | Look, I get it, the devx sucks. And it feels proprietary, icky,
       | COBOL-like experience. It means you have to _dwell_ in the
       | database. What are you, a db admin?!
       | 
       | But I'm telling you, the payoff is worth it. (and also, if you
       | ship it you own it so yes you're a db admin). My company ran for
       | many years on 3 machines, despite it's extremely heavy page
       | weight because the original author wrote it stored procs from the
       | beginning. (He also liberally threw away data, which was great,
       | but that's another post.) Part of my job was to migrate away from
       | .NET and to Java and JavaScript - and another engineer wrote an
       | ingenious tool that would generate Java bindings to SQL Server
       | stored procs that made it really nice to work with them. And the
       | performance really was outrageous - 100x better than any system
       | I've worked with before or since. Those 3 boxes handled 300k very
       | data intensive monthly actives, and that was like 10 years ago.
       | 
       | Don't worry - even if you lean into SPs there is still plenty of
       | engineering to do! It's just that your data layer will simplify,
       | and your troubleshooting actually gets easier, not harder. I
       | liked the custom bindings - a bit like ActiveRecord, and no ORM.
       | But really, truly: if you want to squeeze, move some queries into
       | SPs and prepare to be amazed.
        
         | giantrobot wrote:
         | I can't disagree with the results, SPs can change your life.
         | HOWEVER, they require significant discipline and regular
         | audits. All the code for them needs to be in source control
         | with a Process for deployment to the DB. You also need a test
         | suite as part of the Process which runs against a staging
         | server with a comparable configuration to prod. The SPs need to
         | be regularly dumped and compared against what's in source
         | control and marked as what's supposed to be released in prod.
        
           | [deleted]
        
       | romafirst3 wrote:
       | TLDR. We were going to completely rewrite our architecture but
       | instead we optimized a few Postgres queries. LMAO
        
       | endisneigh wrote:
       | The bit on the database performance issues leads me to my
       | hottest, flamiest take for new projects:
       | 
       | - Design your application's hot path to never use joins. Storage
       | is cheap, denormalize everything and update it all in a
       | transaction. It's truly amazing how much faster everything is
       | when you eliminate joins. For your ad-hoc queries you can
       | replicate to another database for analytical purposes.
       | 
       | On this note, I have mixed feelings about Amazon's DynamoDB, but
       | one things about it is to use it properly you need to plan your
       | use first, and schema second. I think there's something you can
       | take from this even with a RDBMS.
       | 
       | In fact, I'd go as far to say as joins are unnecessary for
       | nonanalytical purposes these days. Storage is so mind booglingly
       | cheap and the major DBs have ACID properties. Just denormalize,
       | forreal.
       | 
       | - Use something more akin to UUIDs to prevent hot partitions.
       | They're not a silver bullet and have their own downsides, but
       | you'll already be used to the consistently "OK" performance that
       | can be horizontally scaled rather than the great performance of
       | say integers that will fall apart eventually.
       | 
       | /hottakes
       | 
       | my sun level take would be also to just index all columns. but
       | that'll have to wait for another day.
        
         | [deleted]
        
         | feoren wrote:
         | There are "tall" applications and "wide" applications. Almost
         | all advice you ever read about database design and optimization
         | is for "tall" applications. Basically, it means that your
         | application is only doing one single thing, and everything else
         | is in service of that. Most of the big tech companies you can
         | think of are tall. They have only a handful of really critical,
         | driving concepts in their data model.
         | 
         | Facebook really only has people, posts, and ads.
         | 
         | Netflix really only has accounts and shows.
         | 
         | Amazon (the product) really only has sellers, buyers, and
         | products, with maybe a couple more behind the scene for
         | logistics.
         | 
         | The reason for this is because tall applications are _easy_.
         | Much, much easier than wide applications, which are often
         | called  "enterprise". Enterprise software is bad because it's
         | _hard_. This is where the most unexplored territory is. This is
         | where untold riches lie. The existing players in this space are
         | abysmally bad at it (Oracle, etc.). You will be too, if you
         | enter it with a tall mindset.
         | 
         | Advice like "never user joins" and "design around a single
         | table" makes a lot of sense for tall applications. It's awful,
         | terrible, very bad, no-good advice for wide applications. You
         | see this occasionally when these very tall companies attempt to
         | do literally anything other than their core competency: they
         | fail miserably, because they're staffed with people who hold
         | sacrosanct this kind of advice that _does not translate_ to the
         | vast space of  "wide" applications. Just realize that: your
         | advice is for companies doing easy things who are already
         | successful and have run out of low-hanging fruit. Even tall
         | applications that aren't yet victims of their own success do
         | not need to think about butchering their data model in service
         | of performance. Only those who are already vastly successful
         | and are trying to squeeze out the last juices of performance.
         | But those are the people who _least need advice_. This kind of
         | tall-centered advice, justified with  "FAANG is doing it so you
         | should too" and "but what about when you have a billion users?"
         | is poisoning the minds of people who set off to do something
         | more interesting than serve ads to billions of people.
        
           | xyzzy123 wrote:
           | Thanks I think this is a really interesting way to look at
           | things.
           | 
           | What is the market for "wide" applications though? It seems
           | like any particular business can only really support one or
           | two of them, for some that will be SAP and for others it
           | might be Salesforce (if they don't need much ERP), or (as you
           | mentioned) some giant semi homebrewed Oracle thing.
           | 
           | Usually there is a legacy system which is failing but still
           | runs the business, and a "next gen" system which is not ready
           | yet (and might never be, because it only supports a small
           | number of use cases from the old software and even with an
           | army of BAs it's difficult to spec out all the things the old
           | software is actually doing with any accuracy).
           | 
           | Or am I not quite getting the idea?
        
           | endisneigh wrote:
           | I agree with your sentiment, but even enterprises work on
           | multiple "tall" features.
           | 
           | If they didn't then I'd change my advice to be simply multi
           | tenant per customer and replicate into a column store for
           | cross customer analytics.
        
           | pipe_connector wrote:
           | I agree with the characterization of applications you've laid
           | out and think everyone should consider whether they're
           | working on a "tall" (most users use a narrow band of
           | functionality) or a "wide" (most users use a mostly non-
           | overlapping band of functionality) application.
           | 
           | I also agree with your take that tall applications are
           | generally easier to build engineering-wise.
           | 
           | Where I disagree is that I think in general wide applications
           | are failures in product design, even if profitable for a
           | period of time. I've worked on a ton of wide applications,
           | and each of them eventually became loathed by users and
           | really hard to design features for. I think my advice would
           | be to strive to build a tall application for as long as you
           | can muster, because it means you understand your customers'
           | problems better than anyone else.
        
             | feoren wrote:
             | > I've worked on a ton of wide applications, and each of
             | them eventually became loathed by users and really hard to
             | design features for.
             | 
             | Yes, I agree that this is the fate of most. But I refuse to
             | believe it's inevitable; rather, I think it comes from
             | systemic flaws in our design thinking. Most of what we
             | learn in a college database course, most of what we read
             | online, most all ideas in this space, transfer poorly to
             | "wide" design. People don't realize this because those
             | approaches do work well for tall applications, and because
             | they're regarded religiously. This is why I call them so
             | much harder.
        
         | veave wrote:
         | >Design your application's hot path to never use joins. Storage
         | is cheap, denormalize everything and update it all in a
         | transaction. It's truly amazing how much faster everything is
         | when you eliminate joins.
         | 
         | Anybody has documentation about this with examples?
        
           | newlisp wrote:
           | Duplicate data to avoid joins, use serializable transactions
           | to update all the duplicated data.
        
           | joshstrange wrote:
           | See "Single Table Design" which I talked about in this
           | comment above: https://news.ycombinator.com/item?id=37093357
        
             | deely3 wrote:
             | And if you don't want to spend money, you can get basic
             | idea from this article:
             | https://www.alexdebrie.com/posts/dynamodb-single-table/
        
         | tibbetts wrote:
         | Premature denomalization is expensive complexity.
         | Denormalization is a great tool, maybe an under-used tool. But
         | you should wait until there are hot paths before using it.
        
           | endisneigh wrote:
           | I agree. To be clear I'm not suggesting anyone start
           | denormalizing everything. I'm saying if you're fortunate
           | enough to be on a green project, you should design the schema
           | around the access patterns which will surely be
           | "denomarlized." as opposed to designing a normalized schema
           | and designing your access patterns around those.
        
         | latchkey wrote:
         | > _Design your application 's hot path to never use joins._
         | 
         | Grab (uber of asia) did this religiously and it created a ton
         | of friction within the company due to the way the teams were
         | laid out. It always required one team to add some sort of API
         | that another team could take advantage of. Since the first team
         | was so busy always implementing their own features, it created
         | roadblocks with other teams and everyone started pointing
         | fingers at each other to the point that nothing ever got done
         | on time.
         | 
         | Law of unintended consequences
        
           | tedunangst wrote:
           | Hard to follow the link. How would you join two tables
           | between teams that don't communicate?
        
             | latchkey wrote:
             | You don't, that's the problem.
        
           | endisneigh wrote:
           | yes, this is a fair point. there's no free lunch after all.
           | without knowing more about what happened with Grab I'd say
           | you could mitigate some of that with good management and
           | access patterns, though.
        
             | latchkey wrote:
             | All in all though, I don't think that 'never use joins' is
             | a good solution either since it does create more developer
             | work almost every way you slice it.
             | 
             | I think the op's solution of looking more closely at the
             | hot paths and solving for those is a far better solution
             | than re-architecting the application in ways that could, or
             | can, create unintended consequences. People don't consider
             | that enough, at all.
             | 
             | Don't forget that hot path resolution is the antithesis of
             | 'premature optimization'.
             | 
             | > you could mitigate some of that with good management and
             | access patterns
             | 
             | the CTO fired me for making those sorts of suggestions
             | about better management, and then got fired himself a
             | couple months later... -\\_(tsu)_/-... even with the macro
             | events, their stock is down 72% since it opened, which
             | doesn't surprise me in the least bit having been on the
             | inside...
        
         | taylodl wrote:
         | My hot take: always use a materialized view or a stored
         | procedure. _Hide the actual, physical tables from the
         | Application 's account!_
         | 
         | The application doesn't need to know how the data is physically
         | stored in the database. They specify the logical view they need
         | of the data. The DBAs create the materialized view/stored
         | procedure that's needed to implement that logical view.
         | 
         | Since the application is _never_ directly accessing the
         | underlying physical data, it can be changed to make the
         | retrieval more efficient without affecting any of the database
         | 's users. You're also getting the experts to create the
         | required data access for you in the fastest, most efficient way
         | possible.
         | 
         | We've been doing this for years now and it works great. It's
         | alleviated so many headaches we used to have.
        
           | walterbell wrote:
           | Interface contracts and indirection FTW.
           | 
           | 2011, "Materialized Views" by Rada Chirkova and Jun Yang,
           | https://dsf.berkeley.edu/cs286/papers/mv-fntdb2012.pdf
           | 
           |  _> We cover three fundamental problems: (1) maintaining
           | materialized views efficiently when the base tables change,
           | (2) using materialized views effectively to improve
           | performance and availability, and (3) selecting which views
           | to materialize. We also point out their connections to a few
           | other areas in database research, illustrate the benefit of
           | cross-pollination of ideas with these areas, and identify
           | several directions for research on materialized views._
        
           | downWidOutaFite wrote:
           | This doesn't work because DBAs are rarely on the dev team's
           | sprint schedule. If the DBAs are blocking them devs can and
           | will figure out how to route around the gatekeepers. In
           | general, keep the logic in the app not the db.
        
           | alfor wrote:
           | But for the saves the structure is visible?
        
             | taylodl wrote:
             | You can update underlying data via a materialized view.
        
         | Scarbutt wrote:
         | Normalization is not only about data storage but most
         | importantly, data integrity.
        
           | endisneigh wrote:
           | Yes, but I assert that it's possible to use transactions to
           | update everything consistently. Serializable transactions
           | weren't really common when MySQL/Postgres _first_ came out,
           | but now that they 're common in new DBs + ACID, I think it's
           | not possible to do with reasonable difficulty. If you agree
           | with this, than its easy to prove that denormalized tables
           | performance increase is well worth the annoyance of updating
           | everything to transactionally update the dependencies.
           | 
           | I won't say that it's trivial to update all of your business
           | logic to do this, but I think it's definitely worth it for a
           | new project at least.
        
             | Guvante wrote:
             | You always need to compare write vs read performance.
             | 
             | Turning a single table update into a 10 table one could tip
             | your lock contention to the point where you are write bound
             | or worse start hitting retries.
             | 
             | Certainly it makes sense to move rarely updated fields to
             | where they are used makes sense.
             | 
             | Similarly "build your table against your queries not your
             | ideal data model" is always sage advice.
        
             | Bognar wrote:
             | Denormalized transactions are not trivial unless you are
             | using serializable isolation level which will kill
             | performance. If you don't use serializable isolation level,
             | then you risk either running into deadlocks (which will
             | kill performance) or inconsistency.
             | 
             | Decent SQL databases offer materialized views, which
             | probably give you what you want without all the headache of
             | maintaining denormalized tables yourself.
        
               | endisneigh wrote:
               | all fair points, but to be fair I don't necessary think
               | this makes the most sense for an existing project for the
               | reasons you state. I do think for a new project would
               | best be able to design around the access patterns in a
               | way that eliminate most of the downsides.
        
             | williamdclt wrote:
             | Transactions are not only (actually mainly not) about
             | atomicity. Of course it's possible to keep data integrity
             | without normalisation, but that means you need to maintain
             | the invariants yourself at application level and a big
             | could result in data inconsistency. Normalisation isn't
             | there to make integrity possible, it's there to make (some)
             | non-integrity impossible.
             | 
             | Nobody says you have to have only one view of your data
             | though. You can have a normalised view of your data to
             | write, and another denormalised for fast reads (you usually
             | have to, at scale). Something like event sourcing is
             | another way (which is actually pushing invariants to
             | application level, in a structured way)
        
         | wizofaus wrote:
         | Can't say I've ever come across a scenario where a join itself
         | was the performance bottleneck. If there's any single principle
         | I have observed is "don't let a table get too big". More often
         | than not it's historical-record type tables that are the issue
         | - but the amount of data you need for day-to-day operations is
         | usually a tiny fraction of what's actually in the table, and
         | you're bound to start finding operations on massive tables get
         | slow no matter what indexes you have (and even the act of
         | adding more indexes becomes problematic. And just indexing all
         | columns isn't enough for traditional RMDBSes at least - you
         | have to index the right combinations of columns for them to be
         | used. Might be different for DynamoDb).
        
           | 8note wrote:
           | Dynamo is quick for that, so long as you are picking good
           | partition keys.
           | 
           | Instead, it'll throw you hot key throttling if you start
           | querying one partition too much
        
         | wtetzner wrote:
         | I'd say that probably depends on what your hot path is. If it's
         | write-heavy, then you'll probably end up with performance
         | issues when you need to write the same data to multiple tables
         | in a single transaction. And if all of those columns are
         | indexed, it'll be even worse.
        
         | iamwil wrote:
         | If you don't use joins, how do you associate records from two
         | different tables when displaying the UI? Do you just join in
         | the application? Or something else?
        
           | endisneigh wrote:
           | this has opinionated answers.
           | 
           | if you ask Amazon, they might suggest that you design around
           | a single table
           | (https://aws.amazon.com/blogs/compute/creating-a-single-
           | table...).
           | 
           | in my opinion it's easier to use join tables. which are what
           | are sometimes temporarily created when you do a join anyways.
           | in this case, you permanently create table1, table2, and
           | table1_join_table2, and keep all three in sync
           | transactionally. when you need a join you just select on
           | table1_join_table2. you might think this is a waste of space,
           | but I'd argue storage is too cheap for you to be thinking
           | about that.
           | 
           | that being said, you really have to design around your access
           | patterns, don't design your application around your schema.
           | most people do the latter because it seems more natural. what
           | this might mean in practice is that you do mockups of all of
           | the expected pages and what data is necessary on each one.
           | _then_ you design a schema that results in you never having
           | to do joins on the majority, if not all, of them.
        
             | sainez wrote:
             | > what this might mean in practice is that you do mockups
             | of all of the expected pages and what data is necessary on
             | each one. then you design a schema that results in you
             | never having to do joins on the majority, if not all, of
             | them.
             | 
             | Great suggestion! I had a role where I helped a small team
             | develop a full stack, data-heavy application. I felt pretty
             | good about the individual layers but I felt we could have
             | done a better job at achieving cohesion in the big picture.
             | Do you have any resources where people think about these
             | sorts of things deeply?
        
               | walterbell wrote:
               | 2001, "Denormalization effects on performance of RDBMS",
               | by G. L. Sanders and Seungkyoon Shin,
               | https://www.semanticscholar.org/paper/Denormalization-
               | effect...
               | 
               |  _> We have suggested using denormalization as an
               | intermediate step between logical and physical modeling,
               | to be used as an analytic procedure for the design of the
               | applications requirements criteria ... The guidelines and
               | methodology presented are sufficiently general, and they
               | can be applicable to most databases ... denormalization
               | can enhance query performance when it is deployed with a
               | complete understanding of application requirements._
               | 
               | PDF: https://web.archive.org/web/20171201030308/https://p
               | dfs.sema...
        
               | endisneigh wrote:
               | yeah, exactly. in my experience the vast majority of
               | access patterns are designed around a normalized schema,
               | where it really should be that the schema is designed
               | around the access patterns and generously "denormalize"
               | (which doesn't make sense in this context of a new
               | database) as necessary.
        
           | joshstrange wrote:
           | Single Table Design is the way forward here. I can highly
           | recommend The DynamoDB Book [0] and anything (talks, blogs,
           | etc) that Rick Houlihan has put out. In previous discussions
           | the author shared a coupon code ("HACKERNEWS") that will take
           | $20-$50 off the cost depending on the package you buy. It
           | worked earlier this year for me when I bought the book. It
           | was very helpful and I referred back to it a number of times.
           | This github repo [1] is also a wealth of information
           | (maintained by the same guy who wrote the book).
           | 
           | As an added data point I don't really like programming books
           | but bought this since the data out there on Single Table
           | Design was sparse or not well organized, it was worth every
           | penny for me.
           | 
           | [0] https://www.dynamodbbook.com/
           | 
           | [1] https://github.com/alexdebrie/awesome-dynamodb
        
             | deely3 wrote:
             | And if you don't want to spend money, you can get idea from
             | this article:
             | 
             | https://www.alexdebrie.com/posts/dynamodb-single-table/
             | 
             | Im really curious about real life performance on different
             | databases, especially in situation where RAM is smaller
             | than database size.
        
               | wizofaus wrote:
               | That article didn't appear to be suggesting single-table
               | design was appropriate for general purpose RMDBSes (or
               | any database other than DynamoDb).
        
         | i_like_apis wrote:
         | Yes I like the zero joins on hot paths approach. It can be hard
         | to sell people on it. It's a great decision for scaling though.
        
         | skybrian wrote:
         | I'm wondering if indexes and materialized views can be used to
         | do basically the same thing? That is, assuming they contain all
         | the columns you want.
        
           | latchkey wrote:
           | The issue is writes, not reads.
        
           | giantrobot wrote:
           | There's always money in the banana sta...materialized views.
           | Materialized views will get you quite a ways on read heavy
           | workloads.
        
         | macNchz wrote:
         | Over the years I think I've encountered more pain from
         | applications where the devs leaned on denormalization than from
         | those that developed issues with join performance on large
         | tables.
         | 
         | You can mash those big joins into a materialized view or ETL
         | them into a column store or whatever you need to fix
         | performance later on, but once someone has copied the
         | `subtotal_cents` column onto the Order, Invoice, Payment,
         | NotificationEmail, and UserProfileRecentOrders models, and
         | they're referenced and/or updated in 296 different
         | places...it's a long road back to sanity.
        
       | klodolph wrote:
       | I have personally witnessed the "let's build microservices to get
       | better performance" argument. I definitely want to nip that in
       | the bud.
       | 
       | It's easy to fall in love with complexity, especially since you
       | see a lot of complexity in existing systems. But those systems
       | became complex as they evolved to meet user needs, or for other
       | reasons, over time. Complex systems are impressive, but you need
       | to make sure that your team has people who recognize the heavy
       | costs of complexity, and who can throw their engineering efforts
       | directly against the most important problems your team faces.
        
       | jsight wrote:
       | I blame the easy availability of additional resources in the
       | cloud for a lot of problems here. Prod db slow? Get a bigger EC2
       | instance. Still slow? Hmm, maybe bigger again! Why bother tuning.
       | 
       | Now... Who knows why our AWS bill is so high?
       | 
       | With real hardware in a DC, you'd have to justify large capital
       | expenditures to do something that stupid.
        
       | gary_0 wrote:
       | No mention of caching? If your database is getting hammered with
       | SELECTs, isn't putting a cache in front of it something that
       | should at least be considered?
        
         | deathanatos wrote:
         | I've been in the OP's situation, and this exact suggestion was
         | made in my case. Welcome to one of the hardest problems in CS:
         | cache invalidation.
         | 
         | If you have a dataset for which cache invalidation is easy
         | (e.g., data that is written and never updated), yeah,
         | absolutely go for this.
         | 
         | In our case, and most cases I've seen, it wasn't so simple, and
         | "split this off to a DB better suited to it" was less complex
         | (maybe still a lot of work, but conceptually _simple_ ) than
         | figuring out cache invalidation.
        
           | lern_too_spel wrote:
           | There are systems that will do that for you like
           | https://readyset.io/.
        
         | Scarbutt wrote:
         | They mentioned adding a DB replica for reads.
        
       | jakey_bakey wrote:
       | [The Grug Brained Developer](https://grugbrain.dev/)
        
       | sssspppp wrote:
       | Love this post. I've been trying to tell my manager the same
       | message for the last few months (with little success). We're
       | about to embark on a massive migration to "next-gen
       | infrastructure" (read: three different Redshift clusters managed
       | by CDK) because our overloaded Redshift cluster (already maxed
       | out with RA3 nodes) has melted down one too many times. The next-
       | gen infra is significantly more complex than our existing setup
       | and I'm not convinced this migration will be the silver bullet
       | everyone is hoping for.
        
       | iamwil wrote:
       | Ugh. I had a colleague that addressed any scaling problem by
       | putting a cache in front of the DB. Praised for solving the
       | immediate problem, but shouldered none of the costs. </rant>
       | 
       | I admit in the face of finding Prod/market fit, you do the
       | expedient thing, but damned if I'm not often at the receiving end
       | of these sorts of decisions.
        
         | aidos wrote:
         | Interestingly, I often ask candidates about optimising a slow
         | running db query and the majority of people jump to adding
         | caching and very few ask if they can run an explain or see the
         | indexes.
        
           | tedunangst wrote:
           | "I would make the slow query faster" seems too obvious an
           | answer for an interview question.
        
       | andrewstuart wrote:
       | Isn't Rails wasteful in its database access patterns?
        
         | iamwil wrote:
         | Generally No. But it can be easy to write bad queries using
         | ActiveRecord ORM if you're not aware of N + 1 problems.
        
           | romafirst3 wrote:
           | 100%, it makes it easy for bad programmers to write bad
           | performing queries, but you can easily write performant code.
           | Btw that's a feature - letting people ramp up to full db
           | knowledge is beneficial, you don't want to be spending your
           | time writing performant queries before you need to.
        
             | topspin wrote:
             | > it makes it easy for bad programmers to write bad
             | performing queries
             | 
             | That is true of every ORM in existence. The easiest thing
             | to do is naively follow the object graph in code, because
             | that's what the ORM gives you. If the ORM was to somehow
             | add friction here to encourage some other approach it would
             | be panned as "too hard!!1" and fade away into obscurity.
        
       | Joel_Mckay wrote:
       | The Monolith is often a marker of several naive assumptions.
       | 
       | Yet some interesting patterns will emerge if teams accept some
       | basic constraints:
       | 
       | 1. A low-cpu-power client-process is identical to a resource
       | taxed server-process
       | 
       | 2. A systems client-server pattern will inevitably become
       | functionally equivalent to inter-server traffic. Thus, the
       | assumption all high performance systems degenerate into a hosted
       | peer-to-peer model will counterintuitively generalize.
       | Accordingly, if you accept this fact early, than one may avoid
       | re-writing a code-base 3 times, and trying to reconcile a bodged
       | API.
       | 
       | 3. Forwarding meaningful information does not mean collecting
       | verbose telemetry, then trying to use data-science to fix your
       | business model later. Assume you will eventually either have
       | high-latency queuing, or start pooling users into siloed
       | contexts. In either case, the faulty idea of a single database
       | shared-state will need seriously reconsidered at around 40k
       | users, and later abandoned after around 13m users.
       | 
       | 4. sharding only buys time at the cost of reliability. You may
       | disagree, but one will need to restart a partitioned-cluster
       | under heavy-load to understand why.
       | 
       | 5. All complex systems fail in improbable ways. Eventually
       | consistent is usually better than sometimes broken. Thus,
       | solutions like Erlang/Elixir have been around for awhile...
       | perhaps the OTP offers a unique set of tradeoffs.
       | 
       | 6. Everyone thinks these constraints don't apply at first. Thus,
       | will repeat the same tantalizing... yet terrible design
       | choices... others have repeated for 40+ years.
       | 
       | Good luck, =) J
        
       | 39 wrote:
       | Strangely obvious advice?
        
         | iamwil wrote:
         | That no one likes to follow.
        
         | sfink wrote:
         | Well, the advice is rarely taken in practice. It is (in my
         | experience, and it seems common from others based on what I've
         | heard) very very common to jump to the complicated solution at
         | the first hint of capacity issues "because we'll need to do it
         | eventually anyway."
         | 
         | The advice is obvious when you're thinking at that level of
         | abstraction. Which suggests that, in practice, people who are
         | architecting such systems rarely think at that level of
         | abstraction. Which is why it is nice to have posts like this,
         | that periodically remind us to get our heads out of the daily
         | minutiae and consider the bigger picture (of complexity
         | tradeoffs, realistic projections, staffing and availability,
         | etc.)
        
         | bryanlarsen wrote:
         | "Common sense is not so common."
         | 
         | - Voltaire
        
         | joelshep wrote:
         | It might be obvious as far as it goes, but it's also incomplete
         | in at least two ways. One is that as tweaks and optimizations
         | and "supplementing the system in some way" often involves
         | increasing its complexity, even if just a little bit at a time.
         | It adds up with time. The more important thing is this: if
         | you're already constrained on vertical scaling, and you don't
         | have a firm grip on how fast your system is scaling, then you
         | can't just stop with making the db more efficient. That's just
         | postponing the inevitable, and possibly not for more than a
         | couple of years. If you're in the position the author portrays,
         | get the database under control first -- for sure -- but then
         | get started on figuring out how you're going to stay in front
         | of your scaling problem, whether that's rearchitecture, off-
         | loading work to systems better suited for it, or whatever.
         | Speaking as a former owner of a very large Amazon database that
         | fought this battle many times, trying to buy enough dev time to
         | build away from it before it completely collapsed. We were too
         | content with performance improvements just like the ones
         | described in this article, before finally recognizing we were
         | just racing the clock.
        
       | huijzer wrote:
       | > We should always put off significant complexity increases as
       | long as possible.
       | 
       | Reminds me of the mantra that I've read here to easily go for
       | reversible things and very careful when going for irreversible
       | things.
        
         | sssspppp wrote:
         | Amazon's one way vs two way door decisions echo the sentiment
        
         | sainez wrote:
         | It is mentioned in this article about the inception of AWS's
         | custom silicon: https://semiconductor.substack.com/p/on-the-
         | origins-of-aws-c...
         | 
         | > "We use the terms one-way door and two-way door to describe
         | the risk of a decision at Amazon. Two-way door decisions have a
         | low cost of failure and are easy to undo, while one-way door
         | decisions have a high cost of failure and are hard to undo. We
         | make two-way door decisions quickly, knowing that speed of
         | execution is key, but we make one-way door decisions slowly and
         | far more deliberately."
        
       | TX81Z wrote:
       | Really curious how much can be attributed to using an ORM.
        
       | kunalgupta wrote:
       | I would definitely do the opposite of this - 3 months is a while
       | and i think the cost of complexity would take a long time before
       | it comparec
        
       | phirschybar wrote:
       | I agree with this approach. the other added benefit is that when
       | they decided to optimize the app by eliminating or tuning queries
       | and utilizing replicas for reads, they ultimately made the app
       | much more performant while possibly reducing complexity. the
       | "squeeze" mindset pays off in the long-run here. the continued
       | optimization over time is infinitely better than adding the
       | complexity of microservices or expanded infrastructure because
       | the latter will simply bury and compound the potential
       | optimizations which could AND SHOULD have been made. squeeze
       | squeeze squeeze until you just can't squeeze any more!
        
       | nathias wrote:
       | Complexity in software is bad, things can be bad and necessary.
       | It's bad in itself, but sometimes it can provide new
       | functionality...
        
       | alfalfasprout wrote:
       | The problem is this is also a myopic way of looking at things.
       | What you should be looking at is also operational complexity.
       | What's the current burden on your org/team maintaining what you
       | currently have? What about when you need to scale even higher?
       | 
       | A lot of teams that think this way end up with really high oncall
       | burdens and then never have the time to even iterate on their
       | infrastructure.
        
       | iblaine wrote:
       | TL;DR; do the easy things fist, in this case it was to fix bad
       | SQL
       | 
       | Given the options to optimize SQL, move read operations to
       | replicas, shard data or go towards micro services, optimizing SQL
       | is the easy choice.
        
         | bayindirh wrote:
         | Actually, I disagree. The "TL;DR:" in the article is "first
         | outgrow, then upgrade". In today's software development
         | practice, efficiency is second class citizen, because moving
         | fast and breaking things is the way to keep the momentum and be
         | hip.
         | 
         | However, sometimes everyone needs to chill and sharpen the tool
         | they have at hand. It might prove much more capable than first
         | anticipated. Or you may be holding the tool wrong to a degree.
        
       | maxboone wrote:
       | Relevant blog on improving PostgreSQL performance on ZFS:
       | https://news.ycombinator.com/item?id=29647645
        
       | notnmeyer wrote:
       | haha, when i read their initial thoughts were write-sharding and
       | microservices i whispered "wtf?" to myself.
       | 
       | glad to see there was a better ending to the story though.
        
       | discussDev wrote:
       | It's the boring solution. It should also only be the default
       | answer if you are not building a super critical system to life
       | and limb. But it certainly gives a much lower total cost of
       | ownership. If you don't have the resources for some big redundant
       | system, I've too often seen the complexity added by the redundant
       | system be the issue then focusing on simplicity. If you need to
       | add a bunch of people to support complexity but both the money
       | and the risk assessment doesn't call for it, simpler is much
       | better. I won't say I haven't seen the issue where eventually it
       | was only a huge project to go forward, but I tend to think
       | sometimes even that is less then the sum of having dealt with
       | complexity to that point, it's dependent on a lot about what you
       | are building.
        
       | alfor wrote:
       | I wonder if moving the db on beefy dedicated hardware with ton of
       | ram and nvme would solve the problem. Preferably physicaly
       | connected to the web serveurs.
       | 
       | Cost: a fraction of the developper cost.
       | 
       | I see so many things done on the cloud that 10X their complexity
       | because of it. Modern hardware in increadibly powerfull.
        
       | sakopov wrote:
       | I thought I was going to read something insightful. Instead it
       | was a post about how to completely ignore your database
       | performance and then consider overcomplicating everything with
       | sharding and microservices because you didn't care to do basic
       | profiling on your queries. I'm glad common sense prevailed, but
       | this is really some junior-level stuff and it's being celebrated
       | as some kind of novelty.
        
       | account-5 wrote:
       | I suppose it seems obvious in hindsight that your first move
       | should always be to investigate potential causes before a
       | wholesale redesign that adds potentially unnecessary complexity
       | to your system.
        
       | exabrial wrote:
       | This is amazing advice. A side note is to use the hell out of
       | replication. These things don't have to be complicated. Setup a
       | readonly and a readwrite datasource/connection pool in your app
       | if you have to.
        
       | i_like_apis wrote:
       | I'm reminded of one of my favorite sayings:
       | 
       |  _You go to war with the army you have, not the army you might
       | want or wish to have at a later time._
       | 
       | You may want to ignore that this this comes from Donald Rumsfeld
       | (he has some great ones though: "unknown unknowns ...", etc.)
       | 
       | I think about this a lot when working on teams. Everyone is not
       | perfectly agreeable or has the same understanding or collective
       | goals. Some may be suboptimal or prone to doing things you don't
       | prefer. But having a team is better than no team, so find the
       | best way to accomplish goals with the one you have.
       | 
       | It applies to systems well too.
        
         | fuzztester wrote:
         | "No battle plan survives contact with the enemy."
         | 
         | https://www.google.com/search?q=no+battle+plan+survives
        
           | sbuk wrote:
           | Mike Tyson said it more simply: "Everybody has a plan until
           | you get hit in the face."
        
           | fuzztester wrote:
           | https://en.m.wikipedia.org/wiki/Helmuth_von_Moltke_the_Elder
           | 
           | Moltke's thesis was that military strategy had to be
           | understood as a system of options, since it was possible to
           | plan only the beginning of a military operation. As a result,
           | he considered the main task of military leaders to consist in
           | the extensive preparation of all possible outcomes.[3] His
           | thesis can be summed up by two statements, one famous and one
           | less so, translated into English as "No plan of operations
           | extends with certainty beyond the first encounter with the
           | enemy's main strength" (or "no plan survives contact with the
           | enemy") and "Strategy is a system of expedients".[18][8]
           | Right before the Austro-Prussian War, Moltke was promoted to
           | General of the Infantry.[8]
        
         | makeitdouble wrote:
         | I'm thinking about this quote for a while but have a hard time
         | squeezing the meaning, or really the actionable part out of it.
         | 
         | The unknown unknowns quote brings the concept that however
         | confident you are in a plan you absolutely need margin. The
         | other quote thought...what do you do differently when
         | understanding that your team is not perfect ?
         | 
         | On one side, outside of VC backed startups I don't see
         | companies trying to reinvent linux whith a team of 4 new
         | graduates. On the other side companies with really big goals
         | will hire a bunch until they feel comfortable with their talent
         | before "going to war". You'll see recruiting posts seeking
         | specialists in a field before a company bets the farm on that
         | specific field (imagine Facebook renaming itself to Meta before
         | owning Oculus...nobody does that[0])
         | 
         | Edit: sorry, I forgot some guy actually just did that 2 weeks
         | ago with a major social platform. And I kinda wanted to forget
         | about it I think.
        
         | sainez wrote:
         | Great point about working on teams. For the vast majority of
         | tasks, people are only marginally better or worse than each
         | other. A few people with decent communication will outpace a
         | "star" any day of the week.
         | 
         | I try to remind myself of this fact when I'm frustrated with
         | other people. A bit of humility and gratitude go a long way.
        
         | tedunangst wrote:
         | Mattis "the enemy gets a vote" is another good reminder of
         | reality, although people get very angry about it. Useful in
         | terms of security, privacy, DRM, etc.
        
           | walterbell wrote:
           | Product management outside the box.
        
         | Buttons840 wrote:
         | I like a similar quote from Steven Pressfield:
         | 
         | "The athlete knows the day will never come when he wakes up
         | pain-free. He has to play hurt."
         | 
         | This applies to ourselves more than our systems though.
        
         | roughly wrote:
         | Rumsfeld's got some great quotes, most of which were delivered
         | in the context of explaining how the Iraq war turned into such
         | a clusterfuck, and boy could that whole situation have used the
         | kind of leadership Donald Rumsfeld's quotes would lead you to
         | believe the man could've provided.
        
           | xapata wrote:
           | > could've
           | 
           | If someone is 83.7% likely to provide good leadership, how
           | would you evaluate the choice to hire that person as a leader
           | in the hindsight that the person failed to provide good
           | leadership -- was it a bad choice, or was it a good choice
           | that was unlucky?
           | 
           | (Likelihood was selected arbitrarily.)
        
             | hluska wrote:
             | Like everything in politics, I think this is a function of
             | what team you cheer for. If your goal was to come up with
             | an excuse to invade Iraq, that person was an excellent
             | choice. If you're on the other team, what a clusterfuck.
             | 
             | Then you add in a party system and it gets more
             | complicated. Realistically, you don't get to be the United
             | States Secretary of Defense (twice) if you're the kind of
             | person who will ignore the will of the party and whoever is
             | President.
        
             | whatshisface wrote:
             | > _quotes would lead you to believe_
        
           | dragonwriter wrote:
           | > Rumsfeld's got some great quotes, most of which were
           | delivered in the context of explaining how the Iraq war
           | turned into such a clusterfuck
           | 
           | If by "explaining how" you mean "deflecting (often
           | preemptively) responsibility for", yes.
        
           | marcosdumay wrote:
           | If I remember it correctly (it was a long time ago), he never
           | fully supported the war. It didn't take a genius to notice
           | that the goals set by the presidency were (literally)
           | impossible and not the kind of thing you do achieve a war.
           | 
           | But whatever position he had, Iraq turning into a clusterfuck
           | wasn't a sign of bad leadership by his part. It was a sign of
           | bad ethics, but not leadership. His options were all of
           | getting out of his position, disobeying the people above him,
           | or leading the US into a clusterfuck.
        
             | mickdeek86 wrote:
             | Rumsfeld personally advanced the de-baathification
             | directive - the lynchpin of the clusterfuckery - all on his
             | own, and he certainly would have known to expect the
             | 'unexpected' results to be similar to de-nazification. This
             | was absolutely his choice. Another point you have
             | (unintentionally?) brought up is the dignified resignation
             | option. While it is often a naive, self-serving gesture, we
             | can reasonably imagine that the Defense Secretary publicly
             | resigning over opposition to a war during the public
             | consideration of that war, might have had some effect on
             | whether that war was started. I want to like him too, with
             | his grandfatherly demeanor and genuine funnyness ("My god,
             | were there so many vases?!") but, come on.
        
           | moffkalast wrote:
           | Could've at least given them some motivational quotes.
        
           | hluska wrote:
           | I like to remind myself that very few people reach positions
           | of great power after mediocre lives. Rather there's a thread
           | of talent that runs through government.
           | 
           | Once they're in, the predilections that led to power often
           | rear their dark long tails. But they're all (even the ones I
           | disagree with) talented.
        
             | patmcc wrote:
             | They're talented at getting into power, and may be talented
             | at any number of other things.
             | 
             | They're not always talented at the things we may want them
             | to be, unfortunately. And that's true of both the ones I
             | agree and disagree with.
        
         | KnobbleMcKnees wrote:
         | That was Donald Rumsfeld!? I always assumed this came from some
         | techie or agile guru given how much it's used as a concept in
         | project planning.
        
           | a_seattle_ian wrote:
           | That it came from Donald Rumsfeld in the context of what we
           | know now and what he surely knew then is why it's such a good
           | quote. The words basically say nothing but are also true
           | about everything. So it can implicit be a warning that there
           | is probably some bullshit going on or someone has a sense of
           | humor and is also warning people while also avoiding the
           | subject - of course just my opinion. How people actually use
           | it will depend what the audience agrees it to mean.
        
             | [deleted]
        
           | midasuni wrote:
           | And unknown unknowns is a great way to communicate with
           | stakeholders too
        
             | roughly wrote:
             | Zizek has a followup to that quote:
             | 
             | "What he forgot to add was the crucial fourth term: the
             | "unknown knowns," the things we don't know that we know."
             | 
             | I've found it's really critical during the project planning
             | phase to get to not just where the boundaries of our
             | knowledge are, but also where are the things we're either
             | tacitly assuming or not even aware that we've assumed. An
             | awful lot of postmortems I've been a part of have come down
             | to "It didn't occur to us that could happen."
        
               | munificent wrote:
               | _> An awful lot of postmortems I 've been a part of have
               | come down to "It didn't occur to us that could happen." _
               | 
               | Would that not be an unknown unknown?
        
               | roughly wrote:
               | Usually there's a tacit assumption of how the system
               | works, how the users are using the system, or something
               | else about the system or the environment that causes that
               | - it's not that the answer wasn't known, it's that it was
               | assumed to be something it wasn't and nobody realized
               | that was an assumption and not a fact.
        
               | thfuran wrote:
               | That's just an unknown unknown masquerading as a known
               | known.
        
               | waprin wrote:
               | I really enjoy the concept of unknown knowns, but I don't
               | agree with your example, which is an unknown unknown.
               | 
               | To me the corporate version of the unknown known is when
               | a a project is certainly doomed, for reasons everyone on
               | the ground knows about, yet nobody wants to say anything
               | and be the messenger that inevitably gets killed, as long
               | as paycheck keeps clearing. An exec ten thousand feet
               | from the ground sets a "vision" which can't be blown off
               | course by minor details such as reality, until the day it
               | does.
               | 
               | Theranos is a famous example of this but I've had less
               | extreme versions happen to me many times throughout my
               | career.
               | 
               | Another example of unknown knowns might be the conflict
               | between companies stated values (Focus on the User) and
               | the unstated values that are often much more important
               | (Make Lots of Money)
        
           | killjoywashere wrote:
           | As a military officer who was watching CNN live from inside
           | an aircraft carrier (moored) when he said that, being in
           | charge of anti-terrorism on the ship at the time, it was
           | absolutely foundational to my approach to so many things
           | after that. Here's the actual footage:
           | https://www.youtube.com/watch?v=REWeBzGuzCc
           | 
           | Rumsfeld was complicated, but there's no doubt he was very
           | effective at leading the Department. I think most people fail
           | to realize how sophisticated the Office of the Secretary of
           | Defense is. Their resources reel the mind, most of all the
           | human capital, many with PhDs, many very savvy political
           | operators with stunning operational experiences. As a small
           | example, as I recall, Google's hallowed SRE system was
           | developed by an engineer who had come up through the ranks of
           | Navy nuclear power. That's but one small component reporting
           | into OSD.
           | 
           | Not a Rumsfeld apologist, by any means. Errol Morris did a
           | good job showing the man for who he is, and it's not pretty
           | (1). But reading HN comments opining about the leadership
           | qualities of a Navy fighter pilot who was both the youngest
           | and oldest SECDEF makes me realize how the Internet lets
           | people indulge in a Dunning-Kruger situation the likes of
           | which humanity has never seen.
           | 
           | https://www.amazon.com/Known-Donald-Rumsfeld/dp/B00JGMJ914
        
             | michael1999 wrote:
             | I'll support you there. In any sensible reading of
             | Nuremberg, they all deserve to hang from the neck until
             | dead. But the central moral failure was Bush. Letting
             | Cheney hijack the vp search, and then pairing him up with
             | Rumsfeld was a bad move, and obviously bad at the time.
             | Those two had established themselves as brilliant but
             | paranoid kooks with their Team B fantasies in the 70s, and
             | should never have been allowed free rein.
        
         | oDot wrote:
         | Every time I hear the name Rumsfeld, I am reminded of the time
         | when, for over 10 minutes, he refused to deny being a lizard:
         | 
         | https://www.youtube.com/watch?v=XH_34tqxAjA
        
       | macNchz wrote:
       | In my experience, in web apps built on top of ORMs there is often
       | a TON of low hanging fruit for query optimization when database
       | load becomes an issue. Beyond the basics of "do we have N+1
       | issues", ORMs sometimes just don't generate optimal queries. I
       | wouldn't want to built a complex production web app _without_ an
       | ORM, but being able to eject from it sometimes is key.
       | 
       | Profile real world queries being run in production that use the
       | most resources. Take a look at them. Get a sense of the shape of
       | the tables that they're running against. Sometimes the ORM will
       | be using a join where you actually want a subquery. Sometimes the
       | opposite. Sometimes you'll want to aggregate some results
       | beforehand, or adjust the WHERE conditions in a complex join.
       | I've seen situations where a semi-frequent ORM-generated query
       | was murdering the DB, taking 20+ seconds to run, and with a few
       | minor tweaks it would run in less than a second.
        
         | nerdponx wrote:
         | I'm working on something right now with the Python ORM
         | SQLAlchemy. It turns out that getting it to use RETURNING with
         | INSERT is not trivial and requires you to set the non-obvious
         | option `expire_on_commit=False`, which doesn't _guarantee_ use
         | of RETURNING, but is supposed to use it if your db driver and
         | database happen to support it and the ORM happens to support it
         | for that particular combination of driver and database. And
         | there 's no API to actually inspect the generated SQL even
         | though it's emitted in the logs, so there's no way to enforce
         | the use of RETURNING in your test suite without capturing and
         | scraping your own logs (which fortunately is very easy within
         | the Pytest framework).
         | 
         | I like ORMs but this is just frustratingly complicated on so
         | many levels. I also understand that SQLAlchemy is an enormous
         | library and not everything will be easy. But I think this case
         | exemplifies the trade-offs involved with using an ORM.
         | 
         | (Yes I am aware that using insert() itself in Core does what I
         | want, I'm talking about .add()-ing an ORM object to an
         | AsyncSession).
        
           | bootsmann wrote:
           | There is certainly an API to inspect your query, you can just
           | call print() on the object iirc.
        
       | sheepz wrote:
       | Agree wholeheartedly with the conclusion of the article.
       | 
       | But the post makes it seem that there was no real query-level
       | monitoring for the Postgres instance in place, other than perhaps
       | the basic CPU/memory ones provided by the cloud provider. Using
       | an ORM without this kind of monitoring is sure way to shoot
       | yourself in the foot with n+1 queries, queries not using
       | indexes/missing indexes etc
       | 
       | The other thing that is amazing that everyone immediately reached
       | for redesigning the system without analyzing the cause of the
       | issues. A single postgres instance can do a lot!
        
         | PeledYuval wrote:
         | What's your recommended way of implementing this in a simple
         | App Server <> Postgres architecture? Is there a good Postgres
         | plugin or do you utilize something on the App side?
        
           | clintonb wrote:
           | We use Datadog, which centralizes logs and application
           | traces, allowing us to better pinpoint the exact request/code
           | path making the slow query.
        
           | sheepz wrote:
           | I've used pganalyze which is a non-free SaaS tool. Gives you
           | a very good overview of where the DB time is spent with index
           | suggestions etc. There are free alternatives, but require
           | more work from you.
        
       | gillh wrote:
       | Prioritized load shedding works well as a last resort [0]. The
       | idea is simple -
       | 
       | - Detect overload/congestion build-up at the database
       | 
       | - Apply queueing at the gateway service and schedule requests
       | based on their priority
       | 
       | - Shed excess requests after a timeout
       | 
       | [0]: https://docs.fluxninja.com/blog/protecting-postgresql-
       | with-a...
        
       | Xeoncross wrote:
       | > The real cost of increased complexity - often the much larger
       | cost - is attention.
       | 
       | ...or just mental load. I'm tired of working on micro-service
       | systems that still have downtime, but no one knows how it all
       | works. Most are actually just distributed monoliths so changes
       | often touch multiple services and have to be rolled out in order.
       | Data has to be duplicated, tasks have to be synchronized, state
       | has to be shared, etc...
       | 
       | https://www.youtube.com/watch?v=y8OnoxKotPQ
        
         | javajosh wrote:
         | This is a very common architectural smell, when you have
         | uservices and "no-one knows how they all work". The whole point
         | is that no-one can or should know how they all work; the fact
         | that someone does in order to fix or modify the system is a
         | strong signal that you've violated some of the rules - like
         | single responsibility, and proper abstraction through API. But,
         | in my experience, this is extremely common - debugging a
         | pipeline of N microservices often requires running and building
         | all N services locally. This is, strictly speaking, a monolith
         | + network partitions + (infinite) build/deploy variation. An
         | extremely challenging work environment that is ultimately
         | beyond any mortal programmer's ability, IMO.
        
       | mandevil wrote:
       | He says to avoid complexity, and the team he was on (cleaning up
       | some bad queries) was probably improving along that axis (or at
       | worst orthogonal to complexity) but, from having done exactly
       | this, adding an optional 'query the read-replica' option for
       | queries- and determining whether this query can safely go there-
       | is definitely extra complexity which will now need to be managed
       | into the future. Definitely less overall than a complete re-
       | arching of the system, but this is where engineering judgement
       | and experience come into play: would you be better off getting
       | those select queries resolved with some other data store or with
       | a pg read-replica? If your query can survive against the read-
       | replica (so stale data is at least sometimes acceptable) would
       | you be better off caching results in redis?
        
         | gwbas1c wrote:
         | > If your query can survive against the read-replica (so stale
         | data is at least sometimes acceptable) would you be better off
         | caching results in redis?
         | 
         | Caching adds a lot of complexity. It denormalizes the data, and
         | now you "need to know" when to update the cache. Because "the
         | single source of truth" is no longer maintained, it's easy to
         | accidentally add regressions.
         | 
         | If it's a matter of adding a read replica, that's a much better
         | solution, long-term, because you don't have the effort of "does
         | this query also need to update the cache?"
         | 
         | (I'd think by now there would be a way to expose events in a DB
         | when certain tables are updated; and then (semi) automatically
         | invalidate the cache.)
        
       | pluto_modadic wrote:
       | ah... rails easy mode discovers rails is only performant if you
       | don't stray too far from hello world...
        
       | dbg31415 wrote:
       | I feel like a lot of people do this instead of upgrading to newer
       | versions, even maintenance patches.
       | 
       | And I get that upgrades can be scary, but often they are
       | relatively low cost.
       | 
       | Leaving everyone on the old system unhappy... means they will
       | eventually push to re-platform, or rebuild, instead of just doing
       | suggested maintenance along the way to keep they system they have
       | in good shape.
       | 
       | My advice... do the maintenance. Do all the maintenance! Don't
       | just drive it into the ground and get mad when it breaks; change
       | the oil and tires and spring for a car wash and some new wiper
       | blades every now and then and you'll be happier in the long run.
        
       | zengid wrote:
       | The solution they went with, squeezing juice out of the system by
       | finding performance optimizations, brings me so much joy.
       | 
       | It reminds me of Richard L. Sites's book
       | _Understanding_Software_Dynamics_ where he basically teaches how
       | to measure and fix latency issues, and how at large scales,
       | reducing latency can have tremendous savings.
       | 
       | Measuring and reasoning about those issues are hard, but the
       | solutions are often simple. For example, on page 9 he mentions
       | that _" [a] simple change paid for 10 years of my salary."_
       | 
       | I hope to someday make such an impactful optimization!
        
         | canucker2016 wrote:
         | The problem I have with their eventual solution is that they
         | only optimized their queries AFTER they had upgraded their
         | instance to the largest config available.
         | 
         | They couldn't upgrade their config with a few clicks in the
         | admin console anymore (I'm guessing what's involved here) so
         | now they had to use actual grey matter to fix their capacity
         | problem. Maybe if they had spent more time optimizing specific
         | parts of their code, they wouldn't even need such a large
         | config instance.
        
       | fritzo wrote:
       | > since our work touched many parts of the codebase and demanded
       | collaboration with lots of different devs, we now have a strong
       | distributed knowledge base about the existing system
       | 
       | Great to see this cultural side-effect called out.
        
       | WallyFunk wrote:
       | > Of course, I'm not saying complexity is bad. It's necessary.
       | 
       | Weird thing about computers, even after a fresh install of your
       | favorite OS, the whole thing is sitting on a mountain of
       | complexity, and that's _before_ you start installing programs,
       | browse the web, etc
       | 
       | Only the die-hard use things like MINIX[0] to do their computing.
       | Correction: MINIX is in the Intel Management Engine so you have
       | /two/ computers.
       | 
       | [0] https://en.wikipedia.org/wiki/Minix
        
       | deathanatos wrote:
       | > _Split up the monolith into multiple interconnected services,
       | each with its own data store that could be scaled on its own
       | terms._
       | 
       | Just to note: you don't have to split out all the possible
       | microservices at this junction. You can ask, "what split would
       | have the most impact?"
       | 
       | In my case, we split out some timeseries data from Mongo into
       | Cassandra. Cass's table structure was a much better fit -- that
       | dataset had a well defined schema, so Cass could pack the data
       | much more efficiently; for that subset, we didn't need the
       | flexibility of JSON docs. And it was the bulk of our data, and so
       | Mongo was quite happy after that. Only a single split was
       | required. (And technically, we were a monolith before and after:
       | the same service just ended up writing to two databases.)
       | 
       | Ironically, later, an airchair architect wanted to merge all the
       | data into a JSON document store, which resulted in numerous
       | "we've been down that road, and we know where it goes" type
       | discussions.
        
         | kedean wrote:
         | Funny enough I frequently have the opposite problem, justifying
         | repeatedly why Cassandra is a bad fit for relatively short
         | lived, frequently updated data (tombstones, baby).
        
           | deathanatos wrote:
           | I'd agree with you there.
           | 
           | The specific data that went into Cassandra in our case was
           | basically immutable. (And somehow, IIRC, we _still_ had
           | issues around tombstones. I am not a fan of them.) Cassandra
           | 's tooling left much to be desired around inspecting the
           | exact state of tombstones within the cluster.
        
         | kreetx wrote:
         | In a way, in the article they also did a split: specific heavy
         | select queries were offloaded to a replica.
        
           | agentultra wrote:
           | They could probably squeeze more depending on their workload
           | patterns. RDBMS' typically optimize for fast/convenient
           | writes. If your write load would be fine with a small
           | increase in latency then you can do a lot of de-normalization
           | so that your reads can avoid using tonnes of joins,
           | aggregates, windows, etc at read-time. Update write path so
           | that you update all of the de-normalized views at write time.
           | 
           | Depending on your read load and application structure you can
           | get a lot more scale with caching.
           | 
           | Decent article.
        
         | mamcx wrote:
         | Is interesting that the idea of micro services is throw like a
         | obvious "solution".
         | 
         | Is not.
         | 
         | "Scale-up" MUST be the "obvious" solution. What is missed by
         | many, and this article touch (despite saying that micro-
         | services is a "solid" choice) is that "Scale-up" is "scale-out"
         | without breaking the consistency of the DB.
         | 
         | Is a lot you can do to squeeze, and is rare you need to ignore
         | join, data validations and other anti-patterns that are
         | normally trow casual when problems of performance happens.
        
           | deathanatos wrote:
           | I don't know what to tell you other than I've seen vertical
           | scaling hit its ceiling, several times. The OP lists "scale
           | vertically first" as a given; to an extent, I agree with it,
           | and the comment you're responding to is made with that as a
           | base assumption.
           | 
           | There are sometimes diminishing returns to simple scaling;
           | e.g., in my current job, each new disk we add adds 1/n disks'
           | worth of capacity. Each scaling step happens quicker and
           | quicker (assuming growth of the underlying system).
           | Eventually, you hit the wall in the OP, in that you need
           | design level changes, not just quick fixes.
           | 
           | The situation I mention in my comment was one of those: we'd
           | about reached the limits of what was possible with the setup
           | we had. We were hitting things such as bringing in new nodes
           | was difficult: the time for the replica to replicate was
           | getting too long, and Mongo, at the time, had some bug that
           | caused like a ~30% chance that the replica would SIGSEGV and
           | need to restart the replication from scratch. Operationally,
           | it was a headache, and the split moved a _lot_ of data out
           | that made these cuts not so bad. (Cassandra did bring its own
           | challenges, but the sum of the new state was that it was
           | better than where we were.)
           | 
           | Consistency is something you must pay attention to. In our
           | case, the old foreign key between the two systems was the
           | user ID, and we had specific checks to ensure consistency of
           | it.
        
       ___________________________________________________________________
       (page generated 2023-08-11 23:00 UTC)