[HN Gopher] Squeeze the hell out of the system you have ___________________________________________________________________ Squeeze the hell out of the system you have Author : sbmsr Score : 269 points Date : 2023-08-11 18:18 UTC (4 hours ago) (HTM) web link (blog.danslimmon.com) (TXT) w3m dump (blog.danslimmon.com) | javajosh wrote: | I'll probably get down-voted for saying this (again), but a key | way to squeeze unimaginable amounts of performance is to _lean | into stored procedures_. | | Look, I get it, the devx sucks. And it feels proprietary, icky, | COBOL-like experience. It means you have to _dwell_ in the | database. What are you, a db admin?! | | But I'm telling you, the payoff is worth it. (and also, if you | ship it you own it so yes you're a db admin). My company ran for | many years on 3 machines, despite it's extremely heavy page | weight because the original author wrote it stored procs from the | beginning. (He also liberally threw away data, which was great, | but that's another post.) Part of my job was to migrate away from | .NET and to Java and JavaScript - and another engineer wrote an | ingenious tool that would generate Java bindings to SQL Server | stored procs that made it really nice to work with them. And the | performance really was outrageous - 100x better than any system | I've worked with before or since. Those 3 boxes handled 300k very | data intensive monthly actives, and that was like 10 years ago. | | Don't worry - even if you lean into SPs there is still plenty of | engineering to do! It's just that your data layer will simplify, | and your troubleshooting actually gets easier, not harder. I | liked the custom bindings - a bit like ActiveRecord, and no ORM. | But really, truly: if you want to squeeze, move some queries into | SPs and prepare to be amazed. | giantrobot wrote: | I can't disagree with the results, SPs can change your life. | HOWEVER, they require significant discipline and regular | audits. All the code for them needs to be in source control | with a Process for deployment to the DB. You also need a test | suite as part of the Process which runs against a staging | server with a comparable configuration to prod. The SPs need to | be regularly dumped and compared against what's in source | control and marked as what's supposed to be released in prod. | [deleted] | romafirst3 wrote: | TLDR. We were going to completely rewrite our architecture but | instead we optimized a few Postgres queries. LMAO | endisneigh wrote: | The bit on the database performance issues leads me to my | hottest, flamiest take for new projects: | | - Design your application's hot path to never use joins. Storage | is cheap, denormalize everything and update it all in a | transaction. It's truly amazing how much faster everything is | when you eliminate joins. For your ad-hoc queries you can | replicate to another database for analytical purposes. | | On this note, I have mixed feelings about Amazon's DynamoDB, but | one things about it is to use it properly you need to plan your | use first, and schema second. I think there's something you can | take from this even with a RDBMS. | | In fact, I'd go as far to say as joins are unnecessary for | nonanalytical purposes these days. Storage is so mind booglingly | cheap and the major DBs have ACID properties. Just denormalize, | forreal. | | - Use something more akin to UUIDs to prevent hot partitions. | They're not a silver bullet and have their own downsides, but | you'll already be used to the consistently "OK" performance that | can be horizontally scaled rather than the great performance of | say integers that will fall apart eventually. | | /hottakes | | my sun level take would be also to just index all columns. but | that'll have to wait for another day. | [deleted] | feoren wrote: | There are "tall" applications and "wide" applications. Almost | all advice you ever read about database design and optimization | is for "tall" applications. Basically, it means that your | application is only doing one single thing, and everything else | is in service of that. Most of the big tech companies you can | think of are tall. They have only a handful of really critical, | driving concepts in their data model. | | Facebook really only has people, posts, and ads. | | Netflix really only has accounts and shows. | | Amazon (the product) really only has sellers, buyers, and | products, with maybe a couple more behind the scene for | logistics. | | The reason for this is because tall applications are _easy_. | Much, much easier than wide applications, which are often | called "enterprise". Enterprise software is bad because it's | _hard_. This is where the most unexplored territory is. This is | where untold riches lie. The existing players in this space are | abysmally bad at it (Oracle, etc.). You will be too, if you | enter it with a tall mindset. | | Advice like "never user joins" and "design around a single | table" makes a lot of sense for tall applications. It's awful, | terrible, very bad, no-good advice for wide applications. You | see this occasionally when these very tall companies attempt to | do literally anything other than their core competency: they | fail miserably, because they're staffed with people who hold | sacrosanct this kind of advice that _does not translate_ to the | vast space of "wide" applications. Just realize that: your | advice is for companies doing easy things who are already | successful and have run out of low-hanging fruit. Even tall | applications that aren't yet victims of their own success do | not need to think about butchering their data model in service | of performance. Only those who are already vastly successful | and are trying to squeeze out the last juices of performance. | But those are the people who _least need advice_. This kind of | tall-centered advice, justified with "FAANG is doing it so you | should too" and "but what about when you have a billion users?" | is poisoning the minds of people who set off to do something | more interesting than serve ads to billions of people. | xyzzy123 wrote: | Thanks I think this is a really interesting way to look at | things. | | What is the market for "wide" applications though? It seems | like any particular business can only really support one or | two of them, for some that will be SAP and for others it | might be Salesforce (if they don't need much ERP), or (as you | mentioned) some giant semi homebrewed Oracle thing. | | Usually there is a legacy system which is failing but still | runs the business, and a "next gen" system which is not ready | yet (and might never be, because it only supports a small | number of use cases from the old software and even with an | army of BAs it's difficult to spec out all the things the old | software is actually doing with any accuracy). | | Or am I not quite getting the idea? | endisneigh wrote: | I agree with your sentiment, but even enterprises work on | multiple "tall" features. | | If they didn't then I'd change my advice to be simply multi | tenant per customer and replicate into a column store for | cross customer analytics. | pipe_connector wrote: | I agree with the characterization of applications you've laid | out and think everyone should consider whether they're | working on a "tall" (most users use a narrow band of | functionality) or a "wide" (most users use a mostly non- | overlapping band of functionality) application. | | I also agree with your take that tall applications are | generally easier to build engineering-wise. | | Where I disagree is that I think in general wide applications | are failures in product design, even if profitable for a | period of time. I've worked on a ton of wide applications, | and each of them eventually became loathed by users and | really hard to design features for. I think my advice would | be to strive to build a tall application for as long as you | can muster, because it means you understand your customers' | problems better than anyone else. | feoren wrote: | > I've worked on a ton of wide applications, and each of | them eventually became loathed by users and really hard to | design features for. | | Yes, I agree that this is the fate of most. But I refuse to | believe it's inevitable; rather, I think it comes from | systemic flaws in our design thinking. Most of what we | learn in a college database course, most of what we read | online, most all ideas in this space, transfer poorly to | "wide" design. People don't realize this because those | approaches do work well for tall applications, and because | they're regarded religiously. This is why I call them so | much harder. | veave wrote: | >Design your application's hot path to never use joins. Storage | is cheap, denormalize everything and update it all in a | transaction. It's truly amazing how much faster everything is | when you eliminate joins. | | Anybody has documentation about this with examples? | newlisp wrote: | Duplicate data to avoid joins, use serializable transactions | to update all the duplicated data. | joshstrange wrote: | See "Single Table Design" which I talked about in this | comment above: https://news.ycombinator.com/item?id=37093357 | deely3 wrote: | And if you don't want to spend money, you can get basic | idea from this article: | https://www.alexdebrie.com/posts/dynamodb-single-table/ | tibbetts wrote: | Premature denomalization is expensive complexity. | Denormalization is a great tool, maybe an under-used tool. But | you should wait until there are hot paths before using it. | endisneigh wrote: | I agree. To be clear I'm not suggesting anyone start | denormalizing everything. I'm saying if you're fortunate | enough to be on a green project, you should design the schema | around the access patterns which will surely be | "denomarlized." as opposed to designing a normalized schema | and designing your access patterns around those. | latchkey wrote: | > _Design your application 's hot path to never use joins._ | | Grab (uber of asia) did this religiously and it created a ton | of friction within the company due to the way the teams were | laid out. It always required one team to add some sort of API | that another team could take advantage of. Since the first team | was so busy always implementing their own features, it created | roadblocks with other teams and everyone started pointing | fingers at each other to the point that nothing ever got done | on time. | | Law of unintended consequences | tedunangst wrote: | Hard to follow the link. How would you join two tables | between teams that don't communicate? | latchkey wrote: | You don't, that's the problem. | endisneigh wrote: | yes, this is a fair point. there's no free lunch after all. | without knowing more about what happened with Grab I'd say | you could mitigate some of that with good management and | access patterns, though. | latchkey wrote: | All in all though, I don't think that 'never use joins' is | a good solution either since it does create more developer | work almost every way you slice it. | | I think the op's solution of looking more closely at the | hot paths and solving for those is a far better solution | than re-architecting the application in ways that could, or | can, create unintended consequences. People don't consider | that enough, at all. | | Don't forget that hot path resolution is the antithesis of | 'premature optimization'. | | > you could mitigate some of that with good management and | access patterns | | the CTO fired me for making those sorts of suggestions | about better management, and then got fired himself a | couple months later... -\\_(tsu)_/-... even with the macro | events, their stock is down 72% since it opened, which | doesn't surprise me in the least bit having been on the | inside... | taylodl wrote: | My hot take: always use a materialized view or a stored | procedure. _Hide the actual, physical tables from the | Application 's account!_ | | The application doesn't need to know how the data is physically | stored in the database. They specify the logical view they need | of the data. The DBAs create the materialized view/stored | procedure that's needed to implement that logical view. | | Since the application is _never_ directly accessing the | underlying physical data, it can be changed to make the | retrieval more efficient without affecting any of the database | 's users. You're also getting the experts to create the | required data access for you in the fastest, most efficient way | possible. | | We've been doing this for years now and it works great. It's | alleviated so many headaches we used to have. | walterbell wrote: | Interface contracts and indirection FTW. | | 2011, "Materialized Views" by Rada Chirkova and Jun Yang, | https://dsf.berkeley.edu/cs286/papers/mv-fntdb2012.pdf | | _> We cover three fundamental problems: (1) maintaining | materialized views efficiently when the base tables change, | (2) using materialized views effectively to improve | performance and availability, and (3) selecting which views | to materialize. We also point out their connections to a few | other areas in database research, illustrate the benefit of | cross-pollination of ideas with these areas, and identify | several directions for research on materialized views._ | downWidOutaFite wrote: | This doesn't work because DBAs are rarely on the dev team's | sprint schedule. If the DBAs are blocking them devs can and | will figure out how to route around the gatekeepers. In | general, keep the logic in the app not the db. | alfor wrote: | But for the saves the structure is visible? | taylodl wrote: | You can update underlying data via a materialized view. | Scarbutt wrote: | Normalization is not only about data storage but most | importantly, data integrity. | endisneigh wrote: | Yes, but I assert that it's possible to use transactions to | update everything consistently. Serializable transactions | weren't really common when MySQL/Postgres _first_ came out, | but now that they 're common in new DBs + ACID, I think it's | not possible to do with reasonable difficulty. If you agree | with this, than its easy to prove that denormalized tables | performance increase is well worth the annoyance of updating | everything to transactionally update the dependencies. | | I won't say that it's trivial to update all of your business | logic to do this, but I think it's definitely worth it for a | new project at least. | Guvante wrote: | You always need to compare write vs read performance. | | Turning a single table update into a 10 table one could tip | your lock contention to the point where you are write bound | or worse start hitting retries. | | Certainly it makes sense to move rarely updated fields to | where they are used makes sense. | | Similarly "build your table against your queries not your | ideal data model" is always sage advice. | Bognar wrote: | Denormalized transactions are not trivial unless you are | using serializable isolation level which will kill | performance. If you don't use serializable isolation level, | then you risk either running into deadlocks (which will | kill performance) or inconsistency. | | Decent SQL databases offer materialized views, which | probably give you what you want without all the headache of | maintaining denormalized tables yourself. | endisneigh wrote: | all fair points, but to be fair I don't necessary think | this makes the most sense for an existing project for the | reasons you state. I do think for a new project would | best be able to design around the access patterns in a | way that eliminate most of the downsides. | williamdclt wrote: | Transactions are not only (actually mainly not) about | atomicity. Of course it's possible to keep data integrity | without normalisation, but that means you need to maintain | the invariants yourself at application level and a big | could result in data inconsistency. Normalisation isn't | there to make integrity possible, it's there to make (some) | non-integrity impossible. | | Nobody says you have to have only one view of your data | though. You can have a normalised view of your data to | write, and another denormalised for fast reads (you usually | have to, at scale). Something like event sourcing is | another way (which is actually pushing invariants to | application level, in a structured way) | wizofaus wrote: | Can't say I've ever come across a scenario where a join itself | was the performance bottleneck. If there's any single principle | I have observed is "don't let a table get too big". More often | than not it's historical-record type tables that are the issue | - but the amount of data you need for day-to-day operations is | usually a tiny fraction of what's actually in the table, and | you're bound to start finding operations on massive tables get | slow no matter what indexes you have (and even the act of | adding more indexes becomes problematic. And just indexing all | columns isn't enough for traditional RMDBSes at least - you | have to index the right combinations of columns for them to be | used. Might be different for DynamoDb). | 8note wrote: | Dynamo is quick for that, so long as you are picking good | partition keys. | | Instead, it'll throw you hot key throttling if you start | querying one partition too much | wtetzner wrote: | I'd say that probably depends on what your hot path is. If it's | write-heavy, then you'll probably end up with performance | issues when you need to write the same data to multiple tables | in a single transaction. And if all of those columns are | indexed, it'll be even worse. | iamwil wrote: | If you don't use joins, how do you associate records from two | different tables when displaying the UI? Do you just join in | the application? Or something else? | endisneigh wrote: | this has opinionated answers. | | if you ask Amazon, they might suggest that you design around | a single table | (https://aws.amazon.com/blogs/compute/creating-a-single- | table...). | | in my opinion it's easier to use join tables. which are what | are sometimes temporarily created when you do a join anyways. | in this case, you permanently create table1, table2, and | table1_join_table2, and keep all three in sync | transactionally. when you need a join you just select on | table1_join_table2. you might think this is a waste of space, | but I'd argue storage is too cheap for you to be thinking | about that. | | that being said, you really have to design around your access | patterns, don't design your application around your schema. | most people do the latter because it seems more natural. what | this might mean in practice is that you do mockups of all of | the expected pages and what data is necessary on each one. | _then_ you design a schema that results in you never having | to do joins on the majority, if not all, of them. | sainez wrote: | > what this might mean in practice is that you do mockups | of all of the expected pages and what data is necessary on | each one. then you design a schema that results in you | never having to do joins on the majority, if not all, of | them. | | Great suggestion! I had a role where I helped a small team | develop a full stack, data-heavy application. I felt pretty | good about the individual layers but I felt we could have | done a better job at achieving cohesion in the big picture. | Do you have any resources where people think about these | sorts of things deeply? | walterbell wrote: | 2001, "Denormalization effects on performance of RDBMS", | by G. L. Sanders and Seungkyoon Shin, | https://www.semanticscholar.org/paper/Denormalization- | effect... | | _> We have suggested using denormalization as an | intermediate step between logical and physical modeling, | to be used as an analytic procedure for the design of the | applications requirements criteria ... The guidelines and | methodology presented are sufficiently general, and they | can be applicable to most databases ... denormalization | can enhance query performance when it is deployed with a | complete understanding of application requirements._ | | PDF: https://web.archive.org/web/20171201030308/https://p | dfs.sema... | endisneigh wrote: | yeah, exactly. in my experience the vast majority of | access patterns are designed around a normalized schema, | where it really should be that the schema is designed | around the access patterns and generously "denormalize" | (which doesn't make sense in this context of a new | database) as necessary. | joshstrange wrote: | Single Table Design is the way forward here. I can highly | recommend The DynamoDB Book [0] and anything (talks, blogs, | etc) that Rick Houlihan has put out. In previous discussions | the author shared a coupon code ("HACKERNEWS") that will take | $20-$50 off the cost depending on the package you buy. It | worked earlier this year for me when I bought the book. It | was very helpful and I referred back to it a number of times. | This github repo [1] is also a wealth of information | (maintained by the same guy who wrote the book). | | As an added data point I don't really like programming books | but bought this since the data out there on Single Table | Design was sparse or not well organized, it was worth every | penny for me. | | [0] https://www.dynamodbbook.com/ | | [1] https://github.com/alexdebrie/awesome-dynamodb | deely3 wrote: | And if you don't want to spend money, you can get idea from | this article: | | https://www.alexdebrie.com/posts/dynamodb-single-table/ | | Im really curious about real life performance on different | databases, especially in situation where RAM is smaller | than database size. | wizofaus wrote: | That article didn't appear to be suggesting single-table | design was appropriate for general purpose RMDBSes (or | any database other than DynamoDb). | i_like_apis wrote: | Yes I like the zero joins on hot paths approach. It can be hard | to sell people on it. It's a great decision for scaling though. | skybrian wrote: | I'm wondering if indexes and materialized views can be used to | do basically the same thing? That is, assuming they contain all | the columns you want. | latchkey wrote: | The issue is writes, not reads. | giantrobot wrote: | There's always money in the banana sta...materialized views. | Materialized views will get you quite a ways on read heavy | workloads. | macNchz wrote: | Over the years I think I've encountered more pain from | applications where the devs leaned on denormalization than from | those that developed issues with join performance on large | tables. | | You can mash those big joins into a materialized view or ETL | them into a column store or whatever you need to fix | performance later on, but once someone has copied the | `subtotal_cents` column onto the Order, Invoice, Payment, | NotificationEmail, and UserProfileRecentOrders models, and | they're referenced and/or updated in 296 different | places...it's a long road back to sanity. | klodolph wrote: | I have personally witnessed the "let's build microservices to get | better performance" argument. I definitely want to nip that in | the bud. | | It's easy to fall in love with complexity, especially since you | see a lot of complexity in existing systems. But those systems | became complex as they evolved to meet user needs, or for other | reasons, over time. Complex systems are impressive, but you need | to make sure that your team has people who recognize the heavy | costs of complexity, and who can throw their engineering efforts | directly against the most important problems your team faces. | jsight wrote: | I blame the easy availability of additional resources in the | cloud for a lot of problems here. Prod db slow? Get a bigger EC2 | instance. Still slow? Hmm, maybe bigger again! Why bother tuning. | | Now... Who knows why our AWS bill is so high? | | With real hardware in a DC, you'd have to justify large capital | expenditures to do something that stupid. | gary_0 wrote: | No mention of caching? If your database is getting hammered with | SELECTs, isn't putting a cache in front of it something that | should at least be considered? | deathanatos wrote: | I've been in the OP's situation, and this exact suggestion was | made in my case. Welcome to one of the hardest problems in CS: | cache invalidation. | | If you have a dataset for which cache invalidation is easy | (e.g., data that is written and never updated), yeah, | absolutely go for this. | | In our case, and most cases I've seen, it wasn't so simple, and | "split this off to a DB better suited to it" was less complex | (maybe still a lot of work, but conceptually _simple_ ) than | figuring out cache invalidation. | lern_too_spel wrote: | There are systems that will do that for you like | https://readyset.io/. | Scarbutt wrote: | They mentioned adding a DB replica for reads. | jakey_bakey wrote: | [The Grug Brained Developer](https://grugbrain.dev/) | sssspppp wrote: | Love this post. I've been trying to tell my manager the same | message for the last few months (with little success). We're | about to embark on a massive migration to "next-gen | infrastructure" (read: three different Redshift clusters managed | by CDK) because our overloaded Redshift cluster (already maxed | out with RA3 nodes) has melted down one too many times. The next- | gen infra is significantly more complex than our existing setup | and I'm not convinced this migration will be the silver bullet | everyone is hoping for. | iamwil wrote: | Ugh. I had a colleague that addressed any scaling problem by | putting a cache in front of the DB. Praised for solving the | immediate problem, but shouldered none of the costs. </rant> | | I admit in the face of finding Prod/market fit, you do the | expedient thing, but damned if I'm not often at the receiving end | of these sorts of decisions. | aidos wrote: | Interestingly, I often ask candidates about optimising a slow | running db query and the majority of people jump to adding | caching and very few ask if they can run an explain or see the | indexes. | tedunangst wrote: | "I would make the slow query faster" seems too obvious an | answer for an interview question. | andrewstuart wrote: | Isn't Rails wasteful in its database access patterns? | iamwil wrote: | Generally No. But it can be easy to write bad queries using | ActiveRecord ORM if you're not aware of N + 1 problems. | romafirst3 wrote: | 100%, it makes it easy for bad programmers to write bad | performing queries, but you can easily write performant code. | Btw that's a feature - letting people ramp up to full db | knowledge is beneficial, you don't want to be spending your | time writing performant queries before you need to. | topspin wrote: | > it makes it easy for bad programmers to write bad | performing queries | | That is true of every ORM in existence. The easiest thing | to do is naively follow the object graph in code, because | that's what the ORM gives you. If the ORM was to somehow | add friction here to encourage some other approach it would | be panned as "too hard!!1" and fade away into obscurity. | Joel_Mckay wrote: | The Monolith is often a marker of several naive assumptions. | | Yet some interesting patterns will emerge if teams accept some | basic constraints: | | 1. A low-cpu-power client-process is identical to a resource | taxed server-process | | 2. A systems client-server pattern will inevitably become | functionally equivalent to inter-server traffic. Thus, the | assumption all high performance systems degenerate into a hosted | peer-to-peer model will counterintuitively generalize. | Accordingly, if you accept this fact early, than one may avoid | re-writing a code-base 3 times, and trying to reconcile a bodged | API. | | 3. Forwarding meaningful information does not mean collecting | verbose telemetry, then trying to use data-science to fix your | business model later. Assume you will eventually either have | high-latency queuing, or start pooling users into siloed | contexts. In either case, the faulty idea of a single database | shared-state will need seriously reconsidered at around 40k | users, and later abandoned after around 13m users. | | 4. sharding only buys time at the cost of reliability. You may | disagree, but one will need to restart a partitioned-cluster | under heavy-load to understand why. | | 5. All complex systems fail in improbable ways. Eventually | consistent is usually better than sometimes broken. Thus, | solutions like Erlang/Elixir have been around for awhile... | perhaps the OTP offers a unique set of tradeoffs. | | 6. Everyone thinks these constraints don't apply at first. Thus, | will repeat the same tantalizing... yet terrible design | choices... others have repeated for 40+ years. | | Good luck, =) J | 39 wrote: | Strangely obvious advice? | iamwil wrote: | That no one likes to follow. | sfink wrote: | Well, the advice is rarely taken in practice. It is (in my | experience, and it seems common from others based on what I've | heard) very very common to jump to the complicated solution at | the first hint of capacity issues "because we'll need to do it | eventually anyway." | | The advice is obvious when you're thinking at that level of | abstraction. Which suggests that, in practice, people who are | architecting such systems rarely think at that level of | abstraction. Which is why it is nice to have posts like this, | that periodically remind us to get our heads out of the daily | minutiae and consider the bigger picture (of complexity | tradeoffs, realistic projections, staffing and availability, | etc.) | bryanlarsen wrote: | "Common sense is not so common." | | - Voltaire | joelshep wrote: | It might be obvious as far as it goes, but it's also incomplete | in at least two ways. One is that as tweaks and optimizations | and "supplementing the system in some way" often involves | increasing its complexity, even if just a little bit at a time. | It adds up with time. The more important thing is this: if | you're already constrained on vertical scaling, and you don't | have a firm grip on how fast your system is scaling, then you | can't just stop with making the db more efficient. That's just | postponing the inevitable, and possibly not for more than a | couple of years. If you're in the position the author portrays, | get the database under control first -- for sure -- but then | get started on figuring out how you're going to stay in front | of your scaling problem, whether that's rearchitecture, off- | loading work to systems better suited for it, or whatever. | Speaking as a former owner of a very large Amazon database that | fought this battle many times, trying to buy enough dev time to | build away from it before it completely collapsed. We were too | content with performance improvements just like the ones | described in this article, before finally recognizing we were | just racing the clock. | huijzer wrote: | > We should always put off significant complexity increases as | long as possible. | | Reminds me of the mantra that I've read here to easily go for | reversible things and very careful when going for irreversible | things. | sssspppp wrote: | Amazon's one way vs two way door decisions echo the sentiment | sainez wrote: | It is mentioned in this article about the inception of AWS's | custom silicon: https://semiconductor.substack.com/p/on-the- | origins-of-aws-c... | | > "We use the terms one-way door and two-way door to describe | the risk of a decision at Amazon. Two-way door decisions have a | low cost of failure and are easy to undo, while one-way door | decisions have a high cost of failure and are hard to undo. We | make two-way door decisions quickly, knowing that speed of | execution is key, but we make one-way door decisions slowly and | far more deliberately." | TX81Z wrote: | Really curious how much can be attributed to using an ORM. | kunalgupta wrote: | I would definitely do the opposite of this - 3 months is a while | and i think the cost of complexity would take a long time before | it comparec | phirschybar wrote: | I agree with this approach. the other added benefit is that when | they decided to optimize the app by eliminating or tuning queries | and utilizing replicas for reads, they ultimately made the app | much more performant while possibly reducing complexity. the | "squeeze" mindset pays off in the long-run here. the continued | optimization over time is infinitely better than adding the | complexity of microservices or expanded infrastructure because | the latter will simply bury and compound the potential | optimizations which could AND SHOULD have been made. squeeze | squeeze squeeze until you just can't squeeze any more! | nathias wrote: | Complexity in software is bad, things can be bad and necessary. | It's bad in itself, but sometimes it can provide new | functionality... | alfalfasprout wrote: | The problem is this is also a myopic way of looking at things. | What you should be looking at is also operational complexity. | What's the current burden on your org/team maintaining what you | currently have? What about when you need to scale even higher? | | A lot of teams that think this way end up with really high oncall | burdens and then never have the time to even iterate on their | infrastructure. | iblaine wrote: | TL;DR; do the easy things fist, in this case it was to fix bad | SQL | | Given the options to optimize SQL, move read operations to | replicas, shard data or go towards micro services, optimizing SQL | is the easy choice. | bayindirh wrote: | Actually, I disagree. The "TL;DR:" in the article is "first | outgrow, then upgrade". In today's software development | practice, efficiency is second class citizen, because moving | fast and breaking things is the way to keep the momentum and be | hip. | | However, sometimes everyone needs to chill and sharpen the tool | they have at hand. It might prove much more capable than first | anticipated. Or you may be holding the tool wrong to a degree. | maxboone wrote: | Relevant blog on improving PostgreSQL performance on ZFS: | https://news.ycombinator.com/item?id=29647645 | notnmeyer wrote: | haha, when i read their initial thoughts were write-sharding and | microservices i whispered "wtf?" to myself. | | glad to see there was a better ending to the story though. | discussDev wrote: | It's the boring solution. It should also only be the default | answer if you are not building a super critical system to life | and limb. But it certainly gives a much lower total cost of | ownership. If you don't have the resources for some big redundant | system, I've too often seen the complexity added by the redundant | system be the issue then focusing on simplicity. If you need to | add a bunch of people to support complexity but both the money | and the risk assessment doesn't call for it, simpler is much | better. I won't say I haven't seen the issue where eventually it | was only a huge project to go forward, but I tend to think | sometimes even that is less then the sum of having dealt with | complexity to that point, it's dependent on a lot about what you | are building. | alfor wrote: | I wonder if moving the db on beefy dedicated hardware with ton of | ram and nvme would solve the problem. Preferably physicaly | connected to the web serveurs. | | Cost: a fraction of the developper cost. | | I see so many things done on the cloud that 10X their complexity | because of it. Modern hardware in increadibly powerfull. | sakopov wrote: | I thought I was going to read something insightful. Instead it | was a post about how to completely ignore your database | performance and then consider overcomplicating everything with | sharding and microservices because you didn't care to do basic | profiling on your queries. I'm glad common sense prevailed, but | this is really some junior-level stuff and it's being celebrated | as some kind of novelty. | account-5 wrote: | I suppose it seems obvious in hindsight that your first move | should always be to investigate potential causes before a | wholesale redesign that adds potentially unnecessary complexity | to your system. | exabrial wrote: | This is amazing advice. A side note is to use the hell out of | replication. These things don't have to be complicated. Setup a | readonly and a readwrite datasource/connection pool in your app | if you have to. | i_like_apis wrote: | I'm reminded of one of my favorite sayings: | | _You go to war with the army you have, not the army you might | want or wish to have at a later time._ | | You may want to ignore that this this comes from Donald Rumsfeld | (he has some great ones though: "unknown unknowns ...", etc.) | | I think about this a lot when working on teams. Everyone is not | perfectly agreeable or has the same understanding or collective | goals. Some may be suboptimal or prone to doing things you don't | prefer. But having a team is better than no team, so find the | best way to accomplish goals with the one you have. | | It applies to systems well too. | fuzztester wrote: | "No battle plan survives contact with the enemy." | | https://www.google.com/search?q=no+battle+plan+survives | sbuk wrote: | Mike Tyson said it more simply: "Everybody has a plan until | you get hit in the face." | fuzztester wrote: | https://en.m.wikipedia.org/wiki/Helmuth_von_Moltke_the_Elder | | Moltke's thesis was that military strategy had to be | understood as a system of options, since it was possible to | plan only the beginning of a military operation. As a result, | he considered the main task of military leaders to consist in | the extensive preparation of all possible outcomes.[3] His | thesis can be summed up by two statements, one famous and one | less so, translated into English as "No plan of operations | extends with certainty beyond the first encounter with the | enemy's main strength" (or "no plan survives contact with the | enemy") and "Strategy is a system of expedients".[18][8] | Right before the Austro-Prussian War, Moltke was promoted to | General of the Infantry.[8] | makeitdouble wrote: | I'm thinking about this quote for a while but have a hard time | squeezing the meaning, or really the actionable part out of it. | | The unknown unknowns quote brings the concept that however | confident you are in a plan you absolutely need margin. The | other quote thought...what do you do differently when | understanding that your team is not perfect ? | | On one side, outside of VC backed startups I don't see | companies trying to reinvent linux whith a team of 4 new | graduates. On the other side companies with really big goals | will hire a bunch until they feel comfortable with their talent | before "going to war". You'll see recruiting posts seeking | specialists in a field before a company bets the farm on that | specific field (imagine Facebook renaming itself to Meta before | owning Oculus...nobody does that[0]) | | Edit: sorry, I forgot some guy actually just did that 2 weeks | ago with a major social platform. And I kinda wanted to forget | about it I think. | sainez wrote: | Great point about working on teams. For the vast majority of | tasks, people are only marginally better or worse than each | other. A few people with decent communication will outpace a | "star" any day of the week. | | I try to remind myself of this fact when I'm frustrated with | other people. A bit of humility and gratitude go a long way. | tedunangst wrote: | Mattis "the enemy gets a vote" is another good reminder of | reality, although people get very angry about it. Useful in | terms of security, privacy, DRM, etc. | walterbell wrote: | Product management outside the box. | Buttons840 wrote: | I like a similar quote from Steven Pressfield: | | "The athlete knows the day will never come when he wakes up | pain-free. He has to play hurt." | | This applies to ourselves more than our systems though. | roughly wrote: | Rumsfeld's got some great quotes, most of which were delivered | in the context of explaining how the Iraq war turned into such | a clusterfuck, and boy could that whole situation have used the | kind of leadership Donald Rumsfeld's quotes would lead you to | believe the man could've provided. | xapata wrote: | > could've | | If someone is 83.7% likely to provide good leadership, how | would you evaluate the choice to hire that person as a leader | in the hindsight that the person failed to provide good | leadership -- was it a bad choice, or was it a good choice | that was unlucky? | | (Likelihood was selected arbitrarily.) | hluska wrote: | Like everything in politics, I think this is a function of | what team you cheer for. If your goal was to come up with | an excuse to invade Iraq, that person was an excellent | choice. If you're on the other team, what a clusterfuck. | | Then you add in a party system and it gets more | complicated. Realistically, you don't get to be the United | States Secretary of Defense (twice) if you're the kind of | person who will ignore the will of the party and whoever is | President. | whatshisface wrote: | > _quotes would lead you to believe_ | dragonwriter wrote: | > Rumsfeld's got some great quotes, most of which were | delivered in the context of explaining how the Iraq war | turned into such a clusterfuck | | If by "explaining how" you mean "deflecting (often | preemptively) responsibility for", yes. | marcosdumay wrote: | If I remember it correctly (it was a long time ago), he never | fully supported the war. It didn't take a genius to notice | that the goals set by the presidency were (literally) | impossible and not the kind of thing you do achieve a war. | | But whatever position he had, Iraq turning into a clusterfuck | wasn't a sign of bad leadership by his part. It was a sign of | bad ethics, but not leadership. His options were all of | getting out of his position, disobeying the people above him, | or leading the US into a clusterfuck. | mickdeek86 wrote: | Rumsfeld personally advanced the de-baathification | directive - the lynchpin of the clusterfuckery - all on his | own, and he certainly would have known to expect the | 'unexpected' results to be similar to de-nazification. This | was absolutely his choice. Another point you have | (unintentionally?) brought up is the dignified resignation | option. While it is often a naive, self-serving gesture, we | can reasonably imagine that the Defense Secretary publicly | resigning over opposition to a war during the public | consideration of that war, might have had some effect on | whether that war was started. I want to like him too, with | his grandfatherly demeanor and genuine funnyness ("My god, | were there so many vases?!") but, come on. | moffkalast wrote: | Could've at least given them some motivational quotes. | hluska wrote: | I like to remind myself that very few people reach positions | of great power after mediocre lives. Rather there's a thread | of talent that runs through government. | | Once they're in, the predilections that led to power often | rear their dark long tails. But they're all (even the ones I | disagree with) talented. | patmcc wrote: | They're talented at getting into power, and may be talented | at any number of other things. | | They're not always talented at the things we may want them | to be, unfortunately. And that's true of both the ones I | agree and disagree with. | KnobbleMcKnees wrote: | That was Donald Rumsfeld!? I always assumed this came from some | techie or agile guru given how much it's used as a concept in | project planning. | a_seattle_ian wrote: | That it came from Donald Rumsfeld in the context of what we | know now and what he surely knew then is why it's such a good | quote. The words basically say nothing but are also true | about everything. So it can implicit be a warning that there | is probably some bullshit going on or someone has a sense of | humor and is also warning people while also avoiding the | subject - of course just my opinion. How people actually use | it will depend what the audience agrees it to mean. | [deleted] | midasuni wrote: | And unknown unknowns is a great way to communicate with | stakeholders too | roughly wrote: | Zizek has a followup to that quote: | | "What he forgot to add was the crucial fourth term: the | "unknown knowns," the things we don't know that we know." | | I've found it's really critical during the project planning | phase to get to not just where the boundaries of our | knowledge are, but also where are the things we're either | tacitly assuming or not even aware that we've assumed. An | awful lot of postmortems I've been a part of have come down | to "It didn't occur to us that could happen." | munificent wrote: | _> An awful lot of postmortems I 've been a part of have | come down to "It didn't occur to us that could happen." _ | | Would that not be an unknown unknown? | roughly wrote: | Usually there's a tacit assumption of how the system | works, how the users are using the system, or something | else about the system or the environment that causes that | - it's not that the answer wasn't known, it's that it was | assumed to be something it wasn't and nobody realized | that was an assumption and not a fact. | thfuran wrote: | That's just an unknown unknown masquerading as a known | known. | waprin wrote: | I really enjoy the concept of unknown knowns, but I don't | agree with your example, which is an unknown unknown. | | To me the corporate version of the unknown known is when | a a project is certainly doomed, for reasons everyone on | the ground knows about, yet nobody wants to say anything | and be the messenger that inevitably gets killed, as long | as paycheck keeps clearing. An exec ten thousand feet | from the ground sets a "vision" which can't be blown off | course by minor details such as reality, until the day it | does. | | Theranos is a famous example of this but I've had less | extreme versions happen to me many times throughout my | career. | | Another example of unknown knowns might be the conflict | between companies stated values (Focus on the User) and | the unstated values that are often much more important | (Make Lots of Money) | killjoywashere wrote: | As a military officer who was watching CNN live from inside | an aircraft carrier (moored) when he said that, being in | charge of anti-terrorism on the ship at the time, it was | absolutely foundational to my approach to so many things | after that. Here's the actual footage: | https://www.youtube.com/watch?v=REWeBzGuzCc | | Rumsfeld was complicated, but there's no doubt he was very | effective at leading the Department. I think most people fail | to realize how sophisticated the Office of the Secretary of | Defense is. Their resources reel the mind, most of all the | human capital, many with PhDs, many very savvy political | operators with stunning operational experiences. As a small | example, as I recall, Google's hallowed SRE system was | developed by an engineer who had come up through the ranks of | Navy nuclear power. That's but one small component reporting | into OSD. | | Not a Rumsfeld apologist, by any means. Errol Morris did a | good job showing the man for who he is, and it's not pretty | (1). But reading HN comments opining about the leadership | qualities of a Navy fighter pilot who was both the youngest | and oldest SECDEF makes me realize how the Internet lets | people indulge in a Dunning-Kruger situation the likes of | which humanity has never seen. | | https://www.amazon.com/Known-Donald-Rumsfeld/dp/B00JGMJ914 | michael1999 wrote: | I'll support you there. In any sensible reading of | Nuremberg, they all deserve to hang from the neck until | dead. But the central moral failure was Bush. Letting | Cheney hijack the vp search, and then pairing him up with | Rumsfeld was a bad move, and obviously bad at the time. | Those two had established themselves as brilliant but | paranoid kooks with their Team B fantasies in the 70s, and | should never have been allowed free rein. | oDot wrote: | Every time I hear the name Rumsfeld, I am reminded of the time | when, for over 10 minutes, he refused to deny being a lizard: | | https://www.youtube.com/watch?v=XH_34tqxAjA | macNchz wrote: | In my experience, in web apps built on top of ORMs there is often | a TON of low hanging fruit for query optimization when database | load becomes an issue. Beyond the basics of "do we have N+1 | issues", ORMs sometimes just don't generate optimal queries. I | wouldn't want to built a complex production web app _without_ an | ORM, but being able to eject from it sometimes is key. | | Profile real world queries being run in production that use the | most resources. Take a look at them. Get a sense of the shape of | the tables that they're running against. Sometimes the ORM will | be using a join where you actually want a subquery. Sometimes the | opposite. Sometimes you'll want to aggregate some results | beforehand, or adjust the WHERE conditions in a complex join. | I've seen situations where a semi-frequent ORM-generated query | was murdering the DB, taking 20+ seconds to run, and with a few | minor tweaks it would run in less than a second. | nerdponx wrote: | I'm working on something right now with the Python ORM | SQLAlchemy. It turns out that getting it to use RETURNING with | INSERT is not trivial and requires you to set the non-obvious | option `expire_on_commit=False`, which doesn't _guarantee_ use | of RETURNING, but is supposed to use it if your db driver and | database happen to support it and the ORM happens to support it | for that particular combination of driver and database. And | there 's no API to actually inspect the generated SQL even | though it's emitted in the logs, so there's no way to enforce | the use of RETURNING in your test suite without capturing and | scraping your own logs (which fortunately is very easy within | the Pytest framework). | | I like ORMs but this is just frustratingly complicated on so | many levels. I also understand that SQLAlchemy is an enormous | library and not everything will be easy. But I think this case | exemplifies the trade-offs involved with using an ORM. | | (Yes I am aware that using insert() itself in Core does what I | want, I'm talking about .add()-ing an ORM object to an | AsyncSession). | bootsmann wrote: | There is certainly an API to inspect your query, you can just | call print() on the object iirc. | sheepz wrote: | Agree wholeheartedly with the conclusion of the article. | | But the post makes it seem that there was no real query-level | monitoring for the Postgres instance in place, other than perhaps | the basic CPU/memory ones provided by the cloud provider. Using | an ORM without this kind of monitoring is sure way to shoot | yourself in the foot with n+1 queries, queries not using | indexes/missing indexes etc | | The other thing that is amazing that everyone immediately reached | for redesigning the system without analyzing the cause of the | issues. A single postgres instance can do a lot! | PeledYuval wrote: | What's your recommended way of implementing this in a simple | App Server <> Postgres architecture? Is there a good Postgres | plugin or do you utilize something on the App side? | clintonb wrote: | We use Datadog, which centralizes logs and application | traces, allowing us to better pinpoint the exact request/code | path making the slow query. | sheepz wrote: | I've used pganalyze which is a non-free SaaS tool. Gives you | a very good overview of where the DB time is spent with index | suggestions etc. There are free alternatives, but require | more work from you. | gillh wrote: | Prioritized load shedding works well as a last resort [0]. The | idea is simple - | | - Detect overload/congestion build-up at the database | | - Apply queueing at the gateway service and schedule requests | based on their priority | | - Shed excess requests after a timeout | | [0]: https://docs.fluxninja.com/blog/protecting-postgresql- | with-a... | Xeoncross wrote: | > The real cost of increased complexity - often the much larger | cost - is attention. | | ...or just mental load. I'm tired of working on micro-service | systems that still have downtime, but no one knows how it all | works. Most are actually just distributed monoliths so changes | often touch multiple services and have to be rolled out in order. | Data has to be duplicated, tasks have to be synchronized, state | has to be shared, etc... | | https://www.youtube.com/watch?v=y8OnoxKotPQ | javajosh wrote: | This is a very common architectural smell, when you have | uservices and "no-one knows how they all work". The whole point | is that no-one can or should know how they all work; the fact | that someone does in order to fix or modify the system is a | strong signal that you've violated some of the rules - like | single responsibility, and proper abstraction through API. But, | in my experience, this is extremely common - debugging a | pipeline of N microservices often requires running and building | all N services locally. This is, strictly speaking, a monolith | + network partitions + (infinite) build/deploy variation. An | extremely challenging work environment that is ultimately | beyond any mortal programmer's ability, IMO. | mandevil wrote: | He says to avoid complexity, and the team he was on (cleaning up | some bad queries) was probably improving along that axis (or at | worst orthogonal to complexity) but, from having done exactly | this, adding an optional 'query the read-replica' option for | queries- and determining whether this query can safely go there- | is definitely extra complexity which will now need to be managed | into the future. Definitely less overall than a complete re- | arching of the system, but this is where engineering judgement | and experience come into play: would you be better off getting | those select queries resolved with some other data store or with | a pg read-replica? If your query can survive against the read- | replica (so stale data is at least sometimes acceptable) would | you be better off caching results in redis? | gwbas1c wrote: | > If your query can survive against the read-replica (so stale | data is at least sometimes acceptable) would you be better off | caching results in redis? | | Caching adds a lot of complexity. It denormalizes the data, and | now you "need to know" when to update the cache. Because "the | single source of truth" is no longer maintained, it's easy to | accidentally add regressions. | | If it's a matter of adding a read replica, that's a much better | solution, long-term, because you don't have the effort of "does | this query also need to update the cache?" | | (I'd think by now there would be a way to expose events in a DB | when certain tables are updated; and then (semi) automatically | invalidate the cache.) | pluto_modadic wrote: | ah... rails easy mode discovers rails is only performant if you | don't stray too far from hello world... | dbg31415 wrote: | I feel like a lot of people do this instead of upgrading to newer | versions, even maintenance patches. | | And I get that upgrades can be scary, but often they are | relatively low cost. | | Leaving everyone on the old system unhappy... means they will | eventually push to re-platform, or rebuild, instead of just doing | suggested maintenance along the way to keep they system they have | in good shape. | | My advice... do the maintenance. Do all the maintenance! Don't | just drive it into the ground and get mad when it breaks; change | the oil and tires and spring for a car wash and some new wiper | blades every now and then and you'll be happier in the long run. | zengid wrote: | The solution they went with, squeezing juice out of the system by | finding performance optimizations, brings me so much joy. | | It reminds me of Richard L. Sites's book | _Understanding_Software_Dynamics_ where he basically teaches how | to measure and fix latency issues, and how at large scales, | reducing latency can have tremendous savings. | | Measuring and reasoning about those issues are hard, but the | solutions are often simple. For example, on page 9 he mentions | that _" [a] simple change paid for 10 years of my salary."_ | | I hope to someday make such an impactful optimization! | canucker2016 wrote: | The problem I have with their eventual solution is that they | only optimized their queries AFTER they had upgraded their | instance to the largest config available. | | They couldn't upgrade their config with a few clicks in the | admin console anymore (I'm guessing what's involved here) so | now they had to use actual grey matter to fix their capacity | problem. Maybe if they had spent more time optimizing specific | parts of their code, they wouldn't even need such a large | config instance. | fritzo wrote: | > since our work touched many parts of the codebase and demanded | collaboration with lots of different devs, we now have a strong | distributed knowledge base about the existing system | | Great to see this cultural side-effect called out. | WallyFunk wrote: | > Of course, I'm not saying complexity is bad. It's necessary. | | Weird thing about computers, even after a fresh install of your | favorite OS, the whole thing is sitting on a mountain of | complexity, and that's _before_ you start installing programs, | browse the web, etc | | Only the die-hard use things like MINIX[0] to do their computing. | Correction: MINIX is in the Intel Management Engine so you have | /two/ computers. | | [0] https://en.wikipedia.org/wiki/Minix | deathanatos wrote: | > _Split up the monolith into multiple interconnected services, | each with its own data store that could be scaled on its own | terms._ | | Just to note: you don't have to split out all the possible | microservices at this junction. You can ask, "what split would | have the most impact?" | | In my case, we split out some timeseries data from Mongo into | Cassandra. Cass's table structure was a much better fit -- that | dataset had a well defined schema, so Cass could pack the data | much more efficiently; for that subset, we didn't need the | flexibility of JSON docs. And it was the bulk of our data, and so | Mongo was quite happy after that. Only a single split was | required. (And technically, we were a monolith before and after: | the same service just ended up writing to two databases.) | | Ironically, later, an airchair architect wanted to merge all the | data into a JSON document store, which resulted in numerous | "we've been down that road, and we know where it goes" type | discussions. | kedean wrote: | Funny enough I frequently have the opposite problem, justifying | repeatedly why Cassandra is a bad fit for relatively short | lived, frequently updated data (tombstones, baby). | deathanatos wrote: | I'd agree with you there. | | The specific data that went into Cassandra in our case was | basically immutable. (And somehow, IIRC, we _still_ had | issues around tombstones. I am not a fan of them.) Cassandra | 's tooling left much to be desired around inspecting the | exact state of tombstones within the cluster. | kreetx wrote: | In a way, in the article they also did a split: specific heavy | select queries were offloaded to a replica. | agentultra wrote: | They could probably squeeze more depending on their workload | patterns. RDBMS' typically optimize for fast/convenient | writes. If your write load would be fine with a small | increase in latency then you can do a lot of de-normalization | so that your reads can avoid using tonnes of joins, | aggregates, windows, etc at read-time. Update write path so | that you update all of the de-normalized views at write time. | | Depending on your read load and application structure you can | get a lot more scale with caching. | | Decent article. | mamcx wrote: | Is interesting that the idea of micro services is throw like a | obvious "solution". | | Is not. | | "Scale-up" MUST be the "obvious" solution. What is missed by | many, and this article touch (despite saying that micro- | services is a "solid" choice) is that "Scale-up" is "scale-out" | without breaking the consistency of the DB. | | Is a lot you can do to squeeze, and is rare you need to ignore | join, data validations and other anti-patterns that are | normally trow casual when problems of performance happens. | deathanatos wrote: | I don't know what to tell you other than I've seen vertical | scaling hit its ceiling, several times. The OP lists "scale | vertically first" as a given; to an extent, I agree with it, | and the comment you're responding to is made with that as a | base assumption. | | There are sometimes diminishing returns to simple scaling; | e.g., in my current job, each new disk we add adds 1/n disks' | worth of capacity. Each scaling step happens quicker and | quicker (assuming growth of the underlying system). | Eventually, you hit the wall in the OP, in that you need | design level changes, not just quick fixes. | | The situation I mention in my comment was one of those: we'd | about reached the limits of what was possible with the setup | we had. We were hitting things such as bringing in new nodes | was difficult: the time for the replica to replicate was | getting too long, and Mongo, at the time, had some bug that | caused like a ~30% chance that the replica would SIGSEGV and | need to restart the replication from scratch. Operationally, | it was a headache, and the split moved a _lot_ of data out | that made these cuts not so bad. (Cassandra did bring its own | challenges, but the sum of the new state was that it was | better than where we were.) | | Consistency is something you must pay attention to. In our | case, the old foreign key between the two systems was the | user ID, and we had specific checks to ensure consistency of | it. ___________________________________________________________________ (page generated 2023-08-11 23:00 UTC)