[HN Gopher] SRE Doesn't Scale ___________________________________________________________________ SRE Doesn't Scale Author : kiyanwang Score : 46 points Date : 2021-10-11 06:22 UTC (1 days ago) (HTM) web link (bravenewgeek.com) (TXT) w3m dump (bravenewgeek.com) | pram wrote: | "Hiring experienced, qualified SREs is difficult and costly. | Despite enormous effort from the recruiting organization, there | are never enough SREs to support all the services that need their | expertise." | | Uh huh. Maybe what isn't scaling is their onerous recruiting | filters. | anonporridge wrote: | A different element of this is that software professionals are | often paid so well that a lot of people can take early | retirement relatively easily if they're good about saving and | investing. And the FIRE movement is only growing. | | I sometimes wonder if the software world has relatively greater | 'leakage' via early retirement than other fields, creating a | constant problem of not enough highly experienced people who | remain stuck as wage slaves throughout a 40 year career. | bostonsre wrote: | It's good to be an sre. From my experience, there is high | demand, low supply and schools don't really teach for sre roles | so the pipeline stays small. | outworlder wrote: | Yes. | | And then there's the unrealistic demands from companies that | didn't even understand what DevOps was supposed to be, and | are misunderstanding SRE even more. Many companies still | treats this the same way they did "ops". | zaptheimpaler wrote: | Listen, if you can't write a working text justification | algorithm in 45 minutes a decade after college, you have no | business doing completely unrelated SRE stuff like monitoring | services or scaling databases. Everybody knows leetcode is the | key to software. If you can't do that, maybe you just don't | have the right IQ to build the next chat app @ Google. | GauntletWizard wrote: | This but unironically. If you can't do a simple programming | task, (we can argue over what text justification is, because | "left-pad" could qualify, as could a full TrueType renderer) | you don't belong in SRE - Much of the point is to have cross- | functional people, who can interface with developers on their | level, _and_ do the math and grunt work around operating a | database cluster. | zaptheimpaler wrote: | I once met a man who was intimately familiar with the | details of the linux kernel and how the new chiplet | architecture in AMD processors resembles a NUMA | architecture and thus impacts VM performance. He was well | versed in shell scripting, k8s, docker, the principles of | observability, and infrastructure as code. He could explain | the difference between READ COMMITED and REPEATABLE READ or | LSTMs or distributed consistency models off the top of his | head. He didn't have a CS degree so obviously he wasn't as | intelligent as me, but yet I found him a little | intimidating for some reason. | | But then I asked him - | | "Given an array of positive integers target and an array | initial of same size with all zeros. | | Return the minimum number of operations to form a target | array from initial if you are allowed to do the following | operation: Choose any subarray from | initial and increment each value by one. | | " | | He was stumped. As I had suspected, he wasn't quite up to | the job of an SRE. I immediately failed him and went about | editing my networking.yaml file. Someone has to maintain | the bar around here.. | [deleted] | karmakaze wrote: | To expand on the brief title | | > Google [...] says the SRE model [...] does not scale with | microservices. Instead, they go on to describe a more tractable, | framework-oriented model to address this through things like | codified best practices, reusable solutions, standardization of | tools and patterns | | Basically anyone planning on microservices should define and | monitor bounds on which frameworks, tools and diversity of design | patterns in use. Good advice at any scale. | igetspam wrote: | > Google enforces standards and opinions around things like | programming languages, instrumentation and metrics, logging, and | control systems surrounding traffic and load management. | | I think the author read this as more of a problem than a | solution. This concept is supported by the DevOps model too. Your | infrastructure is just as much a part of your product and the | teams providing the infrastructure just as responsible for the | service levels and API contracts as any customer facing product | team. | klodolph wrote: | It sounds like the lesson here is that tacking on SRE to an out- | of-control development process, churning out new services by the | boatload, doesn't scale. | | This is caused by the typical attitude software companies have | towards development. The most common model for software | development is simple... rush to push out new features to the | market, and pay the costs later--and then everyone is balking at | the costs. | | The solution is kind of brilliant, IMO. You don't have to pay | technical debt on projects if you shut them down and delete the | code from your repository. Migrate to industry-standard | solutions. Use off-the-shelf programs & libraries. Delete all | your custom stuff. Replace good solutions with "good enough" | solutions. | | The SREs can help you with that, but they can't help with out-of- | control development. As your code base gets larger, the cost of | supporting that code base gets larger too. The difficulty of | scaling your SRE to match development reflects your out-of- | control development process, not a problem with SRE. Keep the | costs under control by keeping your code base under control. | bcrosby95 wrote: | Funny, I read it differently. They talk about frameworks, | libraries, and best practices. | | Effectively, they're talking about standardization across your | teams/services so they don't fuck things up. Essentially, | you're taking away some of the purported freedoms of | microservices (complete independence - eg I can write this | service in brainfuck if I want!) and reigning it in a bit so | you don't build a pile of trash. | klodolph wrote: | I think of that kind of standardization kind of like deleting | code. Stuff like, "We are deprecating support for Python in | SRE, no new projects may be shipped in Python." | lykr0n wrote: | Yep. SRE is not a substitute for high level, overarching | architects and designers. | | One pattern I see is that, as the company grows the development | gets split into different product groups which will organically | diverge unless there is rigid enforcement of design patterns. | In some places, SRE does this implicitly because they will only | support X, Y, or Z but in others each product group will have | their own group of SREs. | | There becomes a point when you need one or a small group of | people who are the opinionated developers who can make design | decisions and who have the authority to cause everyone else to | course correct. If you don't have this, you'll wind up with | long migrations and legacy stuff that never seems to go away. | rektide wrote: | My read on the article was that much more was related to each | team being on their own to set up & drive their pipelines, | operate their own services, and there being a lack of | commonality/shared experience. | | A vast number of the software engineers don't get the ops | (running software) stuff hardly at all & half of them can sort | of play along, hack stuff into place. The engineers on product | teams who do know how to do things meanwhile don't get all the | constraints, best practices, ideas that other various DevOps | folk have done & have their own wants/desires/expected ways of | doing things, so they end up creating their own very unique | sub-ways of doing things within the org. None of these | practices converge on regularity or consistency with what | DevOps machinery ends up being built. | | What we do have often is just a random pile of containers and | scripts that a couple people sort of know decently & everyone | else suffers through & survives within. Almost never does it | look like any other company's devops kitchen. | | SRE doesn't scale because it's an every now and then thing, and | few people notice or care about the difference between a well- | built corporate citizen that runs well & is monitored & | operated according to whatever the in-power SRE cabal wants. | People start to care only if things are going bad, either via | services not building/integrating/deploying/running as well as | they should, or from too much confoundedness/general head | scratching by either the SRE or regular engineers. SRE is not a | priority, it's not practiced regular, it's only an every-now- | and-then thing, so we don't have the chance to get good, to | institutionalize the right ways of doing things. That's what | the articles is discussing. Not the rest of the everyday normal | software development rushing-bedlam you describe. | klodolph wrote: | > SRE doesn't scale because it's an every now and then thing, | ... | | That's the part that doesn't scale... tacking on SRE at the | end, or doing it every now and then. The reason people don't | care about the software being a "well-built corporate | citizen" is because they care more about shipping features. | If you have an SRE team that will say "no" to you when you | try to ship new stuff, you'll eventually figure out a way to | build new things in a way that the SRE team will say "yes". | When I say "no", that could be a hard pushback like "no, | that's not getting shipped" or it could be an answer like, | "no, the SRE team will not support that, yet." | | These kind of decisions need to be made at a high level, | because everyone in the institution is typically operating | with the wrong incentives. That's why you end up with a | random pile of containers and scripts. It doesn't have to end | up that way, even when you have microservices. | | > That's what the articles is discussing. Not the rest of the | everyday normal software development rushing-bedlam you | describe. | | I disagree with the article, so necessarily there are going | to be differences between what I'm saying and what the | article is saying. | [deleted] | gautamdivgi wrote: | > And that move to microservices--in combination with cloud-- | unleashes a whole new level of autonomy and empowerment for | developers who, often coming from a more restrictive ops- | controlled environment on prem, introduce all sorts of new | programming languages, compute platforms, databases, and other | technologies. | | You need standards, without that SRE is pointless. Everything | needs a standard method of monitoring. As an e.g. - stick to | Java/Spring Boot, MariaDB and K8S. That will generally cover 85% | of your use cases. | | The automation and advantage of SRE is derived through standards | and familiarity with the tool chain. | mbesto wrote: | Isn't this more of a comment about microservices than it is about | SRE? It reads to me like "once you hit a number of microservices | it ends up looking like a monolith": | | http://highscalability.com/blog/2020/4/8/one-team-at-uber-is... | iamstupidsimple wrote: | Forgive me but aren't 'macroservices' just... services? I don't | see the difference. | wara23arish wrote: | dumb question time but what exactly makes something a micro | service. | | Is the separation of a specific functionality from a wider | array of functions to its own vm make it a microservice? | | When does something stop being a microservice i guess? | thecleaner wrote: | My definition is separation of infrastructure and deployment | cycles. Everything that always in one deployment is one | service or stuff thats part of your code-base is definitely | not a different service. | igetspam wrote: | It stops being a microservice when a developer starts saying, | "oh! We can do X in service Y too! It already does ${similar | work} and reads/writes from/to ${data source}, so why not?" | | The intended model is to do one thing, thus enabling surgical | changes to functionality without having to rebuild | everything. As long as you stick to your API contracts, you | can muck around with the internals without effecting anything | else. | forty wrote: | I remember asking a candidate whether they were doing | microservices at her current job. | | She answered "I don't know if we have microservices, but we | do have services that don't do much" | | It's since then that's my definition of a microservice :) | notyourday wrote: | > dumb question time but what exactly makes something a micro | service. | | This leftpad as a service, over HTTPS ___________________________________________________________________ (page generated 2021-10-12 23:00 UTC)