[HN Gopher] SRE Doesn't Scale
       ___________________________________________________________________
        
       SRE Doesn't Scale
        
       Author : kiyanwang
       Score  : 46 points
       Date   : 2021-10-11 06:22 UTC (1 days ago)
        
 (HTM) web link (bravenewgeek.com)
 (TXT) w3m dump (bravenewgeek.com)
        
       | pram wrote:
       | "Hiring experienced, qualified SREs is difficult and costly.
       | Despite enormous effort from the recruiting organization, there
       | are never enough SREs to support all the services that need their
       | expertise."
       | 
       | Uh huh. Maybe what isn't scaling is their onerous recruiting
       | filters.
        
         | anonporridge wrote:
         | A different element of this is that software professionals are
         | often paid so well that a lot of people can take early
         | retirement relatively easily if they're good about saving and
         | investing. And the FIRE movement is only growing.
         | 
         | I sometimes wonder if the software world has relatively greater
         | 'leakage' via early retirement than other fields, creating a
         | constant problem of not enough highly experienced people who
         | remain stuck as wage slaves throughout a 40 year career.
        
         | bostonsre wrote:
         | It's good to be an sre. From my experience, there is high
         | demand, low supply and schools don't really teach for sre roles
         | so the pipeline stays small.
        
           | outworlder wrote:
           | Yes.
           | 
           | And then there's the unrealistic demands from companies that
           | didn't even understand what DevOps was supposed to be, and
           | are misunderstanding SRE even more. Many companies still
           | treats this the same way they did "ops".
        
         | zaptheimpaler wrote:
         | Listen, if you can't write a working text justification
         | algorithm in 45 minutes a decade after college, you have no
         | business doing completely unrelated SRE stuff like monitoring
         | services or scaling databases. Everybody knows leetcode is the
         | key to software. If you can't do that, maybe you just don't
         | have the right IQ to build the next chat app @ Google.
        
           | GauntletWizard wrote:
           | This but unironically. If you can't do a simple programming
           | task, (we can argue over what text justification is, because
           | "left-pad" could qualify, as could a full TrueType renderer)
           | you don't belong in SRE - Much of the point is to have cross-
           | functional people, who can interface with developers on their
           | level, _and_ do the math and grunt work around operating a
           | database cluster.
        
             | zaptheimpaler wrote:
             | I once met a man who was intimately familiar with the
             | details of the linux kernel and how the new chiplet
             | architecture in AMD processors resembles a NUMA
             | architecture and thus impacts VM performance. He was well
             | versed in shell scripting, k8s, docker, the principles of
             | observability, and infrastructure as code. He could explain
             | the difference between READ COMMITED and REPEATABLE READ or
             | LSTMs or distributed consistency models off the top of his
             | head. He didn't have a CS degree so obviously he wasn't as
             | intelligent as me, but yet I found him a little
             | intimidating for some reason.
             | 
             | But then I asked him -
             | 
             | "Given an array of positive integers target and an array
             | initial of same size with all zeros.
             | 
             | Return the minimum number of operations to form a target
             | array from initial if you are allowed to do the following
             | operation:                   Choose any subarray from
             | initial and increment each value by one.
             | 
             | "
             | 
             | He was stumped. As I had suspected, he wasn't quite up to
             | the job of an SRE. I immediately failed him and went about
             | editing my networking.yaml file. Someone has to maintain
             | the bar around here..
        
         | [deleted]
        
       | karmakaze wrote:
       | To expand on the brief title
       | 
       | > Google [...] says the SRE model [...] does not scale with
       | microservices. Instead, they go on to describe a more tractable,
       | framework-oriented model to address this through things like
       | codified best practices, reusable solutions, standardization of
       | tools and patterns
       | 
       | Basically anyone planning on microservices should define and
       | monitor bounds on which frameworks, tools and diversity of design
       | patterns in use. Good advice at any scale.
        
       | igetspam wrote:
       | > Google enforces standards and opinions around things like
       | programming languages, instrumentation and metrics, logging, and
       | control systems surrounding traffic and load management.
       | 
       | I think the author read this as more of a problem than a
       | solution. This concept is supported by the DevOps model too. Your
       | infrastructure is just as much a part of your product and the
       | teams providing the infrastructure just as responsible for the
       | service levels and API contracts as any customer facing product
       | team.
        
       | klodolph wrote:
       | It sounds like the lesson here is that tacking on SRE to an out-
       | of-control development process, churning out new services by the
       | boatload, doesn't scale.
       | 
       | This is caused by the typical attitude software companies have
       | towards development. The most common model for software
       | development is simple... rush to push out new features to the
       | market, and pay the costs later--and then everyone is balking at
       | the costs.
       | 
       | The solution is kind of brilliant, IMO. You don't have to pay
       | technical debt on projects if you shut them down and delete the
       | code from your repository. Migrate to industry-standard
       | solutions. Use off-the-shelf programs & libraries. Delete all
       | your custom stuff. Replace good solutions with "good enough"
       | solutions.
       | 
       | The SREs can help you with that, but they can't help with out-of-
       | control development. As your code base gets larger, the cost of
       | supporting that code base gets larger too. The difficulty of
       | scaling your SRE to match development reflects your out-of-
       | control development process, not a problem with SRE. Keep the
       | costs under control by keeping your code base under control.
        
         | bcrosby95 wrote:
         | Funny, I read it differently. They talk about frameworks,
         | libraries, and best practices.
         | 
         | Effectively, they're talking about standardization across your
         | teams/services so they don't fuck things up. Essentially,
         | you're taking away some of the purported freedoms of
         | microservices (complete independence - eg I can write this
         | service in brainfuck if I want!) and reigning it in a bit so
         | you don't build a pile of trash.
        
           | klodolph wrote:
           | I think of that kind of standardization kind of like deleting
           | code. Stuff like, "We are deprecating support for Python in
           | SRE, no new projects may be shipped in Python."
        
         | lykr0n wrote:
         | Yep. SRE is not a substitute for high level, overarching
         | architects and designers.
         | 
         | One pattern I see is that, as the company grows the development
         | gets split into different product groups which will organically
         | diverge unless there is rigid enforcement of design patterns.
         | In some places, SRE does this implicitly because they will only
         | support X, Y, or Z but in others each product group will have
         | their own group of SREs.
         | 
         | There becomes a point when you need one or a small group of
         | people who are the opinionated developers who can make design
         | decisions and who have the authority to cause everyone else to
         | course correct. If you don't have this, you'll wind up with
         | long migrations and legacy stuff that never seems to go away.
        
         | rektide wrote:
         | My read on the article was that much more was related to each
         | team being on their own to set up & drive their pipelines,
         | operate their own services, and there being a lack of
         | commonality/shared experience.
         | 
         | A vast number of the software engineers don't get the ops
         | (running software) stuff hardly at all & half of them can sort
         | of play along, hack stuff into place. The engineers on product
         | teams who do know how to do things meanwhile don't get all the
         | constraints, best practices, ideas that other various DevOps
         | folk have done & have their own wants/desires/expected ways of
         | doing things, so they end up creating their own very unique
         | sub-ways of doing things within the org. None of these
         | practices converge on regularity or consistency with what
         | DevOps machinery ends up being built.
         | 
         | What we do have often is just a random pile of containers and
         | scripts that a couple people sort of know decently & everyone
         | else suffers through & survives within. Almost never does it
         | look like any other company's devops kitchen.
         | 
         | SRE doesn't scale because it's an every now and then thing, and
         | few people notice or care about the difference between a well-
         | built corporate citizen that runs well & is monitored &
         | operated according to whatever the in-power SRE cabal wants.
         | People start to care only if things are going bad, either via
         | services not building/integrating/deploying/running as well as
         | they should, or from too much confoundedness/general head
         | scratching by either the SRE or regular engineers. SRE is not a
         | priority, it's not practiced regular, it's only an every-now-
         | and-then thing, so we don't have the chance to get good, to
         | institutionalize the right ways of doing things. That's what
         | the articles is discussing. Not the rest of the everyday normal
         | software development rushing-bedlam you describe.
        
           | klodolph wrote:
           | > SRE doesn't scale because it's an every now and then thing,
           | ...
           | 
           | That's the part that doesn't scale... tacking on SRE at the
           | end, or doing it every now and then. The reason people don't
           | care about the software being a "well-built corporate
           | citizen" is because they care more about shipping features.
           | If you have an SRE team that will say "no" to you when you
           | try to ship new stuff, you'll eventually figure out a way to
           | build new things in a way that the SRE team will say "yes".
           | When I say "no", that could be a hard pushback like "no,
           | that's not getting shipped" or it could be an answer like,
           | "no, the SRE team will not support that, yet."
           | 
           | These kind of decisions need to be made at a high level,
           | because everyone in the institution is typically operating
           | with the wrong incentives. That's why you end up with a
           | random pile of containers and scripts. It doesn't have to end
           | up that way, even when you have microservices.
           | 
           | > That's what the articles is discussing. Not the rest of the
           | everyday normal software development rushing-bedlam you
           | describe.
           | 
           | I disagree with the article, so necessarily there are going
           | to be differences between what I'm saying and what the
           | article is saying.
        
         | [deleted]
        
       | gautamdivgi wrote:
       | > And that move to microservices--in combination with cloud--
       | unleashes a whole new level of autonomy and empowerment for
       | developers who, often coming from a more restrictive ops-
       | controlled environment on prem, introduce all sorts of new
       | programming languages, compute platforms, databases, and other
       | technologies.
       | 
       | You need standards, without that SRE is pointless. Everything
       | needs a standard method of monitoring. As an e.g. - stick to
       | Java/Spring Boot, MariaDB and K8S. That will generally cover 85%
       | of your use cases.
       | 
       | The automation and advantage of SRE is derived through standards
       | and familiarity with the tool chain.
        
       | mbesto wrote:
       | Isn't this more of a comment about microservices than it is about
       | SRE? It reads to me like "once you hit a number of microservices
       | it ends up looking like a monolith":
       | 
       | http://highscalability.com/blog/2020/4/8/one-team-at-uber-is...
        
         | iamstupidsimple wrote:
         | Forgive me but aren't 'macroservices' just... services? I don't
         | see the difference.
        
         | wara23arish wrote:
         | dumb question time but what exactly makes something a micro
         | service.
         | 
         | Is the separation of a specific functionality from a wider
         | array of functions to its own vm make it a microservice?
         | 
         | When does something stop being a microservice i guess?
        
           | thecleaner wrote:
           | My definition is separation of infrastructure and deployment
           | cycles. Everything that always in one deployment is one
           | service or stuff thats part of your code-base is definitely
           | not a different service.
        
           | igetspam wrote:
           | It stops being a microservice when a developer starts saying,
           | "oh! We can do X in service Y too! It already does ${similar
           | work} and reads/writes from/to ${data source}, so why not?"
           | 
           | The intended model is to do one thing, thus enabling surgical
           | changes to functionality without having to rebuild
           | everything. As long as you stick to your API contracts, you
           | can muck around with the internals without effecting anything
           | else.
        
           | forty wrote:
           | I remember asking a candidate whether they were doing
           | microservices at her current job.
           | 
           | She answered "I don't know if we have microservices, but we
           | do have services that don't do much"
           | 
           | It's since then that's my definition of a microservice :)
        
           | notyourday wrote:
           | > dumb question time but what exactly makes something a micro
           | service.
           | 
           | This leftpad as a service, over HTTPS
        
       ___________________________________________________________________
       (page generated 2021-10-12 23:00 UTC)