[HN Gopher] Oncall Compensation for Software Engineers
       ___________________________________________________________________
        
       Oncall Compensation for Software Engineers
        
       Author : kiyanwang
       Score  : 145 points
       Date   : 2022-08-07 18:52 UTC (4 hours ago)
        
 (HTM) web link (blog.pragmaticengineer.com)
 (TXT) w3m dump (blog.pragmaticengineer.com)
        
       | rb2k_ wrote:
       | One thing that this article is also kinda missing is the base
       | compensation. Some companies (e.g. FB) are in the top 90th
       | percentile or so of the industry but then don't pay on-call
       | compensation..
       | 
       | But that means that in one case you make 200k base, have an on-
       | call and don't get any extra on-call money. In the other case you
       | make 140k base, have an on-call and get 10k extra on-call money.
       | 
       | Ultimately you end up doing the same work but one of them gets
       | paid less.
        
       | almog wrote:
       | Do Amazon and Meta get away with zero oncall compensation policy
       | even in Western European countries?
        
       | rb2k_ wrote:
       | > Some companies hire dedicated tech people whose only job is to
       | be oncall, handle alerts, and improve the oncall infrastructure.
       | This role is called 'DevOps Engineer' at some companies, SRE
       | (Site Reliability Engineer) at others, and may also be called
       | 'Operations Engineer.'
       | 
       | Any company that I've seen put SRE/"DevOps"/... as the sole
       | primary on-call rotation basically just created a glorified
       | operations team.
       | 
       | Unless you have shared pain for botched releases, you will never
       | get rid of these problems.
        
       | [deleted]
        
       | skeeter2020 wrote:
       | I don't think most people here understand how labour laws work.
       | "On call" is a well defined concept and IME universally allowed.
       | Any Western jurisidictions DOES require you to pay someone when
       | they actually get called. If it's overtime or not follows the
       | regular rules for how that's calculated, same with time off and
       | maximum work periods.
       | 
       | What we're discussing here is how companies encourage and reward
       | (or don't) for the inconvenience and impact of staying near your
       | computer, not going out of town, or being woken up in the middle
       | of the night. None are going to pay you time and a half on
       | regular work commitments because you might get called.
       | 
       | Jobs like a fire fighter are completely different. They work a
       | scheduled shift and either respond to calls OR do other work
       | during that time. They're not really on-call as much as
       | prioritizing work. They also don't get 1.5x for their regular
       | scheduled work.
        
       | [deleted]
        
       | wpietri wrote:
       | I feel weirdly split on this. With my founder/leader hat on and
       | thinking about my own on-call time, I think of myself as always
       | on call. I also think it's my job to make it so that on-call
       | incidents are very rare.
       | 
       | But when I think about arbitrary companies and people having
       | regular jobs, I think that of course people should be
       | compensated. It's labor and we pay people for that. And
       | especially when it's more than just a few people, having on-call
       | time and incidents be uncompensated means a broken feedback loop.
       | The company should have strong incentives to make sure that on-
       | call people don't suffer for the sloppiness of others.
       | 
       | Part of that is small vs big, or startup vs established. But
       | there's some part of me that seems too reluctant to insist on
       | proper compensation for my own on-call time. Clearly something I
       | need to chew on before I next take a job at someplace larger than
       | a few people.
        
         | WFHRenaissance wrote:
         | My thinking is that if you're giving someone equity in a
         | company, they have a market incentive to ensure that the
         | product and company succeed. This extends to on-call.
         | Handcuffs.
        
           | almost_usual wrote:
           | As someone who has played this game the equity is probably
           | worthless unless it's a public company. I'll grind for
           | hundreds of thousands annually in RSUs, nothing less.
        
       | falcolas wrote:
       | FWIW, the magic HR word is "accommodation". Neither your managers
       | or HR themselves will tell you this magical word. And you'll want
       | to have a psychiatrist to back you up.
       | 
       | Being on call is super stressful, and if it's causing burnout,
       | you don't need to keep doing it. Does this increase the burden on
       | your teammates? Yes. But so would you burning out.
        
         | babyshake wrote:
         | As in, "I have a health issue that requires the reasonable
         | accommodation of only working 9-5 hours, here is a doctor's
         | note."? Is it really that simple?
        
           | falcolas wrote:
           | Yes. Reasonable accomodations for physical issues are
           | mandated by the ADA, and recently they have started applying
           | it to mental issues as well.
           | 
           | Its worth going through; the worst that can happen is they
           | say no (or, admittedly, fabricate a reason to fire you), but
           | you'll know where you stand.
        
         | matheusmoreira wrote:
         | > Being on call is super stressful
         | 
         | Absolutely. Being on call means we have to be ready to respond.
         | Can't ever fully relax, can't make plans that compromise that
         | readiness. People need to be compensated for that. Where I live
         | doctors get paid when they're on call.
        
           | azornathogron wrote:
           | You need to be ready, yes. Being unable to relax is IMHO more
           | of a function of how well/poorly managed your systems are and
           | your own level of experience and psychological profile.
        
       | anon22334556 wrote:
        
       | anon22334556 wrote:
       | At USAA they pay $70 a week for on call. If you're on call till
       | 2am? Still gotta be in on time. At least when I was there
        
       | wgjordan wrote:
       | Relevant US case law is Berry v. County of Sonoma, 30 F.3d 1174
       | (9th Cir.1994) [1], holding that county coroners' on-call time
       | (requiring carrying pagers and responding by telephone within 15
       | minutes) was not compensible under the Fair Labor Standards Act.
       | 
       | Two key factors are "(1) the degree to which the employee is free
       | to engage in personal activities; and (2) the agreements between
       | the parties." Beyond these general factors, no universal rule
       | applies since the details matter (frequency of calls, response-
       | time requirements and geographical limitations, etc) in the
       | degree to which they limit personal activities, as do any
       | agreements laid out in a contract or company policy (e.g., how
       | specific 'on-call' requirements and compensation are defined and
       | agreed upon in advance).
       | 
       | [1] https://casetext.com/case/berry-v-county-of-sonoma
        
       | kodah wrote:
       | > Some companies hire dedicated tech people whose only job is to
       | be oncall, handle alerts, and improve the oncall infrastructure.
       | This role is called 'DevOps Engineer' at some companies, SRE
       | (Site Reliability Engineer) at others, and may also be called
       | 'Operations Engineer.'
       | 
       | I really wish this wasn't stated so matter-of-factly. Neither of
       | these is actually _supposed_ to be true. A lot of times on-call
       | gets stuck on these folks because they 're often treated as
       | second class citizens in the softwarescape. There is really great
       | structure for doing these roles _right_ that doesn 't involve
       | them making them full time on-call.
        
       | 64StarFox64 wrote:
       | I just joined a company that does formal but unpaid oncall,
       | coming from a prior co that had implicit oncall. I'm very much in
       | the "if you built it, you run it" camp. This said, I think:
       | 
       | - if oncall is a part of the gig, you compensate _somehow_
       | (demonstrably above market salaries, explicit extra pay, time in
       | lieu, etc); oncall culture (or the lack thereof) should be
       | explicitly mentioned in any hiring process and employment
       | contracts
       | 
       | - the team should be striving for 8 or more engineers in the
       | steady state; temporary vacancies should be temporary
       | 
       | - primary should be handling 80+% of pages in the steady state;
       | if this is not the case on average across the team, you are not
       | building enough resiliency into your oncall culture, or relevant
       | tech debt should be high priority
       | 
       | - relatedly, kpis/incentives should be structured such that as
       | call gets worse, progressively more immediate investments are
       | made to address technical root causes (a la SRE error budget)
       | 
       | I'm tinkering with that last one my head. It's easy to say, hard
       | to execute
        
       | WFHRenaissance wrote:
       | I actually love being on-call, especially as a part of a "you
       | build it, you run it" kind of team.
       | 
       | You essentially get a week to find bugs, fortify the application,
       | and make it so that the next person has an easier on-call.
       | 
       | If everyone goes into it with this mindset, eventually on-call
       | becomes a quasi-freebie week where you can either work on "fun
       | stuff", or it becomes invisible.
       | 
       | Not to mention that no product can survive without love and
       | support from its devs.
        
         | anonymoushn wrote:
         | It's atypical for on-call to come with permission to be any
         | less productive at your normal duties.
        
           | WFHRenaissance wrote:
           | Not in my experience, but maybe I'm just lucky to have worked
           | at companies with good on-call cultures.
        
         | mehphp wrote:
         | I don't understand, is on-call not in addition to your normal
         | duties where you work?
         | 
         | I don't understand how it could tuen into extra time to work on
         | fun stuff.
        
           | dilyevsky wrote:
           | On well run teams I've been a part of it was always
           | understood that you're not getting much done on your main
           | projects while being primary oncall
        
             | mehphp wrote:
             | Interesting, I've never been somewhere where on call
             | entailed more than maybe an incident or two to handle
             | during the week. I can only recall a single instance that
             | interfered with my current sprint.
             | 
             | From the replies, it sounds like a lot of places have
             | constant fires to be put out by those on call?
             | 
             | That doesn't sound "well run" to me...
        
               | dilyevsky wrote:
               | Imo oncall engineer should seize the opportunity and fix
               | that warning alert that's been firing for ages _before_
               | it becomes an active fire. So consequently there's
               | usually no lack of things to do even if no fires occured
        
           | WFHRenaissance wrote:
           | I can't imagine telling an engineer to temporarily take more
           | week on a given week. Every company I've worked at has done
           | it like this:
           | 
           | Not on-call engineers work on 20 story points a sprint (2
           | week sprints). On-call engineers (if you're on-call for a
           | week) get 10 points of work + on-call.
        
         | __bjoernd wrote:
         | That depends. If your oncall is spent fixing bugs in your
         | product - great. If you're just chasing whatever "upgrade
         | campaign" some other corp team came up with, but forgot to
         | properly document - not so great.
        
         | almost_usual wrote:
         | I think my wife would divorce me if I told her I loved being
         | on-call.
        
           | WFHRenaissance wrote:
           | I'm young and unmarried, and I work at a company that I like
           | and on a product that I use, so maybe I'm a bit too
           | passionate about my work to be unbiased here lol.
        
             | [deleted]
        
         | yooloo wrote:
         | Unless of course you didn't build the system, just inherited it
        
           | WFHRenaissance wrote:
           | Yeah, this is sort of hell on earth in many ways. Hence why I
           | specified "you build it, you run it". On-call for a legacy
           | system with no owner and few active devs is hell, and I don't
           | recommend it.
        
         | lmarcos wrote:
         | > Not to mention that no product can survive without love and
         | support from its devs.
         | 
         | That's independent from being oncall.
         | 
         | I prefer to spend my non 9-5 time with my wife and daughter.
         | Sadly, many 'innovative' companies out there don't like this
         | mindset of mine and reject people just because they don't want
         | to do oncall rotations.
        
           | powerhour wrote:
           | I'd prefer that as well, however I end up being the one woken
           | up because your (the generic you) code has errors. If you're
           | not willing to be on call I hope you're at least "willing" to
           | be terminated if your code wakes someone up every day of
           | their rotation (yeah, been there).
        
             | WFHRenaissance wrote:
             | This is a management/hiring issue. You (generic you) should
             | stop hiring engineers who don't give a frick about their
             | teammates.
        
             | lmarcos wrote:
             | If you are not willing to work outside 9-5, if you are not
             | willing to sacrifice your scarce free time, then you must
             | produce perfect bugfree code. Is that?
             | 
             | I have to give it to the companies and to the whole
             | devops/agile movement. They have truly convinced us that
             | being oncall is the right thing somehow. And that non
             | oncall engineers are a somewhat inferior race.
        
               | dvtrn wrote:
               | Maybe it's not about anting to write perfect code but a
               | higher business unit that starts wildfires because a
               | important and influential stakeholder from that one group
               | of high paying customers complained loudly about
               | something not working at 2am and next thing you know
               | there's a "planning for on call" meeting on your calendar
               | 
               | Ok simplification of affairs here but...I mean...
        
               | powerhour wrote:
               | When the alternative is expecting someone else to give up
               | _their_ scarce free time, yeah, you 'd better be
               | producing perfect code or work somewhere that doesn't
               | care about overnight outages.
        
           | alex3305 wrote:
           | Besides that, I simply need my rest and sleep to relax and be
           | able to perform again. I love working in a team, as a team.
           | But work is still work for me. I don't really care about some
           | paid company holiday weekend or something. I'd rather do
           | something nice with family or friends.
        
         | Mikushi wrote:
         | > Not to mention that no product can survive without love and
         | support from its devs.
         | 
         | If the business wants a stable and well working application,
         | prioritise it as part of regular dev work. As a dev this is
         | certainly not my problem.
        
           | WFHRenaissance wrote:
           | There's definitely an incentives issue here. Product just
           | wants more features. I feel like product managers (bad
           | product managers at least) need a countervailing tendency in
           | the form of a resiliency manager or something.
           | 
           | Actually, a great way of managing this sort of stuff is
           | implementing error budgets and SLOs. If you app isn't
           | performant, the next sprint is dedicated to fixing issues, et
           | cetera.
        
       | deeptote wrote:
       | This is why I started my own contracting/consulting company, I
       | _hate_ on call and it always gets abused.
       | 
       | I mean, really, what's going to happen if you can't see the score
       | of a baseball game until tomorrow?
        
         | askafriend wrote:
         | > This is why I started my own contracting/consulting company,
         | I hate on call and it always gets abused.
         | 
         | Hate to break it to you, but starting your own
         | contracting/consulting company means you are forever on-call.
         | It just so happens that it's not called that explicitly, and
         | the people you have to answer to are your customers (aka your
         | bosses).
        
           | lrvick wrote:
           | That is not true at all. I have active retainer contracts
           | with several companies providing security engineering and
           | support. All of them understand I am only reachable when I am
           | physically in my home office.
           | 
           | I get back to clients typically within one business day or I
           | will show up to any meetings scheduled a week in advance.
           | This has never been an issue.
           | 
           | I do not carry a cell phone and I make sure every client
           | knows this. If I am outside my office I am living my life.
        
         | happyopossum wrote:
         | > I mean, really, what's going to happen if you can't see the
         | score of a baseball game until tomorrow?
         | 
         | Well, nothing. Unless you're MLB.com and have tens of millions
         | of people paying you >$100/yr to have that information readily
         | available. If that's the case, you're issuing credits (which is
         | a huge time and money sink) _and_ losing customers.
        
           | geraldwhen wrote:
           | Then maybe they should pay people to be ready to fix bugs
           | 24/7.
           | 
           | They won't. No one will. They want their salaried employees
           | to also be firefighters and the premise is absurd.
           | 
           | I don't take calls at night; you cannot reach me. What are
           | they gonna do, fire me?
           | 
           | Good joke. I run interviews and staffing someone who knows
           | their left hand from their right is nearly impossible.
           | Leverage works wonders.
        
         | tyingq wrote:
         | Bookies might care :)
        
         | lrvick wrote:
         | Having been on call as a sysadmin for several companies over
         | 15+ years, starting my own company was mainly so no one can
         | ever demand I do this again.
        
       | teeray wrote:
       | Keep the extra cash for on-call. If I wanted to trade my nights
       | and weekends for more money (and weren't contractually forbidden
       | from it), I would moonlight. I really want a delayed start for
       | any night incidents to catch up on sleep, and extra vacation.
        
       | dvtrn wrote:
       | _Some companies hire dedicated tech people whose only job is to
       | be oncall, handle alerts, and improve the oncall infrastructure.
       | This role is called 'DevOps Engineer' at some companies, SRE
       | (Site Reliability Engineer) at others, and may also be called
       | 'Operations Engineer.'_
       | 
       | Someone finally said the quiet part out loud about 'Devops
       | Engineer' as a job title. Only a matter of time before we wise up
       | about SRE as well, I suppose.
        
         | kodah wrote:
         | I disagree. DevOps Engineer, as much as I hate that title, is
         | really a sysadmin who can do orchestration code (like ansible
         | or terraform). They're not supposed to be responsible for any
         | application code. In a lot of ways they're more like systems
         | integrators these days, but most of them carry some pretty fine
         | OS and distributed system chops.
         | 
         | SRE-SE and SRE-SWE's _are_ responsible for application code and
         | often embed on application teams to bolster either code or
         | system performance or both.
         | 
         | Please do not take companies bastardizing these practices as
         | truth to what they are. There are companies who do this right
         | and we should champion them above the garbage.
        
         | babyshake wrote:
         | No you have it all wrong. Regular mid-level software engineers
         | need to have expertise in dozens of different deep subject
         | matter areas, but they get a "flexible" vacation policy and a
         | $50 monthly gym stipend so they're actually getting a pretty
         | sweet deal.
        
         | dijit wrote:
         | Sysadmins. Those people are sysadmins.
         | 
         | I don't know why we need to have a job title treadmill for
         | this; I hate not knowing what _your_ definition of "devops" or
         | "SRE" is when interviewing. (Both as a person who interviews
         | others and is interviewed by others).
         | 
         | Before anyone says it: Sysadmins could code (not to the same
         | level as feature folk), shitty operators pretending to be
         | sysadmins couldn't.
        
           | BurritoAlPastor wrote:
           | We didn't make software engineer money when we didn't have
           | "engineer" in our titles. I would be perfectly happy to be a
           | "senior systems administrator" or similar if it didn't impact
           | my earnings potential.
        
             | dijit wrote:
             | That's not my experience. The feature folks used to come to
             | sysadmin because the money was better.
        
           | Aperocky wrote:
           | So.. what do you call feature folks that also do sysadmin
           | work?
        
             | Jach wrote:
             | Amazon-style full-ownership software engineer teams?
        
               | Aperocky wrote:
               | Does everyone else not have this? I would be surprised
               | Amazon is the only company that have full ownership
               | teams.
        
               | babyshake wrote:
               | You only are allowed to be "Amazon-style" if your stock
               | grants are demonstrably returning Amazon-style results
               | for employees.
        
             | LegitShady wrote:
             | Underpaid?
        
             | dijit wrote:
             | Not sure. What would you call a doctor that also fulfils
             | the duties of a nurse?
        
           | madrox wrote:
           | I don't think there's a way to resolve a semantic argument
           | like this. Most roles are pretty amorphous, and thinking any
           | title can totally encapsulate job requirements is prone to
           | error. Even as an EM, I have to find out what a job's
           | expectations are during an interview. SWEs are probably the
           | only engineering role that doesn't have this problem
           | (mostly). It's been very different everywhere I've worked.
           | It's different from other fields that have far more rigorous
           | structure.
           | 
           | DevOps started as an idea that the development team should be
           | responsible for operations. Before this, most dev teams
           | created artifacts that got handed to an ops team to deploy
           | and be on call for. That idea went to corporations that
           | wanted to modernize, but you can't just disappear an entire
           | workforce of admins used to doing things differently. It's a
           | similar situation to where graphic designers started being UX
           | designers. These people didn't magically develop a different
           | set of skills...just a different set of expectations.
        
           | GauntletWizard wrote:
           | The problem is that sysadmin has the baggage of twenty years
           | of that dude who deals with exchange and active directory.
           | The rules for interaction with servers under the sysadmin
           | label were _terrible_ and quite frankly, so were and are a
           | lot of the people.
           | 
           | There is a legal requirement (regulatory, but carrying force
           | of law) for some industries to implement ITSM practices (and
           | similar, don't quote me on specifics) . There is a
           | requirement in those practices that Developers not have
           | access to production, and that Operations have access to the
           | code. That's incredibly wrong. It's misguided in the worst
           | possible way - The point is to make sure the two audit each
           | other, but it requires black box auditing, when you actually
           | want whitebox auditing. (Note that allowbox and denybox are
           | not acceptable substitutes here).
           | 
           | SRE is called SRE because of a difference in those practices.
           | DevOps is an inexpert redevelopment of those practices.
           | Sysadmin practices evolved into both, but what's modernly
           | called Sysadmin is descended from the AD and Exchange people,
           | and have bad practices. You can't walk back the evolution of
           | words, you can fix them through evolution as well, but it's
           | as slow or slower than getting there, because the ecological
           | niche is already "filled"
        
             | dijit wrote:
             | SRE actually aligns pretty neatly with systems
             | administration (and thus, in principle ITSM)
             | 
             | DevOps itself as a concept was born in nebulous
             | circumstances ("dev-ops days" being where the verbiage
             | comes from but the founder of that conference called the
             | job "agile systems administration; and the concepts
             | espoused by the devops movement being almost exclusively
             | borne out of the "10+ deploys a day" talk from Flickr).
             | 
             | Anyway, SRE is not materially different than Sysadmins
             | _except_ in three dimensions:
             | 
             | 1. Hire only programmers, none of those operators who click
             | buttons.
             | 
             | 2. Treat reliability as if it is its own feature.
             | 
             | 3. Solidify the contract between feature folks and people
             | focusing on reliability.
             | 
             | I'd like Ben Treynor-Sloss to weigh in here as he likely
             | knows best, but that's the most condense version of what I
             | understood
             | 
             | You're right about the exchange people, but they too
             | suffered title inflation, the exchange folks used to be
             | called IT technicians.
             | 
             | The people automating AD deployments across sites and
             | managing reliability were sysadmins, and they programmed in
             | the most ugliest of languages to achieve that,
             | autounattended.xml and bat files for days.
             | 
             | The tools are better now, but the work that devops/SRE's do
             | in most companies today is why sysadmins used to do in
             | 2008-
        
         | no_wizard wrote:
         | If I understand you correctly you mean that they are really
         | _Operations Engineers_ right?
        
           | dvtrn wrote:
           | I don't know anymore, and honestly I don't care anymore. If
           | the job wants to call me an SRE fine, if they want to call me
           | Devops, sure.
           | 
           | I'm more focused nowadays on "what problems are you hiring me
           | to solve?" since it feels more and more like the Venn diagram
           | of the three job titles has nearly completely coalesced into
           | a perfect circle.
           | 
           | Difference for me is I'm scrutinizing far more intentionally
           | in job interviews about why an org is hiring for SRE/Devops
           | before accepting any offers. Too often orgs are hiring for
           | this talent and turning them into kitchen sinks for anything
           | and everything the SWEs aren't doing.
           | 
           | Compliance? Send to Devops.
           | 
           | Upcoming audit and need a pen test done in 3 days? Send to
           | Devops.
           | 
           | Did a bad job prioritizing bug fixes and now shits crashing?
           | Devops.
           | 
           | Etc. once you go through that a few times you start to figure
           | out the right questions to ask in an interview and figure out
           | if you're about to join a company with Devops practitioners
           | or pretenders.
        
             | dilyevsky wrote:
             | > "what problems are you hiring me to solve?"
             | 
             | Interviewed with dozen of companies over my career - never
             | been able to get a straight or truthful answer to this
        
             | scottyah wrote:
             | I've experienced similar. What are some of the questions
             | you ask?
        
               | dvtrn wrote:
               | Take what works you, ignore what doesn't, good luck.
               | 
               | - Why are you hiring Devops/SRE?
               | 
               | - What is a Devops/SRE going to bring that isn't/can't
               | being done by engineers presently?
               | 
               | - Why isn't it being done presently? What have you tried
               | so far?
               | 
               | - How many other SREs/Devops do you have? When will I get
               | to interview with them (if applicable)
               | 
               | - Who is responsible for platform? Infrastructure?
               | Deployments? How are they involved? _When_ are they
               | involved?
               | 
               | etc. As mentioned in my last comment, a lot of it comes
               | through the baptism of working at a lot of really crummy
               | shops to know the kind of bullshit you don't want to put
               | up with. You gotta deal with some of it no matter where
               | you go, but you sure ain't gotta deal with it all.
               | 
               | This is a lot of boilerplate stuff, sometimes you're
               | lucky and these questions get answered before you can ask
               | them, sometimes they're in the job description. So let me
               | talk about _that_ for a minute.
               | 
               | You really want to take your interviewing to the next
               | step? Learn how to inquisitively, but tactfully challenge
               | what you're reading in job descriptions. The answers I've
               | gotten have been far more revealing than "what will I be
               | doing day to day?" if you ask for more details about a
               | bullet point or two and why those bullet points matter,
               | or who they matter to. That includes, yep, on-call.
               | 
               | Most of my other questions are very probing questions
               | about things in the job description; not necessarily
               | because I'm looking for a specific answer, I want to see
               | how the hiring managers and others describe those topics.
               | Can they actually talk about why they're looking for
               | someone to do x, y and z? Can they have a meaningful
               | dialogue about what those responsibilities mean for the
               | team or are they just parroting back what the job
               | description says, like someone in a zoom call just
               | reading words off a powerpoint slide?
               | 
               | Here's an example:
               | 
               | Job says they want a Devops to come in and also be
               | responsible for security, risk and compliance in the
               | infrastructure? Okay, here's my counter-inquiry about
               | that: if Devops has the responsibility for security, risk
               | and compliance, talk to me about the authority Devops has
               | to recommend or deny certain actions in the platform if
               | it is assessed to be too risky or costly to maintain a
               | compliant and secure posture were we to do it anyway (if
               | you've ever been in that unenviable position, you
               | probably know _exactly_ what I 'm getting at with this
               | question).
               | 
               | Interviews are two way streets, and in my thirties with a
               | family where "family time" has no fungible cost, I'm
               | driving very defensively on my side of the street.
        
           | wpietri wrote:
           | That's certainly what I've seen! I think the DevOps paradigm
           | was a possible revolution in how we worked. But pretty
           | quickly a lot of places just slapped the new label on the old
           | sour wine.
        
         | notesinthefield wrote:
         | And suddenly I understand why the worst tech job ive ever had
         | as an Ops engineer was so bad. We really only existed to
         | improve alerting, pipeline and wake up other engineers at 3am.
        
           | dilyevsky wrote:
           | Majority of companies i talk to are really poorly run wrt to
           | software operations. Case in point - misusing devops term to
           | mean sysadmins/operators
        
           | lmarcos wrote:
           | A sincere thanks. As a software engineer I couldn't care less
           | about what happens to my company's services/products outside
           | the 9-5 time range. Don't get me wrong, I give myself 100% at
           | my job, keep myself educated regularly and I'm rather on the
           | "boring and stable stuff" side of things (instead of the
           | "shiny/trendy and unstable" side). I have commitments outside
           | work and no amount of money is going to make me give more
           | than the (already exhausting) 40h/week my contract states.
           | The "you build it, you run it" may work for people on their
           | 20s (they usually are excited to earn "easy money" by being
           | oncall). For people on their 30s and above the extra oncall
           | money is not worth at all.
        
             | wpietri wrote:
             | I certainly believe that's true for you. But in the case
             | where engineers choose not to ever run what they build, how
             | do you reconnect the feedback loop?
             | 
             | Put differently, I think one of the ways somebody goes from
             | the "shiny/trendy and unstable" side to the "boring and
             | stable stuff" side is by experiencing the operational pain
             | of their choices. If the pain falls on others, will they
             | still learn?
             | 
             | Of course, the way you talk about your job makes me wonder
             | if you are already experiencing so many systemic/managerial
             | issues that there the feedback loops are already pretty
             | broken, so this one may not make a ton of practical
             | difference.
        
               | lmarcos wrote:
               | > If the pain falls on others, will they still learn?
               | 
               | I think that depends on the seniority of the
               | individual/team. In my experience, of course one can
               | still learn.
               | 
               | To give you a real example: years ago one of our systems
               | went down on a Sunday morning and our team had no oncall
               | people. The infrastructure team was the one who fixed the
               | issue (don't remember the exact underlaying issue, but it
               | did make clear one aspect of our service we didn't
               | properly: signal handling). Next morning the team wrote
               | down a Jira issue to improve the way we handle signals.
               | Ticket got prioritized very high and was fixed the very
               | same day.
               | 
               | Now, what would have happened if the issue that Sunday
               | morning was due to a bug in the software our team wrote?
               | The same thing. The difference is that infra team would
               | have no clue on how to fix the thing and would have to
               | revert the service to a previous stable version. Would
               | the business be fine with it? In our case, yeah. As a
               | matter of fact, they didn't want to spend the extra money
               | hiring ops people for each team to be on call. You see,
               | if the business really cared, they would immediately have
               | hired a software engineer willing to be on call... They
               | just didn't care that much (and they couldn't force the
               | current team to be oncall because our contracts didn't
               | specify so and the average age in our team was around 35,
               | and nobody wanted to be on call).
        
               | wpietri wrote:
               | I believe they _can_ still learn if they are senior
               | enough and compassionate enough. And if they have
               | management competent enough to let that work. But what
               | percentage of teams would you guess fit that? I suspect
               | that leaves a lot of on-call staff suffering from bad
               | software.
        
               | bfung wrote:
               | I hear that it can work w/good senior engineers at the
               | helm - I'd prefer the scheme you described as well.
               | 
               | But how did the senior engineer learn to handle those
               | situations in the first place?
        
               | michaelt wrote:
               | _> But in the case where engineers choose not to ever run
               | what they build, how do you reconnect the feedback loop?_
               | 
               | Personally, if I get paged at 3am due to a bug, I'm going
               | to fix it regardless of what the 'backlog' and
               | 'prioritisation' and 'sprint goals' and 'feature roadmap'
               | and 'product owner' say I should be doing.
               | 
               | But some would say I should not be bypassing the process
               | in that way, and that the feedback loop of external
               | stakeholders making requests to the product owner is more
               | than sufficient.
        
               | trombone5000 wrote:
               | Engineers can run what they build during normal working
               | hours.
               | 
               | Oncall is a scourge not because of the experience of
               | technical problems, but because people already working
               | full time have to arrange their lives outside of work
               | around a second "oncall job". A job which occurs after
               | hours, one out of every X weeks.
               | 
               | A dedicated, pure "Ops" night shift (perhaps in another
               | time zone) would be more humane.
        
               | dilyevsky wrote:
               | Then build it in a way you almost never have to plug in
               | outside of business hours.
        
               | trombone5000 wrote:
               | Even if it were built perfectly, if engineers are still
               | on-call, they would have to arrange their after-hours
               | time around the _possibility_ of an incident.
        
               | dilyevsky wrote:
               | That's true but it's just a reality of being employed by
               | a saas company these days. Customer support, sales, etc
               | have those too (and usually less formalized and unpaid)
               | so why are engineers immune to this? You can still
               | probably find some shops that ship an offline
               | distribution but that's becoming more rare.
        
               | kqr wrote:
               | > Engineers can run what they build during normal working
               | hours.
               | 
               | In my experience, this leads to design that pushes
               | problems to outside of working hours.
               | 
               | "We don't need to fix that edge case, just have the off-
               | hours ops team do a manual workaround every now and
               | then."
               | 
               | Or "What does it matter that the deployment is error-
               | prone? We can just schedule it with the off-hours ops
               | team."
        
             | arwhatever wrote:
             | "How about the whole team makes engineering decisions as
             | though you're unable to contact us after hours, or as
             | though doing so were particularly costly."
        
               | dvtrn wrote:
               | What, and break down all the monitoring and alerting
               | silos we built by hiring a Devops engineer to come in and
               | break down the development and infrastructure silos that
               | were built when the company went ham adopting "Capital A"
               | Agile?
        
             | danielheath wrote:
             | "You build it, you run it" works just fine if you're
             | building something that doesn't fail all the damn time.
             | 
             | Work has had three out of hours pages in the last two
             | years, all self resolved within a few minutes.
        
       | cbarrick wrote:
       | Google's oncall compensation structure is phenomenal.
       | 
       | For tier 1 oncall (5m response time), for each hour oncall
       | outside of working hours, you are compensated for 40 minutes,
       | which you can either take as time off or at your current pay rate
       | (i.e. you are compensated at 2/3 your usual pay).
       | 
       | For tier 2 oncall (30m response time), the compensation is 20
       | minutes per hour outside of working hours.
       | 
       | For a tier 1 rotation, the team has a staffing requirement of 12
       | people, split between two sites. There's a max of 80h oncall,
       | outside of working hours, per person per quarter. Because oncall
       | is split between sites, you are never oncall overnight.
        
         | nighthawk454 wrote:
         | Better than Amazon, where you get nothing extra. And often do
         | regular duties during on call as well. Kind of nuts.
         | 
         | The saving grace is a lot of teams aren't really doing anything
         | that critical, so the on call is more a formality bc that's
         | what real teams do. Still pointlessly stressful but less
         | serious.
        
           | [deleted]
        
         | Cyph0n wrote:
         | > Because oncall is split between sites, you are never oncall
         | overnight.
         | 
         | Doesn't this only apply to SRE rotations? The dev teams I know
         | of are definitely oncall overnight.
        
           | fishywang wrote:
           | that only applies to tier 1 oncall. if they are oncall
           | overnight they are most definitely tier 2.
        
             | Cyph0n wrote:
             | Ah, that makes sense!
        
         | soneca wrote:
         | I always assumed that pay for hours outside of regular working
         | hours would be higher than regular pay.
        
           | hbhakhra wrote:
           | The pay for outside working hours applies whether or not you
           | are getting paged. That's 128 hours / 3 = 42.67 hours of
           | extra pay during an on call week. The on call week also gives
           | incentive to fix technical debt and build a more stable
           | production system so you don't get paged.
        
             | soneca wrote:
             | Yeah, makes sense. Forgot the detail that most of on call
             | hours are not strictly working. So Google scheme seems fair
        
               | geraldwhen wrote:
               | Waiting to work is working. Would a hospital surgeon only
               | charge for time holding a knife? Don't be absurd.
        
               | joshuamorton wrote:
               | If I'm at home cooking dinner, I'm not "waiting to work"
               | though.
               | 
               | Yes, you cannot go on a hike, which is why you get paid.
               | You don't get paid more than you do for your normal time
               | working though.
        
               | shadowofneptune wrote:
               | Hospitals do not pay more than normal work hours for on-
               | call, though they do usually pay some amount.
               | 
               | https://physiciansthrive.com/physician-compensation/on-
               | call-...
        
           | yegle wrote:
           | For non-business hour oncall, you usually only need to
           | mitigate with minimum effort. E.g. for a typical overload
           | situation, up sizing the pool or getting an emergency ceiling
           | loan is enough, and you can offload further preventative
           | measures or root cause investigation to the next oncaller
           | when they are in business hour, or wait until next Monday.
        
         | sidlls wrote:
         | "Phenomenal"? Hardly. The base expectation outside of tech is
         | time-and-a-half for each hour over 8 in a day, or 40 in a week.
        
           | leetcrew wrote:
           | it's an apples to oranges comparison. not many jobs that pay
           | time and a half for overtime have mid-level ICs making $250k+
           | before overtime.
        
             | sidlls wrote:
             | The base salary is literally irrelevant to this discussion,
             | which is about compensation for hours worked outside of
             | normal business hours at whatever the rate is.
        
           | dasil003 wrote:
           | For actual work, not for being on call with the expectation
           | that most of the time nothing will go wrong
        
             | sidlls wrote:
             | On-call requires you to more or less not plan anything
             | _other_ than being available for work. Sure most of the
             | time nothing goes wrong--but that isn 't the constraint,
             | here. The whole point is that something _might_ go wrong
             | and that the person on call _must_ respond within a given
             | window of time (5-15 minutes, generally). That effectively
             | makes even mundane things like going to the grocery store a
             | potential trade-off in favor of work. I definitely consider
             | every hour of the day I 'm on call (all 24 of them) as a
             | working hour, and so should every other engineer. Since
             | tech companies get away with not paying for this service, I
             | take off from normal working hours at a rate of 1.5 times
             | the time I spend resolving an on-call alert. I'd rather be
             | compensated with cash for it.
        
             | vageli wrote:
             | Do firemen also not work given that a considerable amount
             | of their time is spent waiting for a call?
        
               | dasil003 wrote:
               | Do you sleep at the office for your oncall shift?
        
               | skeeter2020 wrote:
               | they don't work on-call over night.
        
               | khuey wrote:
               | It's common for firefighters to work 24 hour shifts.
        
               | stickfigure wrote:
               | Professional firefighters spend a lot of their "waiting"
               | time training, writing reports, fixing the apparatus,
               | sharpening shovels, cleaning chainsaws, etc. It isn't the
               | same.
        
               | ok_dad wrote:
               | For a 5m on call time, I would literally have to be
               | sitting at my computer with slack open reading hacker
               | news. Yes, it's basically the same thing.
        
               | noodleman wrote:
               | I'm currently on call.
               | 
               | A 5 minute response time means to respond to the call out
               | and start working on it. If you're on call, you should
               | have a suitable WFH setup and it should be on standby, so
               | 5 minutes is ample time. It doesn't means you have to
               | have it resolved within 5 minutes of being called out,
               | that would be absurd.
        
               | ipsi wrote:
               | In the EU Working Time Directive, it differentiates
               | between the concept of "On Call Duty" and "Standby Duty,"
               | where the former is what this post is about, and the
               | latter is generally reserved for when an employee is
               | required to remain on the premises of their employer
               | (e.g., being on-site overnight to immediately respond to
               | emergencies). The primary difference is that On Call does
               | not count as working time unless you get paged, whereas
               | Standby Duty _does_ count as working time, even if
               | nothing happens. Within the EU, that means that Standby
               | Duty counts against working hours allowed by the EU
               | Working Time Directive and does not count as rest - e.g.,
               | the German Arbeitszeitgesetz limits workers to 10 hours
               | per day (hard limit), and requires 11 hours between
               | working periods (some exceptions that I don 't believe
               | are relevant here).
               | 
               | However, according to recent ECJ decisions[1][2][3],
               | "Standby Duty" is not reserved _exclusively_ for when the
               | employee is required to remain on-premises, and it also
               | depends on the degree to which the freedom of the
               | employee is curtailed, specifically stating in one
               | ruling[2]:
               | 
               | > ...
               | 
               | > 32 In the third place, and as regards more specifically
               | periods of stand-by time, it is apparent from the case-
               | law of the Court that a period during which no actual
               | activity is carried out by the worker for the benefit of
               | his or her employer does not necessarily constitute a
               | 'rest period' for the application of Directive 2003/88.
               | 
               | > ...
               | 
               | > 36 Second, the Court has held that a period of stand-by
               | time according to a stand-by system must also be
               | classified, in its entirety, as 'working time' within the
               | meaning of Directive 2003/88, even if a worker is not
               | required to remain at his or her workplace, where, having
               | regard to the impact, which is objective and very
               | significant, that the constraints imposed on the worker
               | have on the latter's opportunities to pursue his or her
               | personal and social interests, it differs from a period
               | during which a worker is required simply to be at his or
               | her employer's disposal inasmuch as it must be possible
               | for the employer to contact him or her (see, to that
               | effect, judgment of 21 February 2018, Matzak, C-518/15,
               | EU:C:2018:82, paragraphs 63 to 66).
               | 
               | And while I'm very definitely not a lawyer, I think it's
               | possible (likely, even) that having to be at a computer
               | and working within 5 minutes of a page, even at 3AM,
               | would constitute significant constraints on the worker
               | and turn it from "On Call" to "Standby Duty", although
               | the exact implications of that will vary from country to
               | country.
               | 
               | All of that to say that I think that 5 minutes is
               | absolutely bonkers as an expected response time. If I
               | were subject to that, I wouldn't be able to leave my
               | apartment for the duration I was on call - it takes me a
               | lot more than 5 minutes to get to and from the
               | supermarket or even the coffee place just outside. Even
               | taking out the trash could take > 5 minutes (and with no
               | cell reception, due to being underground).
               | 
               | [1] https://home.kpmg/xx/en/home/insights/2021/03/flash-
               | alert-20...
               | 
               | [2] https://curia.europa.eu/juris/document/document.jsf;j
               | session...
               | 
               | [3] https://eur-lex.europa.eu/legal-
               | content/EN/TXT/HTML/?uri=CEL...
               | 
               | [4] (WARNING: auto-download PDF) https://ec.europa.eu/soc
               | ial/BlobServlet?docId=6474&langId=en
        
               | ok_dad wrote:
               | I understand that, my point is: you're still sitting at
               | home when you could be out doing other things. It then
               | should be paid as regular or OT hours, not 2/3 or 1/3 of
               | regular pay or anything like that.
        
               | ramraj07 wrote:
               | It's not 5 seconds! There are definitely a few activities
               | I do at my home that I can't drop in a few minutes notice
               | (extended toilet break?) but I I can think of a ton of
               | things I can do that would still let me be able to start
               | working on my pc with a few minutes heads up.
        
               | ok_dad wrote:
               | I have a kid, so sometimes I can't drop what I'm doing.
               | If I am required to be on-call at 5min response, that
               | means I'm hiring a nanny/babysitter. That's what you all
               | don't get here, people have complex lives outside of work
               | and workplaces should not be shortchanging you or I in
               | order to scrimp and save on customer support.
               | 
               | If it is important for the application to be up 24/7, the
               | company needs to pay for it at the usual rate!
        
           | remus wrote:
           | That's not the same as on call though, that's working extra
           | hours.
           | 
           | With the pay structure described above I assume this is
           | applied outside your normal working hours, where you're not
           | doing anything other than being on call.
        
             | R0b0t1 wrote:
             | Oncall is working. I expect to bill oncall hours at at
             | least time and a half.
        
               | trimbo wrote:
               | Have you ever successfully billed oncall hours as
               | overtime when you weren't called?
        
               | Jabbles wrote:
               | It's not. It's ridiculous to expect to charge _more_ than
               | normal work for oncall. And your expectations are
               | misplaced, as TFA shows.
        
               | R0b0t1 wrote:
               | Disagree, as do others. If my movement and activities
               | will be restricted then it is full
               | employment/utilization, not some quasi-employment or
               | utilization. I didn't pull this out of thin air.
               | 
               | Someone has conned you into accepting less. I'm sorry.
        
               | Jabbles wrote:
               | Ah, I see you are talking about an alternative universe.
        
               | ok_dad wrote:
               | Thanks for trying, I think some people take pride in
               | living to work and they take offense at the idea they've
               | might have been suckers for life.
               | 
               | I agree with you fully, on call time should be
               | compensated at the usual rates, including overtime.
        
               | Jabbles wrote:
               | But why? Why do you think oncall should be paid the same
               | as full work? Perhaps you have a different definition of
               | oncall than me, where you expect to be paged once or
               | twice a week, and spend maybe an hour or so fixing it
               | each time?
               | 
               | Why would I _not_ charge less for this than real work? It
               | involves much less actual work.
        
               | decebalus1 wrote:
               | I've been doing on-call for more than a decade and I feel
               | I need to offer my perspective here. I worked in teams in
               | which I would never get paged and also teams in which I'd
               | get 100 alerts per week.
               | 
               | > But why? Why do you think oncall should be paid the
               | same as full work? Perhaps you have a different
               | definition of oncall than me, where you expect to be
               | paged once or twice a week, and spend maybe an hour or so
               | fixing it each time?
               | 
               | When I'm oncall, I need to cancel all my social
               | engagements for that week and delegate all my errands and
               | such to my partner. Also not drink or take any mind
               | altering substances. I must be 'ready' at any time of day
               | or night. I (as well as others) sleep in the same bed
               | with my partner. If my phone rings due to an alert, my
               | partner is also woken up. So I need to sleep in the
               | living room for a week. From the start, this affects my
               | personal life to the extent that it would be unfair NOT
               | to compensate me extra. It also affects my family way
               | more than a regular desk job should.
               | 
               | You're mentioning the expectation to be paged once or
               | twice a week. If those pages come at odd hours and you
               | need to fix them on the spot, no exceptions, failure is
               | not an option, etc.. it's still very disturbing to your
               | personal life. Additionally, that's a parameter which is
               | well outside of your control. I've seen oncall shifts
               | which turned from '1-2 pages a week' to '5-10 pages a
               | day' after the product finally got in the hands of
               | regular users or after the team grows in size and code
               | contributions grow suddenly. Or even better, when you're
               | doing such a great job that your boss promotes you in the
               | oncall tier and now you also get to do triage for alerts
               | coming for the whole organization.
               | 
               | The volume of the alerts don't and shouldn't matter. If
               | you're oncall, you're oncall, you have a responsibility
               | to be available at all times, rain or snow, night or day.
               | This deserves compensation. Some companies (some I've
               | been lucky to work at) implement some sort of follow-the-
               | sun oncall shift and you at least get to have your sleep
               | and generally minimal impact on your personal life. That
               | is great and does not deserve extra compensation, because
               | your work hours aren't altered at all.
               | 
               | I'm sad that labor rights in the US don't consider this a
               | norm. But it's not surprising, considering we did have
               | dedicated engineers at one time who were paid to watch
               | and maintain the health of the livesite 24/7. But then we
               | figured we'd make regular engineers fuck their sleep
               | cycles by adding oncall to the list of responsibilities,
               | because it would be cheaper this way. And everybody
               | agreed, because 'full-service ownership'.
        
               | ok_dad wrote:
               | I'm arguing that the "5 minute response" on-call should
               | be at regular or OT rates. If your on-call rotation is
               | like a 1 or 2 hour response time, then I could see it
               | being less, but the problem is that I've been at a
               | company where the on-call was previously "whenever you
               | get around to it" and later they changed it to "within 30
               | minutes" and I was not compensated any further even
               | though it killed my life anytime I was on-call.
               | 
               | Why _I_ believe it should be at the full-rate: because I
               | don 't trust the company culture to stay the same over my
               | tenure there. My expectations for a "shit company" have
               | to be the same as my expectations for a "good company",
               | because a good one can turn to shit quickly.
        
               | Tao3300 wrote:
               | > Someone has conned you into accepting less. I'm sorry.
               | 
               | The Kool-Aid was _really_ good though! XD
        
               | ramraj07 wrote:
               | Start a company, make this a policy and advertise. If
               | engineers truly care about this, they'll come to you.
               | Perhaps they just care about total compensation And their
               | RSUs more than this minutiae?
        
               | skeeter2020 wrote:
               | >> I didn't pull this out of thin air.
               | 
               | Except you did. There are pretty specific legal
               | definitions of "on call", what it means and when you get
               | paid for it in almost every jurisidiction. I've never
               | seen one that pays you time and a half for being "on
               | call". This is not the same if you get called and
               | actually work overtime; that's regular rules. How a
               | company entices (or doesn't) for taking a shift is up to
               | them.
        
           | joshuamorton wrote:
           | You are paid 2/3 for time spent at home playing with your
           | kids on the weekend.
           | 
           | Unless you are working half the weekend, every weekend you
           | are oncall, the tier-1 OCC policy wins over time-and-a-half
           | for time worked.
        
             | thfuran wrote:
             | I'm having a bit of trouble reproducing your results. How
             | exactly are you coming up with 2/3 > 3/2?
        
               | joshuamorton wrote:
               | Because, and I can't stress this enough, if I am at home
               | cooking dinner or reading or playing video games, _I am
               | not working_ , so 2/3 of my entire weekend is more than
               | 3/2 of time worked unless I am working 9-5 all day Sunday
               | responding to pages, which no one is.
               | 
               | Time and a half for hours worked is only > that 2/3 for
               | time not worked if you're working 50% of the time, which
               | you aren't, at least not regularly.
        
               | Tao3300 wrote:
               | > home cooking dinner
               | 
               | What if the call comes right then? Now dinner is fucked.
               | They'd better pay a lot to go messing with my outside
               | life.
        
               | joshuamorton wrote:
               | I would not cook a risotto while on call, but most
               | dinners are not "fucked" immediately if you have to walk
               | away with a few minutes notice (esp if you have a
               | partner/roommate, but even if not)
               | 
               | This is like the equivalent of saying dinner (or your
               | day) is ruined if someone knocks on your front door
               | unexpectedly. No it's not.
               | 
               | And yes, you're getting paid 2/3 of your (large) salary
               | for the possibility of this inconvenience.
        
               | Tao3300 wrote:
               | Should be more if you ask me. I'm only going to get so
               | many risottos in my life, but software will always be
               | busted. If that's what employee lives are worth to
               | Google, well, I guess that explains some things.
        
         | babyshake wrote:
         | I would say it makes sense to make the oncall pay based on the
         | number of pages you get or some other metric but that would
         | just create some unwanted incentives and problems. It's
         | probably good to think of paying engineers for oncall time they
         | are not spending putting out fires as a form of reward for
         | setting up their systems to be reliable.
        
           | cbarrick wrote:
           | That's a pretty perverse incentive. Why fix the thing if I
           | get paid more if it's broken? Why tune the pager to be quiet
           | if noise equals cash?
           | 
           | The pager should be tuned to your SLOs, and you should be
           | incentivized to exceed those SLOs.
        
         | dbcurtis wrote:
         | My data is pretty old, old enough that the person on call
         | carried a pager (remember those?), but very similar comp
         | structure. I remember, because the on-call costs hit my budget
         | directly.
         | 
         | 1. Time on-call was paid at 25% of normal hourly rate. (Maybe
         | holiday premium boosted the base rate? I can't remember.) 2.
         | Issue-resolution pay was the normal overtime rate, including
         | shift premium and holiday premium, from the time the pager went
         | off until the issue was cleared. 3. The person on call had to:
         | a) be able to get to the plant in 20 minutes, if necessary, but
         | remote support was perfectly fine and paid the same. Only
         | resolving the issue mattered, not where you did it. 4. The
         | person on call had to remain sober and work-ready the entire
         | time on call.
         | 
         | It's 3 & 4 that justify the 25% pay for carrying a pager.
         | Friend having a party? I'll have cranberry juice, thanks. Fresh
         | snow in Tahoe? I'll have to miss it this weekend.
         | 
         | Restricting someone's movements and social life without
         | compensation is simply abusive. As an industry, we need to
         | stop.
        
       | cletus wrote:
       | Facebook's oncall compensation is really simple: it's zero.
       | 
       | Having also worked at Google, I found this situation ridiculous.
       | Facebook treats oncall as something you're just expected to do
       | _on top of everything else you 're meant to do_. So if you have
       | 50 alerts fire, 40 tasks create, 5 UBNs (UBN = Unblock Now, which
       | should be responded to immediatley and will probably be a SEV)
       | and 3 SEVs, well you just have to do all that and your job.
       | 
       | Google oncalls (IME) tended to be fairly light. You'd often do
       | releases too but there tended to be a lot of automated processes
       | around this (ie building binaries, packaging MPMs, release to
       | staging, release to canary, regression detection, push to
       | production).
       | 
       | Facebook's releases (other than Web) were (again, IME) a dumpster
       | fire.
       | 
       | Web was a special case because of continuous push. Push a commit
       | and automated processes would build the (very large) www binary
       | and handle the push to C1/C2/C3 (these are sort of analogous to
       | internal testing, canary aka 1% and prod). Automated processes
       | would verify a commit by deciding what tests to run. This wasn't
       | explicit and would miss relevant tests for various reasons. This
       | could (and often did) break trunk. This could back up pushes for
       | hours. First thing in the morning it may take as little as 2
       | hours to push to prod. Later in the day it might take 8+ hours.
       | 
       | Facebook works around this by using conditional code, like... _a
       | lot_ , meaning certain code would only run if you're in right set
       | of GKs (gatekeepers) and QEs (quick experiments). Behaviour would
       | be flipped on by a separate GK/QE push, which is much quicker.
       | 
       | But this means when something of yours breaks (which it often
       | does) you have no idea why. Is it a bad code push? A bad GK/QE
       | push? By you? Or some infra you depend on?
       | 
       | I mention this because you had to deal with this sort of thing
       | oncall _a lot_.
       | 
       | The problem with not giving oncall compensation is that the
       | burden is never shared equally. The person or persons who do more
       | than their fair share are never going to do it for the money
       | because it is annoying but at least the money is some form of
       | recogniation or, dare I say it, _compensation_.
       | 
       | Disclaimer: Xoogler, Ex-Facebooker.
        
         | [deleted]
        
       | michaelt wrote:
       | This article is missing the single most important question about
       | being on call: _how often you get called_.
       | 
       | It's one thing to be on call where you get called 2-3 times a
       | year, because you're working on a quality system where bugs get
       | fixed more often than they get introduced. Then the pay, if any,
       | is mostly compensation for hurting your social life.
       | 
       | It's another to be on call where you get called 2-3 times a week,
       | because the organisation has decided calling you is cheaper than
       | fixing the underlying problems. In that case, the compensation
       | better be worth messing up your sleep cycle and upsetting your
       | partner.
        
         | leetcrew wrote:
         | disagree. I need to get paged a lot before it becomes more
         | impactful than planning an entire week around an engagement
         | SLA.
        
           | ironmagma wrote:
           | So that's your personality. What about the rest?
        
         | GauntletWizard wrote:
         | Google had fantastic software quality and still had SRE teams
         | expecting to be paged twice a week. They had that because they
         | had tremendous software quality; they paged well before there
         | was impact that users would care about, and proactively spent
         | time fixing their problems. Being paged, usually during
         | daylight hours, allowed good bugs to be filed.
        
         | iasay wrote:
         | Oh yes nailed it.
         | 
         | That's one problem with a fixed on call rate that some
         | organisations offer. It's a hefty chunk of cash and sounds
         | generous to the engineers. But the cost is already known and
         | sunk up front and not proportional to the amount of call outs
         | so the business sees it as a fixed operational expenditure
         | rather than an appraisal of how fucked things are.
         | 
         | The performance metric quickly becomes how many people you
         | still have on cover who haven't quit to work somewhere else
         | because they are burned out.
        
         | ripper1138 wrote:
         | Not always that simple. 2-3 times a week is nothing! Try being
         | on call in AWS, or any product/service at that scale. How often
         | you get paged has less to do with your organization and more to
         | do with the scale of your systems and business.
        
           | dilyevsky wrote:
           | I've been a part of borg oncall at google - software that
           | manages 90+% hardware there (and there are a lot of
           | hardware). There were week long stretches without any pages.
           | Dont ship garbage software and it'll be alright at any scale.
        
           | Tao3300 wrote:
           | Yeah, but what the hell is possibly important enough to wake
           | up someone's family more than 2-3 times a week?
        
       | morelisp wrote:
       | This data is presented in a really frustrating way.
       | 
       | First, I suspect (but I'm not certain) most companies do it as X%
       | of salary. So I have no idea if I'm looking at truly different
       | on-call policies or rather salary spreads.
       | 
       | Second, there's no associated estimation of how much work "being
       | on-call" is. For us, a small team with SWEs doing voluntary on-
       | call, any out-of-hours page is _immediately_ top priority for
       | work the next day. The person on-call also gets the final say
       | over risky deployments after lunch  / on Friday. I know that's
       | not universally true, and we've worked with companies that
       | consider a page a week or even more normal (still without a
       | separate SRE/OpsEng team). If any of us was getting paged once a
       | week, we'd refuse.
        
         | ipsi wrote:
         | Well, Google is explicitly listed (about halfway down) as
         | paying a percentage, and they're the _only_ ones that are. In
         | my (very limited) experience, it 's generally been a flat rate
         | regardless of salary, so I'd go the other way and believe that
         | the majority do, indeed, do that.
         | 
         | Your second point is definitely a major concern, though - the
         | author talks about it (calling out Amazon and Twilio as
         | particularly bad), but doesn't provide any sort of hard data on
         | what the workload is like, possibly because it varies heavily
         | even between teams or groups within the same company.
        
           | morelisp wrote:
           | I know the rate for three other German companies are are
           | percentages. I think time and a half for "activation" is
           | relatively common. I'm less sure about inactive time.
        
       | cnj wrote:
       | In my experience, a week-long rotation is much more grueling than
       | a daily rotation. Having to stay home for a single evening/night
       | has much less impact for me than having to do it for a whole
       | week.
       | 
       | Additionally, the impact on personal live of being Oncall on the
       | weekend is bigger. At commercetools, we recognize this by paying
       | more for an Oncall day on the weekend (200 EUR on Fri/Sat/Sun)
       | vs. a day during the week (150 EUR).
        
       | dboreham wrote:
       | Quick note that as an employer you may be subject to local
       | employment laws in this space. Particularly true in the US.
        
       | Lukas_Skywalker wrote:
       | In chapter 3, the table labelled as ,,Companies paying 600-1,000
       | USD/EUR/GBP per week." includes German KfW Bank which apparently
       | pays EUR875 per _day_. Is this a typo (they are in reality paying
       | this amount per week) or are the engineers on call only one day
       | per week (making the amounts per day and per week the same)?
        
         | ju-st wrote:
         | It is probably 875EUR/week normal salary for an IT operations
         | job at a German bank.
        
       | yewenjie wrote:
       | Why is part 1 of this article paywalled and not this?
        
       ___________________________________________________________________
       (page generated 2022-08-07 23:00 UTC)