[HN Gopher] Oncall Compensation for Software Engineers ___________________________________________________________________ Oncall Compensation for Software Engineers Author : kiyanwang Score : 145 points Date : 2022-08-07 18:52 UTC (4 hours ago) (HTM) web link (blog.pragmaticengineer.com) (TXT) w3m dump (blog.pragmaticengineer.com) | rb2k_ wrote: | One thing that this article is also kinda missing is the base | compensation. Some companies (e.g. FB) are in the top 90th | percentile or so of the industry but then don't pay on-call | compensation.. | | But that means that in one case you make 200k base, have an on- | call and don't get any extra on-call money. In the other case you | make 140k base, have an on-call and get 10k extra on-call money. | | Ultimately you end up doing the same work but one of them gets | paid less. | almog wrote: | Do Amazon and Meta get away with zero oncall compensation policy | even in Western European countries? | rb2k_ wrote: | > Some companies hire dedicated tech people whose only job is to | be oncall, handle alerts, and improve the oncall infrastructure. | This role is called 'DevOps Engineer' at some companies, SRE | (Site Reliability Engineer) at others, and may also be called | 'Operations Engineer.' | | Any company that I've seen put SRE/"DevOps"/... as the sole | primary on-call rotation basically just created a glorified | operations team. | | Unless you have shared pain for botched releases, you will never | get rid of these problems. | [deleted] | skeeter2020 wrote: | I don't think most people here understand how labour laws work. | "On call" is a well defined concept and IME universally allowed. | Any Western jurisidictions DOES require you to pay someone when | they actually get called. If it's overtime or not follows the | regular rules for how that's calculated, same with time off and | maximum work periods. | | What we're discussing here is how companies encourage and reward | (or don't) for the inconvenience and impact of staying near your | computer, not going out of town, or being woken up in the middle | of the night. None are going to pay you time and a half on | regular work commitments because you might get called. | | Jobs like a fire fighter are completely different. They work a | scheduled shift and either respond to calls OR do other work | during that time. They're not really on-call as much as | prioritizing work. They also don't get 1.5x for their regular | scheduled work. | [deleted] | wpietri wrote: | I feel weirdly split on this. With my founder/leader hat on and | thinking about my own on-call time, I think of myself as always | on call. I also think it's my job to make it so that on-call | incidents are very rare. | | But when I think about arbitrary companies and people having | regular jobs, I think that of course people should be | compensated. It's labor and we pay people for that. And | especially when it's more than just a few people, having on-call | time and incidents be uncompensated means a broken feedback loop. | The company should have strong incentives to make sure that on- | call people don't suffer for the sloppiness of others. | | Part of that is small vs big, or startup vs established. But | there's some part of me that seems too reluctant to insist on | proper compensation for my own on-call time. Clearly something I | need to chew on before I next take a job at someplace larger than | a few people. | WFHRenaissance wrote: | My thinking is that if you're giving someone equity in a | company, they have a market incentive to ensure that the | product and company succeed. This extends to on-call. | Handcuffs. | almost_usual wrote: | As someone who has played this game the equity is probably | worthless unless it's a public company. I'll grind for | hundreds of thousands annually in RSUs, nothing less. | falcolas wrote: | FWIW, the magic HR word is "accommodation". Neither your managers | or HR themselves will tell you this magical word. And you'll want | to have a psychiatrist to back you up. | | Being on call is super stressful, and if it's causing burnout, | you don't need to keep doing it. Does this increase the burden on | your teammates? Yes. But so would you burning out. | babyshake wrote: | As in, "I have a health issue that requires the reasonable | accommodation of only working 9-5 hours, here is a doctor's | note."? Is it really that simple? | falcolas wrote: | Yes. Reasonable accomodations for physical issues are | mandated by the ADA, and recently they have started applying | it to mental issues as well. | | Its worth going through; the worst that can happen is they | say no (or, admittedly, fabricate a reason to fire you), but | you'll know where you stand. | matheusmoreira wrote: | > Being on call is super stressful | | Absolutely. Being on call means we have to be ready to respond. | Can't ever fully relax, can't make plans that compromise that | readiness. People need to be compensated for that. Where I live | doctors get paid when they're on call. | azornathogron wrote: | You need to be ready, yes. Being unable to relax is IMHO more | of a function of how well/poorly managed your systems are and | your own level of experience and psychological profile. | anon22334556 wrote: | anon22334556 wrote: | At USAA they pay $70 a week for on call. If you're on call till | 2am? Still gotta be in on time. At least when I was there | wgjordan wrote: | Relevant US case law is Berry v. County of Sonoma, 30 F.3d 1174 | (9th Cir.1994) [1], holding that county coroners' on-call time | (requiring carrying pagers and responding by telephone within 15 | minutes) was not compensible under the Fair Labor Standards Act. | | Two key factors are "(1) the degree to which the employee is free | to engage in personal activities; and (2) the agreements between | the parties." Beyond these general factors, no universal rule | applies since the details matter (frequency of calls, response- | time requirements and geographical limitations, etc) in the | degree to which they limit personal activities, as do any | agreements laid out in a contract or company policy (e.g., how | specific 'on-call' requirements and compensation are defined and | agreed upon in advance). | | [1] https://casetext.com/case/berry-v-county-of-sonoma | kodah wrote: | > Some companies hire dedicated tech people whose only job is to | be oncall, handle alerts, and improve the oncall infrastructure. | This role is called 'DevOps Engineer' at some companies, SRE | (Site Reliability Engineer) at others, and may also be called | 'Operations Engineer.' | | I really wish this wasn't stated so matter-of-factly. Neither of | these is actually _supposed_ to be true. A lot of times on-call | gets stuck on these folks because they 're often treated as | second class citizens in the softwarescape. There is really great | structure for doing these roles _right_ that doesn 't involve | them making them full time on-call. | 64StarFox64 wrote: | I just joined a company that does formal but unpaid oncall, | coming from a prior co that had implicit oncall. I'm very much in | the "if you built it, you run it" camp. This said, I think: | | - if oncall is a part of the gig, you compensate _somehow_ | (demonstrably above market salaries, explicit extra pay, time in | lieu, etc); oncall culture (or the lack thereof) should be | explicitly mentioned in any hiring process and employment | contracts | | - the team should be striving for 8 or more engineers in the | steady state; temporary vacancies should be temporary | | - primary should be handling 80+% of pages in the steady state; | if this is not the case on average across the team, you are not | building enough resiliency into your oncall culture, or relevant | tech debt should be high priority | | - relatedly, kpis/incentives should be structured such that as | call gets worse, progressively more immediate investments are | made to address technical root causes (a la SRE error budget) | | I'm tinkering with that last one my head. It's easy to say, hard | to execute | WFHRenaissance wrote: | I actually love being on-call, especially as a part of a "you | build it, you run it" kind of team. | | You essentially get a week to find bugs, fortify the application, | and make it so that the next person has an easier on-call. | | If everyone goes into it with this mindset, eventually on-call | becomes a quasi-freebie week where you can either work on "fun | stuff", or it becomes invisible. | | Not to mention that no product can survive without love and | support from its devs. | anonymoushn wrote: | It's atypical for on-call to come with permission to be any | less productive at your normal duties. | WFHRenaissance wrote: | Not in my experience, but maybe I'm just lucky to have worked | at companies with good on-call cultures. | mehphp wrote: | I don't understand, is on-call not in addition to your normal | duties where you work? | | I don't understand how it could tuen into extra time to work on | fun stuff. | dilyevsky wrote: | On well run teams I've been a part of it was always | understood that you're not getting much done on your main | projects while being primary oncall | mehphp wrote: | Interesting, I've never been somewhere where on call | entailed more than maybe an incident or two to handle | during the week. I can only recall a single instance that | interfered with my current sprint. | | From the replies, it sounds like a lot of places have | constant fires to be put out by those on call? | | That doesn't sound "well run" to me... | dilyevsky wrote: | Imo oncall engineer should seize the opportunity and fix | that warning alert that's been firing for ages _before_ | it becomes an active fire. So consequently there's | usually no lack of things to do even if no fires occured | WFHRenaissance wrote: | I can't imagine telling an engineer to temporarily take more | week on a given week. Every company I've worked at has done | it like this: | | Not on-call engineers work on 20 story points a sprint (2 | week sprints). On-call engineers (if you're on-call for a | week) get 10 points of work + on-call. | __bjoernd wrote: | That depends. If your oncall is spent fixing bugs in your | product - great. If you're just chasing whatever "upgrade | campaign" some other corp team came up with, but forgot to | properly document - not so great. | almost_usual wrote: | I think my wife would divorce me if I told her I loved being | on-call. | WFHRenaissance wrote: | I'm young and unmarried, and I work at a company that I like | and on a product that I use, so maybe I'm a bit too | passionate about my work to be unbiased here lol. | [deleted] | yooloo wrote: | Unless of course you didn't build the system, just inherited it | WFHRenaissance wrote: | Yeah, this is sort of hell on earth in many ways. Hence why I | specified "you build it, you run it". On-call for a legacy | system with no owner and few active devs is hell, and I don't | recommend it. | lmarcos wrote: | > Not to mention that no product can survive without love and | support from its devs. | | That's independent from being oncall. | | I prefer to spend my non 9-5 time with my wife and daughter. | Sadly, many 'innovative' companies out there don't like this | mindset of mine and reject people just because they don't want | to do oncall rotations. | powerhour wrote: | I'd prefer that as well, however I end up being the one woken | up because your (the generic you) code has errors. If you're | not willing to be on call I hope you're at least "willing" to | be terminated if your code wakes someone up every day of | their rotation (yeah, been there). | WFHRenaissance wrote: | This is a management/hiring issue. You (generic you) should | stop hiring engineers who don't give a frick about their | teammates. | lmarcos wrote: | If you are not willing to work outside 9-5, if you are not | willing to sacrifice your scarce free time, then you must | produce perfect bugfree code. Is that? | | I have to give it to the companies and to the whole | devops/agile movement. They have truly convinced us that | being oncall is the right thing somehow. And that non | oncall engineers are a somewhat inferior race. | dvtrn wrote: | Maybe it's not about anting to write perfect code but a | higher business unit that starts wildfires because a | important and influential stakeholder from that one group | of high paying customers complained loudly about | something not working at 2am and next thing you know | there's a "planning for on call" meeting on your calendar | | Ok simplification of affairs here but...I mean... | powerhour wrote: | When the alternative is expecting someone else to give up | _their_ scarce free time, yeah, you 'd better be | producing perfect code or work somewhere that doesn't | care about overnight outages. | alex3305 wrote: | Besides that, I simply need my rest and sleep to relax and be | able to perform again. I love working in a team, as a team. | But work is still work for me. I don't really care about some | paid company holiday weekend or something. I'd rather do | something nice with family or friends. | Mikushi wrote: | > Not to mention that no product can survive without love and | support from its devs. | | If the business wants a stable and well working application, | prioritise it as part of regular dev work. As a dev this is | certainly not my problem. | WFHRenaissance wrote: | There's definitely an incentives issue here. Product just | wants more features. I feel like product managers (bad | product managers at least) need a countervailing tendency in | the form of a resiliency manager or something. | | Actually, a great way of managing this sort of stuff is | implementing error budgets and SLOs. If you app isn't | performant, the next sprint is dedicated to fixing issues, et | cetera. | deeptote wrote: | This is why I started my own contracting/consulting company, I | _hate_ on call and it always gets abused. | | I mean, really, what's going to happen if you can't see the score | of a baseball game until tomorrow? | askafriend wrote: | > This is why I started my own contracting/consulting company, | I hate on call and it always gets abused. | | Hate to break it to you, but starting your own | contracting/consulting company means you are forever on-call. | It just so happens that it's not called that explicitly, and | the people you have to answer to are your customers (aka your | bosses). | lrvick wrote: | That is not true at all. I have active retainer contracts | with several companies providing security engineering and | support. All of them understand I am only reachable when I am | physically in my home office. | | I get back to clients typically within one business day or I | will show up to any meetings scheduled a week in advance. | This has never been an issue. | | I do not carry a cell phone and I make sure every client | knows this. If I am outside my office I am living my life. | happyopossum wrote: | > I mean, really, what's going to happen if you can't see the | score of a baseball game until tomorrow? | | Well, nothing. Unless you're MLB.com and have tens of millions | of people paying you >$100/yr to have that information readily | available. If that's the case, you're issuing credits (which is | a huge time and money sink) _and_ losing customers. | geraldwhen wrote: | Then maybe they should pay people to be ready to fix bugs | 24/7. | | They won't. No one will. They want their salaried employees | to also be firefighters and the premise is absurd. | | I don't take calls at night; you cannot reach me. What are | they gonna do, fire me? | | Good joke. I run interviews and staffing someone who knows | their left hand from their right is nearly impossible. | Leverage works wonders. | tyingq wrote: | Bookies might care :) | lrvick wrote: | Having been on call as a sysadmin for several companies over | 15+ years, starting my own company was mainly so no one can | ever demand I do this again. | teeray wrote: | Keep the extra cash for on-call. If I wanted to trade my nights | and weekends for more money (and weren't contractually forbidden | from it), I would moonlight. I really want a delayed start for | any night incidents to catch up on sleep, and extra vacation. | dvtrn wrote: | _Some companies hire dedicated tech people whose only job is to | be oncall, handle alerts, and improve the oncall infrastructure. | This role is called 'DevOps Engineer' at some companies, SRE | (Site Reliability Engineer) at others, and may also be called | 'Operations Engineer.'_ | | Someone finally said the quiet part out loud about 'Devops | Engineer' as a job title. Only a matter of time before we wise up | about SRE as well, I suppose. | kodah wrote: | I disagree. DevOps Engineer, as much as I hate that title, is | really a sysadmin who can do orchestration code (like ansible | or terraform). They're not supposed to be responsible for any | application code. In a lot of ways they're more like systems | integrators these days, but most of them carry some pretty fine | OS and distributed system chops. | | SRE-SE and SRE-SWE's _are_ responsible for application code and | often embed on application teams to bolster either code or | system performance or both. | | Please do not take companies bastardizing these practices as | truth to what they are. There are companies who do this right | and we should champion them above the garbage. | babyshake wrote: | No you have it all wrong. Regular mid-level software engineers | need to have expertise in dozens of different deep subject | matter areas, but they get a "flexible" vacation policy and a | $50 monthly gym stipend so they're actually getting a pretty | sweet deal. | dijit wrote: | Sysadmins. Those people are sysadmins. | | I don't know why we need to have a job title treadmill for | this; I hate not knowing what _your_ definition of "devops" or | "SRE" is when interviewing. (Both as a person who interviews | others and is interviewed by others). | | Before anyone says it: Sysadmins could code (not to the same | level as feature folk), shitty operators pretending to be | sysadmins couldn't. | BurritoAlPastor wrote: | We didn't make software engineer money when we didn't have | "engineer" in our titles. I would be perfectly happy to be a | "senior systems administrator" or similar if it didn't impact | my earnings potential. | dijit wrote: | That's not my experience. The feature folks used to come to | sysadmin because the money was better. | Aperocky wrote: | So.. what do you call feature folks that also do sysadmin | work? | Jach wrote: | Amazon-style full-ownership software engineer teams? | Aperocky wrote: | Does everyone else not have this? I would be surprised | Amazon is the only company that have full ownership | teams. | babyshake wrote: | You only are allowed to be "Amazon-style" if your stock | grants are demonstrably returning Amazon-style results | for employees. | LegitShady wrote: | Underpaid? | dijit wrote: | Not sure. What would you call a doctor that also fulfils | the duties of a nurse? | madrox wrote: | I don't think there's a way to resolve a semantic argument | like this. Most roles are pretty amorphous, and thinking any | title can totally encapsulate job requirements is prone to | error. Even as an EM, I have to find out what a job's | expectations are during an interview. SWEs are probably the | only engineering role that doesn't have this problem | (mostly). It's been very different everywhere I've worked. | It's different from other fields that have far more rigorous | structure. | | DevOps started as an idea that the development team should be | responsible for operations. Before this, most dev teams | created artifacts that got handed to an ops team to deploy | and be on call for. That idea went to corporations that | wanted to modernize, but you can't just disappear an entire | workforce of admins used to doing things differently. It's a | similar situation to where graphic designers started being UX | designers. These people didn't magically develop a different | set of skills...just a different set of expectations. | GauntletWizard wrote: | The problem is that sysadmin has the baggage of twenty years | of that dude who deals with exchange and active directory. | The rules for interaction with servers under the sysadmin | label were _terrible_ and quite frankly, so were and are a | lot of the people. | | There is a legal requirement (regulatory, but carrying force | of law) for some industries to implement ITSM practices (and | similar, don't quote me on specifics) . There is a | requirement in those practices that Developers not have | access to production, and that Operations have access to the | code. That's incredibly wrong. It's misguided in the worst | possible way - The point is to make sure the two audit each | other, but it requires black box auditing, when you actually | want whitebox auditing. (Note that allowbox and denybox are | not acceptable substitutes here). | | SRE is called SRE because of a difference in those practices. | DevOps is an inexpert redevelopment of those practices. | Sysadmin practices evolved into both, but what's modernly | called Sysadmin is descended from the AD and Exchange people, | and have bad practices. You can't walk back the evolution of | words, you can fix them through evolution as well, but it's | as slow or slower than getting there, because the ecological | niche is already "filled" | dijit wrote: | SRE actually aligns pretty neatly with systems | administration (and thus, in principle ITSM) | | DevOps itself as a concept was born in nebulous | circumstances ("dev-ops days" being where the verbiage | comes from but the founder of that conference called the | job "agile systems administration; and the concepts | espoused by the devops movement being almost exclusively | borne out of the "10+ deploys a day" talk from Flickr). | | Anyway, SRE is not materially different than Sysadmins | _except_ in three dimensions: | | 1. Hire only programmers, none of those operators who click | buttons. | | 2. Treat reliability as if it is its own feature. | | 3. Solidify the contract between feature folks and people | focusing on reliability. | | I'd like Ben Treynor-Sloss to weigh in here as he likely | knows best, but that's the most condense version of what I | understood | | You're right about the exchange people, but they too | suffered title inflation, the exchange folks used to be | called IT technicians. | | The people automating AD deployments across sites and | managing reliability were sysadmins, and they programmed in | the most ugliest of languages to achieve that, | autounattended.xml and bat files for days. | | The tools are better now, but the work that devops/SRE's do | in most companies today is why sysadmins used to do in | 2008- | no_wizard wrote: | If I understand you correctly you mean that they are really | _Operations Engineers_ right? | dvtrn wrote: | I don't know anymore, and honestly I don't care anymore. If | the job wants to call me an SRE fine, if they want to call me | Devops, sure. | | I'm more focused nowadays on "what problems are you hiring me | to solve?" since it feels more and more like the Venn diagram | of the three job titles has nearly completely coalesced into | a perfect circle. | | Difference for me is I'm scrutinizing far more intentionally | in job interviews about why an org is hiring for SRE/Devops | before accepting any offers. Too often orgs are hiring for | this talent and turning them into kitchen sinks for anything | and everything the SWEs aren't doing. | | Compliance? Send to Devops. | | Upcoming audit and need a pen test done in 3 days? Send to | Devops. | | Did a bad job prioritizing bug fixes and now shits crashing? | Devops. | | Etc. once you go through that a few times you start to figure | out the right questions to ask in an interview and figure out | if you're about to join a company with Devops practitioners | or pretenders. | dilyevsky wrote: | > "what problems are you hiring me to solve?" | | Interviewed with dozen of companies over my career - never | been able to get a straight or truthful answer to this | scottyah wrote: | I've experienced similar. What are some of the questions | you ask? | dvtrn wrote: | Take what works you, ignore what doesn't, good luck. | | - Why are you hiring Devops/SRE? | | - What is a Devops/SRE going to bring that isn't/can't | being done by engineers presently? | | - Why isn't it being done presently? What have you tried | so far? | | - How many other SREs/Devops do you have? When will I get | to interview with them (if applicable) | | - Who is responsible for platform? Infrastructure? | Deployments? How are they involved? _When_ are they | involved? | | etc. As mentioned in my last comment, a lot of it comes | through the baptism of working at a lot of really crummy | shops to know the kind of bullshit you don't want to put | up with. You gotta deal with some of it no matter where | you go, but you sure ain't gotta deal with it all. | | This is a lot of boilerplate stuff, sometimes you're | lucky and these questions get answered before you can ask | them, sometimes they're in the job description. So let me | talk about _that_ for a minute. | | You really want to take your interviewing to the next | step? Learn how to inquisitively, but tactfully challenge | what you're reading in job descriptions. The answers I've | gotten have been far more revealing than "what will I be | doing day to day?" if you ask for more details about a | bullet point or two and why those bullet points matter, | or who they matter to. That includes, yep, on-call. | | Most of my other questions are very probing questions | about things in the job description; not necessarily | because I'm looking for a specific answer, I want to see | how the hiring managers and others describe those topics. | Can they actually talk about why they're looking for | someone to do x, y and z? Can they have a meaningful | dialogue about what those responsibilities mean for the | team or are they just parroting back what the job | description says, like someone in a zoom call just | reading words off a powerpoint slide? | | Here's an example: | | Job says they want a Devops to come in and also be | responsible for security, risk and compliance in the | infrastructure? Okay, here's my counter-inquiry about | that: if Devops has the responsibility for security, risk | and compliance, talk to me about the authority Devops has | to recommend or deny certain actions in the platform if | it is assessed to be too risky or costly to maintain a | compliant and secure posture were we to do it anyway (if | you've ever been in that unenviable position, you | probably know _exactly_ what I 'm getting at with this | question). | | Interviews are two way streets, and in my thirties with a | family where "family time" has no fungible cost, I'm | driving very defensively on my side of the street. | wpietri wrote: | That's certainly what I've seen! I think the DevOps paradigm | was a possible revolution in how we worked. But pretty | quickly a lot of places just slapped the new label on the old | sour wine. | notesinthefield wrote: | And suddenly I understand why the worst tech job ive ever had | as an Ops engineer was so bad. We really only existed to | improve alerting, pipeline and wake up other engineers at 3am. | dilyevsky wrote: | Majority of companies i talk to are really poorly run wrt to | software operations. Case in point - misusing devops term to | mean sysadmins/operators | lmarcos wrote: | A sincere thanks. As a software engineer I couldn't care less | about what happens to my company's services/products outside | the 9-5 time range. Don't get me wrong, I give myself 100% at | my job, keep myself educated regularly and I'm rather on the | "boring and stable stuff" side of things (instead of the | "shiny/trendy and unstable" side). I have commitments outside | work and no amount of money is going to make me give more | than the (already exhausting) 40h/week my contract states. | The "you build it, you run it" may work for people on their | 20s (they usually are excited to earn "easy money" by being | oncall). For people on their 30s and above the extra oncall | money is not worth at all. | wpietri wrote: | I certainly believe that's true for you. But in the case | where engineers choose not to ever run what they build, how | do you reconnect the feedback loop? | | Put differently, I think one of the ways somebody goes from | the "shiny/trendy and unstable" side to the "boring and | stable stuff" side is by experiencing the operational pain | of their choices. If the pain falls on others, will they | still learn? | | Of course, the way you talk about your job makes me wonder | if you are already experiencing so many systemic/managerial | issues that there the feedback loops are already pretty | broken, so this one may not make a ton of practical | difference. | lmarcos wrote: | > If the pain falls on others, will they still learn? | | I think that depends on the seniority of the | individual/team. In my experience, of course one can | still learn. | | To give you a real example: years ago one of our systems | went down on a Sunday morning and our team had no oncall | people. The infrastructure team was the one who fixed the | issue (don't remember the exact underlaying issue, but it | did make clear one aspect of our service we didn't | properly: signal handling). Next morning the team wrote | down a Jira issue to improve the way we handle signals. | Ticket got prioritized very high and was fixed the very | same day. | | Now, what would have happened if the issue that Sunday | morning was due to a bug in the software our team wrote? | The same thing. The difference is that infra team would | have no clue on how to fix the thing and would have to | revert the service to a previous stable version. Would | the business be fine with it? In our case, yeah. As a | matter of fact, they didn't want to spend the extra money | hiring ops people for each team to be on call. You see, | if the business really cared, they would immediately have | hired a software engineer willing to be on call... They | just didn't care that much (and they couldn't force the | current team to be oncall because our contracts didn't | specify so and the average age in our team was around 35, | and nobody wanted to be on call). | wpietri wrote: | I believe they _can_ still learn if they are senior | enough and compassionate enough. And if they have | management competent enough to let that work. But what | percentage of teams would you guess fit that? I suspect | that leaves a lot of on-call staff suffering from bad | software. | bfung wrote: | I hear that it can work w/good senior engineers at the | helm - I'd prefer the scheme you described as well. | | But how did the senior engineer learn to handle those | situations in the first place? | michaelt wrote: | _> But in the case where engineers choose not to ever run | what they build, how do you reconnect the feedback loop?_ | | Personally, if I get paged at 3am due to a bug, I'm going | to fix it regardless of what the 'backlog' and | 'prioritisation' and 'sprint goals' and 'feature roadmap' | and 'product owner' say I should be doing. | | But some would say I should not be bypassing the process | in that way, and that the feedback loop of external | stakeholders making requests to the product owner is more | than sufficient. | trombone5000 wrote: | Engineers can run what they build during normal working | hours. | | Oncall is a scourge not because of the experience of | technical problems, but because people already working | full time have to arrange their lives outside of work | around a second "oncall job". A job which occurs after | hours, one out of every X weeks. | | A dedicated, pure "Ops" night shift (perhaps in another | time zone) would be more humane. | dilyevsky wrote: | Then build it in a way you almost never have to plug in | outside of business hours. | trombone5000 wrote: | Even if it were built perfectly, if engineers are still | on-call, they would have to arrange their after-hours | time around the _possibility_ of an incident. | dilyevsky wrote: | That's true but it's just a reality of being employed by | a saas company these days. Customer support, sales, etc | have those too (and usually less formalized and unpaid) | so why are engineers immune to this? You can still | probably find some shops that ship an offline | distribution but that's becoming more rare. | kqr wrote: | > Engineers can run what they build during normal working | hours. | | In my experience, this leads to design that pushes | problems to outside of working hours. | | "We don't need to fix that edge case, just have the off- | hours ops team do a manual workaround every now and | then." | | Or "What does it matter that the deployment is error- | prone? We can just schedule it with the off-hours ops | team." | arwhatever wrote: | "How about the whole team makes engineering decisions as | though you're unable to contact us after hours, or as | though doing so were particularly costly." | dvtrn wrote: | What, and break down all the monitoring and alerting | silos we built by hiring a Devops engineer to come in and | break down the development and infrastructure silos that | were built when the company went ham adopting "Capital A" | Agile? | danielheath wrote: | "You build it, you run it" works just fine if you're | building something that doesn't fail all the damn time. | | Work has had three out of hours pages in the last two | years, all self resolved within a few minutes. | cbarrick wrote: | Google's oncall compensation structure is phenomenal. | | For tier 1 oncall (5m response time), for each hour oncall | outside of working hours, you are compensated for 40 minutes, | which you can either take as time off or at your current pay rate | (i.e. you are compensated at 2/3 your usual pay). | | For tier 2 oncall (30m response time), the compensation is 20 | minutes per hour outside of working hours. | | For a tier 1 rotation, the team has a staffing requirement of 12 | people, split between two sites. There's a max of 80h oncall, | outside of working hours, per person per quarter. Because oncall | is split between sites, you are never oncall overnight. | nighthawk454 wrote: | Better than Amazon, where you get nothing extra. And often do | regular duties during on call as well. Kind of nuts. | | The saving grace is a lot of teams aren't really doing anything | that critical, so the on call is more a formality bc that's | what real teams do. Still pointlessly stressful but less | serious. | [deleted] | Cyph0n wrote: | > Because oncall is split between sites, you are never oncall | overnight. | | Doesn't this only apply to SRE rotations? The dev teams I know | of are definitely oncall overnight. | fishywang wrote: | that only applies to tier 1 oncall. if they are oncall | overnight they are most definitely tier 2. | Cyph0n wrote: | Ah, that makes sense! | soneca wrote: | I always assumed that pay for hours outside of regular working | hours would be higher than regular pay. | hbhakhra wrote: | The pay for outside working hours applies whether or not you | are getting paged. That's 128 hours / 3 = 42.67 hours of | extra pay during an on call week. The on call week also gives | incentive to fix technical debt and build a more stable | production system so you don't get paged. | soneca wrote: | Yeah, makes sense. Forgot the detail that most of on call | hours are not strictly working. So Google scheme seems fair | geraldwhen wrote: | Waiting to work is working. Would a hospital surgeon only | charge for time holding a knife? Don't be absurd. | joshuamorton wrote: | If I'm at home cooking dinner, I'm not "waiting to work" | though. | | Yes, you cannot go on a hike, which is why you get paid. | You don't get paid more than you do for your normal time | working though. | shadowofneptune wrote: | Hospitals do not pay more than normal work hours for on- | call, though they do usually pay some amount. | | https://physiciansthrive.com/physician-compensation/on- | call-... | yegle wrote: | For non-business hour oncall, you usually only need to | mitigate with minimum effort. E.g. for a typical overload | situation, up sizing the pool or getting an emergency ceiling | loan is enough, and you can offload further preventative | measures or root cause investigation to the next oncaller | when they are in business hour, or wait until next Monday. | sidlls wrote: | "Phenomenal"? Hardly. The base expectation outside of tech is | time-and-a-half for each hour over 8 in a day, or 40 in a week. | leetcrew wrote: | it's an apples to oranges comparison. not many jobs that pay | time and a half for overtime have mid-level ICs making $250k+ | before overtime. | sidlls wrote: | The base salary is literally irrelevant to this discussion, | which is about compensation for hours worked outside of | normal business hours at whatever the rate is. | dasil003 wrote: | For actual work, not for being on call with the expectation | that most of the time nothing will go wrong | sidlls wrote: | On-call requires you to more or less not plan anything | _other_ than being available for work. Sure most of the | time nothing goes wrong--but that isn 't the constraint, | here. The whole point is that something _might_ go wrong | and that the person on call _must_ respond within a given | window of time (5-15 minutes, generally). That effectively | makes even mundane things like going to the grocery store a | potential trade-off in favor of work. I definitely consider | every hour of the day I 'm on call (all 24 of them) as a | working hour, and so should every other engineer. Since | tech companies get away with not paying for this service, I | take off from normal working hours at a rate of 1.5 times | the time I spend resolving an on-call alert. I'd rather be | compensated with cash for it. | vageli wrote: | Do firemen also not work given that a considerable amount | of their time is spent waiting for a call? | dasil003 wrote: | Do you sleep at the office for your oncall shift? | skeeter2020 wrote: | they don't work on-call over night. | khuey wrote: | It's common for firefighters to work 24 hour shifts. | stickfigure wrote: | Professional firefighters spend a lot of their "waiting" | time training, writing reports, fixing the apparatus, | sharpening shovels, cleaning chainsaws, etc. It isn't the | same. | ok_dad wrote: | For a 5m on call time, I would literally have to be | sitting at my computer with slack open reading hacker | news. Yes, it's basically the same thing. | noodleman wrote: | I'm currently on call. | | A 5 minute response time means to respond to the call out | and start working on it. If you're on call, you should | have a suitable WFH setup and it should be on standby, so | 5 minutes is ample time. It doesn't means you have to | have it resolved within 5 minutes of being called out, | that would be absurd. | ipsi wrote: | In the EU Working Time Directive, it differentiates | between the concept of "On Call Duty" and "Standby Duty," | where the former is what this post is about, and the | latter is generally reserved for when an employee is | required to remain on the premises of their employer | (e.g., being on-site overnight to immediately respond to | emergencies). The primary difference is that On Call does | not count as working time unless you get paged, whereas | Standby Duty _does_ count as working time, even if | nothing happens. Within the EU, that means that Standby | Duty counts against working hours allowed by the EU | Working Time Directive and does not count as rest - e.g., | the German Arbeitszeitgesetz limits workers to 10 hours | per day (hard limit), and requires 11 hours between | working periods (some exceptions that I don 't believe | are relevant here). | | However, according to recent ECJ decisions[1][2][3], | "Standby Duty" is not reserved _exclusively_ for when the | employee is required to remain on-premises, and it also | depends on the degree to which the freedom of the | employee is curtailed, specifically stating in one | ruling[2]: | | > ... | | > 32 In the third place, and as regards more specifically | periods of stand-by time, it is apparent from the case- | law of the Court that a period during which no actual | activity is carried out by the worker for the benefit of | his or her employer does not necessarily constitute a | 'rest period' for the application of Directive 2003/88. | | > ... | | > 36 Second, the Court has held that a period of stand-by | time according to a stand-by system must also be | classified, in its entirety, as 'working time' within the | meaning of Directive 2003/88, even if a worker is not | required to remain at his or her workplace, where, having | regard to the impact, which is objective and very | significant, that the constraints imposed on the worker | have on the latter's opportunities to pursue his or her | personal and social interests, it differs from a period | during which a worker is required simply to be at his or | her employer's disposal inasmuch as it must be possible | for the employer to contact him or her (see, to that | effect, judgment of 21 February 2018, Matzak, C-518/15, | EU:C:2018:82, paragraphs 63 to 66). | | And while I'm very definitely not a lawyer, I think it's | possible (likely, even) that having to be at a computer | and working within 5 minutes of a page, even at 3AM, | would constitute significant constraints on the worker | and turn it from "On Call" to "Standby Duty", although | the exact implications of that will vary from country to | country. | | All of that to say that I think that 5 minutes is | absolutely bonkers as an expected response time. If I | were subject to that, I wouldn't be able to leave my | apartment for the duration I was on call - it takes me a | lot more than 5 minutes to get to and from the | supermarket or even the coffee place just outside. Even | taking out the trash could take > 5 minutes (and with no | cell reception, due to being underground). | | [1] https://home.kpmg/xx/en/home/insights/2021/03/flash- | alert-20... | | [2] https://curia.europa.eu/juris/document/document.jsf;j | session... | | [3] https://eur-lex.europa.eu/legal- | content/EN/TXT/HTML/?uri=CEL... | | [4] (WARNING: auto-download PDF) https://ec.europa.eu/soc | ial/BlobServlet?docId=6474&langId=en | ok_dad wrote: | I understand that, my point is: you're still sitting at | home when you could be out doing other things. It then | should be paid as regular or OT hours, not 2/3 or 1/3 of | regular pay or anything like that. | ramraj07 wrote: | It's not 5 seconds! There are definitely a few activities | I do at my home that I can't drop in a few minutes notice | (extended toilet break?) but I I can think of a ton of | things I can do that would still let me be able to start | working on my pc with a few minutes heads up. | ok_dad wrote: | I have a kid, so sometimes I can't drop what I'm doing. | If I am required to be on-call at 5min response, that | means I'm hiring a nanny/babysitter. That's what you all | don't get here, people have complex lives outside of work | and workplaces should not be shortchanging you or I in | order to scrimp and save on customer support. | | If it is important for the application to be up 24/7, the | company needs to pay for it at the usual rate! | remus wrote: | That's not the same as on call though, that's working extra | hours. | | With the pay structure described above I assume this is | applied outside your normal working hours, where you're not | doing anything other than being on call. | R0b0t1 wrote: | Oncall is working. I expect to bill oncall hours at at | least time and a half. | trimbo wrote: | Have you ever successfully billed oncall hours as | overtime when you weren't called? | Jabbles wrote: | It's not. It's ridiculous to expect to charge _more_ than | normal work for oncall. And your expectations are | misplaced, as TFA shows. | R0b0t1 wrote: | Disagree, as do others. If my movement and activities | will be restricted then it is full | employment/utilization, not some quasi-employment or | utilization. I didn't pull this out of thin air. | | Someone has conned you into accepting less. I'm sorry. | Jabbles wrote: | Ah, I see you are talking about an alternative universe. | ok_dad wrote: | Thanks for trying, I think some people take pride in | living to work and they take offense at the idea they've | might have been suckers for life. | | I agree with you fully, on call time should be | compensated at the usual rates, including overtime. | Jabbles wrote: | But why? Why do you think oncall should be paid the same | as full work? Perhaps you have a different definition of | oncall than me, where you expect to be paged once or | twice a week, and spend maybe an hour or so fixing it | each time? | | Why would I _not_ charge less for this than real work? It | involves much less actual work. | decebalus1 wrote: | I've been doing on-call for more than a decade and I feel | I need to offer my perspective here. I worked in teams in | which I would never get paged and also teams in which I'd | get 100 alerts per week. | | > But why? Why do you think oncall should be paid the | same as full work? Perhaps you have a different | definition of oncall than me, where you expect to be | paged once or twice a week, and spend maybe an hour or so | fixing it each time? | | When I'm oncall, I need to cancel all my social | engagements for that week and delegate all my errands and | such to my partner. Also not drink or take any mind | altering substances. I must be 'ready' at any time of day | or night. I (as well as others) sleep in the same bed | with my partner. If my phone rings due to an alert, my | partner is also woken up. So I need to sleep in the | living room for a week. From the start, this affects my | personal life to the extent that it would be unfair NOT | to compensate me extra. It also affects my family way | more than a regular desk job should. | | You're mentioning the expectation to be paged once or | twice a week. If those pages come at odd hours and you | need to fix them on the spot, no exceptions, failure is | not an option, etc.. it's still very disturbing to your | personal life. Additionally, that's a parameter which is | well outside of your control. I've seen oncall shifts | which turned from '1-2 pages a week' to '5-10 pages a | day' after the product finally got in the hands of | regular users or after the team grows in size and code | contributions grow suddenly. Or even better, when you're | doing such a great job that your boss promotes you in the | oncall tier and now you also get to do triage for alerts | coming for the whole organization. | | The volume of the alerts don't and shouldn't matter. If | you're oncall, you're oncall, you have a responsibility | to be available at all times, rain or snow, night or day. | This deserves compensation. Some companies (some I've | been lucky to work at) implement some sort of follow-the- | sun oncall shift and you at least get to have your sleep | and generally minimal impact on your personal life. That | is great and does not deserve extra compensation, because | your work hours aren't altered at all. | | I'm sad that labor rights in the US don't consider this a | norm. But it's not surprising, considering we did have | dedicated engineers at one time who were paid to watch | and maintain the health of the livesite 24/7. But then we | figured we'd make regular engineers fuck their sleep | cycles by adding oncall to the list of responsibilities, | because it would be cheaper this way. And everybody | agreed, because 'full-service ownership'. | ok_dad wrote: | I'm arguing that the "5 minute response" on-call should | be at regular or OT rates. If your on-call rotation is | like a 1 or 2 hour response time, then I could see it | being less, but the problem is that I've been at a | company where the on-call was previously "whenever you | get around to it" and later they changed it to "within 30 | minutes" and I was not compensated any further even | though it killed my life anytime I was on-call. | | Why _I_ believe it should be at the full-rate: because I | don 't trust the company culture to stay the same over my | tenure there. My expectations for a "shit company" have | to be the same as my expectations for a "good company", | because a good one can turn to shit quickly. | Tao3300 wrote: | > Someone has conned you into accepting less. I'm sorry. | | The Kool-Aid was _really_ good though! XD | ramraj07 wrote: | Start a company, make this a policy and advertise. If | engineers truly care about this, they'll come to you. | Perhaps they just care about total compensation And their | RSUs more than this minutiae? | skeeter2020 wrote: | >> I didn't pull this out of thin air. | | Except you did. There are pretty specific legal | definitions of "on call", what it means and when you get | paid for it in almost every jurisidiction. I've never | seen one that pays you time and a half for being "on | call". This is not the same if you get called and | actually work overtime; that's regular rules. How a | company entices (or doesn't) for taking a shift is up to | them. | joshuamorton wrote: | You are paid 2/3 for time spent at home playing with your | kids on the weekend. | | Unless you are working half the weekend, every weekend you | are oncall, the tier-1 OCC policy wins over time-and-a-half | for time worked. | thfuran wrote: | I'm having a bit of trouble reproducing your results. How | exactly are you coming up with 2/3 > 3/2? | joshuamorton wrote: | Because, and I can't stress this enough, if I am at home | cooking dinner or reading or playing video games, _I am | not working_ , so 2/3 of my entire weekend is more than | 3/2 of time worked unless I am working 9-5 all day Sunday | responding to pages, which no one is. | | Time and a half for hours worked is only > that 2/3 for | time not worked if you're working 50% of the time, which | you aren't, at least not regularly. | Tao3300 wrote: | > home cooking dinner | | What if the call comes right then? Now dinner is fucked. | They'd better pay a lot to go messing with my outside | life. | joshuamorton wrote: | I would not cook a risotto while on call, but most | dinners are not "fucked" immediately if you have to walk | away with a few minutes notice (esp if you have a | partner/roommate, but even if not) | | This is like the equivalent of saying dinner (or your | day) is ruined if someone knocks on your front door | unexpectedly. No it's not. | | And yes, you're getting paid 2/3 of your (large) salary | for the possibility of this inconvenience. | Tao3300 wrote: | Should be more if you ask me. I'm only going to get so | many risottos in my life, but software will always be | busted. If that's what employee lives are worth to | Google, well, I guess that explains some things. | babyshake wrote: | I would say it makes sense to make the oncall pay based on the | number of pages you get or some other metric but that would | just create some unwanted incentives and problems. It's | probably good to think of paying engineers for oncall time they | are not spending putting out fires as a form of reward for | setting up their systems to be reliable. | cbarrick wrote: | That's a pretty perverse incentive. Why fix the thing if I | get paid more if it's broken? Why tune the pager to be quiet | if noise equals cash? | | The pager should be tuned to your SLOs, and you should be | incentivized to exceed those SLOs. | dbcurtis wrote: | My data is pretty old, old enough that the person on call | carried a pager (remember those?), but very similar comp | structure. I remember, because the on-call costs hit my budget | directly. | | 1. Time on-call was paid at 25% of normal hourly rate. (Maybe | holiday premium boosted the base rate? I can't remember.) 2. | Issue-resolution pay was the normal overtime rate, including | shift premium and holiday premium, from the time the pager went | off until the issue was cleared. 3. The person on call had to: | a) be able to get to the plant in 20 minutes, if necessary, but | remote support was perfectly fine and paid the same. Only | resolving the issue mattered, not where you did it. 4. The | person on call had to remain sober and work-ready the entire | time on call. | | It's 3 & 4 that justify the 25% pay for carrying a pager. | Friend having a party? I'll have cranberry juice, thanks. Fresh | snow in Tahoe? I'll have to miss it this weekend. | | Restricting someone's movements and social life without | compensation is simply abusive. As an industry, we need to | stop. | cletus wrote: | Facebook's oncall compensation is really simple: it's zero. | | Having also worked at Google, I found this situation ridiculous. | Facebook treats oncall as something you're just expected to do | _on top of everything else you 're meant to do_. So if you have | 50 alerts fire, 40 tasks create, 5 UBNs (UBN = Unblock Now, which | should be responded to immediatley and will probably be a SEV) | and 3 SEVs, well you just have to do all that and your job. | | Google oncalls (IME) tended to be fairly light. You'd often do | releases too but there tended to be a lot of automated processes | around this (ie building binaries, packaging MPMs, release to | staging, release to canary, regression detection, push to | production). | | Facebook's releases (other than Web) were (again, IME) a dumpster | fire. | | Web was a special case because of continuous push. Push a commit | and automated processes would build the (very large) www binary | and handle the push to C1/C2/C3 (these are sort of analogous to | internal testing, canary aka 1% and prod). Automated processes | would verify a commit by deciding what tests to run. This wasn't | explicit and would miss relevant tests for various reasons. This | could (and often did) break trunk. This could back up pushes for | hours. First thing in the morning it may take as little as 2 | hours to push to prod. Later in the day it might take 8+ hours. | | Facebook works around this by using conditional code, like... _a | lot_ , meaning certain code would only run if you're in right set | of GKs (gatekeepers) and QEs (quick experiments). Behaviour would | be flipped on by a separate GK/QE push, which is much quicker. | | But this means when something of yours breaks (which it often | does) you have no idea why. Is it a bad code push? A bad GK/QE | push? By you? Or some infra you depend on? | | I mention this because you had to deal with this sort of thing | oncall _a lot_. | | The problem with not giving oncall compensation is that the | burden is never shared equally. The person or persons who do more | than their fair share are never going to do it for the money | because it is annoying but at least the money is some form of | recogniation or, dare I say it, _compensation_. | | Disclaimer: Xoogler, Ex-Facebooker. | [deleted] | michaelt wrote: | This article is missing the single most important question about | being on call: _how often you get called_. | | It's one thing to be on call where you get called 2-3 times a | year, because you're working on a quality system where bugs get | fixed more often than they get introduced. Then the pay, if any, | is mostly compensation for hurting your social life. | | It's another to be on call where you get called 2-3 times a week, | because the organisation has decided calling you is cheaper than | fixing the underlying problems. In that case, the compensation | better be worth messing up your sleep cycle and upsetting your | partner. | leetcrew wrote: | disagree. I need to get paged a lot before it becomes more | impactful than planning an entire week around an engagement | SLA. | ironmagma wrote: | So that's your personality. What about the rest? | GauntletWizard wrote: | Google had fantastic software quality and still had SRE teams | expecting to be paged twice a week. They had that because they | had tremendous software quality; they paged well before there | was impact that users would care about, and proactively spent | time fixing their problems. Being paged, usually during | daylight hours, allowed good bugs to be filed. | iasay wrote: | Oh yes nailed it. | | That's one problem with a fixed on call rate that some | organisations offer. It's a hefty chunk of cash and sounds | generous to the engineers. But the cost is already known and | sunk up front and not proportional to the amount of call outs | so the business sees it as a fixed operational expenditure | rather than an appraisal of how fucked things are. | | The performance metric quickly becomes how many people you | still have on cover who haven't quit to work somewhere else | because they are burned out. | ripper1138 wrote: | Not always that simple. 2-3 times a week is nothing! Try being | on call in AWS, or any product/service at that scale. How often | you get paged has less to do with your organization and more to | do with the scale of your systems and business. | dilyevsky wrote: | I've been a part of borg oncall at google - software that | manages 90+% hardware there (and there are a lot of | hardware). There were week long stretches without any pages. | Dont ship garbage software and it'll be alright at any scale. | Tao3300 wrote: | Yeah, but what the hell is possibly important enough to wake | up someone's family more than 2-3 times a week? | morelisp wrote: | This data is presented in a really frustrating way. | | First, I suspect (but I'm not certain) most companies do it as X% | of salary. So I have no idea if I'm looking at truly different | on-call policies or rather salary spreads. | | Second, there's no associated estimation of how much work "being | on-call" is. For us, a small team with SWEs doing voluntary on- | call, any out-of-hours page is _immediately_ top priority for | work the next day. The person on-call also gets the final say | over risky deployments after lunch / on Friday. I know that's | not universally true, and we've worked with companies that | consider a page a week or even more normal (still without a | separate SRE/OpsEng team). If any of us was getting paged once a | week, we'd refuse. | ipsi wrote: | Well, Google is explicitly listed (about halfway down) as | paying a percentage, and they're the _only_ ones that are. In | my (very limited) experience, it 's generally been a flat rate | regardless of salary, so I'd go the other way and believe that | the majority do, indeed, do that. | | Your second point is definitely a major concern, though - the | author talks about it (calling out Amazon and Twilio as | particularly bad), but doesn't provide any sort of hard data on | what the workload is like, possibly because it varies heavily | even between teams or groups within the same company. | morelisp wrote: | I know the rate for three other German companies are are | percentages. I think time and a half for "activation" is | relatively common. I'm less sure about inactive time. | cnj wrote: | In my experience, a week-long rotation is much more grueling than | a daily rotation. Having to stay home for a single evening/night | has much less impact for me than having to do it for a whole | week. | | Additionally, the impact on personal live of being Oncall on the | weekend is bigger. At commercetools, we recognize this by paying | more for an Oncall day on the weekend (200 EUR on Fri/Sat/Sun) | vs. a day during the week (150 EUR). | dboreham wrote: | Quick note that as an employer you may be subject to local | employment laws in this space. Particularly true in the US. | Lukas_Skywalker wrote: | In chapter 3, the table labelled as ,,Companies paying 600-1,000 | USD/EUR/GBP per week." includes German KfW Bank which apparently | pays EUR875 per _day_. Is this a typo (they are in reality paying | this amount per week) or are the engineers on call only one day | per week (making the amounts per day and per week the same)? | ju-st wrote: | It is probably 875EUR/week normal salary for an IT operations | job at a German bank. | yewenjie wrote: | Why is part 1 of this article paywalled and not this? ___________________________________________________________________ (page generated 2022-08-07 23:00 UTC)