[HN Gopher] Tell HN: I salute everyone on call/working support t...
       ___________________________________________________________________
        
       Tell HN: I salute everyone on call/working support through the
       holidays
        
       Thank you for keeping systems available and safe. I've been there
       many times in the past, including having to fly at the last minute
       to a non-internet-connected data center in NJ to babysit an
       emergency production bug fix that took the entire holiday to
       create, install, verify, and monitor.
        
       Author : waynesoftware
       Score  : 370 points
       Date   : 2023-12-21 18:54 UTC (4 hours ago)
        
       | maerF0x0 wrote:
       | Yes, absolutely thanks to all who keep our world running when no
       | one is looking. To keep the yule log on Youtube, to keep our
       | christmas tree lights on, to keep a fresh glass of water from the
       | tap, warm natural gas to keep the freezing cold outside etc.
       | Thank you for keeping society ticking away :)
        
         | akomtu wrote:
         | Let's not confuse on-call firefighters or a water facility
         | staff with the on-call admins that maintain money-making
         | machines monetizing attention of billions. The latter is a net
         | negative on society.
        
           | kridsdale1 wrote:
           | Not a fan of the YouTube Yule Log, I see.
        
           | krallja wrote:
           | Yeah, how dare Netflix provide entertainment on-demand and
           | for cheaper than the other entertainment companies?
        
             | op00to wrote:
             | I am currently viewing this on ethically sourced rfc1149
             | (birds gave consent via a scientifically proven "brain
             | electrode interface"), manually decoding packets using an
             | abacus made out of various animal droppings foraged on the
             | forest floor. If I can't view your content this way, it
             | should not be on the internet.
        
           | oceanplexian wrote:
           | That firefighter is probably using YouTube or scrolling
           | through Instagram to unwind while they're stuck at the
           | station waiting for a call. Just because someone works in
           | entertainment or ads doesn't mean that the economic puzzle
           | piece they represent isn't valuable to society.
        
         | kridsdale1 wrote:
         | I'll give a shout out too to everyone in the military
         | monitoring warning systems and maintaining stance to protect us
         | from being killed while we're with our families.
        
       | sleazebreeze wrote:
       | My wakeup alarm this morning was 9am when OpsGenie let me know
       | I'm on-call today. Praying for peace.
        
       | frakt0x90 wrote:
       | In a similar vein, I'm grateful for the people who maintain the
       | foundational pieces of our digital world that often go unnoticed
       | like date & time systems.
        
       | isoprophlex wrote:
       | Big up the on call heroes! Hope you're getting paid well, hope
       | you get no red lights on the bug hotlines.
        
       | chasd00 wrote:
       | > Thank you for keeping systems available and safe.
       | 
       | theres that word "safe" again. What systems are dangerous
       | otherwise? Do you mean like traffic lights or something? The API
       | serving ads to your mobile game isn't dangerous.
        
         | dotnet00 wrote:
         | 'Safe' in the context of systems can mean hacking attempts,
         | safe from data leaks and other emergencies relative to the
         | system that may arise. It can refer to things that are
         | dangerous for the system itself.
        
       | tecleandor wrote:
       | On call till 31st, so please don't hit refresh too much this days
       | ;-)
        
         | Scoundreller wrote:
         | Me too, but they pay a few bucks an hour to carry the phone so
         | at least that adds up.
         | 
         | Ultimate trick is to have a diverse team. Someone that doesn't
         | care about Christmas but absolutely needs some random day off
         | in March (cool with us!). Someone that celebrates new years
         | some other time.
        
           | kridsdale1 wrote:
           | Speaking of diversity, if you don't do Christmas dinner I
           | strongly recommend ordering takeout from a Chinese place on
           | the 25th! There's lots of happy photos of Chinese chefs and
           | Jewish customers doing a Christmas fist-bump.
        
       | rollcat wrote:
       | Two live video productions, including one on the evening of 31st.
       | I managed to push back on last minute infra/workflow changes (:
        
       | mkhnews wrote:
       | Thanks, been there many seasons, and same to you all.
        
       | datpuz wrote:
       | For some of us, we look forward to the peace and quiet
        
       | tetha wrote:
       | Yeah, we discourage production changes starting first or second
       | december week, and start freezing changes third december week
       | until it's frozen solid fourth december week until second week of
       | january.
       | 
       | December tends to be hell for our customers, so stability should
       | be a priority there.
       | 
       | And honestly, no one wants to work on holidays. So lets just wrap
       | everything starting in december, maybe use the third week for
       | some unnoticed issues and then just lay down the tools. Use that
       | time for documentation, or shorter days, quite frankly.
       | 
       | That way we minimize the on-call situations occuring. Let's hope
       | it goes well for the engineer this year as well. We have a streak
       | to keep.
        
         | ainiriand wrote:
         | We do the same, I work in logistics software and we usually
         | freeze early November up until Christmas.
        
         | bertil wrote:
         | I think that's a great policy as it's clearly intended to help
         | people when they need it, and get people to unplug when it's
         | valued by their loved ones.
         | 
         | _However_ (that part is probably best bookmarked until Jan
         | 2nd), it also betrays that your system is brittle and can be
         | broken by a bad commit. Don't do it because you want people to
         | grind until Dec 24th at 6 pm. Do it because it's great the rest
         | of the year, too. I'd recommend you look into (or ask me about)
         | feature flags, alerting, and automated roll-backs.
         | 
         | The short version is: there's a meta-system on top of your
         | release process that can tell (if you are using roll-back not
         | features flags): - commits until xyzsdf are fine; - roll-outs
         | starting from commit abcdef have a 2% error rate, 80% on
         | Android; - revert to xyzsdf, send a message (low-priority,
         | email) to the DevOps on call and the author of abcdef that it
         | happened; - for all commits after abcdef: if there no conflicts
         | with xyzsdf, re-try to roll them out; - if there is a conflict
         | because they were on top or abcdef, send a message (low-
         | priority email) to the authors that there is a conflict.
         | 
         | There are more sophisticated versions that can do things like,
         | if you use feature flags, flagging Android users to use the
         | previous version. Another way to do this is to scale who has
         | access to abcdef gradually: say 1% every hour, and revert if
         | you detect issues.
         | 
         | All those seem daunting to teams that haven't worked like this
         | before, but it my experience, they love it very fast.
        
           | yardstick wrote:
           | How do you detect errors like this?
           | 
           | What is an error? Is a business logic bug going to be picked
           | up by this process automatically, or is some manual steps
           | involved?
           | 
           | Ie a point of sale app releases an update that automatically
           | halves the amount to charge, but displays the full amount to
           | the merchant in the UI. Unit tests pass (because an engineer
           | made a human mistake). Backend calls are correctly used, no
           | errors thrown, simply the wrong amount is used.
           | 
           | How would this be automatically detected and reverted?
           | 
           | Would anyone writing point of sale software want to risk this
           | over one of the biggest trading periods of the year?
        
             | codebolt wrote:
             | Yeah, that model may work for many public facing apps, but
             | probably less so for enterprise systems that are heavy in
             | business logic.
        
             | bertil wrote:
             | As you point out, it really depends on what is an error.
             | Most of the companies I know of have a Holiday freeze are
             | video games, casual ones, even. Changes are minor fixes and
             | optimization--glitches that a player likely won't notice,
             | but you want to detect them early to avoid losing your
             | ability to detect more.
             | 
             | Back-end tools are different, and I definitely see reasons
             | other than bugs to not change business logic this month.
        
           | tetha wrote:
           | We use these systems liberally on other times of the year and
           | no one notices, usually. If they do, downtime and
           | interruption budgets handle this.
           | 
           | /However/, let me counter with the point: Just one of our
           | customer has 8000 FTEs working with our system. During hell-
           | time (aka, December and Christmas shopping and shipping),
           | each of those dudes spends their shift taking customer calls
           | lasting 2-4 minutes, which in turn require a few requests
           | into our systems.
           | 
           | Due to the stress of their customers^2 (because it's
           | Christmas and holidays and such), if an agent of a customer
           | is unable to access our systems, they cannot handle the use
           | case of the customer^2 and that will piss of the customer of
           | the customer.
           | 
           | So if we push a bad change during this time, we're going to
           | piss of hundreds of customers^2 per minute for that one
           | customer alone. Even with a fast automatic rollback, that's a
           | long time during hell-time. And they have people who know how
           | to yell at vendors in nasty ways who don't like that.
           | 
           | I enjoy moving software fast and enabling moving software
           | quickly, but customer focus and customer orientation means to
           | understand when to move slow as well.
           | 
           | And hey, if that means more quiet holidays for the hard
           | working operators on my team, who's gonna complain?
        
             | bertil wrote:
             | You are a lot more ahead than most companies.
             | 
             | I've worked for too many places where the Christmas break
             | was because of a lack of tooling. I'm glad you are two
             | steps ahead.
        
         | ok_dad wrote:
         | The place I work for pushed v2 of their software, a full
         | rewrite (nothing from the old system, not even databases) by a
         | new team, into production this week for several customers.
         | Mostly they did it so they could say they met their made up
         | 2023 KPIs for the v2 rewrite. There was no good reason to push
         | it out now other than that, and there were several reasons not
         | to, such as it wasn't well tested and it's fucking December
         | 20th. Anyways, I'm not really on call so I can't complain much,
         | but my poor coworkers have to support this over the holidays
         | now.
        
           | hotsauceror wrote:
           | Ugh. Several years ago I spent an entire Christmas vacation,
           | including all day Christmas Day, putting out fires because a
           | team couldn't be bothered to do five minutes of cursory load
           | testing. As a consequence, multiple production systems went
           | down under load.
           | 
           | Later, after we regrouped after a month of this brutality,
           | they wandered around the office bragging like they'd hung the
           | fucking moon after they fixed the crippling, obvious design
           | issue they'd released. I confronted the dev lead with the
           | fact that they would have seen this after 30s of load testing
           | and he just laughed, I think he literally said "LOL". A giant
           | middle finger, that's what Ops got from Dev for Christmas
           | that year.
           | 
           | Here's to the people who KTLO. My people.
        
       | lynx23 wrote:
       | I announced a downtime for a smallish GPU Cluster starting from
       | christmas eve just a few hours ago. It is just the perfect time
       | to schedule a day or two of downtime for a system like that. And
       | if IPMI doesn't fail me, I can get a lot of things done without
       | leaving the comfort of my home. I scheduled this without pressure
       | from my boss. It was a totally voluntary decision... While being
       | raised as a Christian, this time of the year is for me more about
       | solstice then about the Christian clelbration. A time to enjoy
       | the comfort of a heated home. A time to celebrate that the days
       | are going to be longer from now on again. A time to reflect on
       | the past year. And all of this is easily done while having a few
       | terminals open and waiting for remote stuff to complete...
        
       | wavemode wrote:
       | Just barely started my current job too recently to be in the on-
       | call rotation yet. Lucked out! Props to those keeping the wheels
       | turning.
        
       | RationPhantoms wrote:
       | As if this week had attempted to take a measure of blood from my
       | body, I'll be on-call next week. Looking forward to all things
       | quiet on the HEP network front.
        
       | 6stringmerc wrote:
       | Agreed, there are some gigs that just really require support to
       | exist - I know this first-hand from working at a Zoo (very large
       | exotic animal rescue basically). Animals do not take holidays.
       | They need to eat and do animal things in spite of our costumes
       | that day.
       | 
       | On the flip side, having worked Cinema on Christmas Day two years
       | I think, there is no amount of Grace and Patience I can give that
       | is enough to those earning their living. Still have a hat and
       | polo. Why? I had to buy them!
        
       | blast wrote:
       | https://www.youtube.com/watch?v=zB1T3zgne5Y
        
       | bertil wrote:
       | Always be kind, and say it's your fault.
       | 
       | If you don't do it for the sake of the person you are asking for
       | help, do it because it works better. That's the most practical
       | advice [0] ever given by Hans Rosling [1], the Fact master
       | himself:
       | 
       | > In fact, I have the secret to how to get the best help
       | immediately from any customer service, like the phone company or
       | the bank or anything. I have the best line, it always works. You
       | want to know what it is? When I call, I say, "Hello. I am Hans
       | Rosling and I have made a mistake." People immediately want to
       | help you when you put it this way. You get much more when you
       | don't offend people.
       | 
       | [0]: Unless you are in charge of a developing country's budget
       | and have to decide between education and healthcare.
       | 
       | [1]: https://blog.ted.com/qa_with_hans_ro_1/
        
         | chunkymilk wrote:
         | > Always be kind, and say it's your fault.
         | 
         | I do this with internal teams at work. I've found approaching
         | other teams with issues with their library/framework in a "this
         | could be our mistake" manner really helps in keeping them from
         | getting defensive and stonewalling.
        
           | steve_adams_86 wrote:
           | I do something similar. Hey, I'm pretty sure I'm doing
           | something wrong. Can you help me figure it out?
           | 
           | Then be grateful for the help, because it truly isn't granted
           | or a given that people have to drop everything and figure
           | things out for you, even if you work together. And even if
           | the mistake was actually theirs. Gratitude is huge.
        
         | smoyer wrote:
         | I'm going to try that's but will need more information about
         | Hans Rosling to get through the identity verification ...
        
         | caminante wrote:
         | You forgot the rest of the story!
         | 
         |  _> "Hello. I am Hans Rosling and I have made a mistake."_
         | 
         | continues on as...
         | 
         |  _> "I foolishly chose to rely on <insert your service>. I just
         | spent 7 minutes hopping around your phone tree with deadend
         | voicemail terminuses and an outdated monologue starting with
         | "Due to high call volumes..." that has been running since
         | before Covid19. Finally, I found the right combination to talk
         | to you. A human! There's no option to cancel my service online
         | and the help menu threw an error after filling out a detailed
         | form. Can I please reset my password?"_
        
       | whalesalad wrote:
       | Meanwhile a huge number of us (non-religious? introverted kernel
       | compiling cave dwellers?) treat this period no differently than
       | any other week in the year. I'll be here keepin the servers
       | runnin :horns:
       | 
       | It's actually my favorite time of the year. Everyone is gone, it
       | is quiet, and I can get shit done.
        
         | kaashif wrote:
         | > non-religious
         | 
         | Or a member of one of the religions that don't celebrate
         | Christmas.
        
           | muzani wrote:
           | It's me. But we still have a holiday period at the end of
           | year - normally financial targets are hit and it's a 4 day
           | leave to get 10 days off.
        
         | sneak wrote:
         | Holidays are special because they're special, both the winter
         | solstice festival (rebranded for christianity) and the spring
         | equinox one (same deal) can be treated differently for cultural
         | variety by the non-observant.
         | 
         | I'm a militant proselytizing atheist raised by a jew and I
         | still have a tree with pretty lights, give presents, and drink
         | and eat some things I only drink/eat once per year (never make
         | homemade eggnog if you ever want to enjoy it guilt free again,
         | you're basically drinking a megacalorie of heavy cream, yum).
         | It's fun to celebrate the generic concept of "holiday" - a time
         | that is different from other times.
         | 
         | You're allowed to feel nice about peppermint candy (and/or
         | chocolate gelt, I go for both) at the end of December without
         | bringing the supernatural into the equation. :)
         | 
         | \m/
        
           | whalesalad wrote:
           | Oh ya same I love the smell of evergreen wreaths and trees
           | and enjoy partaking in festive activities. A 4K cracklin'
           | Yule log goes a long way too.
        
       | iddan wrote:
       | FYI Israelis are not on holiday - our holidays are on whole
       | different dates. Hire Israelis and experience no down time while
       | working with Silicon Valley level talent
        
         | liorsbg wrote:
         | True story
        
         | loloquwowndueo wrote:
         | Just hope stuff doesn't break on sabbath, they ain't touching
         | no computer that day :)
        
           | INTPenis wrote:
           | They'd need a goy on-call to flick the switch.
        
       | cco wrote:
       | I'm not sure if there is something in the water this year, but
       | this week, Dec 18th to Dec 21st (only a partial week), has been
       | our busiest week all time already.
       | 
       | Sweating over here trying to make it through the week and praying
       | that it slows at least for the first half of next week.
        
       | yardstick wrote:
       | Just remember this time of year is often peak vulnerability time.
       | When attackers exploit that teams are at reduced strength and off
       | guard. Slower response times to investigate and fix issues etc.
        
       | timwaagh wrote:
       | I wouldn't mind honestly. Seems like a good excuse to skip the
       | social obligations.
        
       | smoyer wrote:
       | I'm on call 12 hours a day and hoping things are very quiet next
       | week. Best wishes to everyone else too!
        
       | acedTrex wrote:
       | We've been on freeze for weeks now in preparation for the holiday
       | season.
        
       | er0k wrote:
       | Thanks for the salute, but we also accept cash :)
        
       | mateusfreira wrote:
       | Thanks for all the great work; I hope no one has an outage this
       | holiday and has time to enjoy family and alone.
       | 
       | Keep up the good work, folks
        
       | comprev wrote:
       | I salute those in the startup world - the ones in a team of 5 and
       | they're the only Ops person who always gets paged.
       | 
       | Been there, done that.
        
       | itqwertz wrote:
       | Holidays are excellent times for hackers to take advantage. It's
       | not just Christmas or other Western holidays, either. Extend this
       | principle to any holiday/world conflict/anniversary of conflict
       | made into holiday/calendar new year and then adjust your time of
       | attack.
       | 
       | protip: US companies with offshore groups are usually
       | underfunded, understaffed, and underskilled. Time to see if that
       | disaster recovery environment works!
       | 
       | Happy holidays to those who encounter system stress tests. Can't
       | spell salary without some elements of slavery...
        
       | dallas wrote:
       | Managers, if you're reading this, and you have
       | engineers/developers "on call" but not contractually (off book),
       | make it so, because it slightly sucks when you're having
       | Christmas drinks but can't enjoy yourself because you might need
       | to drive somewhere and climb up a ladder to tend to a product.
        
       ___________________________________________________________________
       (page generated 2023-12-21 23:00 UTC)