hngopher.com

       [HN Gopher] Moved a server from one building to another with zer...
       ___________________________________________________________________
        
       Moved a server from one building to another with zero downtime
        
       Author : huhtenberg
       Score  : 797 points
       Date   : 2020-08-05 10:44 UTC (12 hours ago)
        
 (HTM) web link (www.reddit.com)
 (TXT) w3m dump (www.reddit.com)
        
       | noobermin wrote:
       | Sorry but this is ridiculous. It's a great story of a feat of
       | sysadminery, but the client should have just accepted some down
       | time, even a few hours. The level of entitlement from some
       | clients people get is just infuriating. Even down to calling him
       | back for not agreeing to help, what an infuriating person.
       | 
       | That was my main take away from this. Endeavor to be the sort of
       | person who can refuse clients, the entire idea that "the customer
       | is always right" enables so much ridiculous behavior.
        
       | arethuza wrote:
       | This reminds me of a small company I joined many years ago that
       | did deployments by RAID - find a working server (possibly at a
       | customer site) swap in a blank HD, wait for it to rebuild then
       | take it and put in a new server and repeat the process.
       | 
       | Like finding people who argue against revision control systems,
       | it's really quite a challenge convincing people why things like
       | this are a bad idea - after all "it works!".
        
         | yjftsjthsd-h wrote:
         | That's... actually fascinating, if in a slightly insane way.
         | There's pets, there's cattle... and apparently there's a herd
         | of cloned pets, which I'd somehow never considered before:)
        
       | exabrial wrote:
       | Reminds me of the "hot slide" technique used for old telephone
       | switches
        
         | ansible wrote:
         | They did some crazy stuff in the old days. Like when they moved
         | a telephone exchange live... the whole building.
        
       | walrus01 wrote:
       | Sort of on the subject, i've seen a brochure for a specialty
       | product marketed to law enforcement. It's meant for use with the
       | seizure of live, powered on desktop PCs and similar that have a
       | high likelihood of full disk encryption.
       | 
       | Essentially it's a medium sized double conversion ups, with a
       | really high quality sine wave inverter, and some electronics that
       | can match phase with a live 120vac 60Hz circuit. And a tool kit
       | which consists of the insulated electrical hand tools needed to
       | do a midspan removal of the cable jacket and splice into the
       | wires in an ordinary PC power cable. The person using it is of
       | course supposed to be trained in advance, and competent at the
       | process of attaching the UPS to the live circuit.
        
         | throwaway744678 wrote:
         | Wouldn't it be safer to open the case and connect some kind of
         | battery + adapter _after_ the power supply?
        
           | walrus01 wrote:
           | Splicing into the many wires that is an atx+12v power
           | connector, between the output of the power supply and the
           | motherboard is way more fiddly than just dealing with the hot
           | and neutral on an ordinary $5 PC power cord. You could also
           | never be certain what weird ziptie and cable management
           | system (or lack thereof) might exist in a home built x86 PC
           | case, or if there's any room for hands to work at all...
           | 
           | I think the thing I saw is also meant to deal equally well
           | with a commodity x86 PC built from parts, or an Intel NUC
           | size thing, or a corporate desktop machine with proprietary
           | internal wiring like a slimline Dell, Lenovo, HP, etc.
        
           | ajross wrote:
           | Safer for the operator? Sure. But certainly not for the
           | device, if you're trying to keep it operating. An ATX power
           | supply has 24 pins at 5 different voltage levels (plus any
           | auxilliary power connectors for the GPU and drives, etc...),
           | and motherboards are a lot less tolerant of spikes and
           | transients than the PS on the other side.
           | 
           | Dealing with AC power isn't really that dangerous if you're
           | careful.
        
             | TheSpiceIsLife wrote:
             | Even high voltage and high amperage AC isn't dangerous.
             | 
             | So long as you're not earthed
             | https://imgur.com/gallery/B2c5FfD
        
               | dhosek wrote:
               | We had an electrician of questionable licensing do some
               | minor work for us (replacing some switches and outlets).
               | I asked him to tell me when I should go down to the
               | circuit breaker to turn off the electricity and he told
               | me not to bother. He did all the work with hot current
               | running through the wires. I stayed close enough to be
               | able to tell if I needed to call 911 but no closer while
               | he worked.
        
               | 0xffff2 wrote:
               | I've done a ton of electrical work for my own benefit
               | over the years and I'm perfectly comfortable doing things
               | like swapping switches with live wires. I've never once
               | had a problem. The one and only time I've fucked up was
               | when I cut a run of romex cable that I thought _had_ been
               | turned off.
               | 
               | Lesson learned: electrical wiring is like a gun. Always
               | treat it like it's on, and if you have to do something
               | would be unsafe if the wiring is energized, make damn
               | sure it's de-energized before proceeding. When you're
               | working in that mindset already anyway, flipping the
               | breaker for something as simple as swapping a
               | switch/outlet hardly has any benefit.
        
               | allannienhuis wrote:
               | I apprenticed with my Dad. The first two rules he taught
               | me have stuck with my my whole life:
               | 
               | 1) Treat every wire as if it was hot. Even if you know
               | it's not. 2) A good electrical connection must first have
               | a good physical connection.
               | 
               | Not sure why that second rule sticks with me :) but there
               | has been more than one occasion when I'm fairly sure the
               | first rule has saved me from a bad shock. And you're
               | right - treating the wires as if hot means you can
               | actually work with hot wires for a lot of simple things.
               | 
               | I still turn off the breaker though :)
        
               | nucleardog wrote:
               | The second rule is a great one that so many people doing
               | their own work miss.
               | 
               | The wire nut is only there to stop the wires loosening
               | over time and provide some basic insulation. It is not
               | there to actually attach the wires. When you twist your
               | wires together, they should be attached well enough on
               | their own that you'd be comfortable throwing a piece of
               | electrical tape over them to stop them shorting to the
               | box and leaving it as-is (but don't do that). If the only
               | thing keeping them together is the wire nut and you being
               | very gentle when you manipulate them back into the box,
               | they're not actually connected.
               | 
               | The poor physical connection creates a poor electrical
               | connection. A poor electrical connection has resistance
               | which creates heat. Heat creates fires. Even better after
               | a few years when enough traffic has driven past your
               | house and enough people have moved around inside of it
               | and the wires have wiggled to just barely in contact so
               | occasionally when someone walks down the hallway the
               | lights will all flicker as the wires create some pretty
               | electrical arc light shows, adding carbon buildup to the
               | wires and further increasing the resistance and heat
               | concentrated in the one tiny point of the copper where
               | they're still sometimes connected.
               | 
               | No reason at all for this rant. Definitely not a real
               | example at all. Definitely didn't waste an afternoon with
               | a toner, a drill with a pilot bit, and a borescope to
               | hunt down the six octagon boxes someone had sealed into
               | the basement ceiling hiding away some of the shoddiest
               | wiring I'd ever seen. Nope.
        
               | Scoundreller wrote:
               | This makes me feel bad. As a kid, I remember holding
               | light switches at _just_ the right point to hear the
               | buzzing (arcing)? inside. At least if the contacts were
               | carbonizing, there wasn't a lot flowing through them
               | closed.
        
               | beatrobot wrote:
               | I had an electrician add a breaker to the main panel
               | while it was still live, no protection or gloves,
               | nothing. I was also terrified.
        
               | nucleardog wrote:
               | Sometimes you do what you've gotta do.
               | 
               | I'm not a nut that does everything with the power on--I
               | kill any branch I'm working on and double and triple
               | check with a non-contact voltage detector before I stick
               | my fingers into anything (which saved my bacon the one
               | time when the hot from a different branch of the same
               | phase ended up connected to a neutral wire for a plug
               | with no connected ground leaving it showing 0V on a
               | multimeter in any configuration and still being live with
               | the breaker off; that house was a mess). However our
               | current dwelling has no main cut-off for the power. If we
               | wanted to turn off power to the panel we'd need to get
               | the power company out to pull the meter from the socket.
               | 
               | In a mostly full panel the bus bars are pretty much
               | completely covered by the breakers anyway. You'd have to
               | work pretty hard to come in contact with them. And the
               | wires you're working with (besides the ground) are
               | insulated anyway so no issue if they brush up against
               | something.
               | 
               | The only thing that's _slightly_ butthole puckering is
               | chasing the uninsulated ground wire through the panel
               | down to the neutral bus.
               | 
               | And yeah, done without gloves because weighing "safety
               | when I make a mistake" versus "greater dexterity so I'm
               | much less likely to make a mistake" I prefer the latter.
               | The protection is rubber soled shoes and keeping one hand
               | tied behind my back so the electricity has no path
               | through me.
        
               | saltcured wrote:
               | Ha, that's nothing. I once watched a stubborn guy replace
               | the bus bars in the input panel of a house. He did wear
               | rubber gloves and boots and stand on a plastic stool.
               | But, this is a kind of job where you are operating a
               | socket wrench on the clamps holding down the bare ends of
               | the thick direct-burial power cables, then wrestling the
               | ends of the cable out of the way to unscrew and remove
               | the bus-work from the panel chassis.
               | 
               | He did this without notifying the power company, so those
               | supply lines were hot with 240V residential service. The
               | weather shifted and a light mist started falling before
               | he was done. Like another poster above, I was thinking I
               | need to be ready to call 911, but wanting to be far
               | enough away not to be hit by splattering metal or any
               | surprise voltage gradients in the soil.
        
               | myself248 wrote:
               | I accidentally replaced an outlet and added a switch to a
               | circuit that was still energized. I had turned off the
               | wrong breaker, and failed to confirm it before I started
               | work.
               | 
               | But, careful work habits and some tools that happened to
               | be insulated anyway, meant that I was never bridging two
               | different potentials. The job went flawlessly and I only
               | noticed when I plugged the outlet tester into it at the
               | end, expecting to go turn the breaker on and come back
               | and look at the lights... but the lights were already lit
               | up.
        
               | mschuster91 wrote:
               | In Germany this is called "Arbeiten unter Spannung" and
               | perfectly legal if qualified
               | (https://de.wikipedia.org/wiki/Arbeiten_unter_Spannung).
        
               | dhosek wrote:
               | The electrician was Croatian and, I presume, learned his
               | trade there. It still terrified me.
        
               | bluGill wrote:
               | Working on hot wires is no problem. Ground wires scare me
               | and I'll turn off the main breaker before I touch them.
               | You can never be sure what ground is really at.
        
               | oilman wrote:
               | I was once working for a small company building
               | electrical equipment. We mostly worked on "medium
               | voltage" equipment, you know 2400 to 69000 VAC.
               | 
               | For one project we had large banks of ultracapacitor in a
               | cabinet. Fully charged it was around 1200 VDC. This thing
               | was in the prototyping stage, and we were testing a
               | control system on a Saturday morning.
               | 
               | So we charge it using a large AC/DC converter, fully
               | charged, everything worked beautifully. We start a
               | discharge cycle converting the DC back to AC. Uh oh, it
               | starts pulling way too much current. Flames start to
               | shoot out of the AC/DC converter. Fuck. BANG. Fuse blown.
               | 
               | We assess the damage... the AC/DC unit is totally shot.
               | And someone (me) is going to have to analyze what caused
               | the failure. Otherwise everything with the capacitor
               | cabinet seems okay, but the thing is still charged to
               | 1090 VDC and the fuse is blown. Check with the mechanical
               | engineer that designed the cabinet. Turns out the fuse
               | can't be changed (can't be accessed) while the cabinet is
               | charged and the cabinet can't be discharged because the
               | fuse is blown. Well that isn't good.
               | 
               | The only thing we could do was discharge it into a load
               | bank (think large toaster) by connecting something
               | directly to the copper busbar live at 1090 VDC. So one of
               | the commissioning guys volunteered. He put on some high
               | voltage gloves, stood on a plastic mat, and connected
               | some jumper cables someone had in their car to the bus
               | bar. He stepped back and someone else threw the switch on
               | the load bank and it discharged without incident.
               | 
               | There were some design revisions after that.
        
           | regularfry wrote:
           | Cases can have "case open" switches that tell the machine to
           | switch off. You can't necessarily tell beforehand.
        
           | jrootabega wrote:
           | Case intrusion alarms (built in or Homebrew)
        
         | bob1029 wrote:
         | How do they deal with the loss of network connectivity?
         | 
         | I could pretty easily write a script that forces my machine to
         | reboot and do all manner of other things if some sort of
         | network change is detected.
        
           | numpad0 wrote:
           | Or motion, inactivity, vibrations in the room, etc. But
           | that's for another product/specialist I guess?
        
             | reaperducer wrote:
             | There used to be an OS X program that would lock the
             | computer if it detected motion. As long as a trusted
             | Bluetooth device was paired, the computer was fine. But if
             | the device left range and someone touched the computer, it
             | locked.
             | 
             | There was also one that would use the motion detector to
             | try to detect if the device was falling, and park the hard
             | drive heads before impact.
        
           | walrus01 wrote:
           | I don't believe that specific product addresses it at all.
           | Undoubtedly the persons operating the kits have put some
           | thought into it, but given the myriad of possible LAN
           | configurations and types of software deadmans switches, it
           | must be a difficult problem to solve.
        
           | CydeWeys wrote:
           | You _could_ , but what % of running servers actually have
           | such safeguards in place? I'd say almost none of them.
        
             | discordance wrote:
             | The next Dread Pirates Roberts would be interested in this
             | safeguard
        
         | miles wrote:
         | HotPlug Field Kit https://www.cru-
         | inc.com/products/wiebetech/hotplug_field_kit...
         | 
         | "With the CRU WiebeTech HotPlug you can transport a computer
         | without shutting it down.
         | 
         | "The HotPlug allows hot seizure and removal of computers from
         | the field to anywhere else. The HotPlug's patented technology
         | keeps power flowing to the computer while transferring the
         | computer's power input from one A/C source (such as a wall
         | outlet or power strip) to another (a portable UPS) and back
         | again.
         | 
         | "We created this product for our Government/Forensic customers,
         | but it has IT uses as well. Need to move a server without
         | powering it down? The HotPlug can do it.
         | 
         | "It's great for digital forensic investigators and techs who
         | can't risk losing access to data on a running computer. With
         | many computers now employing full-disk encryption, shutting
         | them down poses the risk of having to crack a password after
         | moving the computer to a lab for analysis, which can greatly
         | increase the time and expense of an investigation. When
         | combined with a WiebeTech Mouse Jiggler, you also won't have to
         | worry about the computer entering password-protected
         | screensaver or sleep modes."
        
           | Scoundreller wrote:
           | Time to geo-fence the servers with an external GPS antenna
           | (often useful for time-sync). Or maybe FM signal strength
           | locks?
        
         | jlgaddis wrote:
         | Search for "HotPlug" on YouTube.
        
           | dghughes wrote:
           | I thought about HotPlug too. And the obligatory Seinfeld
           | Frogger scene (become much less familiar to younger folks).
           | 
           | HotPlug must only work in countries with terribly designed
           | plug outlets like the US and Canada. Our NEMA 5-15 plugs are
           | live when the plug's hot (electrons be here) and neutral
           | (return to sender) blades are still visible. I don't think
           | this device could work in the UK I'm not from there but I
           | think their plugs can't be live with exposed plug blades.
           | 
           | https://www.cru-
           | inc.com/products/wiebetech/hotplug_field_kit...
        
             | Scoundreller wrote:
             | Just need to carefully expose the wiring in the cable
             | itself then. Or yank out the socket and connect to the
             | wires there before snipping and shipping.
             | 
             | Unplugging just enough to expose the prongs is risky
             | because the point where contact is lost will vary from
             | receptacle to receptacle.
             | 
             | Chances are things are plugged into a multi-plug hub
             | anyway. European homes are especially lacking in sockets in
             | my experience.
        
         | mercora wrote:
         | linked below is an old advertisement/demo video of a similar
         | device or maybe even the one you mentioned :)
         | 
         | https://www.youtube.com/watch?v=-G8sEYCOv-o
        
           | walrus01 wrote:
           | Very similar, yes
        
         | jimmaswell wrote:
         | I'd have thought plugging something into the outlet and
         | unscrewing the outlet to take with you would be more convenient
         | than carefully splicing wires just enough not to disconnect
         | them. All the easier if it's on a power strip.
        
           | Scoundreller wrote:
           | Sometimes they are on different circuits.
        
         | huhtenberg wrote:
         | In a similar vein, there are USB gadgets that emulate a mouse
         | that keeps on jiggling, to prevent the machine from locking out
         | on user inactivity.
         | 
         | However, there are anti-jigglers too that lock the machine when
         | any new human input device is plugged in.
         | 
         | http://codefromthe70s.org/antijiggler.aspx
        
           | vangelis wrote:
           | That's where the analog mouse jiggler comes in. Apparently
           | watche faces work quite well for for optical mice.
        
             | reaperducer wrote:
             | I've read that on HN before, and tried it a few months ago.
             | It didn't work. At least not with an Apple Magic Mouse and
             | my wife's desk clock.
        
           | reaperducer wrote:
           | If you ever come across a jiggle-and-click gadget, let me
           | know. Some of the computer activity trackers I've seen lately
           | require the user to click every so often, so plain jigglers
           | are no longer effective.
        
             | mike_d wrote:
             | Get a USB Rubber Ducky and script it to send something like
             | Mouse Button 7. The click event registers but it isn't
             | associated with an action except in super advanced CAD
             | software.
        
           | warrenm wrote:
           | They should call it the Jiggle-No
        
           | ace32229 wrote:
           | You will be seen as active (including on comms software (at
           | least the ones I've tried)) if you have any sort of video
           | playing e.g. Youtube in an active tab. Quite handy.
        
           | kawsper wrote:
           | That's interesting.
           | 
           | You could have a list of known USB device IDs you trust, and
           | if a newly plugged in USB device wasn't on that list you
           | could lock or power down.
        
             | blibble wrote:
             | easy enough to fake the device/vendor ID, then abuse bugs
             | in the driver/implementation
        
               | kawsper wrote:
               | Yes, if your attacker knows which device/vendor IDs you
               | have on your list it won't work.
        
             | warrenm wrote:
             | this is a pretty common practice on many (if not all)
             | government networked devices
             | 
             | that...or the USB port is permanently blocked (saw that
             | when I was at a finserv years back: all USB ports (except
             | the one the mouse plugged into) were epoxied
        
               | spatley wrote:
               | I have seen security minded IT to go so far as requiring
               | laptops with PS2 mouses and epoxying all the USB ports.
        
               | icedchai wrote:
               | So are they keeping a stock of laptops from the 90's?
               | Basically no modern laptops have PS2 ports.
        
               | reaperducer wrote:
               | Order enough of them, and manufacturers will give you
               | whatever you want.
        
               | icedchai wrote:
               | If you're big enough, sure. Why not order laptops without
               | USB ports, instead of epoxying them then?
        
             | ThePadawan wrote:
             | That is a policy I heard to be used in already not-
             | extremely-secure environments like software development at
             | a bank (completely isolated from production environment).
             | 
             | They didn't go so far as to cause alarms on unknown device
             | ids, but devices would just not be mounted if they were not
             | whitelisted.
        
               | walrus01 wrote:
               | About 13-14 years ago some parts of the US DoD resorted
               | to hot glue gun filling all the usb ports on desktop PCs,
               | except for the two ports required for the keyboard and
               | mouse.
               | 
               | This was during the windows XP era when it seemed there
               | were an endless number of security problems related to
               | usb devices, no matter how good the group policy and
               | registry settings pushed via active directory membership
               | were.
        
               | sysadmindotfail wrote:
               | >About 13-14 years ago some parts of the US DoD resorted
               | to hot glue gun filling all the usb ports on desktop PCs,
               | except for the two ports required for the keyboard and
               | mouse.
               | 
               | Here's a current story:
               | 
               | Someone ordered the wrong desk phones at your large
               | company?
               | 
               | 1.) Assemble your crew. Go to various departments and
               | recruit non-technical people.
               | 
               | 2.) Task them with disassembling 1000 desk phones.
               | 
               | 3.) Hot glue USB port on phone shut.
               | 
               | 4.) Reassemble 1000 desk phones.
        
               | sneak wrote:
               | Is the disassembly and reassembly just for more billable
               | hours? Seems to me you could fill user-accessible USB
               | ports with hot glue without it, same as a user could fill
               | it with an unauthorized USB device.
        
               | kevin_thibedeau wrote:
               | My company stayed on NT4 until 2008 because it didn't
               | have USB support. Network was fully locked down and any
               | unknown MAC would cause an immediate search by IT.
        
               | shifto wrote:
               | Also, performance must have been amazing using Office '97
               | on current day desktops.
        
               | noir_lord wrote:
               | Did they also remove the MAC address info off the back of
               | everything because spoofing a MAC is fairly trivial.
        
               | icedchai wrote:
               | They probably did. The sort of IT folks that would run a
               | decade old OS are the same kind that would resort to this
               | sort of security theater to "lock down" their network.
               | Capturing MAC addresses off a device is pretty simple if
               | you don't mind a little bit of connectivity loss during
               | the process.
        
               | swalsh wrote:
               | What does that solve though? I don't NEED a mouse to copy
               | data.
        
               | jrott wrote:
               | It doesn't solve for an outsider or malicious employee
               | getting access to a machine. What it does solve for is an
               | employee plugging in a compromised usb device on accident
               | since they probably won't unplug there keyboard or mouse
               | for it.
        
               | dhosek wrote:
               | It solves the "I found this USB stick in the parking lot
               | --let me plug it in to see what's on it" problem.
        
               | strbean wrote:
               | They'll just unplug the mouse and plug in the drive to
               | see what happens!
        
               | icedchai wrote:
               | Sure, if they don't have a USB hub sitting around.
        
               | muttled wrote:
               | If my experience with users holds true, they'll abandon
               | the quest at the first obstacle and the USB will
               | harmlessly sit in a desk drawer for the rest of time.
        
               | dhosek wrote:
               | The closest thing to a USB hub I've got is one of my
               | external drives for my Mac Mini has a built in USB hub so
               | I can plug stuff into that as well as directly into the
               | computer. The last time I worried about such things was
               | back when desktop computers only had one or two USB
               | ports. Plus, in a DoD situation, I'd imagine that having
               | your own USB hub plugged into a DoD computer would be the
               | kind of thing that could put your job at risk. A friend
               | who teaches at the Naval War College often laments the
               | unusability of DoD IT because of the level of locking
               | down, but any "Why don't you do X?" suggestions have a
               | response of "I'd get fired."
               | 
               | The safeguard doesn't need to be perfect, it just has to
               | be good enough.
        
               | acdha wrote:
               | It solves two problems: one is someone covertly or
               | foolishly plugging in an untrusted USB device (which
               | might be easily missed on, say, the back of a desktop)
               | and it means that checking to make sure that only a
               | keyboard or mouse are attached is as simple as putting
               | tamper-evident seals on those cables.
               | 
               | Attempting to authenticate USB devices is a very hard
               | problem -- a sufficiently advanced attacker can spoof
               | manufacturer and device IDs, even if you lock things down
               | to prevent anything other than a keyboard or mouse it's
               | possible to send keystrokes to open the wrong website,
               | there's always a chance of an exploitable flaw in your
               | USB stack, etc. -- but anyone diligent can be paid to
               | walk around every week checking to make sure that a seal
               | is solid and the tamper-evident stickers have the same
               | serial number as listed on the inventory. There is a real
               | value in having things where the failure modes are
               | obvious and intuitive.
        
               | sramam wrote:
               | I'd think guardrails like this also serve at a
               | psychological level - as in "this is a secure machine,
               | don't try to break rules".
               | 
               | While these second order effects are immeasurable, they
               | are quite tangible in my personal experience.
        
               | huhtenberg wrote:
               | They could've glued ALL usb ports and simply plugged mice
               | and keyboards into PS/2 sockets.
        
               | 3pt14159 wrote:
               | That's what my alma mater, the University of Waterloo,
               | did for some of our labs when I attended. Then at some
               | point something must have happened and they moved _all_
               | the electronics into the PC case and only the wires of
               | the mouse, keyboard, and monitor came out of these little
               | openings.
        
               | withinboredom wrote:
               | There was a virus directed at DoD machines going around
               | via USB devices. PITA to get rid of too...
        
               | bnastic wrote:
               | I have not yet seen this implemented anywhere in banks.
               | HID devices are fine, but anything else USB (esp.
               | storage) is locked out completely. One of those banks
               | wouldn't even let temp staff send emails out of the bank
               | from their work account.
               | 
               | (Due to various disability acts they can't really do it
               | either, as the employer must provide their staff with
               | hardware they require, e.g. ergonomic keyboards and mice)
        
               | ThePadawan wrote:
               | That sounds really the wrong way around - the worst
               | offenders in USB malware surely are flash drives that
               | declare themselves as keyboards and input preprogrammed
               | keyboard events (like the USB Rubber Ducky [0])!
               | 
               | (For your parenthetical I should clarify - it wasn't the
               | case that it was _impossible_ to whitelist other devices,
               | it just had to be done on a case-by-case basis. I.e. you
               | would call IT and say  "Jen from accounting at machine
               | foo123 needs her new ergonomic mouse to be recognized"
               | and they would remote in, tell Jen to unplug and replug
               | the device and whitelist that exact USB device id on that
               | exact machine.)
               | 
               | [0] https://shop.hak5.org/products/usb-rubber-ducky-
               | deluxe
        
               | bnastic wrote:
               | It may be so, but I'm talking from experience - as a
               | keyboard geek I have, over the past ten years, taken all
               | sorts of weird keyboards (and mice) into various big
               | banks with not a hint of trouble. USB storage, on the
               | other hand, qualifies for an instant termination.
        
       | jcrawfordor wrote:
       | I once basically spent a summer doing this, not over a parking
       | lot but to consolidate the remaining equipment in a large number
       | of racks into a few new ones - this was a former sales office of
       | a megacorporation that had been built to have its 1970s-era
       | computer room proudly displayed through windows into the main
       | conference room, a very weird setup without the context that in
       | said '70s that conference room was used to pitch prospective
       | customers on business automation.
       | 
       | Anyway, by the time I was there it was still a '70s-vintage large
       | computer room but now massively overprovisioned on space,
       | cooling, etc, particularly with most IT functions having moved to
       | corporate. A decision was made to repurpose part of it as a test
       | lab and move all the actual remaining equipment to three racks in
       | the corner.
       | 
       | I'd do about two servers a day in between other things, taking
       | advantage of redundant power supplies to transfer the PSUs one at
       | a time to extension cords, swap to a long network cable fast
       | enough that TCP sessions probably didn't time out, and then
       | unrack onto a hydraulic lift card and do the same procedure the
       | other way.
       | 
       | I presented this at the start as far from a guaranteed strategy -
       | that it would minimize downtime but there would inevitably be
       | some due to mistakes. None of this was really that critical.
       | There were a few devices that were pretty old and poorly
       | maintained, we agreed up front that if these lost power for some
       | reason and then failed to boot, we would just say they'd lived
       | long lives and purchase replacements.
       | 
       | I guess the point is that this whole situation was kind of
       | unusual and I would generally _not_ recommend doing this, we were
       | lucky that all the equipment left had stakeholders that
       | acknowledged it was legacy stuff and they could tolerate losing
       | it.
       | 
       | The irony is, of course, that it went perfectly. So far as I know
       | there was not a single problem experienced through the whole
       | thing. I even managed to swap the phone lines to the
       | (surprisingly busy!) legacy fax server when each was out of use.
        
       | nobrains wrote:
       | I don't know. If the "boss" was charged "4.5 hours of work, 2
       | hours of consultancy, and 4.5 hours of consultant", and assuming
       | he would have been charged half of that with downtime, maybe the
       | boss did get a good deal. We don't know the cost of downtime for
       | him.
       | 
       | I mean if he had access to technical resources who were willing
       | and capable to do this for him, he chose to do it.
        
         | notwhereyouare wrote:
         | I personally find it hard to believe that a rough estimate of
         | $450 for the job (spitballing $45/hr for 10 hours) is less than
         | 5 minutes of downtime and they only have 1 server.
         | 
         | Then again, could _easily_ be wrong
        
           | nobrains wrote:
           | You cannot compare it to zero. You have to compare it to the
           | cost of doing it with the downtime. There would be cost to
           | that as well. It will not be free.
        
         | moduspol wrote:
         | It's also possible that "downtime" has different meanings to
         | different people. The client may be seeing "downtime" as the
         | net result of what happened the last few times the server was
         | "down," which could have been for any number of reasons
         | (potentially even unrelated to the server itself).
         | 
         | When you get clients describing things like this, it's possible
         | they've been promised things about this server before by other
         | consultants that didn't pan out. They don't want to give you
         | the full details because then you'll recommend a different
         | route that they don't want to take (justifiably or not).
         | 
         | It's easier for them to frame the problem to a consultant in a
         | way that allows for only one potential solution, even if
         | perhaps better ones exist, because the guy in charge of making
         | the decision isn't technically skilled enough to assess whether
         | others proposed by consultants are as viable.
         | 
         | And, of course, one might read a little into why there exists a
         | "boss" with such a highly-critical IT need that is hiring a
         | consultant to do work like this, and thinks that threatening to
         | not pay at all if there is any downtime is the best way to do
         | it.
         | 
         | I mean, what if they opened the door to this closet and it
         | grazed a power cable on the floor and the machine just shut
         | off? Why even bother staying around to bring things back up? It
         | wasn't your fault and there's already downtime: you're not
         | getting paid.
        
           | closetohome wrote:
           | Someone upthread was talking about how, as a Salesman, you
           | have to read the room and know how to talk to clients. I did
           | that for awhile, and always got a lot of mileage out of
           | asking the customer what they ultimately wanted to
           | _accomplish_ , which usually revealed that what they were
           | asking for was a solution to a self-made problem, and there
           | was a better alternative altogether.
        
       | BrianB wrote:
       | It's been done.
       | https://i.cdn.turner.com/v5cache/TBS/Images/Dynamic/i439/sei...
        
       | [deleted]
        
       | growt wrote:
       | Setting up a new server at the new location and moving the VMs
       | one by one to the new server as they become idle should be
       | possible without downtime. But maybe there were other
       | requirements (like no new/additional hardware) that weren't
       | mentioned in the article.
        
       | D895n9o33436N42 wrote:
       | This reminds me of a famously obtuse and obdurate boss who asked
       | for things that were utterly impossible. He had delusions of
       | grandeur which left him convinced that he and only he was
       | qualified to challenge the "cheap, fast, good - pick any two"
       | triangle.
       | 
       | Naturally, I did my best to explain the laws of physics to him,
       | but he wouldn't hear it. In a spectacular display of Stockholm
       | syndrome I did my best to appease him for four years, but, as
       | many of you can surely predict by this point in the story, I
       | failed in every possible way and eventually gave up. Just wish I
       | could have my four years back.
       | 
       | I was glad to read that OP at least got paid well for his
       | efforts.
        
         | Tade0 wrote:
         | I applaud you for being able to stand four years of this.
         | 
         | I usually get fired from such positions in less than two.
        
           | sleepybrett wrote:
           | I usually walk out about six months in if not sooner. Maybe
           | it's just because I spent so much time freelancing that I had
           | enough experience to recognize a no-win situation.
        
       | jtbayly wrote:
       | Seems very risky. Not something I'd want to do if minimum
       | downtime was the goal. One wrong piece of gravel ends up with
       | catastrophic failure instead of 5 minutes of downtime.
        
         | dmurray wrote:
         | But the goal was zero downtime, not minimum downtime. The
         | client made it clear that 5 minutes of downtime was equivalent
         | to catastrophic failure. So they correctly found a solution
         | that reduced the chance of "5 minutes of downtime", at the
         | expense of an increased risk of catastrophic failure.
        
           | jtbayly wrote:
           | I understand that. I just doubt that the risk was worth it,
           | if downtime is such a big deal.
        
       | pengaru wrote:
       | Decades ago working in a sysadmin role at a hosting company I had
       | a similar situation.
       | 
       | The solution I came up with was to fashion a custom male<->male
       | power cord, like a gender changer, from some broken ATX PSU
       | scraps we had laying around. By rearranging the power sockets
       | from multiple donors, two male power cords could be connected on
       | a single enclosure. Internally the sockets were simply bridged,
       | otherwise the PSU was basically gutted.
       | 
       | With this goofy metal box having two male power cords dangling
       | from it in hand, I just used a very long extension cord plugged
       | into an outlet on the same AC phase as the existing server's
       | power source. The extension cord powered one of the bridge cords.
       | The other bridge cord plugged into the server's existing - and
       | hot - power strip, forming a redundant power source. Now the
       | power strip could be unplugged from the primary power source
       | without losing power, and we just moved the server to the new
       | location with the bridge box and power strip in tow.
       | 
       | If memory serves the only tricky part was determining which
       | outlet at the new home was on a compatible circuit. We didn't
       | have much in the way of electronics tools, no oscilloscopes or
       | anything. Even the soldering involved to make the bridge box was
       | done using my personal soldering iron, which just happened to be
       | in the office because some of us raced RC cars there after hours.
       | 
       | I think I just used an incandescent desk lamp to verify a normal
       | brightness on the bridged circuit before proceeding with the
       | server, but it's been a while.
       | 
       | I wonder how many people have fashioned AC power cord gender
       | changers throughout history... :)
        
       | rafaelturk wrote:
       | Fun reading. But my advice is never accept a job like this. This
       | could easily become 2 weeks down time
        
       | xyst wrote:
       | Why wouldn't cloning the VMs to a second server, then split the
       | traffic between the primary and secondary server work? Once
       | traffic to the second server is confirmed, you could shut off the
       | second server and haul it off to the new location.
       | 
       | I would probably still charge a much higher rate since the owner
       | was an arse, but at least you would get back your 7-8 hours.
        
         | AnIdiotOnTheNet wrote:
         | Not all services can be load balanced in this way
         | 
         | Live migration of VMs would have been a better option, which
         | was brought up in the reddit comments and dismissed because
         | HyperV live migration is spotty. While I'd have to agree with
         | that assessment, it isn't so spotty that what they actually did
         | was less risky.
        
         | blcArmadillo wrote:
         | It sounds like there was no second server.
        
         | morphogenesis wrote:
         | Database inconsistency for one thing. This works for frontend
         | web services but how do you reconcile the writes between the
         | two servers?
        
         | viraptor wrote:
         | You're making assumptions about what's running on the servers.
         | Let's say it's a VoIP conference server with a shared dedicated
         | room - effectively you have an ongoing session shared between
         | multiple connection and you cannot stop it. Or you have
         | stateful local processing so you can't "split the traffic". Or
         | a number of other limitations...
        
       | Zenst wrote:
       | Interesting story and one that has played out a few times, I'm
       | aware of a couple verbatim to that. Another - used power
       | extension leads to cover power. Key being systems with dual power
       | units (most servers do) and networking so you can switch from one
       | run to another.
       | 
       | But have known some large companies who have in their history,
       | done things like this and other creative solutions to impossible
       | problems.
        
       | Reedx wrote:
       | That reminds me of the Pixar incident where Toy Story 2 was
       | accidentally deleted while in production and had no working
       | backups.
       | 
       | Luckily one employee was working from home (rare at the time!)
       | and had a copy of the entire movie on her desktop computer. Which
       | they _very carefully_ moved back to the office and were able to
       | restore from that.
       | 
       | https://www.youtube.com/watch?v=7MAedEXri7c
        
       | fooblat wrote:
       | > Stupidest thing I've ever had to do.
       | 
       | I don't really understand the "ranty" tone. The client had very
       | specific requirements and the author came up with an effective
       | solution and was fully paid to deliver it. Sounds like a win for
       | everyone.
        
         | some_random wrote:
         | The client was an asshole who demanded 100% uptime and stated
         | that they wouldn't pay if there was any downtime at all. The
         | rant is entirely justified.
        
         | swarnie_ wrote:
         | I see reddit so i assume this is the sysadmin subreddit?
         | 
         | They're famous for not being a cheery bunch. Because reddit's
         | demographic does swing younger the sub used to be filled with
         | endless posts about being socially incompetent or possessing 0
         | business craft.
         | 
         | Does anyone know if it improved?
        
         | ianhawes wrote:
         | I believe the expectations of having 0 downtime was not
         | expressed until the day of the transfer.
        
           | dna_polymerase wrote:
           | In addition to that the customer runs a single server but
           | expects the guys to maintain a property not even feasible at
           | Google scale: Zero downtime. Overall the whole thing was just
           | ridiculous, but luckily the customer got a nice bill in the
           | end.
        
             | pc86 wrote:
             | To be fair, Google maintains zero downtime for small time
             | scales like this _a lot_. Most of the time, actually.
        
               | kevincox wrote:
               | Are you talking about Google Compute Engine? In that case
               | yes, because by default VMs are live migrated between
               | physical hosts. This can be done for schedule maintenance
               | or upon signs that the machine is likely to fail.
               | Furthermore there are no physical disks for a GCE VM
               | which is one of the more common failure points. The
               | result of this is that GCE VMs often survive for months
               | or years without downtime. Note that the SLA allows more
               | than 3 hours of downtime per month.
               | https://cloud.google.com/compute/sla
               | 
               | For physical servers the uptime is typically quite small.
               | Of course Google isn't optimizing for server uptime so it
               | isn't fair to say "well even Google can't do it".
        
               | sokoloff wrote:
               | I'm having a hard time following what "zero downtime most
               | of the time, actually" really means.
               | 
               | https://m.youtube.com/watch?v=IKiSPUc2Jck&t=81s
        
               | im3w1l wrote:
               | It means that downtime is chunky.
        
               | buran77 wrote:
               | > zero downtime for small time scales... Most of the time
               | 
               | I read it as "if you take small enough discrete time
               | intervals they won't overlap with any downtime". Or in
               | other words "no downtime between downtimes". Yes, it's
               | very in line with your video.
        
         | intpx wrote:
         | probably because proper architecture (clustering, HA etc) and
         | planning would have never made this an issue. This is still an
         | extremely risky operation, hot swapping power and switching
         | interfaces on the fly all while sitting on a cart in a
         | corridor. In any disruptive work there is never a guarantee of
         | no downtime for affected assets. I know the OP came in as a
         | consultant, but If I was the MSP tech, I would have demanded a
         | paper trail a mile long to cover my ass if this went sideways
         | and If i was the account manager for the client, I would have
         | refused the work. Its not good business to agree to do work
         | where you know there is a better than good chance there will be
         | an outage and your client is saying they wont pay if there is
         | an outage. Even agreeing to it puts you in a bad spot for
         | future work. I guess as an outside consultant, bewilderment is
         | a better reaction than ranting, but this is the kind of shit
         | that drives ops folks crazy
        
           | dspillett wrote:
           | _> planning would have never made this an issue_
           | 
           | Hard work is wonderful stuff. Days and weeks of it can save
           | you whole hours of planning.
        
           | jabroni_salad wrote:
           | My SOWs leave zero room for 'and it will go flawlessly or you
           | wont get paid at all'. If you are occupying my time in a way
           | that makes me unable to serve other clients you will
           | definitely pay for it.
        
           | addHocker wrote:
           | The whole setup is risky. And a customer who demands a 0
           | downtime while driving price down on the setup, sounds stupid
           | from the start.
        
         | jedimastert wrote:
         | > Me: You didn't notify them of scheduled maintenance like we
         | discussed on Friday?
         | 
         | It appears that the client _didn 't_ have the specific
         | requirements on initial consult.
        
         | throwaway0a5e wrote:
         | Reddit (for reasons related to user demographics and feedback
         | loops) rewards certain types of writing and implied viewpoints.
         | Following best practices and rules is one of those things. This
         | server migration clearly runs counter to established wisdom so
         | OP using a writing style of "look how terrible and asinine this
         | was" will be rewarded and gain traction much more than a "look
         | how interesting this was" writing style.
        
           | user5994461 wrote:
           | It's reddit /sysadmin, the channel is dedicated to rants and
           | horrible experiences from sysadmin and helpdesk folks.
           | 
           | It's quite sad IMO, don't recommend to go there unless you
           | want to have a bad day reading about the most horrific work
           | environments and bad practices in the world.
        
             | DoreenMichele wrote:
             | Perhaps somewhat similarly, r/TalesFromRetail is devoted to
             | kvetching about your job in the retail sector, but it's
             | really not a depressing place. There are a lot of rules and
             | expectations about how you tell your story. You aren't
             | supposed to outright dox anyone or veer into genuine trash
             | talk.
             | 
             | It's not supposed to be negative per se. It's supposed to
             | be entertaining.
             | 
             | It's an art form. It's not everyone's cup of tea, just like
             | horror isn't everyone's cup of tea. But people often watch
             | horror movies for catharsis, not because they want to be
             | depressed and wallowing in self pity.
             | 
             | Storytelling is often about educating people about things
             | you can't speak about more directly. It's often a way of
             | sharing wisdom in an inoffensive manner and one that will
             | stick because people will actually pay attention, unlike
             | when you are giving them some dry lecture about some
             | problem they haven't yet had and don't yet care about.
             | 
             | But if you entertain them, they will read it anyway and
             | that story may stick with them. And then six months or a
             | year later when they have the same problem, they will
             | actually remember how someone else handled the same issue
             | and it will turn a potentially nightmarish scenario into
             | "Meh, I just did the same thing that guy on Reddit did to
             | his shitty boss/customer/coworker. Worked like a charm.
             | Moving on."
        
           | me_me_me wrote:
           | <any group of 2 or more people> (for reasons related to user
           | demographics and feedback loops) rewards certain types of
           | writing and implied viewpoints.
           | 
           | This is literally the basis of human interactions, thats how
           | we humans work at every scale to form
           | friendships/families/societies/nations.
        
             | dspillett wrote:
             | Correct, though I don't think the comment you replied to
             | intended to imply that such pressures & rewards didn't
             | exist elsewhere or that this particular outcome was either
             | general or not.
             | 
             | It just stated that the specific pressures and rewards
             | present in most reddit communities tend to encourage this
             | specific style of writing.
        
               | me_me_me wrote:
               | Maybe you are right and it was just a plain statement.
               | But it sounded quite snarky to me. As if it was
               | condescending reddit for biases given subreddit might
               | have, like HN has none.
        
           | DoreenMichele wrote:
           | ^
           | 
           | The flair for the piece is "Rant." That's an official
           | category for the sub. There are going to be expectations
           | surrounding how you write when using a tag like that.
        
           | aphrax wrote:
           | Its funny you mention the rewards on certain types of writing
           | within Reddit. I was thinking about it the other day &
           | couldnt quite put my finger on why I dislike a lot of the
           | stuff on there - even across Reddits. I think this is
           | probably the cause...
        
           | fortran77 wrote:
           | > Reddit (for reasons related to user demographics and
           | feedback loops) rewards certain types of writing and implied
           | viewpoints.
           | 
           | Just like Hacker News! Here's a clue--it's in a subreddit
           | where these types of stories are welcome.
        
       | social_quotient wrote:
       | Slight topic drift - Any thoughts on how the pandemic might
       | materially change assumptions about an onsite/onprem being better
       | than cloud or manage data center when the code people are now
       | actually remote to the "Local" infrastructure. Something specific
       | to the reality of the pandemic strikes me as something that would
       | make the die hard local only folks have to start rethinking the
       | position.
       | 
       | (Not to suggest it's bad, just different now that a primary
       | assumption about people work in the office is less true)
        
         | AnIdiotOnTheNet wrote:
         | As someone who works in a very anti-cloud company culture
         | (which I happen to agree with), this incident has had no effect
         | whatsoever on that mindset. We don't dislike cloud because it
         | is accessed remotely, we dislike cloud because of the lack of
         | control we have over everything running there. If something
         | happens and our local systems have a problem, there are people
         | here, like myself, who's highest priority will be fixing it and
         | second highest priority will be communicating the status of
         | that. Your problems are _never_ a priority to a cloud vendor
         | and communicating with you is even less of a priority. That 's
         | before we even get into the absurd expenses and reliance on big
         | fat pipes.
        
         | ocdtrekkie wrote:
         | I feel a lot safer knowing I'm controlling all the variables
         | during a global crisis, actually.
         | 
         | This article provides an example of how when you operate on
         | prem, literally any crazy option remains on the table for you.
         | If you asked your cloud provider to do this, it'd be a no.
        
       | macintux wrote:
       | At my first job we were starting up the company and didn't really
       | know what we were doing; one early server was sitting on a
       | folding table and its power cord was wrapped around a leg, so
       | just to replace the table with something more robust involved
       | downtime.
        
         | em-bee wrote:
         | the careful application of a saw or an angle grinder would have
         | made it possible to remove the folding table without unplugging
         | the power cord. :-)
        
         | crumpled wrote:
         | I heard an anecdote about a company splicing some fiber cable
         | in the middle of a utility van and having to cut the van apart
         | at the end.
        
         | elliotpage wrote:
         | I've been there, solidarity for cheap furniture based
         | maintainence windows.
        
       | Humphrey wrote:
       | I haven't read the article, but I'm reminded of that episode of
       | Seinfeld and the frogger arcade game
        
         | dfsegoat wrote:
         | I have pondered this exact scenario (server move w/0 downtime)
         | - because of watching that episode - wouldn't have thought
         | about it otherwise.
         | 
         | ..It's interesting how pop-culture and your chosen profession
         | intersect, at times.
        
       | chisleu wrote:
       | I didn't want to lose my many months of uptime for a lan party
       | back in 1999/2000 and we used the UPS to migrate my linux box
       | across town for some Quake 3 Arena action.
       | 
       | Things were so much simpler back then.
        
       | devchix wrote:
       | I recall sometime in the mid 2000s there was a fever for
       | achieving five-9s (99.999% uptime, I think -- it became fodder
       | for a few episodes of Mr. Robot). Not that the metric ever went
       | away, but back then a lot of BigIron(TM) vendors advertised
       | achieving five-9s by replacing hardware while the OS remained
       | running and continuing service. Sun 15K and 25K series (Gilfoyle
       | had a used one in the garage running his network) were behemoths
       | whose mem/cpu boards you could swap out wholesale while the
       | entire frame and backplane was powered on, and while the OS the
       | board came out of remains functioning. There were many caveats
       | around the procedure but it worked. Execs and sales guys loved
       | those demos. These monsters were expensive and banks and energy
       | conglomerates were buying them by the dozens. There was also a
       | big todo about hot swappable drives. The idea that you could be
       | doing hardware maintenance while the machine was still running
       | was a novelty, something like brain surgery while the patient was
       | not only awake, but awake and eating, driving his car, talking on
       | the phone, etc.
       | 
       | A decade later I look back with deep surprise that we didn't
       | think to abstract out the service instead of the hardware. I
       | don't know how many of those behemoths are still being bought,
       | now I work almost exclusively with small server instances that
       | can come and go on the fly. Micro services and AWS have taken
       | five-9s in a different direction. I frequently think of Sun as a
       | failed Hephaestus, in a Christopher Nolan film he would be
       | brilliant but could only turn out clumsy tools because of his
       | deformity, he hates the things he makes so he throws them away
       | before completion. Men find these cast-offs and temper and refine
       | them.
        
         | chasd00 wrote:
         | i remember those. A friend of mine was a network engineer at a
         | local datacenter ( UUNET then MCI pre-scandal ) and said
         | companies were buying Suns for everything no matter how
         | trivial.
         | 
         | He worked a night shift and i use to go hang out with him in
         | the noc and download movies (residential bandwidth was not what
         | it is today). Odd he nor i ever get in any trouble for that
         | heh.
        
         | tlack wrote:
         | Well as I recall there were a few reasons that people focused
         | on reliability in hardware in the late 90s:
         | 
         | 1. Shared state storage systems that supported replication were
         | rare (I think Oracle and Informix maybe?)
         | 
         | 2. Virtualization software was in its infancy (did SunOS have
         | something before Solaris?)
         | 
         | 3. RAM and hardware were waaaaay more expensive, meaning you
         | often had to buy more pure metal just to answer questions fast
         | enough
         | 
         | At least that's my take on it based on my dim faded memories
        
         | bcrosby95 wrote:
         | AWS single region SLA isn't 5 9s though. If you want 5 9s in
         | the cloud, not even multiple AZ is enough - you need to go
         | multi region or even multi cloud.
        
           | closetohome wrote:
           | > Multicloud
           | 
           | This sounds like something they'd make up on NCIS.
        
         | theevilsharpie wrote:
         | > [In the mid-2000's], a lot of BigIron(TM) vendors advertised
         | achieving five-9s by replacing hardware while the OS remained
         | running and continuing service... A decade later I look back
         | with deep surprise that we didn't think to abstract out the
         | service instead of the hardware.... Micro services and AWS have
         | taken five-9s in a different direction.
         | 
         | In the mid-2000s, enterprises were (and in many cases, still
         | are) running proprietary software with proprietary RPC
         | protocols that had no available source code or other means of
         | modification, and most had no support for application-level
         | high availability, access control, or any other operational
         | quality-of-life feature that people take for granted today.
         | Rather, that functionality was handled at the infrastructure
         | level, through things like the aforementioned Big Iron.
         | 
         | The world looks different today, but those machines made sense
         | for the environment at the time.
        
           | ClumsyPilot wrote:
           | I think it kind of makes sence in general, and the obly
           | questuon is whether it could be achieved at lower cost.
           | Complexity of Todays commodity machines is conparable to big
           | iron kf yesteryear
        
         | larrik wrote:
         | > a lot of BigIron(TM) vendors advertised achieving five-9s by
         | replacing hardware while the OS remained running and continuing
         | service.
         | 
         | AS/400's were capable of that in the 90's (possibly the 80's as
         | well). Heck, they'd call IBM for replacement parts on their
         | own. You'd show up for work and there'd be an IBM guy waiting
         | to be let in. He'd swap out a part with no downtime, and be
         | gone. I've seen machines with uptimes of over a decade with
         | zero on-site IT.
        
           | blhack wrote:
           | We had one of these at an old office of mine. I actually
           | think it's really cool.
        
           | pmiller2 wrote:
           | I recall working on some Sun machines with hot-swappable CPUs
           | (and, I assume, disks and other peripherals). If they somehow
           | made memory hot swappable (I'm sure it's possible, just
           | uncommon and/or verrry expensive), with hot swap CPUs and
           | disks, and redundant power supplies, you could tear the
           | machine half apart and it would still keep running. Of
           | course, at that point, once everything is hot swappable,
           | there are generally multiples of everything, so your one
           | machine is really more like multiple machines inside one box
           | than a single discrete machine.
        
       | g051051 wrote:
       | I guess servers have gotten a lot more robust in the last
       | decade...there's no way any server I ever managed would survive
       | something like that.
        
         | maeln wrote:
         | A lot of server are SSD-only these days which make them less
         | fragile. Still, I really wouldn't see myself pushing a running
         | server in a cart.
        
           | tyingq wrote:
           | Yeah, there's certainly still things like riser cards and
           | connectors that could come unseated due to vibration.
        
             | marcosdumay wrote:
             | That's probably a problem for the next guy that takes an
             | ops job there. Loose pieces often don't disconnect right at
             | the same instant, and even when they do, memory caches
             | usually postpone the failures.
        
           | throwaway744678 wrote:
           | On a parking lot, no less! Let's hope it will not rain on the
           | way!
        
             | em-bee wrote:
             | umbrellas over the switches and the cart...
             | 
             | (means extra billable hours for the extra manhours needed
             | to hold the umbrellas)
        
       | rafaelturk wrote:
       | Pictures please!
        
       | fredley wrote:
       | This is the kind of content I've only ever seen previously in
       | TDWTF (which is entirely this sort of content...)
       | 
       | https://thedailywtf.com/
        
       | anfractuosity wrote:
       | Reminds me of this - https://www.youtube.com/watch?v=vQ5MA685ApE
       | 
       | 'Moving online webserver using public transport'
        
         | salzig wrote:
         | had the same thought :)
        
         | zomglings wrote:
         | In the rain!
        
         | tyingq wrote:
         | The Indiana Bell building move is pretty impressive.
         | http://www.paul-f.com/ibmove.html
        
           | joncrane wrote:
           | Wow this is literally the exact same thing as the OP but for
           | an entire building. Insane.
        
           | sschueller wrote:
           | https://www.youtube.com/watch?v=CNqul9TfJwI
        
       | cpuguy83 wrote:
       | I'm picturing that Seinfeld episode where George tries to move
       | the Frogger arcade from a restaurant that is shutting down but
       | doesn't want to lose his high score.
        
       | codingdave wrote:
       | I'm surprised that part of the story wasn't to drill down into
       | the requirements. No downtime ever? Not even at 3 AM on a
       | Saturday?
       | 
       | I've found that when people are being unreasonable it is because
       | they haven't split out their true needs from their first idea of
       | how to meet those needs. In this case the true need is zero
       | impact to users. The owner translated that to "zero downtime",
       | and then didn't accept alternative solutions that still would
       | have met his true business need.
        
       | neycoda wrote:
       | I wonder if there was any legit reason to require no downtime.
       | Otherwise the owner doesn't understand what downtime means for
       | his business.
        
       | ericyan wrote:
       | The consultants really should told the client that if all you
       | have is a single server then there is no such thing as "zero
       | downtime".
        
       | robin_reala wrote:
       | I always remember this post by the Amsterdam Police who managed
       | to maintain their uptime on a VMS cluster despite moving data
       | centres in the middle:
       | http://web.archive.org/web/20120229042903/http://www.openvms...
        
       | JeroenKnoops1 wrote:
       | Reminds me of the OpenVMS clusters.. Police in Amsterdam
       | celebrated in 2007 an uptime of 10 years of their cluster. In
       | this period, all hardware was replace, and half of it was moved
       | to another location 7 km away. All data moved from DAS disks to
       | SAN without one application needed to be stopped. Also VMS was
       | upgraded from 6.2 to 7.3-2. The VMS cluster did not go down
       | during all of these changes. I <3 OpenVMS
        
         | tandr wrote:
         | Would be interesting to know if it is still up and running?
        
         | JeroenKnoops1 wrote:
         | During Y2K I've also had to shutdown various OpenVMS servers
         | with uptime over 10 years... Only because of company policies,
         | not because OpenVMS required the reboot.
        
       | umarniz wrote:
       | Interesting read, makes me wonder as a thought experiment if it
       | counts as downtime if the latency of commands on the machines
       | rises to 5 minutes?
       | 
       | You could clone the VM to another instance and record commands
       | going to VM1 and replay them to VM2 after 5 minutes.
       | 
       | This whole brain fart of mine doesn't make much sense but if you
       | play along with it, does it still count as a downtime or just
       | very high latency?
        
         | NateEag wrote:
         | It depends on how downtime is defined in the contract.
         | 
         | That sounds like I'm being snarky but I mean it - whether an
         | actual legal contract or just the documentation given to users,
         | any system where downtime matters should have some discussion
         | of what impacts downtime can have and how it's measured and
         | managed.
         | 
         | That documentation is what defines "downtime".
         | 
         | I'll add that what you've described is a sort of low-fi manual
         | version of DB replication
         | (https://en.m.wikipedia.org/wiki/Replication_(computing)).
        
         | pc86 wrote:
         | Wouldn't requests time out on the client side long before five
         | minutes?
        
           | heavenlyblue wrote:
           | I don't know whether it's the software in general, but ever
           | since I've started using Three 4G broadband in the UK; all of
           | the software started behaving really weirdly (lots of
           | lockups, hangs, etc). Apps often need to be restarted.
           | 
           | If you do a ping during "bad weather", you can see that they
           | buffer up to 5 minutes of packets (i.e. there will be no
           | communication for some time, then you'll receive a bunch of
           | them with a huge latency with sequences intact).
           | 
           | So I would assume a lot of software could even work that way.
           | I think a lot of software don't set any (TCP) timeouts at
           | all.
        
         | tyingq wrote:
         | That works where you have control over all of the timeouts and
         | failure detection at every level and layer. TCP keepalives, for
         | example, could thwart you. Or client side timeouts, or firewall
         | connection state tables, etc.
         | 
         | 5 minutes of unplanned downtime in a pub/sub setup could easily
         | go unnoticed, since that setup is typically tuned for long
         | timeouts and/or repeated retries.
        
       | tobyhinloopen wrote:
       | 10 hours investment for no downtime seems like a good deal for
       | the owner
        
         | topkai22 wrote:
         | Depends on if he really has customers accessing the system "all
         | the time."
         | 
         | Besides, as pretty much everyone has noted, running a zero-
         | downtime system on a single physical machine in what sounds
         | like is just a normal cable room is kind of nuts. Those 10h
         | would have been much better spent to move that puppy to someone
         | else's data center and get some redundancy.
         | 
         | Although reading between the lines, maybe the lease was up and
         | they were waiting to the last minute to move it.
        
       | mercora wrote:
       | when i was younger i was super proud that i could replace my disk
       | while i kept working on the device. i would put the new disk into
       | my LVM volume group moved all extends to the new disk and dropped
       | the old disk out of the VG afterwards, when done i could just
       | unplug it and be done without halting work except for kicking off
       | the process.
        
       | akssri wrote:
       | Was it George Costanza ?
       | 
       | https://m.youtube.com/watch?v=a-FbktgqCqY
        
       | user5994461 wrote:
       | I am so scared to imagine what would happen if there was any
       | issue during the move (very likely when dragging live cables and
       | powers over hundreds of meters).
       | 
       | The client would immediately refuse to pay anything because he
       | was very clear he wouldn't pay a thing if there is downtime.
       | 
       | Then, the next contractor would be super quick to judge you and
       | the situation, reinforcing that you were an incompetent idiot and
       | the client was right to kick you away on the spot and not pay a
       | dime.
       | 
       | Glad it went well in the end. There is so much to lose for the
       | person trying to help.
        
         | willcipriano wrote:
         | This is a junior sysadmin I suspect. With a bit more experience
         | you'd learn to say something along the lines of "no downtime,
         | sure, that will be 30 grand" and the ability for downtime will
         | suddenly materialize. Him and his friend did this big song and
         | dance, took a huge risk and only got paid for ten hours worth
         | of work in the end.
        
           | zrail wrote:
           | $30k, 80% up front, strict liability waiver that says I'm not
           | responsible for loss of business or anything else if there is
           | downtime.
        
             | imtringued wrote:
             | You can't get paid upfront and at the same time get a
             | liability waiver. For a 100% guarantee with full liability
             | $30k doesn't actually sound ridiculous because it would
             | require obtaining 100% identical hardware and doing at
             | least one test run on that hardware before actually doing
             | it on the production hardware. What the contractor did is
             | basically "wing it", explain a way to get zero downtime to
             | the client and then not actually offer a guarantee by doing
             | the operation straight on the production hardware. Really
             | this was more about convincing (ie bullshitting your way
             | through) the client to let you do the work than actually
             | doing it properly and for a huge sum of money. It wouldn't
             | surprise me if there was actual downtime for a few seconds
             | and the client simply didn't notice it.
        
             | pedrocr wrote:
             | Now you're over-charging massively. If you have no
             | liability and are guaranteed pay, charging for just double
             | hourly rate is more than enough as a "stupid and non-
             | standard requirements" kind of thing.
        
           | cellularmitosis wrote:
           | > sure, that will be 30 grand
           | 
           | I am having trouble finding a reference to it now, but I've
           | heard patio11 refer to this as "the Japanese no". Don't ever
           | say "no" directly, just quote an astronomical price.
        
             | x86_64Ubuntu wrote:
             | People in the trades world do it too. If a job won't
             | provide the margin they are seeking, or the job is more
             | difficult than it's worth they will up the price. If the
             | consumer chooses them to do the job, it's at a pricepoint
             | that's worth the trouble but they are really hoping to be
             | passed over.
        
               | briankelly wrote:
               | My dad's in construction and frequently gives out "fuck
               | off" quotes. It isn't so rare that the client accepts
               | them.
        
               | coldcode wrote:
               | The Rolling Stones tried that with Microsoft and "Start
               | Me Up", they quoted what they thought was ridiculous
               | $10M. Microsoft said sure, no problem.
        
               | vntok wrote:
               | $10M is a debunked urban legend; the actual figure is
               | only $3M, which is pretty standard. Microsoft's whole ad
               | campaign for Win95 cost about $200M after all.
        
               | Strom wrote:
               | That's a fun story. Looking more into it, it seems that
               | $10M is based on rumors and it was more likely $3M. [1]
               | Doesn't change the point of the story though.
               | 
               | --
               | 
               | [1] https://www.networkworld.com/article/2220097/what-
               | microsoft-...
        
             | toss1 wrote:
             | That works.
             | 
             | A friend of mine with a consulting biz was requested by IBM
             | to handle a job in Turkey. He didn't want the gig & told
             | them so repeatedly. He finally decided to tell them the
             | most ridiculous price he could think of (like appending two
             | zeros to the number). He said they didn't even flinch and
             | he was on the plane to Turkey the next week for six months.
             | But he did say that it was pretty much worth it in the end
             | (but only because of the pricing).
        
               | pbronez wrote:
               | Welcome to the market economy!
               | 
               | Seriously, this sort of dynamic is why the world works as
               | well as it does.
        
               | toss1 wrote:
               | Yup, market economies are fantastic for rapid resource
               | allocation!
               | 
               | Yet they are not a panacea.
               | 
               | They suck at preventing problems related to:
               | 
               | * tragedy of the commons - tend to create & magnify it
               | 
               | * long-term disaster planning / tail risk - e.g.,
               | stockpiling resources for natural disasters, pandemics,
               | etc.,
               | 
               | * preventing foolish development, e.g., on cheap land
               | subject to flooding
               | 
               | * self-creating safety systems for workers, consumers,
               | environments, etc. -left to their own devices, markets
               | always do too little-too late
               | 
               | Market systems literally often need to be saved from
               | themselves, e.g., when overfishing will literally kill an
               | industry by driving extinct the very thing it depends
               | upon
        
               | namibj wrote:
               | Actually, stockpiling does happen if there are no laws
               | against price gauging. Because that's how the capital
               | bound in the stockpile gets it's ROI.
        
               | toss1 wrote:
               | I hope you are not seriously suggesting making price
               | gouging in disasters legal as a method of preparation.
               | 
               | Price gouging is nowhere near a reliable method of
               | disaster preparation as actual expert planning.
               | 
               | The stockpiles you speak of are usually just ordinary
               | current inventory marked up by an order(s) of magnitude.
               | 
               | Also, stockpiling goods is not the only thing needed for
               | disaster preparation. One must also stockpile services,
               | i.e., have the right people recruited, trained, equipped,
               | and ready to respond. Prime examples are military and
               | firefighters, who spend a much time & resources training,
               | and little time actually fighting the wars or fires.
        
             | vorpalhex wrote:
             | I had a client who wanted me to write some code in Adobe
             | Coldfusion of all things. Not wanting to say no to an
             | otherwise good client, I quoted some insane hourly.
             | 
             | And now I know that Coldfusion is absolutely miserable to
             | code in (and the client tried to dodge their bills!).
        
             | ericlewis wrote:
             | my grandfather called this the "asshole quote"
        
             | patio11 wrote:
             | I don't believe I've said that.
             | 
             | For what it is worth, if a customer of my previous
             | (salaryman-heavy) employer asked for this, we'd tell them
             | an _actual_ no, which is extremely rare in client
             | relationships in Japan. A contextually appropriate  "no"
             | for something which is less absurdly wasteful of
             | engineering time to no purpose would be "That sounds
             | difficult. We could explore options to do it, but perhaps
             | you could accept an hour of downtime in the dead of night"
             | then bargain down to 15 minutes.
        
             | pmiller2 wrote:
             | I've heard people say that the right way to say 'no' in
             | Japanese is more along the lines of "it is very difficult."
             | I have no idea how much linguistic truth there is to that,
             | but it definitely rings true culturally.
        
             | yourapostasy wrote:
             | There is an art to this. These situations come up because
             | you want to continue an ongoing relationship into the
             | future.
             | 
             | So you quote a price that is high, but not so high as to
             | destroy the relationship. I call it "plausibly-deniably-
             | high".
             | 
             | You also have to gauge the context of the other party in
             | the negotiation. This technique works best when you
             | accompany the quote with some kind of description matching
             | the personalities. Some people are swayed by a description
             | of the additional time it takes (the billable hours
             | mentality). Others are swayed by a description of the
             | additional risks you are bearing on their behalf to deliver
             | the outcome. Still others are swayed by a description of
             | the _de novo_ technical challenges that no one else has
             | ever attempted before. The list goes on, and is a
             | fascinating study into people.
             | 
             | This is where a real salesman (as opposed to an order-
             | taker) earns their keep, where they know how to read a room
             | and craft a response, messaging and after-meeting
             | socializing that takes into account all those perspectives
             | simultaneously from the point of view of the other party.
        
         | gnopgnip wrote:
         | 4.5 hours of a consultants billing rate can be much more than
         | 10 hours of your regular hourly rate working a similar job. A
         | good consultant will have a contract. The client saying I won't
         | pay if XX happens doesn't mean anything unless it was in the
         | contract.
         | 
         | Networking/spanning tree loops, arp table mismatch/corruption,
         | the switches at the destination being misconfigured are all
         | realistic problems that would result in downtime here. The
         | normal way you do this is with live migration from hyper-v or
         | vmotion from ESXI. If the initial migration is not successful,
         | you just leave the server powered on while you address the
         | issues. Once the VM has been migrated you can do whatever you
         | want with the original server without worrying about downtime.
        
           | maire wrote:
           | This reminds me so much of when I joined vmware in 2006.
           | vmotion had already been around for a few years - but I
           | believe this was the first release of vCenter with DRS.
           | 
           | A couple of months I joined, a room full of customers chewed
           | us out for not publishing our vmotion compatibility tables.
           | After 4 hours of chewing out - they then told us they reverse
           | engineered the compatibility tables and reorganized their
           | entire data center to conform to vmware vmotion. Then (of
           | course) we worked with intel to make sure the compatibility
           | matrix worked in the future.
           | 
           | I realized at that point that I joined the right company.
        
         | ThePowerOfFuet wrote:
         | Also, moving a server with spinning disks? What could possibly
         | go wrong.
        
           | nwallin wrote:
           | Disks aren't that sensitive to motion.
           | 
           | At my last job, we had 2 airplanes with 5 computers each with
           | 6 disks each mounted in an aircraft. These were regular
           | servers from Dell, not special hardened or resilient hardware
           | or anything. So 60 or so hard disks flying around. Takeoffs,
           | landings, turbulence. Two flights per day, 3 hours each
           | flight, 6 days per week. So 626 landings per year.
           | 
           | Disk failures were not particularly common.
        
             | zimpenfish wrote:
             | As a counterpoint, I worked for a place that used Mac Minis
             | inside spinning displays and the hard disks absolutely did
             | not like it one bit.
             | 
             | (They also tried spinning disk machines on buses which also
             | failed quickly but that was more the grime and electrical
             | noise than the motion, IIRC. Then they tried mini-servers
             | running from CF and the motion would slowly work the CF
             | cards out of their sockets. The company did not last long.)
        
           | nilssonanders wrote:
           | Reminds me of the video where they yell at hard drives and
           | measure disk latency. https://youtu.be/tDacjrSCeq4
        
           | daemin wrote:
           | Wasn't here a story about Sun (or HP or someone like that)
           | where they moved a bunch of disk servers across a parking lot
           | to another building and found that many of them had died from
           | the vibrations on the trolley cart used to transport them.
        
             | znpy wrote:
             | it was Yahoo, IIRC.
        
           | driverdan wrote:
           | Spinning disks can take a surprising amount of shock and
           | vibration before they fail.
        
           | sschueller wrote:
           | I had a spinning disk in my car back before we had all these
           | cool embedded PCs. The disk was never an issue, these things
           | can take a lot of abuse (Even New England roads). I had it
           | mounted sideways so a large pothole wouldn't push the heads
           | into the platter.
        
         | sleepybrett wrote:
         | If there was downtime during the move and the client was there
         | and declaring that they would not pay, you just walk away.
         | You'd be surprised at how fast they can cut a check in that
         | situation.
        
       | linsomniac wrote:
       | Lower stakes, but ~15 years ago a friend had a Linux box in the
       | corner that had huge uptime. I want to say the uptime started
       | shortly after the kernel patch that fixed the 400-ish day
       | overflow of the uptime counter. He moved to a new home and very
       | carefully moved the running server using it's UPS. He didn't have
       | to worry about keeping networking up though.
       | 
       | I used to be all about long uptimes. I eventually started seeing
       | long uptimes as a negative though. A long uptime probably means
       | patches have not been applied.
        
         | jccooper wrote:
         | I also did that once, about the same timeframe, specifically to
         | preserve an uptime.
         | 
         | I think the cult of runtime came about simply because it was
         | impressive that a personal computer could stay running for more
         | than a few days when most of the world ran Win95. And because
         | development cycles were longer and there weren't a lot of
         | network threats.
        
       | panpanna wrote:
       | Really disappointed they didn't use a wireless network of some
       | kind.
        
         | lordnacho wrote:
         | My first thought as well. Set up WiFi along the path, basically
         | turn the machine into a laptop. But I think there might be a
         | disconnect when you change base stations? At least when I move
         | my laptop between rooms in my house there's often a momentary
         | problem while on video call.
         | 
         | The other way I'd do it is more similar to described. Create
         | redundant network paths to the server, then cut one.
        
         | dbalatero wrote:
         | I wouldn't, the risk of disconnects is high.
        
           | panpanna wrote:
           | But the risk of sometime tripping over your looong cat6 and
           | breaking the network is not negligible either...
        
       | neilv wrote:
       | Good thing the server had two power supplies. There was a YouTube
       | video (which I can't immediately find) of people moving a server
       | across town, on the train, without powering it off, and, IIRC,
       | they had to splice the UPS into the power cable.
       | 
       | When it's done for pay rather than for fun, and payment is
       | conditioned on zero downtime, I hope they charged a premium to
       | make up for the risk of no pay. Offhand, I don't know what's a
       | good way to do that -- I've never had a consulting client demand
       | terms like that for billed-by-the-hour work.
        
         | kijin wrote:
         | Effective hourly rate = base hourly rate * risk.
         | 
         | Risk = client risk * task risk.
         | 
         | Client risk is based on your past experience with the same
         | client. If they're prone to demand last-minute changes or
         | stupid stuff, they get charged a higher rate on every project
         | afterward. Jacking up the client risk factor is also a nice way
         | to fire a client you don't want.
        
         | discordance wrote:
         | https://www.youtube.com/watch?v=vQ5MA685ApE
        
       | tzury wrote:
       | I once was called in to export data from a DOS program that had
       | no export option. Single Author died of heart issues and the
       | company needed the data for the migration.
       | 
       | After several attempts to understand the binary format I gave up
       | and ended up printing tabular reports to LPT1 which I connected
       | my laptop to, extracting it and rebuilding CSV files.
       | 
       | Lucky enough, printing those days were the most important feature
       | of a business app.
        
       | shireboy wrote:
       | You realize the client is condescendingly mocking the guy for
       | saying it can't be done now, and will expect this next time they
       | run updates on the server, which is to say never
        
       | mercora wrote:
       | i wonder if its really possible to do the initial setup of the
       | ethernet failover without interruption. i have never done this,
       | but i would expect the interfaces themselves will become
       | unavailable for direct use and you get a completely fresh virtual
       | ethernet interface which represents whatever physical interface
       | is currently active... at least this is what happens when you add
       | an ethernet interface to a bridge in linux...
        
       | johnklos wrote:
       | I've done something like this - server running off of UPS moved
       | from one building in Manhattan to another about 1/4 mile away, in
       | snow... Not for someone with weak arms.
        
       | digitalsushi wrote:
       | I had to search the reddit commits for 'vmotion'. They have it
       | covered.
       | 
       | This anecdote is an amazingly good story for telling at the pub
       | over a few beers. It's a terrible story for a strategy.
       | 
       | If this is a mountain, my molehill is that one night in the late
       | 90s, I got paged cause the SMTP outbound server was overheating.
       | At midnight I drive across sleepy NH backroads, and stopped at a
       | Wendys to get a chicken sandwich and iced tea, for the caffeine.
       | 
       | When I got to the server room, I pulled the 2U Dell server out of
       | the rack and discovered the CPU cooling fan had seized up. Mind
       | you, this is a New Hampshire data center in 1999, and it has a
       | filing cabinet with manilla folders, and carpeted floors. This
       | thing was never prepared for any disasters.
       | 
       | A half hour later, the SMTP server was up and running cool again.
       | 
       | I greased the fan with the mayonnaise from my sandwich.
        
         | Spooky23 wrote:
         | The real lesson is that the teller of the tale sort of did
         | initally -- fire the customer.
         | 
         | If the story is true, the client is a stereotypical know-it-all
         | small business owner who gets by on bullying. You see them
         | frequently in businesses that pay low-skill workers a small
         | premium that is hard to replace. (ex: cleaning services, pool
         | guys, mechanical contractors that do low-end maintenance work,
         | etc)
         | 
         | As a contracted SME, taking a job like this is dumb. The
         | chances of failure, where "failure" == the server going down is
         | high, and the customer will just stiff you.
        
           | donmcronald wrote:
           | I deal with a fair bit of customers like this (not via my
           | business), enough that firing them all isn't an option
           | because it's a large portion of the market where I am. It's
           | something. They'll have low end servers with no redundancy,
           | terrible or no backups, and no contingency plans for
           | anything. They won't spend a nickel and are the most likely
           | to lose their minds if anything goes wrong.
           | 
           | It's so frustrating and stressful.
        
             | eldavido wrote:
             | This is why services like Google Apps or managed exchange
             | hosting exist. Most people are _terrible_ at IT management.
             | So bad they 're far, far from realizing how bad they even
             | are.
             | 
             | When you consider you're getting like 1000 of the smartest
             | tech people in the world to manage your infrastructure for
             | you, for $5/month per user, it's really such a no-brainer.
             | If people are too stubborn to see that, or want to waste
             | time trying to do it better themselves "because it's
             | cheaper", with redundant power, OS patching, zero-downtime
             | changes/deploys, proper capacity planning, proper redundant
             | connectivity, provisioning the right network around it,
             | physically securing the server room, ensuring things are
             | properly cooled, not wet...I could go on forever, and this
             | isn't even the main focus of the business...I'm sorry, but
             | they deserve go to out of business.
             | 
             | I had a very stubborn client once who ran a hotel chain.
             | Won't say what or where, but I wasn't surprised when their
             | random "security through obscurity" VNC server got
             | compromised. I wasn't finishing migrating to the new PCI-
             | DSS compliant system we built, either, so there go 5000
             | credit cards "encrypted" with some sweet rot13-level
             | bullshit in Turbo Pascal I cracked in about 30 minutes with
             | no code access.
        
             | ed25519FUUU wrote:
             | Seems like some dedicated hardware here would make this go
             | faster if there's a business for it. For example, if the
             | bandwidth isn't high, you could setup a wireless mesh from
             | point A to point B and connect via some appliance to the
             | NIC.
             | 
             | Walk the length with the appliance and verify there's no
             | dead spots, then just hook up to a power supply and get
             | things done.
        
               | Bnshsysjab wrote:
               | In the 90'a wifi was even more of a trash fire than it is
               | today.
        
           | paulsutter wrote:
           | Key is to set the price. If the job is a hassle, you're not
           | charging enough. That will also filter out bozos. Sometimes
           | people really do have extreme requirements. And when they do,
           | they're willing to pay 10x for it.
           | 
           | Agreed though, this particular customer disqualified himself
           | as soon as he said he wont pay if the server goes down. He
           | should have offered a big bonus if the move succeeds without
           | downtime.
        
             | Dylan16807 wrote:
             | > Agreed though, this particular customer disqualified
             | himself as soon as he said he wont pay if the server goes
             | down. He should have offered a big bonus if the move
             | succeeds without downtime.
             | 
             | I don't know, that sounds too close to encouraging the
             | attitude of "it's not worth it, I'll just take my normal
             | pay". 10x vs. 0x is a significantly stronger incentive than
             | 10x vs. 1x.
        
             | noobermin wrote:
             | This is probably the best compromise. Sure if you're moving
             | a server with, idk, top secret government spy files or
             | something then you charge 10x or some multiple. But that
             | would obviously be the exception. The vast majority of
             | these people are just self-entitled dipshits who have an
             | inflated notion of self-worth.
             | 
             | See my comment elsewhere on "the customer is always right."
        
             | derefr wrote:
             | > Key is to set the price. If the job is a hassle, you're
             | not charging enough. That will also filter out bozos.
             | 
             | Sure. Just, there are some verticals where charging a
             | "positive-ROI" amount gets you no business at all, because
             | _all_ the potential clients in that vertical are businesses
             | that operate on such razor-thin margins that they don 't
             | actually _have_ the cash-flow to pay for the extreme
             | requirements they also have. They 've been getting along
             | until now purely by begging/tricking/manipulating people
             | into doing negative-ROI one-off tasks for them. If forced
             | to get contract all the services they need out on the free
             | market, their business would cease to exist.
             | 
             | (Therefore, you say, they _should_ cease to exist. I 'm not
             | arguing!)
        
               | rdslw wrote:
               | > there are some verticals where charging a "positive-
               | ROI" amount gets you no business at all,
               | 
               | if you do, you're just selling dollar bill for 80c. You
               | may drink growth-kool-aid, or someday-monopoly-hope or
               | VC-subsidizes-business.
               | 
               | In the end, somebody pays for it either from stupidity or
               | hope.
        
             | kenhwang wrote:
             | When I did consulting, we always got unreasonable but
             | technically not impossible asks like this. We never "fired"
             | the customer, because that's just bad business and customer
             | service. What we did instead was tell them their options,
             | our recommendation, and "appropriate" billing estimates.
             | Your job is to consult them to the best of your ability,
             | not stop them from bad decisions despite your advice.
             | 
             | So, 1 hour billed for 5 minutes of downtime, or 40 if you
             | want absolutely none. Happy to do either, but highly
             | recommend the former. 99% of people pick the cheaper
             | recommended option.
             | 
             | In this case, I would've tried to put the server on WiFi
             | which would seem like less a hassle for me. Equipment
             | acquisition cost billed to the customer.
        
           | dialamac wrote:
           | There's a market for everything. The "shit customer" would
           | have hired someone. Sure things could have gone to shit and
           | they wouldn't be paid, but they didn't, and there is a living
           | to be had serving this market segment. Every contracting
           | business has difficulty with AR, stereotyping often isn't
           | even particularly acccurate. Suffice it to say there is a
           | business to be had in catering primarily to difficult
           | customers.
        
             | mlyle wrote:
             | > stereotyping often isn't even particularly acccurate.
             | 
             | While I agree that it's difficult to predict exact credit
             | risk based upon customer personality, explicit threats not
             | to pay you like exist in the story -are- a bit of a signal
             | that there may be a risk of nonpayment.
        
               | kube-system wrote:
               | Sure, I think the point was that there _is_ a market for
               | customers with subprime credit. Sometimes it is fairly
               | profitable too.
        
             | nautilus12 wrote:
             | Depends on what you mean by business. The nature of that
             | type of business is that it's not very predictable or
             | repeatable so you get one small chunk of business but
             | ultimately it's not the kind of business (even a
             | consultancy who's job it is to throw hours into the fire)
             | you really want. Scaling that out would be death by 1000
             | papercuts.
        
           | renewiltord wrote:
           | Presumably at least part of the value of the job from his POV
           | was the excitement of it. I've definitely done suboptimal
           | jobs where I just enjoyed them a little and they were a break
           | from the replicable tasks.
           | 
           | Since the customer is never going to know the awesomeness of
           | it, it's really just for yourself.
        
         | jjice wrote:
         | I'm a young developer, so I've never had the chance to work
         | with on prem servers (and the chances that I will are looking
         | slim), but I've always loved these "war" stories.
        
           | jacobsenscott wrote:
           | I miss the days when the servers came with seating (Cray).
        
           | mike_d wrote:
           | You can buy yourself a pair of old Dell servers from
           | craigslist or eBay for a few hundred bucks. With a $200
           | membership to VMUG Advantage you'll get all the licences you
           | need to build an enterprise grade cluster.
           | 
           | Build yourself a home lab and learn how systems work. Figure
           | out what is really running your code. Learn how to resource
           | optimize.
           | 
           | Don't end up only being able to work on webapps and small
           | datasets that fit comfortably in the cloud.
        
             | jjice wrote:
             | That's been a plan of mine. Right now, my home server is
             | just a Pi running SMB and Jellyfin, but the plan is to
             | expand into some used hardware. Seems like used server
             | hardware is one hell of a deal.
        
           | laurent92 wrote:
           | In my former service company, the story of the server room
           | which has become much much more important and reliable with
           | massive investment like a diesel generator, but the teams
           | haven't grown enough in maturity. One day they have a problem
           | with a server. A system admin is granted permission to go to
           | the bay, since remote desktop didn't work. They discuss the
           | problem in front of the bay. One leans on another bay that
           | was just between two locations. Wheels weren't locked. It
           | just flew across the room.
           | 
           | It was ok, just the power cord and a fee RJ45 torn. No
           | serious damage besides downtime.
        
         | m3kw9 wrote:
         | Mayo is around 1 part _water_ to 1 part oil ratio..
        
         | y_tho wrote:
         | It's 2008. A manager that just doesn't care anymore tells the
         | new IT person to replenish the fan mayo.
         | 
         | -"Why? I don't know why. It just works."
         | 
         | -"Is Hellmann's ok?"
         | 
         | IT person documents that Hellmann's is preferred.
        
           | bentcorner wrote:
           | And then later on Hellmann's is discontinued, so the company
           | solicits quotes for a mayo supplier.
        
         | Nextgrid wrote:
         | I've once used cooking oil as a thermal paste substitute.
         | Worked well enough and nothing went wrong.
        
           | unlaxedneurotic wrote:
           | That sounds too close to a fire hazard
        
             | CydeWeys wrote:
             | I doubt there's any component in a PC getting remotely
             | close to the ignition point of oil (which for canola is
             | 424degC). Plus, it's going to be a minimal amount of oil.
             | 
             | I'm more worried that the oil won't get to a high enough
             | temperature and thus won't polymerize, so it'll flow out
             | and ruin some other component, or go rancid, or something.
             | Thermal paste won't move on you. Oil will.
        
               | zhengyi13 wrote:
               | Oil will certainly move on you, but it might not destroy
               | your components, depending perhaps on the specific oil
               | chosen: you can actually buy or build fully oil-cooled
               | PCs.
               | 
               | https://www.pugetsystems.com/submerged.php as an example.
        
               | ponker wrote:
               | That's mineral oil, not cooking oil. Mineral oil doesn't
               | go rancid.
        
               | [deleted]
        
               | Scoundreller wrote:
               | Presumably there's enough surface tension from both sides
               | to hold it between the gaps and resist gravity (if it's a
               | vertical CPU).
               | 
               | Pretty much any Solid-liquid-solid or solid-solid-solid
               | interface will be better than solid-roomTemp&Pressure
               | gas-solid.
               | 
               | The whole point is to conduct heat better than air, and
               | most things will.
        
         | dsr_ wrote:
         | The (probably soybean) oil is a fine lubricant, but the
         | constant motion should cause the egg proteins to coagulate. How
         | long did it operate before you replaced the fan properly?
        
           | emeraldd wrote:
           | I can't help smiling at this analysis. It feels like
           | something you'd hear from a sci-fi story engineer working on
           | an rundown ship that just keeps going no matter what ...
        
             | staticvoidmaine wrote:
             | That book is Expeditionary Force
        
               | novaleaf wrote:
               | books 1 to 3 are awesome. after that the author forgets
               | how to advance the plot while still churning out a new
               | book more than once/year.
               | 
               | I gave up on book 7, so yeah, I tried sticking with it.
        
               | gknoy wrote:
               | That feels like the Honnor Harrington series to me. :-( I
               | thoroughly enjoyed the first N books I read (5? 6?), but
               | the next one or two seemed like watching a series TV show
               | that never resolves tension points, because if they did
               | there'd be no reason for Season N+1.
        
               | ethbro wrote:
               | I read the first 100 pages of the first book and then
               | literally threw it into a fire.
               | 
               | I believe the "Nope" sentence was something akin to
               | "{character} thought that {thing} because {thing}."
               | 
               | Jesus Christ. Would it kill you a little to show instead
               | of tell?
               | 
               | (And lest people believe I'm not a fan of some good *
               | opera, I'm not ashamed to admit I've read my fair share
               | of _BattleTech_ , _Barsoom_ , and even _Lost Fleet_ ,
               | among less highbrow works)
        
               | unclesaamm wrote:
               | You literally threw it into a fire?
        
               | ethbro wrote:
               | I stand by my decision. The world is a better place.
        
               | munificent wrote:
               | So you're saying the plot seized up and the author wasn't
               | able to engineer a solution?
        
               | novaleaf wrote:
               | In the forward of one of the books the author mentions he
               | quit his day job after the first book was a hit, so I can
               | understand his financial need to churn out more books.
               | 
               | Unfortunately there is little plot advancement, perilous
               | situations more contrived, and needless exposition/filler
               | the norm.
        
               | treeman79 wrote:
               | Expeditionary is my cleaning audiobook.
               | 
               | As in, if I'm cleaning and have nothing else interesting
               | to listen too.
               | 
               | Some occasional good laughs. Dinosaur holding a plunger
               | badge. :)
        
               | wojciii wrote:
               | I'm reading the series to the end no matter what.
        
               | garettmd wrote:
               | Well thanks for that recommendation. Added to my reading
               | list
        
           | Corrado wrote:
           | It probably lasted long enough to get a replacement fan
           | installed the next morning.
        
             | stronglikedan wrote:
             | Why would it need to be replaced now that its working
             | again? ;-)
        
               | eigenvector wrote:
               | That reminds me of the time I found an appropriately
               | shaped bolt installed in a fuse-holder - presumably
               | someone did not have a replacement fuse and improvised.
               | 
               | Except that it had been 5 years since the last
               | maintenance in this place and it was a protection panel
               | for a large synchronous generator in a power plant.
               | 
               | After you make a heroic temporary fix, please, ensure the
               | permanent fix is applied later!
        
               | yjftsjthsd-h wrote:
               | > After you make a heroic temporary fix, please, ensure
               | the permanent fix is applied later!
               | 
               | I've known people who would, depending on exactly how bad
               | the failure was, outright refuse to apply temporary fixes
               | precisely because they didn't believe that the business
               | would fix things properly if the issue wasn't forced. And
               | having watched how that particular company handled
               | things, I can't say that they were wrong.
        
               | shuntress wrote:
               | I've seen it said on this forum before and it also aligns
               | with my experience: _Most fixes that are 'just for now'
               | are actually 'forever'_
        
               | Ccecil wrote:
               | This chart seems relevant in this case.
               | 
               | https://images.app.goo.gl/db84Dmv3sqyVEwfz5
        
               | dylan604 wrote:
               | There's nothing more permanent than a temporary solution.
        
               | vangelis wrote:
               | It's a slow blow
        
           | coding123 wrote:
           | Maybe it was vegan mayo? Oh wait, 1999.
        
           | tempestn wrote:
           | Reminds me of the foosball table in our old engineering
           | students' society room. You can buy special lube for foosball
           | bearings. OR you can just rub popcorn butter on the bars.
           | Guess which one we had in ample supply.
        
           | kijin wrote:
           | The egg proteins are already quite coagulated. I'd be more
           | worried about the vinegar component. You need to neutralize
           | that acid with something.
        
             | tzs wrote:
             | It's also got corn syrup. Would that cause any problems for
             | this application?
             | 
             | Here's the ingredient list for Wendy's mayo: Soybean Oil,
             | Water, Egg Yolks, Corn Syrup, Distilled Vinegar, Salt,
             | Mustard Seed, Calcium Disodium EDTA (To Protect Flavor)
        
               | cnasc wrote:
               | > It's also got corn syrup. Would that cause any problems
               | for this application?
               | 
               | Over time, the server would expand to be 4U rather than
               | 2U
        
               | noir_lord wrote:
               | That's the problem with the FAT filesystem, it grows over
               | time.
        
               | Scoundreller wrote:
               | > corn syrup
               | 
               | That's how you get ants.
        
             | dsr_ wrote:
             | The egg proteins are coagulated but dispersed in the
             | colloidal solution. The motion brings them out of solution.
             | You can try it at home by warming up some mayo in the
             | microwave and then rubbing it between your hands: you'll
             | get a stringy oily mess.
        
               | asguy wrote:
               | To play this discussion out further: it depends on the
               | heat and it depends on the motion. I've made plenty of
               | Hollandaise (which is a sibling of Mayonnaise) in the
               | blender with "boiling" butter poured in, and it stays
               | quite hot... especially when it continues to warm on the
               | stove top.
               | 
               | If I microwaved it from cold, it would break almost
               | instantly.
        
               | ggrrhh_ta wrote:
               | Haha :-), I want this dialogue performed in Space
               | Janitors or something of the sort
        
       | Milank wrote:
       | No downtime is acceptable, but they have only one server?
       | 
       | What if a technical failure happen? What if there's a fire in the
       | server room? What if there is an earthquake and the building
       | collapses? What if... many things can happen that can result in a
       | long, long downtime with this tactics.
       | 
       | If uptime is so crucial, the system should be setup in such way
       | that moving one server should be a peace of cake, not a spec-ops
       | mission.
        
         | walrus01 wrote:
         | From an ISP perspective this seems like the sort of company
         | that orders one $250 a month business DIA circuit (at a price
         | point where there is no ISP ROI for building a true ring
         | topology to feed a stub customer) and has no backup circuit.
         | Then the inevitable happens like a dump truck 2km away with a
         | raised dump driving through aerial fiber and causing an 18 hour
         | outage.
         | 
         | Some circuits might average 5 to 7 nines of uptime over a year,
         | but the next year is dump truck time... You can never truly be
         | certain.
        
         | icedchai wrote:
         | Never mind these less common scenarios... What do they do about
         | Windows updates?
        
         | momokoko wrote:
         | You'd be shocked how rare downtime is with modern hardware. A
         | redundant power supply and SSDs in the right RAID configuration
         | typically will not have any issues for years until it can be
         | replaced by a newer model. Also, hardware monitoring is
         | significantly improved to the point where you'll typically know
         | if something will fail and can schedule the maintenance.
         | 
         | In the past power supplies and spinning disc hard drives would
         | fail much more often.
         | 
         | It's basically a solved problem, outside of extremely mission
         | critical, 5 nines kind of stuff, that we all forgot because of
         | AWS.
         | 
         | HN ran, and may still run, on a single bare metal server.
        
           | user5994461 wrote:
           | AWS and older hardware is no different. Set it once and it
           | keeps running for many years.
           | 
           | I've came across old AWS account (startup have been using AWS
           | for the longest). All the network traffic or VPN goes through
           | a single instance with 3 years of uptime.
        
             | bathtub365 wrote:
             | AWS EC2 instances or their host machines can fail at any
             | time and it's out of your hands.
        
               | ficklepickle wrote:
               | True fact! I recently had EC2 migrate my VM when the
               | physical server it was on reached EOL. If they had fired
               | my VM up again, I wouldn't have even noticed. They
               | didn't. Fortunately it had an EBS volume and I was able
               | to manually restart it without data loss.
        
           | marcosdumay wrote:
           | > HN ran, and may still run, on a single bare metal server.
           | 
           | I bet HN wouldn't do a 10 hours high-risk operation for
           | moving their servers because they can't afford an outage.
           | (But well, running stuff on a single bare-metal server is
           | expensive enough that even if they could, I expect they
           | don't.)
           | 
           | What would that company do if a pipe broke inside the
           | datacenter? Besides, if you never restart your servers, you
           | are guaranteeing that the one time when the power goes off on
           | the entire city, they won't come back online.
        
             | znpy wrote:
             | > I bet HN wouldn't do a 10 hours high-risk operation for
             | moving their servers because they can't afford an outage.
             | 
             | HN is probably not business-critical and could probably
             | affort a 10 hour downtime without much hassle.
        
               | TallGuyShort wrote:
               | The point is that they probably also wouldn't then insist
               | on a consultant doing an unreasonable migration and
               | threatening to not pay them if there was downtime. And
               | they probably wouldn't call around to other consultants
               | with the same requirements, apparently telling them that
               | the first consultant refused to do the job.
        
               | Scoundreller wrote:
               | > apparently telling them that the first consultant
               | refused to do the job.
               | 
               | While I don't think they informed them of this in good-
               | faith, it is a nice heads-up. In this case, it meant
               | Consultant2 consulting RefusingConsultant that probably
               | knew the IT better.
        
               | com2kid wrote:
               | It would be legitimately interesting if a 10 hour
               | downtime of HN was at all correlated to an increase in
               | github commits.
               | 
               | I _hope_ there wouldn 't be a correlation, but I wouldn't
               | be all that surprised if a somewhat loose one was found.
        
           | jayd16 wrote:
           | >HN ran, and may still run, on a single bare metal server.
           | 
           | HN also has downtime fairly often.
        
           | Johnny555 wrote:
           | Even in modern hardware there are plenty of single points of
           | failure.
           | 
           | Single server and "can't tolerate any downtime" are mutually
           | exclusive.
        
           | paulie_a wrote:
           | Quality hardware has existed for years. At a ford motor plant
           | they were doing an inventory and couldn't locate a 10 ton
           | mainframe. It was working so well for 15 or so years the
           | tribal knowledge of where it was physically located was lost.
        
             | ansible wrote:
             | Wow, that's impressive losing that big a piece of hardware.
             | 
             | Though it was likely easier to find than that Novell
             | Netware server that was sealed behind some drywall, with
             | only a stray network cable leaving any clue as to where it
             | was.
        
               | owenmarshall wrote:
               | Depends on how big the building is that houses it -
               | manufacturing IT can deal with impressive floor spaces.
               | 
               | I once only half jokingly suggested finding a missing
               | data closet in a two million square foot distribution
               | center by pinging a known IP from three or four
               | aggregator switches across the building and triangulating
               | the location on a floor plan. Sadly the people crawling
               | around the ceiling found it before I could put my idea
               | into practice.
        
               | pbhjpbhj wrote:
               | 2Msqft is c.430m x 430m for a square floorplan. Ping
               | resolution is 1us (microsecond). Speed of electrical
               | signal in cooper is about 0.8c. Gives a max resolution of
               | ~240m by my reckoning. If there are variances in the
               | switch+network delay it seems like you're going to
               | struggle to even say which side of the building it is.
               | 
               | Good job they found it!
        
               | owenmarshall wrote:
               | Hah! Good math. Based on the switch placement and the
               | building being more of a rectangle I figured "north side
               | or south side" would be as close as I could get. And when
               | we really dug in it was a classic last mile problem: the
               | first several core switches were well known, we just
               | needed to figure out where the last aggregate switch
               | went.
               | 
               | Turns out a door was closed and a new one built to a
               | hallway to another hallway and not properly labeled on
               | the updated drawings. Had one of the boxes running a
               | conveyor belt not have died, we'd never have looked.
        
             | Milank wrote:
             | This is all true, but you still can't rely on increased
             | hardware quality if you can't afford any downtime due to
             | moving (a one-time event) a server.
             | 
             | Also, that doesn't cover other problems mentioned here,
             | like natural disasters, ISP problems, etc.
        
               | nuker wrote:
               | > can't rely on increased hardware quality if you can't
               | afford any downtime due to moving (a one-time event) a
               | server.
               | 
               | Mainframe is not just a server. You can hot plug RAM on
               | these things.
        
               | hnlmorg wrote:
               | Often these kinds of SLAs are decided upon based on blame
               | rather than what is reasonably required by the customers
               | of that system. In this case, moving offices means the
               | downtime is due to internal reasons. But if an ISP goes
               | down or there is a natural disaster, then that isn't in
               | their control.
               | 
               | Also cost does come in play as well. Multiple physical
               | links in would be very expensive for what sounds like
               | internal services. Likewise a natural disaster might
               | cause bigger issues to the company than those internal
               | services going down. They might still have offsite back
               | ups (I'd hope they would!) so at least they can recover
               | the services but the cost of having a live redundancy
               | system off site might not justify those risk factors.
               | 
               | The customers requires are definitely unreasonable
               | though. I'd hope those systems are regularly patched, in
               | which case when is downtime for that scheduled and why is
               | that acceptable but not when you're physically moving the
               | server? I doesn't really make much sense; but then "not
               | making much sense" also quite a common problem when
               | providing IT services for others.
        
               | Milank wrote:
               | You are right, their SLA can be a bit different from what
               | we're talking about here (and expect).
               | 
               | In general, we don't know much about this case. It's a
               | post on Reddit, might not even be true. As is, it doesn't
               | make much sense, but we don't know all the details, so
               | maybe we jumped to conclusions.
        
           | Thaxll wrote:
           | Yeah that's how you end up with 3years uptime on some
           | forgoten servers... :)
        
             | closeparen wrote:
             | Which is why AWS instances should be no more than minions
             | in a load balancer pool, and any permanent state on an EBS
             | volume or a managed storage service.
        
           | closeparen wrote:
           | >hardware monitoring is significantly improved to the point
           | where you'll typically know if something will fail and can
           | schedule the maintenance.
           | 
           | There's SMART for disks... what else?
        
             | duskwuff wrote:
             | ECC for RAM is the other big one. A single-bit error will
             | trigger warnings, so that you can replace the faulty DIMM
             | before it progresses into uncorrectable errors.
        
               | Scoundreller wrote:
               | Is there a tool that can randomly take 128mb chunks of
               | memory out of the pool and test them around the clock?
        
           | walrus01 wrote:
           | Unfortunately complacency about how reliable modern hardware
           | is can lead to neglecting things like off site backups. And
           | other issues. Yeah your one big critical on premises server
           | may be super reliable. But what happens when the building is
           | flooded with 6 ft of water, catches on fire, is leveled in an
           | earthquake, or anything else?
           | 
           | If a function is super critical to business, it also deserves
           | to have some thought put into the blast radius of its
           | failure.
           | 
           | The sort of places that would insist on rolling a live server
           | 700 ft across a parking lot probably don't have any real
           | disaster recovery plan.
        
           | mwcampbell wrote:
           | Still, sooner or later, the data center will be hit by a
           | natural disaster, a DoS attack, a network problem, or the
           | like, and you'll have to be ready to move to a different one
           | to get your service back online. Or you'll have to reboot
           | your server to apply a critical kernel security update, in
           | which case you need to be ready to fail over to a hot
           | standby. So, since relying on a single server with high-
           | uptime hardware is penny-smart and pound-foolish, might as go
           | with a cloud-style architecture with commodity hardware.
        
             | chasd00 wrote:
             | I use to be fascinated with datacenters and would
             | masquerade as a customer prospect to get a tour and see all
             | the cool gear. I was asking one engineer about what they're
             | plan was for a tornado (this was at ThePlanet in Dallas TX
             | way back when) and they basically scoffed at the question.
             | A week or so later one briefly touched down about 1/4 mile
             | from them, I wonder if they thought about me when the
             | sirens were going off hah.
        
           | packet_nerd wrote:
           | Human error is a bigger cause of downtime than technical
           | failure or natural disasters. And in practice, a single
           | server like this tends to be a hand managed one-off which
           | only exasperates the human error component.
        
             | goodcanadian wrote:
             | s/exasperates/exacerbates/
        
               | pmiller2 wrote:
               | It's probably a bit of both, TBH. ;)
        
         | YetAnotherNick wrote:
         | You wrote one server but describe the failure modes of having
         | one data center. I think it is very very uncommon and hard to
         | allow for data center level issue. After all Instagram and 100
         | other site failed when one AWS data center went down. I would
         | interested to know how/whether anyone's backend will work if
         | any data center and its databases completely fails due to
         | fire/earthquake/networking etc.
         | 
         | Second thing is having multiple machines for server. In theory
         | it might help in increasing the availability but in practice I
         | haven't seen any random issue due to machine which occurs just
         | based on probability. I think almost all failure modes that
         | exist, they are correlated between machines. eg suppose you
         | have data loss on one machine, you could more likely than not,
         | blame it on code and it would be similar across machines.
        
           | toast0 wrote:
           | Re: single datacenter. At the basic level, you need a second
           | datacenter with enough machines to provide your service (or a
           | emergency version at least), replication of data, and a way
           | to switch traffic. It's doable, but expensive in capital and
           | development. If you're dependant on outsourced services, they
           | also need to be available from both datacenters and not
           | served from only one. In an ideal world, your two datacenters
           | would be managed by different companies, so you would avoid
           | any one company's global routing failure (IBM had one
           | recently).
           | 
           | Re: multiple servers. Power supplies fail, memory modules
           | fail, cpus fail, fans fail, storage drives fail. Sometimes
           | those are correlated --- the HP SSDs that failed when the
           | power on hours hit a limit (two separate models) are going to
           | be pretty correlated if they were purchased new and stuck
           | into servers at a similar time and then on 24/7. Most of
           | those failures aren't that correlated though. Software
           | failures would be more likely to be correlated though, of
           | course.
           | 
           | The key thing is to really think about what the cost for
           | being down is, how long is acceptable/desirable to be down,
           | and how much you're willing to spend to hit those goals.
        
             | YetAnotherNick wrote:
             | > In an ideal world, your two datacenters would be managed
             | by different companies, so you would avoid any one
             | company's global routing failure
             | 
             | I can't understand this. I think transferring servers would
             | be the the least of problems. Its the transferring of
             | database and maintaining consistent version of databases in
             | both the locations. Moving the snapshots after every X
             | minutes doesn't maintain consistency. I would like to read
             | about any company that is able to do this, as honestly it
             | sounds really hard to me. Is there any writeup of IBM thing
             | you mentioned?
        
               | toast0 wrote:
               | Re: IBM outage
               | 
               | https://news.ycombinator.com/item?id=23471698
               | 
               | TLDR is connectivity to and from the IBM cloud
               | datacenters (which includes softlayer) was generally
               | unavailable, globally, for a couple hours. If you were in
               | multiple IBM datacenters, you were as down as if you were
               | in only one (mostly, I was poking around when it was
               | wrapping up, and some datacenters came back earlier than
               | others).
               | 
               | > Its the transferring of database and maintaining
               | consistent version of databases in both the locations.
               | Moving the snapshots after every X minutes doesn't
               | maintain consistency. I would like to read about any
               | company that is able to do this, as honestly it sounds
               | really hard to me
               | 
               | The gold standard here is two-phase commit. Of course,
               | that subjects every transaction to delay, so people tend
               | not to do that. The close enough version is MySQL (or
               | other DB) replication, monitor that the replication
               | stream is pretty current and hope not a lot is lost when
               | a datacenter dies. There's room to fiddle with failover
               | and reconciliation; I recommend against automatic
               | failover for writes, because it gets really messy if you
               | get a split brain situation --- some of your hosts see
               | one write server available and others see another, and
               | you may accept conflicting writes. A few minutes running
               | like that can mean days or weeks of reconciliation, if
               | you didn't build for reconciliation.
        
         | wastedhours wrote:
         | We used to have one server for a website I was a content guy on
         | - it was in a standard PC case, plugged into a switch in the IT
         | team's office (this was not a tech-centered org).
         | 
         | The main IT guy went on holiday and one of the cover guys from
         | another office decided to tidy up. He unplugged the server and
         | thought (and told me after his thought process) "if anyone was
         | using it, they'll let us know".
         | 
         | This was the one, single box for the whole website - no one
         | else was monitoring (even though the central office had a
         | proper, dedicated web team) and the assumption was I was
         | sysadmin.
         | 
         | An hour later I'm sprinting down the corridor to find out what
         | the hell happened and why I can't even SSH into the box.
         | 
         | We put a sticker on the case saying not to unplug it after
         | that...
        
         | coldcode wrote:
         | I worked at my last job for a place with a single rack mounted
         | set of Windows servers at a data center - with no backup power
         | supply, no backups of any kind for that matter, no UPS and no
         | redundancy of any system, plus they didn't even have an admin
         | for 6 months. The CEO refused to spend money on a 2nd anything.
         | The company has 2000 employees. One server held all of the
         | companies photos (which is basically the core of the business)
         | and of course was not backed up.
        
           | elliekelly wrote:
           | This is the kind of company that could benefit immensely from
           | a ransomware attack.
        
           | Milank wrote:
           | Of course it can work, you can get far with one server and no
           | spending on anything like backups, UPS, etc.
           | 
           | Whether it's smart and good for your business/reputation is a
           | different question.
        
           | closetohome wrote:
           | My boss refused to use UPSs for years because he bought one
           | once and couldn't get it to stop beeping.
        
         | misiti3780 wrote:
         | or even better, how do they apply OS patches?
        
         | galoisgirl wrote:
         | > Should have been a 5 minute job if done correctly. Owner
         | ended up paying for over 10 hours of work. Stupidest thing I've
         | ever had to do.
         | 
         | You can see the common sense ship has sailed.
        
         | redwood wrote:
         | Remind me of how IBM positions mainframes: they are so highly
         | available that you simply never let them shut down.
        
           | lasereyes136 wrote:
           | IBM Mainframes are designed to be serviced while running so
           | if you have multiple CPUs you can offline one at a time for
           | upgrade it without the whole mainframe going down. Big Sun
           | Solaris boxes where built like at as well.
           | 
           | If your mainframe had only one CPU, you did have to turn it
           | off in order to service it. But you could upgrade the OS
           | without turning it off. While they aren't cool tech now,
           | mainframes are a marvel of hardware engineering.
        
           | chasd00 wrote:
           | plus, i would imagine turning them on and bringing them
           | online isn't just a press of a button.
        
             | MrMorden wrote:
             | It's not. https://web.archive.org/web/20190324191654/https:
             | //www.ibm.c...
             | 
             | (archive.org link because ibm.com apparently isn't hosted
             | on a mainframe.)
        
         | gear54rus wrote:
         | He should have taken it offline without notifying this brain-
         | dead manager. Probably wouldn't have noticed lol.
         | 
         | And then charge for those 5 hours for good measure.
         | 
         | In general, this stupid trend of wanting 0 downtime makes no
         | sense to me. If you're not NASA, police or other emergency
         | service you 100% can afford a few hours of downtime with
         | scheduling it be forehead.
        
       | webscalist wrote:
       | Could've been cheaper to buy/rent another server, put it on the
       | new location, set up redundancy/replication, power off the old
       | server, move it to the new location, return the new server. Or
       | just keep it for sanity.
        
       | tzs wrote:
       | I needed to restart a server where I worked. My boss was
       | complaining about the revenue loss during the down time. I knew
       | the revenue loss (if there even was any, as opposed to a couple
       | of minutes of revenue simply shifting to a few minutes later...)
       | would be well under a dollar.
       | 
       | So I listened to him whine for a couple minutes, then tossed a
       | dollar on his desk, told him that would cover it so he could shut
       | up now, and rebooted the server.
       | 
       | Warning: you should probably only try this if you are good
       | friends with your boss. That boss had been my best friend for
       | years before I came to work for his company.
        
       | PinguTS wrote:
       | It's been done 7 years ago even using public transport.
       | 
       | https://www.reddit.com/r/uptimeporn/comments/1kf26r/moving_a...
        
         | jagermo wrote:
         | The most dangerous part is them expecting the 3G to be
         | available during the subway ride.
        
           | deftnerd wrote:
           | I'm surprised they weren't stopped by police to investigate a
           | very suspicious heavily loaded cart on the Subway. It easily
           | could have been 300lbs of explosives on that cart.
        
           | mercora wrote:
           | in Germany mobile networks work just fine in the subway as
           | ISPs have deployed hardware there. I actually have more
           | issues with the network when using classical railroad
           | transport...
        
         | csours wrote:
         | I really thought this post on HN was going to be that story.
         | Thanks for digging it up.
        
       | kuon wrote:
       | When I was younger (read 20 years ago), I did crazy things like
       | that, not over that long distance, but moving live servers in
       | different racks.
       | 
       | Now that I am older, I don't think I would do it anymore, too
       | much stress for a small reward. Also today, most of the time, I
       | am able to "talk out" customers of crazy requirements, while I
       | would just have said "OK let's do it" in my younger years.
        
       | jfcorbett wrote:
       | Reminds me of the time where IT at a previous employer told us
       | that due to a "new IT strategy", our production cluster that had
       | been sitting comfortably in the basement for years had to be
       | moved to an "approved IT hub facility"... in another office 500
       | km away and across the North Sea.
       | 
       | There was downtime.
       | 
       | Promptly after our cluster settled into this wonderful new
       | facility, a cooling pipe in the ceiling leaked on it, frying 1/3
       | of our nodes.
        
         | yjftsjthsd-h wrote:
         | On a personal selfish level I was quite happy to see our
         | workloads moving to datacenters that we couldn't (reasonably)
         | physically access, because it replaced "can you go drive to the
         | DC and replace a failing disk" with "we put in the request for
         | smart hands to replace the failing disk". Of course, there's
         | some notable tradeoffs, but it makes me feel better when the
         | business decides to do such things...
        
       | merb wrote:
       | meanwhile in germany, german telekom have their connect ip lines
       | (leased lines..., company internet..) shutdown since tuesday
       | morning. so a downtime of over 48 hours, besides a sla that no
       | downtime will be longer than 8 hours and a availability of 99,9%.
       | 
       | what a crazy world.
        
       | imglorp wrote:
       | The moving server on cart part made me nervous. If there was any
       | rotating rust in there, bouncing across the parking lot would
       | make things difficult for flying heads. I'd have hand carried it
       | from stage to stage, setting it on a padded cart each stop,
       | treating it like sweating TNT.
        
       | neya wrote:
       | Dude has ONE server and talks about having 0 downtime for his
       | clients? What the hell?!
       | 
       | In a way, this is Darwinism for the IT industry and I'm happy the
       | people involved got paid well. Due probably paid as much as it
       | woulda costed him a new server. I bet he'll never forget this
       | lesson.
        
       | geocrasher wrote:
       | I once shut down a PC, moved it to another desk, and it wouldn't
       | power back on. Another time I moved a server to another rack. It
       | had 2 years uptime. Had to power it down, and it wouldn't power
       | back on. Both required PSU replacements. Had I moved them _while
       | powered on_ I can only imagine the fun times.
       | 
       | Perhaps they should have just told the customer they couldn't
       | find it:
       | https://www.theregister.com/2001/04/12/missing_novell_server...
        
       | Johnny555 wrote:
       | Decades ago an ISP I was colocated at did the same thing. I don't
       | remember the exact details, but it was a DNS server and they
       | either couldn't log in or were relying on the zone files cached
       | in memory or something but for some reason they couldn't power it
       | off.
       | 
       | It was already plugged into a UPS, but they had to cut one of the
       | posts off the rack to get the server out without unplugging it,
       | then they plugged that UPS into a bigger UPS on a cart and
       | wheeled it to the new data center they built out in the building
       | next door.
       | 
       | The world was much different at the time -- this coloc provider
       | had a good reputation, yet.... they had a keg of beer in the
       | corner of the server room and a stack of adult magazines in the
       | men's room.
        
       | kyuudou wrote:
       | It's called vMotion
        
       ___________________________________________________________________
       (page generated 2020-08-05 23:00 UTC)