[HN Gopher] Moved a server from one building to another with zer... ___________________________________________________________________ Moved a server from one building to another with zero downtime Author : huhtenberg Score : 797 points Date : 2020-08-05 10:44 UTC (12 hours ago) (HTM) web link (www.reddit.com) (TXT) w3m dump (www.reddit.com) | noobermin wrote: | Sorry but this is ridiculous. It's a great story of a feat of | sysadminery, but the client should have just accepted some down | time, even a few hours. The level of entitlement from some | clients people get is just infuriating. Even down to calling him | back for not agreeing to help, what an infuriating person. | | That was my main take away from this. Endeavor to be the sort of | person who can refuse clients, the entire idea that "the customer | is always right" enables so much ridiculous behavior. | arethuza wrote: | This reminds me of a small company I joined many years ago that | did deployments by RAID - find a working server (possibly at a | customer site) swap in a blank HD, wait for it to rebuild then | take it and put in a new server and repeat the process. | | Like finding people who argue against revision control systems, | it's really quite a challenge convincing people why things like | this are a bad idea - after all "it works!". | yjftsjthsd-h wrote: | That's... actually fascinating, if in a slightly insane way. | There's pets, there's cattle... and apparently there's a herd | of cloned pets, which I'd somehow never considered before:) | exabrial wrote: | Reminds me of the "hot slide" technique used for old telephone | switches | ansible wrote: | They did some crazy stuff in the old days. Like when they moved | a telephone exchange live... the whole building. | walrus01 wrote: | Sort of on the subject, i've seen a brochure for a specialty | product marketed to law enforcement. It's meant for use with the | seizure of live, powered on desktop PCs and similar that have a | high likelihood of full disk encryption. | | Essentially it's a medium sized double conversion ups, with a | really high quality sine wave inverter, and some electronics that | can match phase with a live 120vac 60Hz circuit. And a tool kit | which consists of the insulated electrical hand tools needed to | do a midspan removal of the cable jacket and splice into the | wires in an ordinary PC power cable. The person using it is of | course supposed to be trained in advance, and competent at the | process of attaching the UPS to the live circuit. | throwaway744678 wrote: | Wouldn't it be safer to open the case and connect some kind of | battery + adapter _after_ the power supply? | walrus01 wrote: | Splicing into the many wires that is an atx+12v power | connector, between the output of the power supply and the | motherboard is way more fiddly than just dealing with the hot | and neutral on an ordinary $5 PC power cord. You could also | never be certain what weird ziptie and cable management | system (or lack thereof) might exist in a home built x86 PC | case, or if there's any room for hands to work at all... | | I think the thing I saw is also meant to deal equally well | with a commodity x86 PC built from parts, or an Intel NUC | size thing, or a corporate desktop machine with proprietary | internal wiring like a slimline Dell, Lenovo, HP, etc. | ajross wrote: | Safer for the operator? Sure. But certainly not for the | device, if you're trying to keep it operating. An ATX power | supply has 24 pins at 5 different voltage levels (plus any | auxilliary power connectors for the GPU and drives, etc...), | and motherboards are a lot less tolerant of spikes and | transients than the PS on the other side. | | Dealing with AC power isn't really that dangerous if you're | careful. | TheSpiceIsLife wrote: | Even high voltage and high amperage AC isn't dangerous. | | So long as you're not earthed | https://imgur.com/gallery/B2c5FfD | dhosek wrote: | We had an electrician of questionable licensing do some | minor work for us (replacing some switches and outlets). | I asked him to tell me when I should go down to the | circuit breaker to turn off the electricity and he told | me not to bother. He did all the work with hot current | running through the wires. I stayed close enough to be | able to tell if I needed to call 911 but no closer while | he worked. | 0xffff2 wrote: | I've done a ton of electrical work for my own benefit | over the years and I'm perfectly comfortable doing things | like swapping switches with live wires. I've never once | had a problem. The one and only time I've fucked up was | when I cut a run of romex cable that I thought _had_ been | turned off. | | Lesson learned: electrical wiring is like a gun. Always | treat it like it's on, and if you have to do something | would be unsafe if the wiring is energized, make damn | sure it's de-energized before proceeding. When you're | working in that mindset already anyway, flipping the | breaker for something as simple as swapping a | switch/outlet hardly has any benefit. | allannienhuis wrote: | I apprenticed with my Dad. The first two rules he taught | me have stuck with my my whole life: | | 1) Treat every wire as if it was hot. Even if you know | it's not. 2) A good electrical connection must first have | a good physical connection. | | Not sure why that second rule sticks with me :) but there | has been more than one occasion when I'm fairly sure the | first rule has saved me from a bad shock. And you're | right - treating the wires as if hot means you can | actually work with hot wires for a lot of simple things. | | I still turn off the breaker though :) | nucleardog wrote: | The second rule is a great one that so many people doing | their own work miss. | | The wire nut is only there to stop the wires loosening | over time and provide some basic insulation. It is not | there to actually attach the wires. When you twist your | wires together, they should be attached well enough on | their own that you'd be comfortable throwing a piece of | electrical tape over them to stop them shorting to the | box and leaving it as-is (but don't do that). If the only | thing keeping them together is the wire nut and you being | very gentle when you manipulate them back into the box, | they're not actually connected. | | The poor physical connection creates a poor electrical | connection. A poor electrical connection has resistance | which creates heat. Heat creates fires. Even better after | a few years when enough traffic has driven past your | house and enough people have moved around inside of it | and the wires have wiggled to just barely in contact so | occasionally when someone walks down the hallway the | lights will all flicker as the wires create some pretty | electrical arc light shows, adding carbon buildup to the | wires and further increasing the resistance and heat | concentrated in the one tiny point of the copper where | they're still sometimes connected. | | No reason at all for this rant. Definitely not a real | example at all. Definitely didn't waste an afternoon with | a toner, a drill with a pilot bit, and a borescope to | hunt down the six octagon boxes someone had sealed into | the basement ceiling hiding away some of the shoddiest | wiring I'd ever seen. Nope. | Scoundreller wrote: | This makes me feel bad. As a kid, I remember holding | light switches at _just_ the right point to hear the | buzzing (arcing)? inside. At least if the contacts were | carbonizing, there wasn't a lot flowing through them | closed. | beatrobot wrote: | I had an electrician add a breaker to the main panel | while it was still live, no protection or gloves, | nothing. I was also terrified. | nucleardog wrote: | Sometimes you do what you've gotta do. | | I'm not a nut that does everything with the power on--I | kill any branch I'm working on and double and triple | check with a non-contact voltage detector before I stick | my fingers into anything (which saved my bacon the one | time when the hot from a different branch of the same | phase ended up connected to a neutral wire for a plug | with no connected ground leaving it showing 0V on a | multimeter in any configuration and still being live with | the breaker off; that house was a mess). However our | current dwelling has no main cut-off for the power. If we | wanted to turn off power to the panel we'd need to get | the power company out to pull the meter from the socket. | | In a mostly full panel the bus bars are pretty much | completely covered by the breakers anyway. You'd have to | work pretty hard to come in contact with them. And the | wires you're working with (besides the ground) are | insulated anyway so no issue if they brush up against | something. | | The only thing that's _slightly_ butthole puckering is | chasing the uninsulated ground wire through the panel | down to the neutral bus. | | And yeah, done without gloves because weighing "safety | when I make a mistake" versus "greater dexterity so I'm | much less likely to make a mistake" I prefer the latter. | The protection is rubber soled shoes and keeping one hand | tied behind my back so the electricity has no path | through me. | saltcured wrote: | Ha, that's nothing. I once watched a stubborn guy replace | the bus bars in the input panel of a house. He did wear | rubber gloves and boots and stand on a plastic stool. | But, this is a kind of job where you are operating a | socket wrench on the clamps holding down the bare ends of | the thick direct-burial power cables, then wrestling the | ends of the cable out of the way to unscrew and remove | the bus-work from the panel chassis. | | He did this without notifying the power company, so those | supply lines were hot with 240V residential service. The | weather shifted and a light mist started falling before | he was done. Like another poster above, I was thinking I | need to be ready to call 911, but wanting to be far | enough away not to be hit by splattering metal or any | surprise voltage gradients in the soil. | myself248 wrote: | I accidentally replaced an outlet and added a switch to a | circuit that was still energized. I had turned off the | wrong breaker, and failed to confirm it before I started | work. | | But, careful work habits and some tools that happened to | be insulated anyway, meant that I was never bridging two | different potentials. The job went flawlessly and I only | noticed when I plugged the outlet tester into it at the | end, expecting to go turn the breaker on and come back | and look at the lights... but the lights were already lit | up. | mschuster91 wrote: | In Germany this is called "Arbeiten unter Spannung" and | perfectly legal if qualified | (https://de.wikipedia.org/wiki/Arbeiten_unter_Spannung). | dhosek wrote: | The electrician was Croatian and, I presume, learned his | trade there. It still terrified me. | bluGill wrote: | Working on hot wires is no problem. Ground wires scare me | and I'll turn off the main breaker before I touch them. | You can never be sure what ground is really at. | oilman wrote: | I was once working for a small company building | electrical equipment. We mostly worked on "medium | voltage" equipment, you know 2400 to 69000 VAC. | | For one project we had large banks of ultracapacitor in a | cabinet. Fully charged it was around 1200 VDC. This thing | was in the prototyping stage, and we were testing a | control system on a Saturday morning. | | So we charge it using a large AC/DC converter, fully | charged, everything worked beautifully. We start a | discharge cycle converting the DC back to AC. Uh oh, it | starts pulling way too much current. Flames start to | shoot out of the AC/DC converter. Fuck. BANG. Fuse blown. | | We assess the damage... the AC/DC unit is totally shot. | And someone (me) is going to have to analyze what caused | the failure. Otherwise everything with the capacitor | cabinet seems okay, but the thing is still charged to | 1090 VDC and the fuse is blown. Check with the mechanical | engineer that designed the cabinet. Turns out the fuse | can't be changed (can't be accessed) while the cabinet is | charged and the cabinet can't be discharged because the | fuse is blown. Well that isn't good. | | The only thing we could do was discharge it into a load | bank (think large toaster) by connecting something | directly to the copper busbar live at 1090 VDC. So one of | the commissioning guys volunteered. He put on some high | voltage gloves, stood on a plastic mat, and connected | some jumper cables someone had in their car to the bus | bar. He stepped back and someone else threw the switch on | the load bank and it discharged without incident. | | There were some design revisions after that. | regularfry wrote: | Cases can have "case open" switches that tell the machine to | switch off. You can't necessarily tell beforehand. | jrootabega wrote: | Case intrusion alarms (built in or Homebrew) | bob1029 wrote: | How do they deal with the loss of network connectivity? | | I could pretty easily write a script that forces my machine to | reboot and do all manner of other things if some sort of | network change is detected. | numpad0 wrote: | Or motion, inactivity, vibrations in the room, etc. But | that's for another product/specialist I guess? | reaperducer wrote: | There used to be an OS X program that would lock the | computer if it detected motion. As long as a trusted | Bluetooth device was paired, the computer was fine. But if | the device left range and someone touched the computer, it | locked. | | There was also one that would use the motion detector to | try to detect if the device was falling, and park the hard | drive heads before impact. | walrus01 wrote: | I don't believe that specific product addresses it at all. | Undoubtedly the persons operating the kits have put some | thought into it, but given the myriad of possible LAN | configurations and types of software deadmans switches, it | must be a difficult problem to solve. | CydeWeys wrote: | You _could_ , but what % of running servers actually have | such safeguards in place? I'd say almost none of them. | discordance wrote: | The next Dread Pirates Roberts would be interested in this | safeguard | miles wrote: | HotPlug Field Kit https://www.cru- | inc.com/products/wiebetech/hotplug_field_kit... | | "With the CRU WiebeTech HotPlug you can transport a computer | without shutting it down. | | "The HotPlug allows hot seizure and removal of computers from | the field to anywhere else. The HotPlug's patented technology | keeps power flowing to the computer while transferring the | computer's power input from one A/C source (such as a wall | outlet or power strip) to another (a portable UPS) and back | again. | | "We created this product for our Government/Forensic customers, | but it has IT uses as well. Need to move a server without | powering it down? The HotPlug can do it. | | "It's great for digital forensic investigators and techs who | can't risk losing access to data on a running computer. With | many computers now employing full-disk encryption, shutting | them down poses the risk of having to crack a password after | moving the computer to a lab for analysis, which can greatly | increase the time and expense of an investigation. When | combined with a WiebeTech Mouse Jiggler, you also won't have to | worry about the computer entering password-protected | screensaver or sleep modes." | Scoundreller wrote: | Time to geo-fence the servers with an external GPS antenna | (often useful for time-sync). Or maybe FM signal strength | locks? | jlgaddis wrote: | Search for "HotPlug" on YouTube. | dghughes wrote: | I thought about HotPlug too. And the obligatory Seinfeld | Frogger scene (become much less familiar to younger folks). | | HotPlug must only work in countries with terribly designed | plug outlets like the US and Canada. Our NEMA 5-15 plugs are | live when the plug's hot (electrons be here) and neutral | (return to sender) blades are still visible. I don't think | this device could work in the UK I'm not from there but I | think their plugs can't be live with exposed plug blades. | | https://www.cru- | inc.com/products/wiebetech/hotplug_field_kit... | Scoundreller wrote: | Just need to carefully expose the wiring in the cable | itself then. Or yank out the socket and connect to the | wires there before snipping and shipping. | | Unplugging just enough to expose the prongs is risky | because the point where contact is lost will vary from | receptacle to receptacle. | | Chances are things are plugged into a multi-plug hub | anyway. European homes are especially lacking in sockets in | my experience. | mercora wrote: | linked below is an old advertisement/demo video of a similar | device or maybe even the one you mentioned :) | | https://www.youtube.com/watch?v=-G8sEYCOv-o | walrus01 wrote: | Very similar, yes | jimmaswell wrote: | I'd have thought plugging something into the outlet and | unscrewing the outlet to take with you would be more convenient | than carefully splicing wires just enough not to disconnect | them. All the easier if it's on a power strip. | Scoundreller wrote: | Sometimes they are on different circuits. | huhtenberg wrote: | In a similar vein, there are USB gadgets that emulate a mouse | that keeps on jiggling, to prevent the machine from locking out | on user inactivity. | | However, there are anti-jigglers too that lock the machine when | any new human input device is plugged in. | | http://codefromthe70s.org/antijiggler.aspx | vangelis wrote: | That's where the analog mouse jiggler comes in. Apparently | watche faces work quite well for for optical mice. | reaperducer wrote: | I've read that on HN before, and tried it a few months ago. | It didn't work. At least not with an Apple Magic Mouse and | my wife's desk clock. | reaperducer wrote: | If you ever come across a jiggle-and-click gadget, let me | know. Some of the computer activity trackers I've seen lately | require the user to click every so often, so plain jigglers | are no longer effective. | mike_d wrote: | Get a USB Rubber Ducky and script it to send something like | Mouse Button 7. The click event registers but it isn't | associated with an action except in super advanced CAD | software. | warrenm wrote: | They should call it the Jiggle-No | ace32229 wrote: | You will be seen as active (including on comms software (at | least the ones I've tried)) if you have any sort of video | playing e.g. Youtube in an active tab. Quite handy. | kawsper wrote: | That's interesting. | | You could have a list of known USB device IDs you trust, and | if a newly plugged in USB device wasn't on that list you | could lock or power down. | blibble wrote: | easy enough to fake the device/vendor ID, then abuse bugs | in the driver/implementation | kawsper wrote: | Yes, if your attacker knows which device/vendor IDs you | have on your list it won't work. | warrenm wrote: | this is a pretty common practice on many (if not all) | government networked devices | | that...or the USB port is permanently blocked (saw that | when I was at a finserv years back: all USB ports (except | the one the mouse plugged into) were epoxied | spatley wrote: | I have seen security minded IT to go so far as requiring | laptops with PS2 mouses and epoxying all the USB ports. | icedchai wrote: | So are they keeping a stock of laptops from the 90's? | Basically no modern laptops have PS2 ports. | reaperducer wrote: | Order enough of them, and manufacturers will give you | whatever you want. | icedchai wrote: | If you're big enough, sure. Why not order laptops without | USB ports, instead of epoxying them then? | ThePadawan wrote: | That is a policy I heard to be used in already not- | extremely-secure environments like software development at | a bank (completely isolated from production environment). | | They didn't go so far as to cause alarms on unknown device | ids, but devices would just not be mounted if they were not | whitelisted. | walrus01 wrote: | About 13-14 years ago some parts of the US DoD resorted | to hot glue gun filling all the usb ports on desktop PCs, | except for the two ports required for the keyboard and | mouse. | | This was during the windows XP era when it seemed there | were an endless number of security problems related to | usb devices, no matter how good the group policy and | registry settings pushed via active directory membership | were. | sysadmindotfail wrote: | >About 13-14 years ago some parts of the US DoD resorted | to hot glue gun filling all the usb ports on desktop PCs, | except for the two ports required for the keyboard and | mouse. | | Here's a current story: | | Someone ordered the wrong desk phones at your large | company? | | 1.) Assemble your crew. Go to various departments and | recruit non-technical people. | | 2.) Task them with disassembling 1000 desk phones. | | 3.) Hot glue USB port on phone shut. | | 4.) Reassemble 1000 desk phones. | sneak wrote: | Is the disassembly and reassembly just for more billable | hours? Seems to me you could fill user-accessible USB | ports with hot glue without it, same as a user could fill | it with an unauthorized USB device. | kevin_thibedeau wrote: | My company stayed on NT4 until 2008 because it didn't | have USB support. Network was fully locked down and any | unknown MAC would cause an immediate search by IT. | shifto wrote: | Also, performance must have been amazing using Office '97 | on current day desktops. | noir_lord wrote: | Did they also remove the MAC address info off the back of | everything because spoofing a MAC is fairly trivial. | icedchai wrote: | They probably did. The sort of IT folks that would run a | decade old OS are the same kind that would resort to this | sort of security theater to "lock down" their network. | Capturing MAC addresses off a device is pretty simple if | you don't mind a little bit of connectivity loss during | the process. | swalsh wrote: | What does that solve though? I don't NEED a mouse to copy | data. | jrott wrote: | It doesn't solve for an outsider or malicious employee | getting access to a machine. What it does solve for is an | employee plugging in a compromised usb device on accident | since they probably won't unplug there keyboard or mouse | for it. | dhosek wrote: | It solves the "I found this USB stick in the parking lot | --let me plug it in to see what's on it" problem. | strbean wrote: | They'll just unplug the mouse and plug in the drive to | see what happens! | icedchai wrote: | Sure, if they don't have a USB hub sitting around. | muttled wrote: | If my experience with users holds true, they'll abandon | the quest at the first obstacle and the USB will | harmlessly sit in a desk drawer for the rest of time. | dhosek wrote: | The closest thing to a USB hub I've got is one of my | external drives for my Mac Mini has a built in USB hub so | I can plug stuff into that as well as directly into the | computer. The last time I worried about such things was | back when desktop computers only had one or two USB | ports. Plus, in a DoD situation, I'd imagine that having | your own USB hub plugged into a DoD computer would be the | kind of thing that could put your job at risk. A friend | who teaches at the Naval War College often laments the | unusability of DoD IT because of the level of locking | down, but any "Why don't you do X?" suggestions have a | response of "I'd get fired." | | The safeguard doesn't need to be perfect, it just has to | be good enough. | acdha wrote: | It solves two problems: one is someone covertly or | foolishly plugging in an untrusted USB device (which | might be easily missed on, say, the back of a desktop) | and it means that checking to make sure that only a | keyboard or mouse are attached is as simple as putting | tamper-evident seals on those cables. | | Attempting to authenticate USB devices is a very hard | problem -- a sufficiently advanced attacker can spoof | manufacturer and device IDs, even if you lock things down | to prevent anything other than a keyboard or mouse it's | possible to send keystrokes to open the wrong website, | there's always a chance of an exploitable flaw in your | USB stack, etc. -- but anyone diligent can be paid to | walk around every week checking to make sure that a seal | is solid and the tamper-evident stickers have the same | serial number as listed on the inventory. There is a real | value in having things where the failure modes are | obvious and intuitive. | sramam wrote: | I'd think guardrails like this also serve at a | psychological level - as in "this is a secure machine, | don't try to break rules". | | While these second order effects are immeasurable, they | are quite tangible in my personal experience. | huhtenberg wrote: | They could've glued ALL usb ports and simply plugged mice | and keyboards into PS/2 sockets. | 3pt14159 wrote: | That's what my alma mater, the University of Waterloo, | did for some of our labs when I attended. Then at some | point something must have happened and they moved _all_ | the electronics into the PC case and only the wires of | the mouse, keyboard, and monitor came out of these little | openings. | withinboredom wrote: | There was a virus directed at DoD machines going around | via USB devices. PITA to get rid of too... | bnastic wrote: | I have not yet seen this implemented anywhere in banks. | HID devices are fine, but anything else USB (esp. | storage) is locked out completely. One of those banks | wouldn't even let temp staff send emails out of the bank | from their work account. | | (Due to various disability acts they can't really do it | either, as the employer must provide their staff with | hardware they require, e.g. ergonomic keyboards and mice) | ThePadawan wrote: | That sounds really the wrong way around - the worst | offenders in USB malware surely are flash drives that | declare themselves as keyboards and input preprogrammed | keyboard events (like the USB Rubber Ducky [0])! | | (For your parenthetical I should clarify - it wasn't the | case that it was _impossible_ to whitelist other devices, | it just had to be done on a case-by-case basis. I.e. you | would call IT and say "Jen from accounting at machine | foo123 needs her new ergonomic mouse to be recognized" | and they would remote in, tell Jen to unplug and replug | the device and whitelist that exact USB device id on that | exact machine.) | | [0] https://shop.hak5.org/products/usb-rubber-ducky- | deluxe | bnastic wrote: | It may be so, but I'm talking from experience - as a | keyboard geek I have, over the past ten years, taken all | sorts of weird keyboards (and mice) into various big | banks with not a hint of trouble. USB storage, on the | other hand, qualifies for an instant termination. | jcrawfordor wrote: | I once basically spent a summer doing this, not over a parking | lot but to consolidate the remaining equipment in a large number | of racks into a few new ones - this was a former sales office of | a megacorporation that had been built to have its 1970s-era | computer room proudly displayed through windows into the main | conference room, a very weird setup without the context that in | said '70s that conference room was used to pitch prospective | customers on business automation. | | Anyway, by the time I was there it was still a '70s-vintage large | computer room but now massively overprovisioned on space, | cooling, etc, particularly with most IT functions having moved to | corporate. A decision was made to repurpose part of it as a test | lab and move all the actual remaining equipment to three racks in | the corner. | | I'd do about two servers a day in between other things, taking | advantage of redundant power supplies to transfer the PSUs one at | a time to extension cords, swap to a long network cable fast | enough that TCP sessions probably didn't time out, and then | unrack onto a hydraulic lift card and do the same procedure the | other way. | | I presented this at the start as far from a guaranteed strategy - | that it would minimize downtime but there would inevitably be | some due to mistakes. None of this was really that critical. | There were a few devices that were pretty old and poorly | maintained, we agreed up front that if these lost power for some | reason and then failed to boot, we would just say they'd lived | long lives and purchase replacements. | | I guess the point is that this whole situation was kind of | unusual and I would generally _not_ recommend doing this, we were | lucky that all the equipment left had stakeholders that | acknowledged it was legacy stuff and they could tolerate losing | it. | | The irony is, of course, that it went perfectly. So far as I know | there was not a single problem experienced through the whole | thing. I even managed to swap the phone lines to the | (surprisingly busy!) legacy fax server when each was out of use. | nobrains wrote: | I don't know. If the "boss" was charged "4.5 hours of work, 2 | hours of consultancy, and 4.5 hours of consultant", and assuming | he would have been charged half of that with downtime, maybe the | boss did get a good deal. We don't know the cost of downtime for | him. | | I mean if he had access to technical resources who were willing | and capable to do this for him, he chose to do it. | notwhereyouare wrote: | I personally find it hard to believe that a rough estimate of | $450 for the job (spitballing $45/hr for 10 hours) is less than | 5 minutes of downtime and they only have 1 server. | | Then again, could _easily_ be wrong | nobrains wrote: | You cannot compare it to zero. You have to compare it to the | cost of doing it with the downtime. There would be cost to | that as well. It will not be free. | moduspol wrote: | It's also possible that "downtime" has different meanings to | different people. The client may be seeing "downtime" as the | net result of what happened the last few times the server was | "down," which could have been for any number of reasons | (potentially even unrelated to the server itself). | | When you get clients describing things like this, it's possible | they've been promised things about this server before by other | consultants that didn't pan out. They don't want to give you | the full details because then you'll recommend a different | route that they don't want to take (justifiably or not). | | It's easier for them to frame the problem to a consultant in a | way that allows for only one potential solution, even if | perhaps better ones exist, because the guy in charge of making | the decision isn't technically skilled enough to assess whether | others proposed by consultants are as viable. | | And, of course, one might read a little into why there exists a | "boss" with such a highly-critical IT need that is hiring a | consultant to do work like this, and thinks that threatening to | not pay at all if there is any downtime is the best way to do | it. | | I mean, what if they opened the door to this closet and it | grazed a power cable on the floor and the machine just shut | off? Why even bother staying around to bring things back up? It | wasn't your fault and there's already downtime: you're not | getting paid. | closetohome wrote: | Someone upthread was talking about how, as a Salesman, you | have to read the room and know how to talk to clients. I did | that for awhile, and always got a lot of mileage out of | asking the customer what they ultimately wanted to | _accomplish_ , which usually revealed that what they were | asking for was a solution to a self-made problem, and there | was a better alternative altogether. | BrianB wrote: | It's been done. | https://i.cdn.turner.com/v5cache/TBS/Images/Dynamic/i439/sei... | [deleted] | growt wrote: | Setting up a new server at the new location and moving the VMs | one by one to the new server as they become idle should be | possible without downtime. But maybe there were other | requirements (like no new/additional hardware) that weren't | mentioned in the article. | D895n9o33436N42 wrote: | This reminds me of a famously obtuse and obdurate boss who asked | for things that were utterly impossible. He had delusions of | grandeur which left him convinced that he and only he was | qualified to challenge the "cheap, fast, good - pick any two" | triangle. | | Naturally, I did my best to explain the laws of physics to him, | but he wouldn't hear it. In a spectacular display of Stockholm | syndrome I did my best to appease him for four years, but, as | many of you can surely predict by this point in the story, I | failed in every possible way and eventually gave up. Just wish I | could have my four years back. | | I was glad to read that OP at least got paid well for his | efforts. | Tade0 wrote: | I applaud you for being able to stand four years of this. | | I usually get fired from such positions in less than two. | sleepybrett wrote: | I usually walk out about six months in if not sooner. Maybe | it's just because I spent so much time freelancing that I had | enough experience to recognize a no-win situation. | jtbayly wrote: | Seems very risky. Not something I'd want to do if minimum | downtime was the goal. One wrong piece of gravel ends up with | catastrophic failure instead of 5 minutes of downtime. | dmurray wrote: | But the goal was zero downtime, not minimum downtime. The | client made it clear that 5 minutes of downtime was equivalent | to catastrophic failure. So they correctly found a solution | that reduced the chance of "5 minutes of downtime", at the | expense of an increased risk of catastrophic failure. | jtbayly wrote: | I understand that. I just doubt that the risk was worth it, | if downtime is such a big deal. | pengaru wrote: | Decades ago working in a sysadmin role at a hosting company I had | a similar situation. | | The solution I came up with was to fashion a custom male<->male | power cord, like a gender changer, from some broken ATX PSU | scraps we had laying around. By rearranging the power sockets | from multiple donors, two male power cords could be connected on | a single enclosure. Internally the sockets were simply bridged, | otherwise the PSU was basically gutted. | | With this goofy metal box having two male power cords dangling | from it in hand, I just used a very long extension cord plugged | into an outlet on the same AC phase as the existing server's | power source. The extension cord powered one of the bridge cords. | The other bridge cord plugged into the server's existing - and | hot - power strip, forming a redundant power source. Now the | power strip could be unplugged from the primary power source | without losing power, and we just moved the server to the new | location with the bridge box and power strip in tow. | | If memory serves the only tricky part was determining which | outlet at the new home was on a compatible circuit. We didn't | have much in the way of electronics tools, no oscilloscopes or | anything. Even the soldering involved to make the bridge box was | done using my personal soldering iron, which just happened to be | in the office because some of us raced RC cars there after hours. | | I think I just used an incandescent desk lamp to verify a normal | brightness on the bridged circuit before proceeding with the | server, but it's been a while. | | I wonder how many people have fashioned AC power cord gender | changers throughout history... :) | rafaelturk wrote: | Fun reading. But my advice is never accept a job like this. This | could easily become 2 weeks down time | xyst wrote: | Why wouldn't cloning the VMs to a second server, then split the | traffic between the primary and secondary server work? Once | traffic to the second server is confirmed, you could shut off the | second server and haul it off to the new location. | | I would probably still charge a much higher rate since the owner | was an arse, but at least you would get back your 7-8 hours. | AnIdiotOnTheNet wrote: | Not all services can be load balanced in this way | | Live migration of VMs would have been a better option, which | was brought up in the reddit comments and dismissed because | HyperV live migration is spotty. While I'd have to agree with | that assessment, it isn't so spotty that what they actually did | was less risky. | blcArmadillo wrote: | It sounds like there was no second server. | morphogenesis wrote: | Database inconsistency for one thing. This works for frontend | web services but how do you reconcile the writes between the | two servers? | viraptor wrote: | You're making assumptions about what's running on the servers. | Let's say it's a VoIP conference server with a shared dedicated | room - effectively you have an ongoing session shared between | multiple connection and you cannot stop it. Or you have | stateful local processing so you can't "split the traffic". Or | a number of other limitations... | Zenst wrote: | Interesting story and one that has played out a few times, I'm | aware of a couple verbatim to that. Another - used power | extension leads to cover power. Key being systems with dual power | units (most servers do) and networking so you can switch from one | run to another. | | But have known some large companies who have in their history, | done things like this and other creative solutions to impossible | problems. | Reedx wrote: | That reminds me of the Pixar incident where Toy Story 2 was | accidentally deleted while in production and had no working | backups. | | Luckily one employee was working from home (rare at the time!) | and had a copy of the entire movie on her desktop computer. Which | they _very carefully_ moved back to the office and were able to | restore from that. | | https://www.youtube.com/watch?v=7MAedEXri7c | fooblat wrote: | > Stupidest thing I've ever had to do. | | I don't really understand the "ranty" tone. The client had very | specific requirements and the author came up with an effective | solution and was fully paid to deliver it. Sounds like a win for | everyone. | some_random wrote: | The client was an asshole who demanded 100% uptime and stated | that they wouldn't pay if there was any downtime at all. The | rant is entirely justified. | swarnie_ wrote: | I see reddit so i assume this is the sysadmin subreddit? | | They're famous for not being a cheery bunch. Because reddit's | demographic does swing younger the sub used to be filled with | endless posts about being socially incompetent or possessing 0 | business craft. | | Does anyone know if it improved? | ianhawes wrote: | I believe the expectations of having 0 downtime was not | expressed until the day of the transfer. | dna_polymerase wrote: | In addition to that the customer runs a single server but | expects the guys to maintain a property not even feasible at | Google scale: Zero downtime. Overall the whole thing was just | ridiculous, but luckily the customer got a nice bill in the | end. | pc86 wrote: | To be fair, Google maintains zero downtime for small time | scales like this _a lot_. Most of the time, actually. | kevincox wrote: | Are you talking about Google Compute Engine? In that case | yes, because by default VMs are live migrated between | physical hosts. This can be done for schedule maintenance | or upon signs that the machine is likely to fail. | Furthermore there are no physical disks for a GCE VM | which is one of the more common failure points. The | result of this is that GCE VMs often survive for months | or years without downtime. Note that the SLA allows more | than 3 hours of downtime per month. | https://cloud.google.com/compute/sla | | For physical servers the uptime is typically quite small. | Of course Google isn't optimizing for server uptime so it | isn't fair to say "well even Google can't do it". | sokoloff wrote: | I'm having a hard time following what "zero downtime most | of the time, actually" really means. | | https://m.youtube.com/watch?v=IKiSPUc2Jck&t=81s | im3w1l wrote: | It means that downtime is chunky. | buran77 wrote: | > zero downtime for small time scales... Most of the time | | I read it as "if you take small enough discrete time | intervals they won't overlap with any downtime". Or in | other words "no downtime between downtimes". Yes, it's | very in line with your video. | intpx wrote: | probably because proper architecture (clustering, HA etc) and | planning would have never made this an issue. This is still an | extremely risky operation, hot swapping power and switching | interfaces on the fly all while sitting on a cart in a | corridor. In any disruptive work there is never a guarantee of | no downtime for affected assets. I know the OP came in as a | consultant, but If I was the MSP tech, I would have demanded a | paper trail a mile long to cover my ass if this went sideways | and If i was the account manager for the client, I would have | refused the work. Its not good business to agree to do work | where you know there is a better than good chance there will be | an outage and your client is saying they wont pay if there is | an outage. Even agreeing to it puts you in a bad spot for | future work. I guess as an outside consultant, bewilderment is | a better reaction than ranting, but this is the kind of shit | that drives ops folks crazy | dspillett wrote: | _> planning would have never made this an issue_ | | Hard work is wonderful stuff. Days and weeks of it can save | you whole hours of planning. | jabroni_salad wrote: | My SOWs leave zero room for 'and it will go flawlessly or you | wont get paid at all'. If you are occupying my time in a way | that makes me unable to serve other clients you will | definitely pay for it. | addHocker wrote: | The whole setup is risky. And a customer who demands a 0 | downtime while driving price down on the setup, sounds stupid | from the start. | jedimastert wrote: | > Me: You didn't notify them of scheduled maintenance like we | discussed on Friday? | | It appears that the client _didn 't_ have the specific | requirements on initial consult. | throwaway0a5e wrote: | Reddit (for reasons related to user demographics and feedback | loops) rewards certain types of writing and implied viewpoints. | Following best practices and rules is one of those things. This | server migration clearly runs counter to established wisdom so | OP using a writing style of "look how terrible and asinine this | was" will be rewarded and gain traction much more than a "look | how interesting this was" writing style. | user5994461 wrote: | It's reddit /sysadmin, the channel is dedicated to rants and | horrible experiences from sysadmin and helpdesk folks. | | It's quite sad IMO, don't recommend to go there unless you | want to have a bad day reading about the most horrific work | environments and bad practices in the world. | DoreenMichele wrote: | Perhaps somewhat similarly, r/TalesFromRetail is devoted to | kvetching about your job in the retail sector, but it's | really not a depressing place. There are a lot of rules and | expectations about how you tell your story. You aren't | supposed to outright dox anyone or veer into genuine trash | talk. | | It's not supposed to be negative per se. It's supposed to | be entertaining. | | It's an art form. It's not everyone's cup of tea, just like | horror isn't everyone's cup of tea. But people often watch | horror movies for catharsis, not because they want to be | depressed and wallowing in self pity. | | Storytelling is often about educating people about things | you can't speak about more directly. It's often a way of | sharing wisdom in an inoffensive manner and one that will | stick because people will actually pay attention, unlike | when you are giving them some dry lecture about some | problem they haven't yet had and don't yet care about. | | But if you entertain them, they will read it anyway and | that story may stick with them. And then six months or a | year later when they have the same problem, they will | actually remember how someone else handled the same issue | and it will turn a potentially nightmarish scenario into | "Meh, I just did the same thing that guy on Reddit did to | his shitty boss/customer/coworker. Worked like a charm. | Moving on." | me_me_me wrote: | <any group of 2 or more people> (for reasons related to user | demographics and feedback loops) rewards certain types of | writing and implied viewpoints. | | This is literally the basis of human interactions, thats how | we humans work at every scale to form | friendships/families/societies/nations. | dspillett wrote: | Correct, though I don't think the comment you replied to | intended to imply that such pressures & rewards didn't | exist elsewhere or that this particular outcome was either | general or not. | | It just stated that the specific pressures and rewards | present in most reddit communities tend to encourage this | specific style of writing. | me_me_me wrote: | Maybe you are right and it was just a plain statement. | But it sounded quite snarky to me. As if it was | condescending reddit for biases given subreddit might | have, like HN has none. | DoreenMichele wrote: | ^ | | The flair for the piece is "Rant." That's an official | category for the sub. There are going to be expectations | surrounding how you write when using a tag like that. | aphrax wrote: | Its funny you mention the rewards on certain types of writing | within Reddit. I was thinking about it the other day & | couldnt quite put my finger on why I dislike a lot of the | stuff on there - even across Reddits. I think this is | probably the cause... | fortran77 wrote: | > Reddit (for reasons related to user demographics and | feedback loops) rewards certain types of writing and implied | viewpoints. | | Just like Hacker News! Here's a clue--it's in a subreddit | where these types of stories are welcome. | social_quotient wrote: | Slight topic drift - Any thoughts on how the pandemic might | materially change assumptions about an onsite/onprem being better | than cloud or manage data center when the code people are now | actually remote to the "Local" infrastructure. Something specific | to the reality of the pandemic strikes me as something that would | make the die hard local only folks have to start rethinking the | position. | | (Not to suggest it's bad, just different now that a primary | assumption about people work in the office is less true) | AnIdiotOnTheNet wrote: | As someone who works in a very anti-cloud company culture | (which I happen to agree with), this incident has had no effect | whatsoever on that mindset. We don't dislike cloud because it | is accessed remotely, we dislike cloud because of the lack of | control we have over everything running there. If something | happens and our local systems have a problem, there are people | here, like myself, who's highest priority will be fixing it and | second highest priority will be communicating the status of | that. Your problems are _never_ a priority to a cloud vendor | and communicating with you is even less of a priority. That 's | before we even get into the absurd expenses and reliance on big | fat pipes. | ocdtrekkie wrote: | I feel a lot safer knowing I'm controlling all the variables | during a global crisis, actually. | | This article provides an example of how when you operate on | prem, literally any crazy option remains on the table for you. | If you asked your cloud provider to do this, it'd be a no. | macintux wrote: | At my first job we were starting up the company and didn't really | know what we were doing; one early server was sitting on a | folding table and its power cord was wrapped around a leg, so | just to replace the table with something more robust involved | downtime. | em-bee wrote: | the careful application of a saw or an angle grinder would have | made it possible to remove the folding table without unplugging | the power cord. :-) | crumpled wrote: | I heard an anecdote about a company splicing some fiber cable | in the middle of a utility van and having to cut the van apart | at the end. | elliotpage wrote: | I've been there, solidarity for cheap furniture based | maintainence windows. | Humphrey wrote: | I haven't read the article, but I'm reminded of that episode of | Seinfeld and the frogger arcade game | dfsegoat wrote: | I have pondered this exact scenario (server move w/0 downtime) | - because of watching that episode - wouldn't have thought | about it otherwise. | | ..It's interesting how pop-culture and your chosen profession | intersect, at times. | chisleu wrote: | I didn't want to lose my many months of uptime for a lan party | back in 1999/2000 and we used the UPS to migrate my linux box | across town for some Quake 3 Arena action. | | Things were so much simpler back then. | devchix wrote: | I recall sometime in the mid 2000s there was a fever for | achieving five-9s (99.999% uptime, I think -- it became fodder | for a few episodes of Mr. Robot). Not that the metric ever went | away, but back then a lot of BigIron(TM) vendors advertised | achieving five-9s by replacing hardware while the OS remained | running and continuing service. Sun 15K and 25K series (Gilfoyle | had a used one in the garage running his network) were behemoths | whose mem/cpu boards you could swap out wholesale while the | entire frame and backplane was powered on, and while the OS the | board came out of remains functioning. There were many caveats | around the procedure but it worked. Execs and sales guys loved | those demos. These monsters were expensive and banks and energy | conglomerates were buying them by the dozens. There was also a | big todo about hot swappable drives. The idea that you could be | doing hardware maintenance while the machine was still running | was a novelty, something like brain surgery while the patient was | not only awake, but awake and eating, driving his car, talking on | the phone, etc. | | A decade later I look back with deep surprise that we didn't | think to abstract out the service instead of the hardware. I | don't know how many of those behemoths are still being bought, | now I work almost exclusively with small server instances that | can come and go on the fly. Micro services and AWS have taken | five-9s in a different direction. I frequently think of Sun as a | failed Hephaestus, in a Christopher Nolan film he would be | brilliant but could only turn out clumsy tools because of his | deformity, he hates the things he makes so he throws them away | before completion. Men find these cast-offs and temper and refine | them. | chasd00 wrote: | i remember those. A friend of mine was a network engineer at a | local datacenter ( UUNET then MCI pre-scandal ) and said | companies were buying Suns for everything no matter how | trivial. | | He worked a night shift and i use to go hang out with him in | the noc and download movies (residential bandwidth was not what | it is today). Odd he nor i ever get in any trouble for that | heh. | tlack wrote: | Well as I recall there were a few reasons that people focused | on reliability in hardware in the late 90s: | | 1. Shared state storage systems that supported replication were | rare (I think Oracle and Informix maybe?) | | 2. Virtualization software was in its infancy (did SunOS have | something before Solaris?) | | 3. RAM and hardware were waaaaay more expensive, meaning you | often had to buy more pure metal just to answer questions fast | enough | | At least that's my take on it based on my dim faded memories | bcrosby95 wrote: | AWS single region SLA isn't 5 9s though. If you want 5 9s in | the cloud, not even multiple AZ is enough - you need to go | multi region or even multi cloud. | closetohome wrote: | > Multicloud | | This sounds like something they'd make up on NCIS. | theevilsharpie wrote: | > [In the mid-2000's], a lot of BigIron(TM) vendors advertised | achieving five-9s by replacing hardware while the OS remained | running and continuing service... A decade later I look back | with deep surprise that we didn't think to abstract out the | service instead of the hardware.... Micro services and AWS have | taken five-9s in a different direction. | | In the mid-2000s, enterprises were (and in many cases, still | are) running proprietary software with proprietary RPC | protocols that had no available source code or other means of | modification, and most had no support for application-level | high availability, access control, or any other operational | quality-of-life feature that people take for granted today. | Rather, that functionality was handled at the infrastructure | level, through things like the aforementioned Big Iron. | | The world looks different today, but those machines made sense | for the environment at the time. | ClumsyPilot wrote: | I think it kind of makes sence in general, and the obly | questuon is whether it could be achieved at lower cost. | Complexity of Todays commodity machines is conparable to big | iron kf yesteryear | larrik wrote: | > a lot of BigIron(TM) vendors advertised achieving five-9s by | replacing hardware while the OS remained running and continuing | service. | | AS/400's were capable of that in the 90's (possibly the 80's as | well). Heck, they'd call IBM for replacement parts on their | own. You'd show up for work and there'd be an IBM guy waiting | to be let in. He'd swap out a part with no downtime, and be | gone. I've seen machines with uptimes of over a decade with | zero on-site IT. | blhack wrote: | We had one of these at an old office of mine. I actually | think it's really cool. | pmiller2 wrote: | I recall working on some Sun machines with hot-swappable CPUs | (and, I assume, disks and other peripherals). If they somehow | made memory hot swappable (I'm sure it's possible, just | uncommon and/or verrry expensive), with hot swap CPUs and | disks, and redundant power supplies, you could tear the | machine half apart and it would still keep running. Of | course, at that point, once everything is hot swappable, | there are generally multiples of everything, so your one | machine is really more like multiple machines inside one box | than a single discrete machine. | g051051 wrote: | I guess servers have gotten a lot more robust in the last | decade...there's no way any server I ever managed would survive | something like that. | maeln wrote: | A lot of server are SSD-only these days which make them less | fragile. Still, I really wouldn't see myself pushing a running | server in a cart. | tyingq wrote: | Yeah, there's certainly still things like riser cards and | connectors that could come unseated due to vibration. | marcosdumay wrote: | That's probably a problem for the next guy that takes an | ops job there. Loose pieces often don't disconnect right at | the same instant, and even when they do, memory caches | usually postpone the failures. | throwaway744678 wrote: | On a parking lot, no less! Let's hope it will not rain on the | way! | em-bee wrote: | umbrellas over the switches and the cart... | | (means extra billable hours for the extra manhours needed | to hold the umbrellas) | rafaelturk wrote: | Pictures please! | fredley wrote: | This is the kind of content I've only ever seen previously in | TDWTF (which is entirely this sort of content...) | | https://thedailywtf.com/ | anfractuosity wrote: | Reminds me of this - https://www.youtube.com/watch?v=vQ5MA685ApE | | 'Moving online webserver using public transport' | salzig wrote: | had the same thought :) | zomglings wrote: | In the rain! | tyingq wrote: | The Indiana Bell building move is pretty impressive. | http://www.paul-f.com/ibmove.html | joncrane wrote: | Wow this is literally the exact same thing as the OP but for | an entire building. Insane. | sschueller wrote: | https://www.youtube.com/watch?v=CNqul9TfJwI | cpuguy83 wrote: | I'm picturing that Seinfeld episode where George tries to move | the Frogger arcade from a restaurant that is shutting down but | doesn't want to lose his high score. | codingdave wrote: | I'm surprised that part of the story wasn't to drill down into | the requirements. No downtime ever? Not even at 3 AM on a | Saturday? | | I've found that when people are being unreasonable it is because | they haven't split out their true needs from their first idea of | how to meet those needs. In this case the true need is zero | impact to users. The owner translated that to "zero downtime", | and then didn't accept alternative solutions that still would | have met his true business need. | neycoda wrote: | I wonder if there was any legit reason to require no downtime. | Otherwise the owner doesn't understand what downtime means for | his business. | ericyan wrote: | The consultants really should told the client that if all you | have is a single server then there is no such thing as "zero | downtime". | robin_reala wrote: | I always remember this post by the Amsterdam Police who managed | to maintain their uptime on a VMS cluster despite moving data | centres in the middle: | http://web.archive.org/web/20120229042903/http://www.openvms... | JeroenKnoops1 wrote: | Reminds me of the OpenVMS clusters.. Police in Amsterdam | celebrated in 2007 an uptime of 10 years of their cluster. In | this period, all hardware was replace, and half of it was moved | to another location 7 km away. All data moved from DAS disks to | SAN without one application needed to be stopped. Also VMS was | upgraded from 6.2 to 7.3-2. The VMS cluster did not go down | during all of these changes. I <3 OpenVMS | tandr wrote: | Would be interesting to know if it is still up and running? | JeroenKnoops1 wrote: | During Y2K I've also had to shutdown various OpenVMS servers | with uptime over 10 years... Only because of company policies, | not because OpenVMS required the reboot. | umarniz wrote: | Interesting read, makes me wonder as a thought experiment if it | counts as downtime if the latency of commands on the machines | rises to 5 minutes? | | You could clone the VM to another instance and record commands | going to VM1 and replay them to VM2 after 5 minutes. | | This whole brain fart of mine doesn't make much sense but if you | play along with it, does it still count as a downtime or just | very high latency? | NateEag wrote: | It depends on how downtime is defined in the contract. | | That sounds like I'm being snarky but I mean it - whether an | actual legal contract or just the documentation given to users, | any system where downtime matters should have some discussion | of what impacts downtime can have and how it's measured and | managed. | | That documentation is what defines "downtime". | | I'll add that what you've described is a sort of low-fi manual | version of DB replication | (https://en.m.wikipedia.org/wiki/Replication_(computing)). | pc86 wrote: | Wouldn't requests time out on the client side long before five | minutes? | heavenlyblue wrote: | I don't know whether it's the software in general, but ever | since I've started using Three 4G broadband in the UK; all of | the software started behaving really weirdly (lots of | lockups, hangs, etc). Apps often need to be restarted. | | If you do a ping during "bad weather", you can see that they | buffer up to 5 minutes of packets (i.e. there will be no | communication for some time, then you'll receive a bunch of | them with a huge latency with sequences intact). | | So I would assume a lot of software could even work that way. | I think a lot of software don't set any (TCP) timeouts at | all. | tyingq wrote: | That works where you have control over all of the timeouts and | failure detection at every level and layer. TCP keepalives, for | example, could thwart you. Or client side timeouts, or firewall | connection state tables, etc. | | 5 minutes of unplanned downtime in a pub/sub setup could easily | go unnoticed, since that setup is typically tuned for long | timeouts and/or repeated retries. | tobyhinloopen wrote: | 10 hours investment for no downtime seems like a good deal for | the owner | topkai22 wrote: | Depends on if he really has customers accessing the system "all | the time." | | Besides, as pretty much everyone has noted, running a zero- | downtime system on a single physical machine in what sounds | like is just a normal cable room is kind of nuts. Those 10h | would have been much better spent to move that puppy to someone | else's data center and get some redundancy. | | Although reading between the lines, maybe the lease was up and | they were waiting to the last minute to move it. | mercora wrote: | when i was younger i was super proud that i could replace my disk | while i kept working on the device. i would put the new disk into | my LVM volume group moved all extends to the new disk and dropped | the old disk out of the VG afterwards, when done i could just | unplug it and be done without halting work except for kicking off | the process. | akssri wrote: | Was it George Costanza ? | | https://m.youtube.com/watch?v=a-FbktgqCqY | user5994461 wrote: | I am so scared to imagine what would happen if there was any | issue during the move (very likely when dragging live cables and | powers over hundreds of meters). | | The client would immediately refuse to pay anything because he | was very clear he wouldn't pay a thing if there is downtime. | | Then, the next contractor would be super quick to judge you and | the situation, reinforcing that you were an incompetent idiot and | the client was right to kick you away on the spot and not pay a | dime. | | Glad it went well in the end. There is so much to lose for the | person trying to help. | willcipriano wrote: | This is a junior sysadmin I suspect. With a bit more experience | you'd learn to say something along the lines of "no downtime, | sure, that will be 30 grand" and the ability for downtime will | suddenly materialize. Him and his friend did this big song and | dance, took a huge risk and only got paid for ten hours worth | of work in the end. | zrail wrote: | $30k, 80% up front, strict liability waiver that says I'm not | responsible for loss of business or anything else if there is | downtime. | imtringued wrote: | You can't get paid upfront and at the same time get a | liability waiver. For a 100% guarantee with full liability | $30k doesn't actually sound ridiculous because it would | require obtaining 100% identical hardware and doing at | least one test run on that hardware before actually doing | it on the production hardware. What the contractor did is | basically "wing it", explain a way to get zero downtime to | the client and then not actually offer a guarantee by doing | the operation straight on the production hardware. Really | this was more about convincing (ie bullshitting your way | through) the client to let you do the work than actually | doing it properly and for a huge sum of money. It wouldn't | surprise me if there was actual downtime for a few seconds | and the client simply didn't notice it. | pedrocr wrote: | Now you're over-charging massively. If you have no | liability and are guaranteed pay, charging for just double | hourly rate is more than enough as a "stupid and non- | standard requirements" kind of thing. | cellularmitosis wrote: | > sure, that will be 30 grand | | I am having trouble finding a reference to it now, but I've | heard patio11 refer to this as "the Japanese no". Don't ever | say "no" directly, just quote an astronomical price. | x86_64Ubuntu wrote: | People in the trades world do it too. If a job won't | provide the margin they are seeking, or the job is more | difficult than it's worth they will up the price. If the | consumer chooses them to do the job, it's at a pricepoint | that's worth the trouble but they are really hoping to be | passed over. | briankelly wrote: | My dad's in construction and frequently gives out "fuck | off" quotes. It isn't so rare that the client accepts | them. | coldcode wrote: | The Rolling Stones tried that with Microsoft and "Start | Me Up", they quoted what they thought was ridiculous | $10M. Microsoft said sure, no problem. | vntok wrote: | $10M is a debunked urban legend; the actual figure is | only $3M, which is pretty standard. Microsoft's whole ad | campaign for Win95 cost about $200M after all. | Strom wrote: | That's a fun story. Looking more into it, it seems that | $10M is based on rumors and it was more likely $3M. [1] | Doesn't change the point of the story though. | | -- | | [1] https://www.networkworld.com/article/2220097/what- | microsoft-... | toss1 wrote: | That works. | | A friend of mine with a consulting biz was requested by IBM | to handle a job in Turkey. He didn't want the gig & told | them so repeatedly. He finally decided to tell them the | most ridiculous price he could think of (like appending two | zeros to the number). He said they didn't even flinch and | he was on the plane to Turkey the next week for six months. | But he did say that it was pretty much worth it in the end | (but only because of the pricing). | pbronez wrote: | Welcome to the market economy! | | Seriously, this sort of dynamic is why the world works as | well as it does. | toss1 wrote: | Yup, market economies are fantastic for rapid resource | allocation! | | Yet they are not a panacea. | | They suck at preventing problems related to: | | * tragedy of the commons - tend to create & magnify it | | * long-term disaster planning / tail risk - e.g., | stockpiling resources for natural disasters, pandemics, | etc., | | * preventing foolish development, e.g., on cheap land | subject to flooding | | * self-creating safety systems for workers, consumers, | environments, etc. -left to their own devices, markets | always do too little-too late | | Market systems literally often need to be saved from | themselves, e.g., when overfishing will literally kill an | industry by driving extinct the very thing it depends | upon | namibj wrote: | Actually, stockpiling does happen if there are no laws | against price gauging. Because that's how the capital | bound in the stockpile gets it's ROI. | toss1 wrote: | I hope you are not seriously suggesting making price | gouging in disasters legal as a method of preparation. | | Price gouging is nowhere near a reliable method of | disaster preparation as actual expert planning. | | The stockpiles you speak of are usually just ordinary | current inventory marked up by an order(s) of magnitude. | | Also, stockpiling goods is not the only thing needed for | disaster preparation. One must also stockpile services, | i.e., have the right people recruited, trained, equipped, | and ready to respond. Prime examples are military and | firefighters, who spend a much time & resources training, | and little time actually fighting the wars or fires. | vorpalhex wrote: | I had a client who wanted me to write some code in Adobe | Coldfusion of all things. Not wanting to say no to an | otherwise good client, I quoted some insane hourly. | | And now I know that Coldfusion is absolutely miserable to | code in (and the client tried to dodge their bills!). | ericlewis wrote: | my grandfather called this the "asshole quote" | patio11 wrote: | I don't believe I've said that. | | For what it is worth, if a customer of my previous | (salaryman-heavy) employer asked for this, we'd tell them | an _actual_ no, which is extremely rare in client | relationships in Japan. A contextually appropriate "no" | for something which is less absurdly wasteful of | engineering time to no purpose would be "That sounds | difficult. We could explore options to do it, but perhaps | you could accept an hour of downtime in the dead of night" | then bargain down to 15 minutes. | pmiller2 wrote: | I've heard people say that the right way to say 'no' in | Japanese is more along the lines of "it is very difficult." | I have no idea how much linguistic truth there is to that, | but it definitely rings true culturally. | yourapostasy wrote: | There is an art to this. These situations come up because | you want to continue an ongoing relationship into the | future. | | So you quote a price that is high, but not so high as to | destroy the relationship. I call it "plausibly-deniably- | high". | | You also have to gauge the context of the other party in | the negotiation. This technique works best when you | accompany the quote with some kind of description matching | the personalities. Some people are swayed by a description | of the additional time it takes (the billable hours | mentality). Others are swayed by a description of the | additional risks you are bearing on their behalf to deliver | the outcome. Still others are swayed by a description of | the _de novo_ technical challenges that no one else has | ever attempted before. The list goes on, and is a | fascinating study into people. | | This is where a real salesman (as opposed to an order- | taker) earns their keep, where they know how to read a room | and craft a response, messaging and after-meeting | socializing that takes into account all those perspectives | simultaneously from the point of view of the other party. | gnopgnip wrote: | 4.5 hours of a consultants billing rate can be much more than | 10 hours of your regular hourly rate working a similar job. A | good consultant will have a contract. The client saying I won't | pay if XX happens doesn't mean anything unless it was in the | contract. | | Networking/spanning tree loops, arp table mismatch/corruption, | the switches at the destination being misconfigured are all | realistic problems that would result in downtime here. The | normal way you do this is with live migration from hyper-v or | vmotion from ESXI. If the initial migration is not successful, | you just leave the server powered on while you address the | issues. Once the VM has been migrated you can do whatever you | want with the original server without worrying about downtime. | maire wrote: | This reminds me so much of when I joined vmware in 2006. | vmotion had already been around for a few years - but I | believe this was the first release of vCenter with DRS. | | A couple of months I joined, a room full of customers chewed | us out for not publishing our vmotion compatibility tables. | After 4 hours of chewing out - they then told us they reverse | engineered the compatibility tables and reorganized their | entire data center to conform to vmware vmotion. Then (of | course) we worked with intel to make sure the compatibility | matrix worked in the future. | | I realized at that point that I joined the right company. | ThePowerOfFuet wrote: | Also, moving a server with spinning disks? What could possibly | go wrong. | nwallin wrote: | Disks aren't that sensitive to motion. | | At my last job, we had 2 airplanes with 5 computers each with | 6 disks each mounted in an aircraft. These were regular | servers from Dell, not special hardened or resilient hardware | or anything. So 60 or so hard disks flying around. Takeoffs, | landings, turbulence. Two flights per day, 3 hours each | flight, 6 days per week. So 626 landings per year. | | Disk failures were not particularly common. | zimpenfish wrote: | As a counterpoint, I worked for a place that used Mac Minis | inside spinning displays and the hard disks absolutely did | not like it one bit. | | (They also tried spinning disk machines on buses which also | failed quickly but that was more the grime and electrical | noise than the motion, IIRC. Then they tried mini-servers | running from CF and the motion would slowly work the CF | cards out of their sockets. The company did not last long.) | nilssonanders wrote: | Reminds me of the video where they yell at hard drives and | measure disk latency. https://youtu.be/tDacjrSCeq4 | daemin wrote: | Wasn't here a story about Sun (or HP or someone like that) | where they moved a bunch of disk servers across a parking lot | to another building and found that many of them had died from | the vibrations on the trolley cart used to transport them. | znpy wrote: | it was Yahoo, IIRC. | driverdan wrote: | Spinning disks can take a surprising amount of shock and | vibration before they fail. | sschueller wrote: | I had a spinning disk in my car back before we had all these | cool embedded PCs. The disk was never an issue, these things | can take a lot of abuse (Even New England roads). I had it | mounted sideways so a large pothole wouldn't push the heads | into the platter. | sleepybrett wrote: | If there was downtime during the move and the client was there | and declaring that they would not pay, you just walk away. | You'd be surprised at how fast they can cut a check in that | situation. | linsomniac wrote: | Lower stakes, but ~15 years ago a friend had a Linux box in the | corner that had huge uptime. I want to say the uptime started | shortly after the kernel patch that fixed the 400-ish day | overflow of the uptime counter. He moved to a new home and very | carefully moved the running server using it's UPS. He didn't have | to worry about keeping networking up though. | | I used to be all about long uptimes. I eventually started seeing | long uptimes as a negative though. A long uptime probably means | patches have not been applied. | jccooper wrote: | I also did that once, about the same timeframe, specifically to | preserve an uptime. | | I think the cult of runtime came about simply because it was | impressive that a personal computer could stay running for more | than a few days when most of the world ran Win95. And because | development cycles were longer and there weren't a lot of | network threats. | panpanna wrote: | Really disappointed they didn't use a wireless network of some | kind. | lordnacho wrote: | My first thought as well. Set up WiFi along the path, basically | turn the machine into a laptop. But I think there might be a | disconnect when you change base stations? At least when I move | my laptop between rooms in my house there's often a momentary | problem while on video call. | | The other way I'd do it is more similar to described. Create | redundant network paths to the server, then cut one. | dbalatero wrote: | I wouldn't, the risk of disconnects is high. | panpanna wrote: | But the risk of sometime tripping over your looong cat6 and | breaking the network is not negligible either... | neilv wrote: | Good thing the server had two power supplies. There was a YouTube | video (which I can't immediately find) of people moving a server | across town, on the train, without powering it off, and, IIRC, | they had to splice the UPS into the power cable. | | When it's done for pay rather than for fun, and payment is | conditioned on zero downtime, I hope they charged a premium to | make up for the risk of no pay. Offhand, I don't know what's a | good way to do that -- I've never had a consulting client demand | terms like that for billed-by-the-hour work. | kijin wrote: | Effective hourly rate = base hourly rate * risk. | | Risk = client risk * task risk. | | Client risk is based on your past experience with the same | client. If they're prone to demand last-minute changes or | stupid stuff, they get charged a higher rate on every project | afterward. Jacking up the client risk factor is also a nice way | to fire a client you don't want. | discordance wrote: | https://www.youtube.com/watch?v=vQ5MA685ApE | tzury wrote: | I once was called in to export data from a DOS program that had | no export option. Single Author died of heart issues and the | company needed the data for the migration. | | After several attempts to understand the binary format I gave up | and ended up printing tabular reports to LPT1 which I connected | my laptop to, extracting it and rebuilding CSV files. | | Lucky enough, printing those days were the most important feature | of a business app. | shireboy wrote: | You realize the client is condescendingly mocking the guy for | saying it can't be done now, and will expect this next time they | run updates on the server, which is to say never | mercora wrote: | i wonder if its really possible to do the initial setup of the | ethernet failover without interruption. i have never done this, | but i would expect the interfaces themselves will become | unavailable for direct use and you get a completely fresh virtual | ethernet interface which represents whatever physical interface | is currently active... at least this is what happens when you add | an ethernet interface to a bridge in linux... | johnklos wrote: | I've done something like this - server running off of UPS moved | from one building in Manhattan to another about 1/4 mile away, in | snow... Not for someone with weak arms. | digitalsushi wrote: | I had to search the reddit commits for 'vmotion'. They have it | covered. | | This anecdote is an amazingly good story for telling at the pub | over a few beers. It's a terrible story for a strategy. | | If this is a mountain, my molehill is that one night in the late | 90s, I got paged cause the SMTP outbound server was overheating. | At midnight I drive across sleepy NH backroads, and stopped at a | Wendys to get a chicken sandwich and iced tea, for the caffeine. | | When I got to the server room, I pulled the 2U Dell server out of | the rack and discovered the CPU cooling fan had seized up. Mind | you, this is a New Hampshire data center in 1999, and it has a | filing cabinet with manilla folders, and carpeted floors. This | thing was never prepared for any disasters. | | A half hour later, the SMTP server was up and running cool again. | | I greased the fan with the mayonnaise from my sandwich. | Spooky23 wrote: | The real lesson is that the teller of the tale sort of did | initally -- fire the customer. | | If the story is true, the client is a stereotypical know-it-all | small business owner who gets by on bullying. You see them | frequently in businesses that pay low-skill workers a small | premium that is hard to replace. (ex: cleaning services, pool | guys, mechanical contractors that do low-end maintenance work, | etc) | | As a contracted SME, taking a job like this is dumb. The | chances of failure, where "failure" == the server going down is | high, and the customer will just stiff you. | donmcronald wrote: | I deal with a fair bit of customers like this (not via my | business), enough that firing them all isn't an option | because it's a large portion of the market where I am. It's | something. They'll have low end servers with no redundancy, | terrible or no backups, and no contingency plans for | anything. They won't spend a nickel and are the most likely | to lose their minds if anything goes wrong. | | It's so frustrating and stressful. | eldavido wrote: | This is why services like Google Apps or managed exchange | hosting exist. Most people are _terrible_ at IT management. | So bad they 're far, far from realizing how bad they even | are. | | When you consider you're getting like 1000 of the smartest | tech people in the world to manage your infrastructure for | you, for $5/month per user, it's really such a no-brainer. | If people are too stubborn to see that, or want to waste | time trying to do it better themselves "because it's | cheaper", with redundant power, OS patching, zero-downtime | changes/deploys, proper capacity planning, proper redundant | connectivity, provisioning the right network around it, | physically securing the server room, ensuring things are | properly cooled, not wet...I could go on forever, and this | isn't even the main focus of the business...I'm sorry, but | they deserve go to out of business. | | I had a very stubborn client once who ran a hotel chain. | Won't say what or where, but I wasn't surprised when their | random "security through obscurity" VNC server got | compromised. I wasn't finishing migrating to the new PCI- | DSS compliant system we built, either, so there go 5000 | credit cards "encrypted" with some sweet rot13-level | bullshit in Turbo Pascal I cracked in about 30 minutes with | no code access. | ed25519FUUU wrote: | Seems like some dedicated hardware here would make this go | faster if there's a business for it. For example, if the | bandwidth isn't high, you could setup a wireless mesh from | point A to point B and connect via some appliance to the | NIC. | | Walk the length with the appliance and verify there's no | dead spots, then just hook up to a power supply and get | things done. | Bnshsysjab wrote: | In the 90'a wifi was even more of a trash fire than it is | today. | paulsutter wrote: | Key is to set the price. If the job is a hassle, you're not | charging enough. That will also filter out bozos. Sometimes | people really do have extreme requirements. And when they do, | they're willing to pay 10x for it. | | Agreed though, this particular customer disqualified himself | as soon as he said he wont pay if the server goes down. He | should have offered a big bonus if the move succeeds without | downtime. | Dylan16807 wrote: | > Agreed though, this particular customer disqualified | himself as soon as he said he wont pay if the server goes | down. He should have offered a big bonus if the move | succeeds without downtime. | | I don't know, that sounds too close to encouraging the | attitude of "it's not worth it, I'll just take my normal | pay". 10x vs. 0x is a significantly stronger incentive than | 10x vs. 1x. | noobermin wrote: | This is probably the best compromise. Sure if you're moving | a server with, idk, top secret government spy files or | something then you charge 10x or some multiple. But that | would obviously be the exception. The vast majority of | these people are just self-entitled dipshits who have an | inflated notion of self-worth. | | See my comment elsewhere on "the customer is always right." | derefr wrote: | > Key is to set the price. If the job is a hassle, you're | not charging enough. That will also filter out bozos. | | Sure. Just, there are some verticals where charging a | "positive-ROI" amount gets you no business at all, because | _all_ the potential clients in that vertical are businesses | that operate on such razor-thin margins that they don 't | actually _have_ the cash-flow to pay for the extreme | requirements they also have. They 've been getting along | until now purely by begging/tricking/manipulating people | into doing negative-ROI one-off tasks for them. If forced | to get contract all the services they need out on the free | market, their business would cease to exist. | | (Therefore, you say, they _should_ cease to exist. I 'm not | arguing!) | rdslw wrote: | > there are some verticals where charging a "positive- | ROI" amount gets you no business at all, | | if you do, you're just selling dollar bill for 80c. You | may drink growth-kool-aid, or someday-monopoly-hope or | VC-subsidizes-business. | | In the end, somebody pays for it either from stupidity or | hope. | kenhwang wrote: | When I did consulting, we always got unreasonable but | technically not impossible asks like this. We never "fired" | the customer, because that's just bad business and customer | service. What we did instead was tell them their options, | our recommendation, and "appropriate" billing estimates. | Your job is to consult them to the best of your ability, | not stop them from bad decisions despite your advice. | | So, 1 hour billed for 5 minutes of downtime, or 40 if you | want absolutely none. Happy to do either, but highly | recommend the former. 99% of people pick the cheaper | recommended option. | | In this case, I would've tried to put the server on WiFi | which would seem like less a hassle for me. Equipment | acquisition cost billed to the customer. | dialamac wrote: | There's a market for everything. The "shit customer" would | have hired someone. Sure things could have gone to shit and | they wouldn't be paid, but they didn't, and there is a living | to be had serving this market segment. Every contracting | business has difficulty with AR, stereotyping often isn't | even particularly acccurate. Suffice it to say there is a | business to be had in catering primarily to difficult | customers. | mlyle wrote: | > stereotyping often isn't even particularly acccurate. | | While I agree that it's difficult to predict exact credit | risk based upon customer personality, explicit threats not | to pay you like exist in the story -are- a bit of a signal | that there may be a risk of nonpayment. | kube-system wrote: | Sure, I think the point was that there _is_ a market for | customers with subprime credit. Sometimes it is fairly | profitable too. | nautilus12 wrote: | Depends on what you mean by business. The nature of that | type of business is that it's not very predictable or | repeatable so you get one small chunk of business but | ultimately it's not the kind of business (even a | consultancy who's job it is to throw hours into the fire) | you really want. Scaling that out would be death by 1000 | papercuts. | renewiltord wrote: | Presumably at least part of the value of the job from his POV | was the excitement of it. I've definitely done suboptimal | jobs where I just enjoyed them a little and they were a break | from the replicable tasks. | | Since the customer is never going to know the awesomeness of | it, it's really just for yourself. | jjice wrote: | I'm a young developer, so I've never had the chance to work | with on prem servers (and the chances that I will are looking | slim), but I've always loved these "war" stories. | jacobsenscott wrote: | I miss the days when the servers came with seating (Cray). | mike_d wrote: | You can buy yourself a pair of old Dell servers from | craigslist or eBay for a few hundred bucks. With a $200 | membership to VMUG Advantage you'll get all the licences you | need to build an enterprise grade cluster. | | Build yourself a home lab and learn how systems work. Figure | out what is really running your code. Learn how to resource | optimize. | | Don't end up only being able to work on webapps and small | datasets that fit comfortably in the cloud. | jjice wrote: | That's been a plan of mine. Right now, my home server is | just a Pi running SMB and Jellyfin, but the plan is to | expand into some used hardware. Seems like used server | hardware is one hell of a deal. | laurent92 wrote: | In my former service company, the story of the server room | which has become much much more important and reliable with | massive investment like a diesel generator, but the teams | haven't grown enough in maturity. One day they have a problem | with a server. A system admin is granted permission to go to | the bay, since remote desktop didn't work. They discuss the | problem in front of the bay. One leans on another bay that | was just between two locations. Wheels weren't locked. It | just flew across the room. | | It was ok, just the power cord and a fee RJ45 torn. No | serious damage besides downtime. | m3kw9 wrote: | Mayo is around 1 part _water_ to 1 part oil ratio.. | y_tho wrote: | It's 2008. A manager that just doesn't care anymore tells the | new IT person to replenish the fan mayo. | | -"Why? I don't know why. It just works." | | -"Is Hellmann's ok?" | | IT person documents that Hellmann's is preferred. | bentcorner wrote: | And then later on Hellmann's is discontinued, so the company | solicits quotes for a mayo supplier. | Nextgrid wrote: | I've once used cooking oil as a thermal paste substitute. | Worked well enough and nothing went wrong. | unlaxedneurotic wrote: | That sounds too close to a fire hazard | CydeWeys wrote: | I doubt there's any component in a PC getting remotely | close to the ignition point of oil (which for canola is | 424degC). Plus, it's going to be a minimal amount of oil. | | I'm more worried that the oil won't get to a high enough | temperature and thus won't polymerize, so it'll flow out | and ruin some other component, or go rancid, or something. | Thermal paste won't move on you. Oil will. | zhengyi13 wrote: | Oil will certainly move on you, but it might not destroy | your components, depending perhaps on the specific oil | chosen: you can actually buy or build fully oil-cooled | PCs. | | https://www.pugetsystems.com/submerged.php as an example. | ponker wrote: | That's mineral oil, not cooking oil. Mineral oil doesn't | go rancid. | [deleted] | Scoundreller wrote: | Presumably there's enough surface tension from both sides | to hold it between the gaps and resist gravity (if it's a | vertical CPU). | | Pretty much any Solid-liquid-solid or solid-solid-solid | interface will be better than solid-roomTemp&Pressure | gas-solid. | | The whole point is to conduct heat better than air, and | most things will. | dsr_ wrote: | The (probably soybean) oil is a fine lubricant, but the | constant motion should cause the egg proteins to coagulate. How | long did it operate before you replaced the fan properly? | emeraldd wrote: | I can't help smiling at this analysis. It feels like | something you'd hear from a sci-fi story engineer working on | an rundown ship that just keeps going no matter what ... | staticvoidmaine wrote: | That book is Expeditionary Force | novaleaf wrote: | books 1 to 3 are awesome. after that the author forgets | how to advance the plot while still churning out a new | book more than once/year. | | I gave up on book 7, so yeah, I tried sticking with it. | gknoy wrote: | That feels like the Honnor Harrington series to me. :-( I | thoroughly enjoyed the first N books I read (5? 6?), but | the next one or two seemed like watching a series TV show | that never resolves tension points, because if they did | there'd be no reason for Season N+1. | ethbro wrote: | I read the first 100 pages of the first book and then | literally threw it into a fire. | | I believe the "Nope" sentence was something akin to | "{character} thought that {thing} because {thing}." | | Jesus Christ. Would it kill you a little to show instead | of tell? | | (And lest people believe I'm not a fan of some good * | opera, I'm not ashamed to admit I've read my fair share | of _BattleTech_ , _Barsoom_ , and even _Lost Fleet_ , | among less highbrow works) | unclesaamm wrote: | You literally threw it into a fire? | ethbro wrote: | I stand by my decision. The world is a better place. | munificent wrote: | So you're saying the plot seized up and the author wasn't | able to engineer a solution? | novaleaf wrote: | In the forward of one of the books the author mentions he | quit his day job after the first book was a hit, so I can | understand his financial need to churn out more books. | | Unfortunately there is little plot advancement, perilous | situations more contrived, and needless exposition/filler | the norm. | treeman79 wrote: | Expeditionary is my cleaning audiobook. | | As in, if I'm cleaning and have nothing else interesting | to listen too. | | Some occasional good laughs. Dinosaur holding a plunger | badge. :) | wojciii wrote: | I'm reading the series to the end no matter what. | garettmd wrote: | Well thanks for that recommendation. Added to my reading | list | Corrado wrote: | It probably lasted long enough to get a replacement fan | installed the next morning. | stronglikedan wrote: | Why would it need to be replaced now that its working | again? ;-) | eigenvector wrote: | That reminds me of the time I found an appropriately | shaped bolt installed in a fuse-holder - presumably | someone did not have a replacement fuse and improvised. | | Except that it had been 5 years since the last | maintenance in this place and it was a protection panel | for a large synchronous generator in a power plant. | | After you make a heroic temporary fix, please, ensure the | permanent fix is applied later! | yjftsjthsd-h wrote: | > After you make a heroic temporary fix, please, ensure | the permanent fix is applied later! | | I've known people who would, depending on exactly how bad | the failure was, outright refuse to apply temporary fixes | precisely because they didn't believe that the business | would fix things properly if the issue wasn't forced. And | having watched how that particular company handled | things, I can't say that they were wrong. | shuntress wrote: | I've seen it said on this forum before and it also aligns | with my experience: _Most fixes that are 'just for now' | are actually 'forever'_ | Ccecil wrote: | This chart seems relevant in this case. | | https://images.app.goo.gl/db84Dmv3sqyVEwfz5 | dylan604 wrote: | There's nothing more permanent than a temporary solution. | vangelis wrote: | It's a slow blow | coding123 wrote: | Maybe it was vegan mayo? Oh wait, 1999. | tempestn wrote: | Reminds me of the foosball table in our old engineering | students' society room. You can buy special lube for foosball | bearings. OR you can just rub popcorn butter on the bars. | Guess which one we had in ample supply. | kijin wrote: | The egg proteins are already quite coagulated. I'd be more | worried about the vinegar component. You need to neutralize | that acid with something. | tzs wrote: | It's also got corn syrup. Would that cause any problems for | this application? | | Here's the ingredient list for Wendy's mayo: Soybean Oil, | Water, Egg Yolks, Corn Syrup, Distilled Vinegar, Salt, | Mustard Seed, Calcium Disodium EDTA (To Protect Flavor) | cnasc wrote: | > It's also got corn syrup. Would that cause any problems | for this application? | | Over time, the server would expand to be 4U rather than | 2U | noir_lord wrote: | That's the problem with the FAT filesystem, it grows over | time. | Scoundreller wrote: | > corn syrup | | That's how you get ants. | dsr_ wrote: | The egg proteins are coagulated but dispersed in the | colloidal solution. The motion brings them out of solution. | You can try it at home by warming up some mayo in the | microwave and then rubbing it between your hands: you'll | get a stringy oily mess. | asguy wrote: | To play this discussion out further: it depends on the | heat and it depends on the motion. I've made plenty of | Hollandaise (which is a sibling of Mayonnaise) in the | blender with "boiling" butter poured in, and it stays | quite hot... especially when it continues to warm on the | stove top. | | If I microwaved it from cold, it would break almost | instantly. | ggrrhh_ta wrote: | Haha :-), I want this dialogue performed in Space | Janitors or something of the sort | Milank wrote: | No downtime is acceptable, but they have only one server? | | What if a technical failure happen? What if there's a fire in the | server room? What if there is an earthquake and the building | collapses? What if... many things can happen that can result in a | long, long downtime with this tactics. | | If uptime is so crucial, the system should be setup in such way | that moving one server should be a peace of cake, not a spec-ops | mission. | walrus01 wrote: | From an ISP perspective this seems like the sort of company | that orders one $250 a month business DIA circuit (at a price | point where there is no ISP ROI for building a true ring | topology to feed a stub customer) and has no backup circuit. | Then the inevitable happens like a dump truck 2km away with a | raised dump driving through aerial fiber and causing an 18 hour | outage. | | Some circuits might average 5 to 7 nines of uptime over a year, | but the next year is dump truck time... You can never truly be | certain. | icedchai wrote: | Never mind these less common scenarios... What do they do about | Windows updates? | momokoko wrote: | You'd be shocked how rare downtime is with modern hardware. A | redundant power supply and SSDs in the right RAID configuration | typically will not have any issues for years until it can be | replaced by a newer model. Also, hardware monitoring is | significantly improved to the point where you'll typically know | if something will fail and can schedule the maintenance. | | In the past power supplies and spinning disc hard drives would | fail much more often. | | It's basically a solved problem, outside of extremely mission | critical, 5 nines kind of stuff, that we all forgot because of | AWS. | | HN ran, and may still run, on a single bare metal server. | user5994461 wrote: | AWS and older hardware is no different. Set it once and it | keeps running for many years. | | I've came across old AWS account (startup have been using AWS | for the longest). All the network traffic or VPN goes through | a single instance with 3 years of uptime. | bathtub365 wrote: | AWS EC2 instances or their host machines can fail at any | time and it's out of your hands. | ficklepickle wrote: | True fact! I recently had EC2 migrate my VM when the | physical server it was on reached EOL. If they had fired | my VM up again, I wouldn't have even noticed. They | didn't. Fortunately it had an EBS volume and I was able | to manually restart it without data loss. | marcosdumay wrote: | > HN ran, and may still run, on a single bare metal server. | | I bet HN wouldn't do a 10 hours high-risk operation for | moving their servers because they can't afford an outage. | (But well, running stuff on a single bare-metal server is | expensive enough that even if they could, I expect they | don't.) | | What would that company do if a pipe broke inside the | datacenter? Besides, if you never restart your servers, you | are guaranteeing that the one time when the power goes off on | the entire city, they won't come back online. | znpy wrote: | > I bet HN wouldn't do a 10 hours high-risk operation for | moving their servers because they can't afford an outage. | | HN is probably not business-critical and could probably | affort a 10 hour downtime without much hassle. | TallGuyShort wrote: | The point is that they probably also wouldn't then insist | on a consultant doing an unreasonable migration and | threatening to not pay them if there was downtime. And | they probably wouldn't call around to other consultants | with the same requirements, apparently telling them that | the first consultant refused to do the job. | Scoundreller wrote: | > apparently telling them that the first consultant | refused to do the job. | | While I don't think they informed them of this in good- | faith, it is a nice heads-up. In this case, it meant | Consultant2 consulting RefusingConsultant that probably | knew the IT better. | com2kid wrote: | It would be legitimately interesting if a 10 hour | downtime of HN was at all correlated to an increase in | github commits. | | I _hope_ there wouldn 't be a correlation, but I wouldn't | be all that surprised if a somewhat loose one was found. | jayd16 wrote: | >HN ran, and may still run, on a single bare metal server. | | HN also has downtime fairly often. | Johnny555 wrote: | Even in modern hardware there are plenty of single points of | failure. | | Single server and "can't tolerate any downtime" are mutually | exclusive. | paulie_a wrote: | Quality hardware has existed for years. At a ford motor plant | they were doing an inventory and couldn't locate a 10 ton | mainframe. It was working so well for 15 or so years the | tribal knowledge of where it was physically located was lost. | ansible wrote: | Wow, that's impressive losing that big a piece of hardware. | | Though it was likely easier to find than that Novell | Netware server that was sealed behind some drywall, with | only a stray network cable leaving any clue as to where it | was. | owenmarshall wrote: | Depends on how big the building is that houses it - | manufacturing IT can deal with impressive floor spaces. | | I once only half jokingly suggested finding a missing | data closet in a two million square foot distribution | center by pinging a known IP from three or four | aggregator switches across the building and triangulating | the location on a floor plan. Sadly the people crawling | around the ceiling found it before I could put my idea | into practice. | pbhjpbhj wrote: | 2Msqft is c.430m x 430m for a square floorplan. Ping | resolution is 1us (microsecond). Speed of electrical | signal in cooper is about 0.8c. Gives a max resolution of | ~240m by my reckoning. If there are variances in the | switch+network delay it seems like you're going to | struggle to even say which side of the building it is. | | Good job they found it! | owenmarshall wrote: | Hah! Good math. Based on the switch placement and the | building being more of a rectangle I figured "north side | or south side" would be as close as I could get. And when | we really dug in it was a classic last mile problem: the | first several core switches were well known, we just | needed to figure out where the last aggregate switch | went. | | Turns out a door was closed and a new one built to a | hallway to another hallway and not properly labeled on | the updated drawings. Had one of the boxes running a | conveyor belt not have died, we'd never have looked. | Milank wrote: | This is all true, but you still can't rely on increased | hardware quality if you can't afford any downtime due to | moving (a one-time event) a server. | | Also, that doesn't cover other problems mentioned here, | like natural disasters, ISP problems, etc. | nuker wrote: | > can't rely on increased hardware quality if you can't | afford any downtime due to moving (a one-time event) a | server. | | Mainframe is not just a server. You can hot plug RAM on | these things. | hnlmorg wrote: | Often these kinds of SLAs are decided upon based on blame | rather than what is reasonably required by the customers | of that system. In this case, moving offices means the | downtime is due to internal reasons. But if an ISP goes | down or there is a natural disaster, then that isn't in | their control. | | Also cost does come in play as well. Multiple physical | links in would be very expensive for what sounds like | internal services. Likewise a natural disaster might | cause bigger issues to the company than those internal | services going down. They might still have offsite back | ups (I'd hope they would!) so at least they can recover | the services but the cost of having a live redundancy | system off site might not justify those risk factors. | | The customers requires are definitely unreasonable | though. I'd hope those systems are regularly patched, in | which case when is downtime for that scheduled and why is | that acceptable but not when you're physically moving the | server? I doesn't really make much sense; but then "not | making much sense" also quite a common problem when | providing IT services for others. | Milank wrote: | You are right, their SLA can be a bit different from what | we're talking about here (and expect). | | In general, we don't know much about this case. It's a | post on Reddit, might not even be true. As is, it doesn't | make much sense, but we don't know all the details, so | maybe we jumped to conclusions. | Thaxll wrote: | Yeah that's how you end up with 3years uptime on some | forgoten servers... :) | closeparen wrote: | Which is why AWS instances should be no more than minions | in a load balancer pool, and any permanent state on an EBS | volume or a managed storage service. | closeparen wrote: | >hardware monitoring is significantly improved to the point | where you'll typically know if something will fail and can | schedule the maintenance. | | There's SMART for disks... what else? | duskwuff wrote: | ECC for RAM is the other big one. A single-bit error will | trigger warnings, so that you can replace the faulty DIMM | before it progresses into uncorrectable errors. | Scoundreller wrote: | Is there a tool that can randomly take 128mb chunks of | memory out of the pool and test them around the clock? | walrus01 wrote: | Unfortunately complacency about how reliable modern hardware | is can lead to neglecting things like off site backups. And | other issues. Yeah your one big critical on premises server | may be super reliable. But what happens when the building is | flooded with 6 ft of water, catches on fire, is leveled in an | earthquake, or anything else? | | If a function is super critical to business, it also deserves | to have some thought put into the blast radius of its | failure. | | The sort of places that would insist on rolling a live server | 700 ft across a parking lot probably don't have any real | disaster recovery plan. | mwcampbell wrote: | Still, sooner or later, the data center will be hit by a | natural disaster, a DoS attack, a network problem, or the | like, and you'll have to be ready to move to a different one | to get your service back online. Or you'll have to reboot | your server to apply a critical kernel security update, in | which case you need to be ready to fail over to a hot | standby. So, since relying on a single server with high- | uptime hardware is penny-smart and pound-foolish, might as go | with a cloud-style architecture with commodity hardware. | chasd00 wrote: | I use to be fascinated with datacenters and would | masquerade as a customer prospect to get a tour and see all | the cool gear. I was asking one engineer about what they're | plan was for a tornado (this was at ThePlanet in Dallas TX | way back when) and they basically scoffed at the question. | A week or so later one briefly touched down about 1/4 mile | from them, I wonder if they thought about me when the | sirens were going off hah. | packet_nerd wrote: | Human error is a bigger cause of downtime than technical | failure or natural disasters. And in practice, a single | server like this tends to be a hand managed one-off which | only exasperates the human error component. | goodcanadian wrote: | s/exasperates/exacerbates/ | pmiller2 wrote: | It's probably a bit of both, TBH. ;) | YetAnotherNick wrote: | You wrote one server but describe the failure modes of having | one data center. I think it is very very uncommon and hard to | allow for data center level issue. After all Instagram and 100 | other site failed when one AWS data center went down. I would | interested to know how/whether anyone's backend will work if | any data center and its databases completely fails due to | fire/earthquake/networking etc. | | Second thing is having multiple machines for server. In theory | it might help in increasing the availability but in practice I | haven't seen any random issue due to machine which occurs just | based on probability. I think almost all failure modes that | exist, they are correlated between machines. eg suppose you | have data loss on one machine, you could more likely than not, | blame it on code and it would be similar across machines. | toast0 wrote: | Re: single datacenter. At the basic level, you need a second | datacenter with enough machines to provide your service (or a | emergency version at least), replication of data, and a way | to switch traffic. It's doable, but expensive in capital and | development. If you're dependant on outsourced services, they | also need to be available from both datacenters and not | served from only one. In an ideal world, your two datacenters | would be managed by different companies, so you would avoid | any one company's global routing failure (IBM had one | recently). | | Re: multiple servers. Power supplies fail, memory modules | fail, cpus fail, fans fail, storage drives fail. Sometimes | those are correlated --- the HP SSDs that failed when the | power on hours hit a limit (two separate models) are going to | be pretty correlated if they were purchased new and stuck | into servers at a similar time and then on 24/7. Most of | those failures aren't that correlated though. Software | failures would be more likely to be correlated though, of | course. | | The key thing is to really think about what the cost for | being down is, how long is acceptable/desirable to be down, | and how much you're willing to spend to hit those goals. | YetAnotherNick wrote: | > In an ideal world, your two datacenters would be managed | by different companies, so you would avoid any one | company's global routing failure | | I can't understand this. I think transferring servers would | be the the least of problems. Its the transferring of | database and maintaining consistent version of databases in | both the locations. Moving the snapshots after every X | minutes doesn't maintain consistency. I would like to read | about any company that is able to do this, as honestly it | sounds really hard to me. Is there any writeup of IBM thing | you mentioned? | toast0 wrote: | Re: IBM outage | | https://news.ycombinator.com/item?id=23471698 | | TLDR is connectivity to and from the IBM cloud | datacenters (which includes softlayer) was generally | unavailable, globally, for a couple hours. If you were in | multiple IBM datacenters, you were as down as if you were | in only one (mostly, I was poking around when it was | wrapping up, and some datacenters came back earlier than | others). | | > Its the transferring of database and maintaining | consistent version of databases in both the locations. | Moving the snapshots after every X minutes doesn't | maintain consistency. I would like to read about any | company that is able to do this, as honestly it sounds | really hard to me | | The gold standard here is two-phase commit. Of course, | that subjects every transaction to delay, so people tend | not to do that. The close enough version is MySQL (or | other DB) replication, monitor that the replication | stream is pretty current and hope not a lot is lost when | a datacenter dies. There's room to fiddle with failover | and reconciliation; I recommend against automatic | failover for writes, because it gets really messy if you | get a split brain situation --- some of your hosts see | one write server available and others see another, and | you may accept conflicting writes. A few minutes running | like that can mean days or weeks of reconciliation, if | you didn't build for reconciliation. | wastedhours wrote: | We used to have one server for a website I was a content guy on | - it was in a standard PC case, plugged into a switch in the IT | team's office (this was not a tech-centered org). | | The main IT guy went on holiday and one of the cover guys from | another office decided to tidy up. He unplugged the server and | thought (and told me after his thought process) "if anyone was | using it, they'll let us know". | | This was the one, single box for the whole website - no one | else was monitoring (even though the central office had a | proper, dedicated web team) and the assumption was I was | sysadmin. | | An hour later I'm sprinting down the corridor to find out what | the hell happened and why I can't even SSH into the box. | | We put a sticker on the case saying not to unplug it after | that... | coldcode wrote: | I worked at my last job for a place with a single rack mounted | set of Windows servers at a data center - with no backup power | supply, no backups of any kind for that matter, no UPS and no | redundancy of any system, plus they didn't even have an admin | for 6 months. The CEO refused to spend money on a 2nd anything. | The company has 2000 employees. One server held all of the | companies photos (which is basically the core of the business) | and of course was not backed up. | elliekelly wrote: | This is the kind of company that could benefit immensely from | a ransomware attack. | Milank wrote: | Of course it can work, you can get far with one server and no | spending on anything like backups, UPS, etc. | | Whether it's smart and good for your business/reputation is a | different question. | closetohome wrote: | My boss refused to use UPSs for years because he bought one | once and couldn't get it to stop beeping. | misiti3780 wrote: | or even better, how do they apply OS patches? | galoisgirl wrote: | > Should have been a 5 minute job if done correctly. Owner | ended up paying for over 10 hours of work. Stupidest thing I've | ever had to do. | | You can see the common sense ship has sailed. | redwood wrote: | Remind me of how IBM positions mainframes: they are so highly | available that you simply never let them shut down. | lasereyes136 wrote: | IBM Mainframes are designed to be serviced while running so | if you have multiple CPUs you can offline one at a time for | upgrade it without the whole mainframe going down. Big Sun | Solaris boxes where built like at as well. | | If your mainframe had only one CPU, you did have to turn it | off in order to service it. But you could upgrade the OS | without turning it off. While they aren't cool tech now, | mainframes are a marvel of hardware engineering. | chasd00 wrote: | plus, i would imagine turning them on and bringing them | online isn't just a press of a button. | MrMorden wrote: | It's not. https://web.archive.org/web/20190324191654/https: | //www.ibm.c... | | (archive.org link because ibm.com apparently isn't hosted | on a mainframe.) | gear54rus wrote: | He should have taken it offline without notifying this brain- | dead manager. Probably wouldn't have noticed lol. | | And then charge for those 5 hours for good measure. | | In general, this stupid trend of wanting 0 downtime makes no | sense to me. If you're not NASA, police or other emergency | service you 100% can afford a few hours of downtime with | scheduling it be forehead. | webscalist wrote: | Could've been cheaper to buy/rent another server, put it on the | new location, set up redundancy/replication, power off the old | server, move it to the new location, return the new server. Or | just keep it for sanity. | tzs wrote: | I needed to restart a server where I worked. My boss was | complaining about the revenue loss during the down time. I knew | the revenue loss (if there even was any, as opposed to a couple | of minutes of revenue simply shifting to a few minutes later...) | would be well under a dollar. | | So I listened to him whine for a couple minutes, then tossed a | dollar on his desk, told him that would cover it so he could shut | up now, and rebooted the server. | | Warning: you should probably only try this if you are good | friends with your boss. That boss had been my best friend for | years before I came to work for his company. | PinguTS wrote: | It's been done 7 years ago even using public transport. | | https://www.reddit.com/r/uptimeporn/comments/1kf26r/moving_a... | jagermo wrote: | The most dangerous part is them expecting the 3G to be | available during the subway ride. | deftnerd wrote: | I'm surprised they weren't stopped by police to investigate a | very suspicious heavily loaded cart on the Subway. It easily | could have been 300lbs of explosives on that cart. | mercora wrote: | in Germany mobile networks work just fine in the subway as | ISPs have deployed hardware there. I actually have more | issues with the network when using classical railroad | transport... | csours wrote: | I really thought this post on HN was going to be that story. | Thanks for digging it up. | kuon wrote: | When I was younger (read 20 years ago), I did crazy things like | that, not over that long distance, but moving live servers in | different racks. | | Now that I am older, I don't think I would do it anymore, too | much stress for a small reward. Also today, most of the time, I | am able to "talk out" customers of crazy requirements, while I | would just have said "OK let's do it" in my younger years. | jfcorbett wrote: | Reminds me of the time where IT at a previous employer told us | that due to a "new IT strategy", our production cluster that had | been sitting comfortably in the basement for years had to be | moved to an "approved IT hub facility"... in another office 500 | km away and across the North Sea. | | There was downtime. | | Promptly after our cluster settled into this wonderful new | facility, a cooling pipe in the ceiling leaked on it, frying 1/3 | of our nodes. | yjftsjthsd-h wrote: | On a personal selfish level I was quite happy to see our | workloads moving to datacenters that we couldn't (reasonably) | physically access, because it replaced "can you go drive to the | DC and replace a failing disk" with "we put in the request for | smart hands to replace the failing disk". Of course, there's | some notable tradeoffs, but it makes me feel better when the | business decides to do such things... | merb wrote: | meanwhile in germany, german telekom have their connect ip lines | (leased lines..., company internet..) shutdown since tuesday | morning. so a downtime of over 48 hours, besides a sla that no | downtime will be longer than 8 hours and a availability of 99,9%. | | what a crazy world. | imglorp wrote: | The moving server on cart part made me nervous. If there was any | rotating rust in there, bouncing across the parking lot would | make things difficult for flying heads. I'd have hand carried it | from stage to stage, setting it on a padded cart each stop, | treating it like sweating TNT. | neya wrote: | Dude has ONE server and talks about having 0 downtime for his | clients? What the hell?! | | In a way, this is Darwinism for the IT industry and I'm happy the | people involved got paid well. Due probably paid as much as it | woulda costed him a new server. I bet he'll never forget this | lesson. | geocrasher wrote: | I once shut down a PC, moved it to another desk, and it wouldn't | power back on. Another time I moved a server to another rack. It | had 2 years uptime. Had to power it down, and it wouldn't power | back on. Both required PSU replacements. Had I moved them _while | powered on_ I can only imagine the fun times. | | Perhaps they should have just told the customer they couldn't | find it: | https://www.theregister.com/2001/04/12/missing_novell_server... | Johnny555 wrote: | Decades ago an ISP I was colocated at did the same thing. I don't | remember the exact details, but it was a DNS server and they | either couldn't log in or were relying on the zone files cached | in memory or something but for some reason they couldn't power it | off. | | It was already plugged into a UPS, but they had to cut one of the | posts off the rack to get the server out without unplugging it, | then they plugged that UPS into a bigger UPS on a cart and | wheeled it to the new data center they built out in the building | next door. | | The world was much different at the time -- this coloc provider | had a good reputation, yet.... they had a keg of beer in the | corner of the server room and a stack of adult magazines in the | men's room. | kyuudou wrote: | It's called vMotion ___________________________________________________________________ (page generated 2020-08-05 23:00 UTC)