[HN Gopher] I Accidentally Deleted 7TB of Videos Before Going to...
       ___________________________________________________________________
        
       I Accidentally Deleted 7TB of Videos Before Going to Production
        
       Author : thevinter
       Score  : 445 points
       Date   : 2022-05-05 10:00 UTC (13 hours ago)
        
 (HTM) web link (blog.thevinter.com)
 (TXT) w3m dump (blog.thevinter.com)
        
       | lnxg33k1 wrote:
       | But are you a junior dev with less than one year of experience
       | working by yourself alone at a company? No tech lead/help?
        
       | birdyrooster wrote:
       | Is 7TB a lot? Peers at personal arrays at orders of magnitude
       | greater.
        
       | iamben wrote:
       | I like these stories. I think they resonate well for 'the rest of
       | us'. I've made plenty of mistakes like this - you learn and grow,
       | right?
       | 
       | One of the best things about HN is that so many incredible,
       | talented people post. It's incredibly inspiring to raise your own
       | game, to see what the best are doing. But sometimes it's equally
       | important to realise we all fuck up, and for every unicorn dev
       | there's another thousand of us grinding away.
       | 
       | OP - well done for sorting the problem and telling us all about
       | it!
        
         | rossdavidh wrote:
         | Amen
        
       | JacobiX wrote:
       | > It involves bad practices and errors from multiple parties in a
       | world that might seem
       | 
       | > foreign to the "Silicon Valley" world but paints an accurate
       | picture of what
       | 
       | > development is for small IT companies around the world
       | 
       | Everybody makes mistakes even in the "Silicon Valley" world, but
       | such problems cloud be easily caught by testing (which he did but
       | it was restricted to the first page) and performing a simple dry-
       | run.
        
         | crispyambulance wrote:
         | Exactly, _everyone_ makes mistakes. Sometimes huge ones. In
         | hindsight or on the sidelines it 's always easy to point out a
         | few technical things that WOULD HAVE avoided catastrophe, but
         | does that help? I think not (aside from a cautionary parable
         | for interns).
         | 
         | Things are complicated, people are human and forget things,
         | there are pressures to "get it done" and override the
         | guardrails. Everybody has horror stories. Some worse than
         | others. Welcome to the OP's day of horror. I would think
         | "Silicon Valley" dev-ops horror stories make this one seem like
         | a triviality.
        
       | batch12 wrote:
       | It's like the first time you run                 rm -rf
       | /path/to/delete/ *
       | 
       | And realize it is taking too long...
        
         | SnowHill9902 wrote:
         | Can you explain? I feel like it removes / but not sure why.
        
           | switch007 wrote:
           | The error is the space before the asterisk. The original
           | intention was to delete the contents of the folder
           | /path/to/delete/. Instead, the asterisk enumerates files in
           | the current directory and they get deleted
        
           | KarlKode wrote:
           | Besides recursively deleting /path/to/delete/ the command
           | also deletes all (non hidden) content of the current
           | directory (note the * at the end of the line). I assume the
           | correct command would be /path/to/delete/*.
        
           | Tesl wrote:
           | It removes everything in the current directory
        
           | pwg wrote:
           | rm -rf /path/to/delete/ *
           | 
           | Note the space between the last / and the *
           | 
           | This will recursively remove the directory /path/to/delete
           | and remove every file/directory that matches * in the current
           | directory where 'rm' is being run.
           | 
           | When what was most likely meant was:                  rm -rf
           | /path/to/delete/*
           | 
           | Note the lack of a space between the last / and _. This will
           | remove all files that match_ that reside in the
           | /path/to/delete/ directory.
        
       | progx wrote:
       | Now you learned what a backup is.
        
         | lpointal wrote:
         | How can any enterprise only rely on such online services and
         | not keep copies of their job on their own storage ?
         | 
         | At least store in large TB hard disks connected with a SATA
         | adapter when needed, and put them in a case in a safe place
         | (better: two copies, stored in two places). What is the HD +
         | copy time price relatively to production work ?
        
       | stareatgoats wrote:
       | A great success story as far as I'm concerned, even if it doesn't
       | reflect well on Vimeo support. But a good reminder to have
       | someone doublecheck your logic if you aim to delete massive
       | amounts of data from production. And to check if the backups are
       | working (producing restorable data) on a regular basis. Sometimes
       | they just seem to be working, as I have learned the hard way...
        
       | legalcorrection wrote:
       | [deleted]
        
         | JauntyHatAngle wrote:
         | I'm baffled by this too. Unnecessary bridge burning I'd call
         | it.
         | 
         | It's not even necessary to the story.
        
           | dkersten wrote:
           | Its explained on the first line: " I'm a Junior Developer
           | with less than one year of actual experience. Some of the
           | things that might seem obvious to some might not be so for
           | me". I guess it applies to this, too, not just the technical
           | aspects.
        
             | dsego wrote:
             | I might've missed it, but I don't think that line existed
             | when this was first posted.
        
         | thevinter wrote:
         | You're right and I edited the company's name (might be too late
         | but better this way). That said I'm not very happy with the
         | experience of working for TheCompanyTM anyways so I'm in the
         | process of switching jobs.
         | 
         | Thanks for the comment :)
        
           | philliphaydon wrote:
           | I would take down the post entirelly.
           | 
           | Your current job is linked in your CV.
        
             | legalcorrection wrote:
             | And try emailing the hackernews mods asking them to take
             | this post down.
        
           | yowlingcat wrote:
           | As sibling comments indicate, I would advise emailing HN mods
           | to take this post down and remove it from your blog and post
           | it on an anonymous one. Here are the problems you will face:
           | 
           | 1) Your current blog has your current employer + client
           | linked to it. 2) Your github has your real name. 3) All of
           | these have been crawled/archived.
           | 
           | None of this bodes well for your career in the future. While
           | I think your blog post is a great war story, it's really not
           | a good idea to post it on your main account which can be
           | traced back to your real name and CV because it will come up
           | the next time you apply for a job.
           | 
           | Unfortunately, even if it illustrates a great deal of
           | ingenuity and creativity on your part in fixing a mess you
           | made, many folks will take one look at it and be judgmental.
           | You have to manage your reputation online and be careful.
        
           | legalcorrection wrote:
           | You're welcome and good luck!
        
           | KingOfCoders wrote:
           | Talking bad about your employer is great for finding a new
           | job. Companies are eager to hire people who bad-talk them.
        
             | breakfastduck wrote:
             | He doesn't talk bad about his employer. He talks bad about
             | his employers client.
        
               | mkr-hn wrote:
               | Tech is like any other human endeavor. People talk.
               | People change jobs and still like the people in the place
               | they left.
        
       | urbandw311er wrote:
       | Would you have had the courage to post this here if you hadn't
       | been able to fix it?
        
       | tomkwong wrote:
       | First, I want to say that this is a great post. You always grow
       | stronger when you make mistakes. Writing it up solidify
       | understanding in the learning process.
       | 
       | This story resonates with many people here because many
       | experienced engineers had done something similar before. For me,
       | destructive batch operations like this would be two distinct
       | steps:
       | 
       | 1. Identify files that need to be deleted; 2. Loop through the
       | list and delete them one by one.
       | 
       | These steps are decoupled so that the list can be validated. Each
       | step can be tested independently. And the scripts are idempotent
       | and can be reused.
       | 
       | Production operations are always risky. A good practice is to
       | always prepare an execution plan with detailed steps, a
       | validation plan, and a rollback plan. And, review the plan with
       | peers before the operation.
        
         | notyourday wrote:
         | > 1. Identify files that need to be deleted; 2. Loop through
         | the list and delete them one by one.
         | 
         | > These steps are decoupled so that the list can be validated.
         | Each step can be tested independently. And the scripts are
         | idempotent and can be reused.
         | 
         | This is the most underrated comment.
         | 
         | I'm saying it as someone who had the ultimate oversight of
         | deleting hundreds of TBs per day spread of billions of files on
         | different clouds and local storage.
        
       | dsego wrote:
       | > but at the time the code seemed completely correct to me
       | 
       | It always does.
       | 
       | > Well, it teaches me to do more diverse tests when doing
       | destructive operations.
       | 
       | Or add some logging and do a dry run and check the results,
       | literally simple prints statements:
       | print("-----")         print("Downloading videos ids from url:
       | {url}")         print(list of ids)         ...         ...
       | ...         # delete()  dangerous action commented out until I'm
       | sure it's right         print("I'm about to delete video {id}")
       | print("Deleted {count} videos") # maybe even assert         ...
       | 
       | Then dump out to a file and spot check it five times before
       | running for real.
        
         | aqme28 wrote:
         | Rather than commenting it out, I suggest adding a --live-run
         | flag to scripts and checking the output of --live-run=false (or
         | omitted) before you run it "live."
        
           | sdevonoes wrote:
           | But then you have double the chances of introducing a bug for
           | the specific scenario we are talking about:
           | 
           | Before: there is chance there is a bug in my "delete" use
           | case
           | 
           | Now: what we have before plus the change that there is a bug
           | in my "--live-run" flag
        
             | aqme28 wrote:
             | You can make automated tests for your flag. You can't make
             | automated tests for your code comments.
        
         | mbiondi wrote:
         | Agreed, I've also been burned doing stupid things like this and
         | always print out the commands and check them before actually
         | doing the commit.
         | 
         | As they say, measure twice, cut once.
         | 
         | Don't feel bad, I think every professional in IT goes through
         | something similar at one time or another.
        
         | lifthrasiir wrote:
         | Human-in-the-loop is so important concept in ops and yet
         | everyone (that's including me) seems to learn it the hard way.
        
         | GordonS wrote:
         | It's amazing the number of times I look at some simple code and
         | think "nah, this is so simple it doesn't need a test!", add
         | tests anyway (because I know I should)... and immediately find
         | the test fails because of an issue that would have been
         | difficult to diagnose in production.
         | 
         | Automated tests are awesome :)
        
         | dncornholio wrote:
         | Dry run really is key here. Most automated tests wouldn't find
         | this bug.
        
         | pc86 wrote:
         | I just want to say as someone currently working on a script to
         | delete approximately 3.2TB of a ~4TB production database, this
         | subthread is pure gold.
        
         | hayd wrote:
         | I'd make sure those include WARN or ERROR (I'd use logging to
         | do that), that way you can grep for those. Spot checking might
         | be difficult if the logs get long.
        
         | V__ wrote:
         | This was my first thought too. Another think I like to do, is
         | to limit the loop to say one page or 10 entries and check after
         | each run that it was correctly executed. It makes it a half-
         | automated task, but saves time in the long run.
        
         | hinkley wrote:
         | Condensed to aphorism form:                   Decide, then act.
         | 
         | There's a whole menagerie of failure modes that come from
         | trying to make decisions and actions at the same time. This is
         | but one of them.
         | 
         | Another of my favorites is egregious use of caching, because
         | traversing a DAG can result in the same decision being made
         | four or five times, and the 'obvious' solution is to just add
         | caches and/or promises to fix the problem.
         | 
         | As near as I can tell, this dates back to a time when
         | accumulating two copies of data into memory was considered a
         | faux pas, and so we try to stream the data and work with it at
         | the same time. We don't live there anymore, and because we
         | don't live there anymore we are expected to handle bigger
         | problems, like DAGs instead of lists or trees. These
         | incremental solutions only work with streams and sometimes
         | trees. They don't work with graphs.
         | 
         | Critically, if the reason you're creating duplicate work is
         | because you're subconsciously trying to conserve memory by
         | acting while traversing, then adding caches completely
         | sabotages that goal (and a number of others). If you build the
         | plan first, then executing it is effectively dynamic
         | programming. Or as you've pointed out, you can just not execute
         | it at all.
         | 
         | Plus the testing burden is so drastically reduced that I get
         | super-frustrated having to have this conversation with people
         | over and over again.
        
         | password4321 wrote:
         | SELECT COUNT(1) FROM table         -- UPDATE table SET
         | col='val'        WHERE 1=1
        
           | worble wrote:
           | BEGIN TRANSACTION          UPDATE table SET col='val' WHERE
           | 1=1         ROLLBACK
        
             | password4321 wrote:
             | Definitely better, when you can afford the overhead!
        
             | tomrod wrote:
             | Exactly!
        
         | mipmap04 wrote:
         | I do this, too, but I also take a count of the expected number
         | of items to be deleted as well. If my collection I'm iterating
         | over doesn't have exactly that number of objects I expect, I
         | don't proceed.
        
         | kortex wrote:
         | This is why I like to always write any sort of user-script
         | batch-job tools (backfills, purges, scrapers) with a "porcelain
         | and plumbing" approach: The first step generates a fully
         | declarative manifest of files/uris/commands (usually just json)
         | and the second step actually executes them. I've used a --dry-
         | run flag to just output the manifest, but I just read some
         | folks use a --live-run flag to _enable_ , with dry-run being
         | the default, and I like that much better so I'll be using that
         | going forward.
         | 
         | This pattern has the added benefit that it makes it really easy
         | to write unit tests, which is something often sorely lacking in
         | these sorts of batch scripts. It also makes full automation
         | down the line a breeze, since you have nice shearing layers
         | between your components.
         | 
         | http://www.laputan.org/mud/mud.html#ShearingLayers
        
           | InfoSecErik wrote:
           | I tend towards a --dry-run flag for creative actions and
           | --confirm for destructive actions. Probably sightly annoying
           | that the commands end up seemingly different, but it sure
           | beats accidentally nuking something important.
        
         | gilleain wrote:
         | Yes, I find command line tools that have a "--dry-run" flag to
         | be very helpful. If the tool (or script or whatever) is
         | performing some destructive or expensive change, then having
         | the ability to ask "what do you think I want to do?" is great.
         | 
         | It's like the difference between "do what I say" and "do what I
         | mean"...
        
           | bzxcvbn wrote:
           | That's what I like about powershell. Every script can include
           | a "SupportsShouldProcess" [1] attribute. What this means is
           | that you can pass two new arguments to you script, which have
           | standardized names across the whole platform:
           | 
           | - -WhatIf to see what would happen if you run the script;
           | 
           | - -Confirm, which asks for confirmation before any
           | potentially destructive action.
           | 
           | Moreover these arguments get passed down to any command you
           | write in your script that support them. So you can write
           | something like:
           | [CmdletBinding(SupportsShouldProcess)]         param
           | ([Parameter()] [string] $FolderToBeDeleted)
           | # I'm using bash-like aliases but these are really powershell
           | cmdlets!         echo "Deleting files in $FolderToBeDeleted"
           | $files = @(ls $FolderToBeDeleted -rec -file)         echo
           | "Found $($files.Length) files"         rm $files
           | 
           | If I call this script with -WhatIf, it will only display the
           | list of files to be deleted without doing anything. If I call
           | it with -Confirm, it will ask for confirmation before each
           | file, with an option to abort, debug the script, or process
           | the rest without confirming again.
           | 
           | I can also declare that my script is "High" impact with the
           | "ConfirmImpact = High" switch. This will make it so that the
           | user gets asked for confirmation without explicitly passing
           | -Confirm. A user can set their $ConfirmPreference to High,
           | Medium, Low, or None, to make sure they get asked for
           | confirmation for any script that declare an impact at least
           | as high as their preference.
           | 
           | [1]: https://docs.microsoft.com/en-
           | us/powershell/scripting/learn/...
        
             | spookthesunset wrote:
             | I'm a bit confused (because I didnt read the docs)... does
             | calling it with "--whatif" exercise the same code path as
             | calling without, only the "do destructive stuff"
             | automagically doesn't do anything? Or is it a separate
             | routine that you have to write?
             | 
             | Cause if it is an entirely separate code path, doesn't that
             | introduce a case where what you say you'll isn't exactly
             | what actually happens?
        
               | bzxcvbn wrote:
               | It's the first option. And yes, sometimes you have to be
               | careful if you want to implement SupportsShouldProcess
               | correctly, it's not something you can add willy-nilly.
               | For example, if you create a folder, you can't `cd` there
               | in -WhatIf mode.
        
           | FriedrichN wrote:
           | All my tools that have a possible destructive outcome use
           | either a interactive stdin prompt or a --live option. I like
           | the idea of dry running by default.
        
           | rjh29 wrote:
           | Going further, make it dry run by default and have an
           | --execute flag to actually run the commands: this encourages
           | the user to check the dryrun output first.
        
           | mmcclimon wrote:
           | The rule we have is that anything that is not idempotent and
           | not run as a matter of daily routine must dry-run by default,
           | and not take action unless you pass --really. This has saved
           | my bacon many times!
        
             | maweki wrote:
             | Deleting actually is idempotent. Doing it twice wont be
             | different from doing it once.
        
               | maccard wrote:
               | Deleting * may not be though. Your selection needs to be
               | idempotent.
        
               | maweki wrote:
               | idempotency means that f(X) = f(f(X)). Modifying the X
               | inbetween is not allowed. Is there really an initial
               | environment where rm * ; rm * ; does something different
               | than rm * once?
        
               | einsty wrote:
               | In the case of any live system, i would say yes.
               | Additional, and different, files could have appeared on
               | the file system in between the times of each rm *.
        
               | mikeryan wrote:
               | * is just short hand for a list of files. Calling rm with
               | the same list of files will have the same results if you
               | call it multiple times. That's idempotent.
               | 
               | Your example is changing the list of files, or arguments
               | to rm between runs. Same as pc85's example where the
               | timestamp argument changes.
        
               | pc86 wrote:
               | In addition to what einsty said (which is 100% accurate),
               | if you're deleting aged records, on any system of
               | sufficient size objects will become aged beyond your
               | threshold between executions.
        
               | jameshart wrote:
               | Right. You can kind of consider the state of a filesystem
               | on which you occasionally run rm * purges to be a system
               | whose state is made up of 'stuff in the filesystem' and
               | 'timestamp the last purge was run'.
               | 
               | If you run rm * multiple times, the state of the system
               | changes each time because that 'timestamp' ends up being
               | different each time.
               | 
               | But if instead you run an rm on files older than a fixed
               | timestamp, multiple times, the resulting filesystem is
               | idempotent with respect to that operation, because the
               | timestamp ends up set to the same value, and the
               | filesystem in every case contains all the files added
               | later than that timestamp.
        
               | hansel_der wrote:
               | > Is there really an initial environment where rm * ; rm
               | * ; does something different than rm * once?
               | 
               | if * expands to the rm binary itself, maybe.
        
               | maweki wrote:
               | How is the system different after the first and after the
               | second call?
        
               | jgoldshlag wrote:
               | If there is an rm executable in the current directory,
               | and also one later in your PATH, the second run might use
               | a different rm that could do whatever it wants to
        
             | zrail wrote:
             | Early in my career I used --yes-i-really-mean-it and then a
             | coworker removed it with the commit message "remove
             | whimsy".
             | 
             | T'was a sad day.
        
         | inglor_cz wrote:
         | Yeah, that is what I recommend too.
         | 
         | Instead of performing the dangerous action outright, just log a
         | message to screen (or elsewhere) and watch what is happening.
         | 
         | Alternatively, or subsequently, chroot and try that stuff on
         | some dummy data to see if it actually works.
        
         | thunderbong wrote:
         | That is called experience.
         | 
         | Good decisions come from experience. Experience comes from
         | making bad decisions.
        
         | dkersten wrote:
         | I was involved with archiving of data that was legally required
         | to be retained for PSD2 compliance. So it was pretty important
         | that the data was correctly archived, but it was just as
         | important that it was properly removed from other places due to
         | data protection.
         | 
         | This is basically the approach that was taken: log before and
         | after every action exactly what data or files is being acted on
         | and how. Don't actually do it. Then have multiple people
         | inspect the logs. Once ok'd, run again, with manual prompts
         | after each log item asking to continue, for the first few
         | files/bits of data. Only after that was ok'd too did it run the
         | remainder.
         | 
         | In other things I've worked on, I've taken the terraform-style
         | plan first, then apply the plan approach, with manual
         | inspection of the plan in between.
        
           | dredmorbius wrote:
           | mv then rm is another idiom. So long as you have the space.
           | 
           | For database entries, flag for deletion, then delete.
           | 
           | In the files case, the move or rename also accomplishes the
           | result of breaking any functionality which still relies on
           | those file ... whilst you can still recover.
           | 
           | Way back in the day I was doing filesystem surgery on a Linux
           | system, shuffling partitions around. I meant to issue the 'rf
           | -rm .' in a specific directory, I happened to be in root.
           | 
           | However ...
           | 
           | - I'd booted a live-Linux version. (This was back when those
           | still ran from floppy).
           | 
           | - I'd mounted all partitions _other_ than the one I was
           | performing surgery on  '-ro' (read-only).
           | 
           | So what I bought was a reboot, and an opportunity to see what
           | a Linux system with an active shell, but no executables,
           | looks like.
           | 
           | Plan ahead. Make big changes in stages. Measure twice (or 3,
           | or 10, or 20 times), cut once. Sit on your hands for a minute
           | before running as root. Paste into an editor session (C-x C-e
           | Readline command, as noted elsewhere in this thread).
           | 
           | Have backups.
        
             | marcosdumay wrote:
             | You mean cp then rm?
             | 
             | And yes, copy, verify, delete. And make sure by the code
             | structure that you either do the three on the same files,
             | or their fail.
             | 
             | Also, do it slowly, with just a bit of data on each
             | iteration. That will make the verification step more
             | reliable.
             | 
             | Anyway, for a huge majority of cases, only having backups
             | is enough already. Just make sure to test them.
        
               | andi999 wrote:
               | I think mv then rm is probably meant as 'windows trash
               | bin' style.
        
           | csours wrote:
           | Make a plan, check the plan, [fix the plan, check the plan
           | (loop)], do the plan
           | 
           | See PDCA for more a more time critical decision loop.
           | https://en.wikipedia.org/wiki/PDCA
        
           | zeristor wrote:
           | Yes, I love the idea of the Plan Apply.
        
           | crispyambulance wrote:
           | > ... Then have multiple people inspect the logs. Once ok'd,
           | run again, with manual prompts after each log item asking to
           | continue...
           | 
           | This sort-of reminds me of some "critical" work I had to do a
           | couple of decades ago. I was in a shop that used this
           | horrifically tedious tool for designing masks for special
           | kinds of photonic devices-- basically it was tracing out
           | optical waveguides that would be placed on a crystal that was
           | processed much like a silicon IC.
           | 
           | The process was for TWO of us to sit in front of computer and
           | review the curves in this crazy old EDA layout tool called
           | "L-edit" before it got sent to have the actual masks made
           | (which were very expensive). It took HOURS to check
           | everything.
           | 
           | The first hour was tolerable but then boredom started to
           | creep in and we got sloppy. The whole reason TWO people got
           | tasked with this was because it was thought that we would
           | keep each other focused-- 2 pairs of eyes are better than
           | one, right?. Instead, it just underscored the tedium of it
           | all. One day someone walked in and found us BOTH in DEEP
           | SLEEP in front of the monitor. Having two people didn't
           | decrease the waste caused by mistakes, it just bored the hell
           | out of more people.
        
             | foota wrote:
             | How many mistakes did you catch?
        
               | Freestyler_3 wrote:
               | From his story I can tell he found one big mistake. The
               | tedious work itself.
        
           | mmmm2 wrote:
           | Another good approach is do deletions slowly. Put sleeps
           | between each operation, and log everything. That way if you
           | realize something is broken, you have a chance of catching it
           | before it's too late.
        
           | water8 wrote:
           | It never hurts to ask for another set of eyes to review. At
           | the least if something goes awry, the blame isn't solely on
           | you.
        
           | tauwauwau wrote:
           | Once we get used to doing same thing multiple times a day, it
           | doesn't matter if the log shows that we're about to take a
           | destructive action, we'll still do it. Only thing that is
           | foolproof is to not take the destructive action because
           | people make mistake, it's human nature. I don't know how this
           | can be implemented, may be encrypt the files, take a backup
           | in some other location (which may not be allowed).
           | 
           | Multiple reviewers here didn't catch the mistake
           | 
           | https://www.bloombergquint.com/markets/citi-s-900-million-
           | mi...
        
             | dkersten wrote:
             | > Multiple reviewers here didn't catch the mistake
             | 
             | Sure, but we can only do so much. I find its good bang for
             | buck and alternatives that might prevent that are not
             | always available, so we do the best we can. You gotta make
             | a call on whether its enough or not.
        
             | slaymaker1907 wrote:
             | I'm a fan of doing things temporally so data is very rarely
             | actually deleted from the database. Most of the time, you
             | just update the "valid_to" field to the current time.
             | Sometimes real deleted are required such as with privacy
             | requests, but I think that sort of thing is pretty rare.
             | 
             | If your application has space concerns, you can modify this
             | approach to be like a recycle bin where you delete records
             | which are no longer valid and have been invalid for over a
             | month (or whatever time frame is appropriate for your
             | application). However, I think this is unnecessary in most
             | cases except for blob/file storage.
        
             | Danieru wrote:
             | That form had a couple weird checkboxes with odd wording.
             | It is a famous mistake, but also rather understandable just
             | because the form was cryptic.
        
             | irrational wrote:
             | Because everyone assumes that everyone else is looking at
             | it more closely than they are. "I'll just do a cursory look
             | since I'm sure everyone else is doing a in-depth look."
             | Narrator: nobody did an in-depth search.
        
             | HowardStark wrote:
             | While this is a huge issue, a solution (well, a partial
             | mitigation) I've seen and used is the "Pointing and
             | Calling" technique. The basic idea is that you incorporate
             | more actions beyond reading and typing or pressing a button
             | --generally by having people point at something and say
             | aloud what it is they're doing and what they expect to
             | happen.
             | 
             | It's used rather extensively in safety-critical public
             | transportation in Japan [1] and to a lesser extent in New
             | York (along with many other countries) [2]. This can easily
             | extend to software without overcomplicating by just setting
             | the expectation that engineers, Q&A, etc. do this even when
             | alone.
             | 
             | [1] https://www.atlasobscura.com/articles/pointing-and-
             | calling-j...
             | 
             | [2] https://en.wikipedia.org/wiki/Pointing_and_calling
        
               | emerged wrote:
               | "I'm removing that semicolon!" (Pointing)
        
               | bbarnett wrote:
               | Parent meant this sort of pointing.
               | 
               | https://t.co/TjfX5K54H7
        
               | akavel wrote:
               | I heard of this technique, but unfortunately I don't see
               | how it can be easily applied in software
               | engineering/devops.
               | 
               | Also, I now realized that aviation checklists seem to
               | tend to be done similarly with gestures - at least from
               | what I saw on YouTube, not sure if that's representative
               | or only used during education (?)
        
               | samus wrote:
               | Spelling out loudly the command you are about to execute
               | and explaining the reasoning behind it can help a lot
               | too.
        
               | samhw wrote:
               | Hell, GitHub does that to an extent, with the "type the
               | name of this repository to delete it" prompts. Typing the
               | name of the repository isn't exactly perfect, but it's an
               | interesting direction.
        
               | Blackcatmaxy wrote:
               | There was a thread recently about a repo that
               | accidentally went private and lost all of its stars
               | because of confusion with GH teams vs GH profile readme
               | repo naming. I think this type of prompt is very useful
               | for explicitly preventing the rare worst case scenarios
               | but the problem is making any type of prompt "routine" so
               | that our brains fail to process it.
        
               | lostlogin wrote:
               | This is it I think.
               | https://news.ycombinator.com/item?id=31033758
        
               | swid wrote:
               | The suggestion in that post about how to fix it is good,
               | and mirrors one I read in the Rachael by the Bay blog -
               | type the number of machines to continue:
               | 
               | https://rachelbythebay.com/w/2020/10/26/num/
               | 
               | The take away by both is there is actually something to
               | do which can wake people up when the stakes are high, and
               | they might not be doing what they expect.
        
               | oauea wrote:
               | And most importantly, don't let yourself get into the
               | habit of copy pasting the value
        
               | underwater wrote:
               | I wonder if your could print some non visible characters
               | in there to taint the copied value in some detectable
               | way.
        
               | skrtskrt wrote:
               | I always copy-paste into that box as well, they should
               | probably make at least an attempt at disabling pasting
               | into it
        
           | JadeNB wrote:
           | > Then have multiple people inspect the logs.
           | 
           | I think that this is the most important part of any check.
           | Your parent refers to checking the log five times, but, at
           | least in my experience, I won't catch any more errors on the
           | fifth time than the first--if I once saw what I expected
           | rather than what was there, I'll keep doing so. Of course
           | everyone has their blind spots, but, as in the famous Swiss-
           | cheese approach, we just hope that they don't line up!
        
         | veltas wrote:
         | Yep, even writing a simple wildcard at command-line I will
         | 'echo' before I 'rm'.
        
           | pjerem wrote:
           | On computers I own, I always install "trash-cli" and i even
           | created an alias for rm to trash. It's like rm, but it goes
           | to the good old trash. It will not save your prod but it's
           | pretty useful on your own computer at least.
        
         | sam0x17 wrote:
         | Indeed. I would say that framework or even language-level
         | support for putting things in "dry-run" mode is something
         | sorely missed from many modern frameworks and languages, that
         | old C libraries used to do.
        
         | OrwellianTimes wrote:
         | Experience is the best teacher(tm)
        
         | rawgabbit wrote:
         | To ensure that the files are actually are downloaded (step1),
         | before deleting the original (step2). I would make make step1
         | an input to step2. That is step2 cannot work without step1.
         | Something like:                   (step1) Download video from
         | URL.  Include the Id in the filename.         (step2) Grab the
         | list of files that have been downloaded and parse to get the
         | Id.  Using the Id, delete the original file.
        
         | bambax wrote:
         | Yes. Also, maybe not have a delete action in the middle of a
         | script. It's usually better to build a list of items to be
         | deleted. In that case, two lists: items to be deleted, items to
         | be kept. Then compare the lists:
         | 
         | - make sure the sum of their lengths == number of total current
         | items
         | 
         | - make sure items_to_be_kept.length != 0
         | 
         | - make sure no two items appear in both lists
         | 
         | - check some items chosen at random to see if they were sorted
         | in the correct list
         | 
         | At this point the only possible mistake left is to confuse the
         | lists and send the "to_be_kept" one to the delete script; a dry
         | run of the delete list can be in order.
        
           | pc86 wrote:
           | I've had good success with this approach, have two distinct
           | scripts generate the two lists, then in addition to your
           | items here also checking that every item appears in one of
           | the lists.
        
           | ectopod wrote:
           | This. The original approach can fail horribly if there's a
           | problem on the server when you run the script for real. Your
           | code can be perfect but that's no guarantee the server will
           | always return what it ought to.
        
           | ufo wrote:
           | What do you recommend, to not get intro trouble if there are
           | spaces or newlines in the file names?
        
             | marcosdumay wrote:
             | Try not to delete stuff with Bash.
             | 
             | This is the most reliable way. Bash has a few niceties for
             | error handling, but if you are using them, you would
             | probably fare better in another language.
             | 
             | If you do insist on Bash, quote everything, and use the
             | "${var}" syntax instead of "$var". Also, make sure you
             | handle every single possible error.
        
               | ricardobeat wrote:
               | `set -e` will abort on any error, anywhere in the
               | pipeline. It's a must for any critical script.
        
             | kevinmgranger wrote:
             | Don't use a shell script.
        
               | ufo wrote:
               | Do you mean, always pass the list directly to the next
               | script via function calls, without writing it to an
               | intermediate file / pipeline?
        
               | plonk wrote:
               | Yes, use the list argument to Python's subprocess.run for
               | example. It's much easier to not mess up if your
               | arguments don't get parsed by a shell before getting
               | passed.
        
         | mkr-hn wrote:
         | This sounds like a "do nothing script."
         | 
         | https://news.ycombinator.com/item?id=29083367
         | 
         | It defaults to not doing anything so you can gradually and
         | selectively have it do something.
         | 
         | Learned about when I posted my command line checklist tool on
         | HN: https://github.com/givemefoxes/sneklist
         | 
         | (https://news.ycombinator.com/item?id=25811276)
         | 
         | You could use it to summon up a checklist of to-dos like "make
         | sure the collection in the dictionary has the expected number
         | of values" before a "do you want to proceed? Y/n"
        
         | jagged-chisel wrote:
         | This is how I do it in compiled code. In shell, I print the
         | destructive command for dry runs - no conditions around whether
         | to print or not, I go back to remove echo and printf to
         | actually run the commands.
        
         | zrail wrote:
         | Another technique that I've used with good success is to write
         | a script that dumps out bash commands to delete files
         | individually. I can visually inspect the file, analyze it with
         | other tools, etc and then when I'm happy it's correct just
         | "bash file_full_of_rms.sh" and be confident that it did the
         | right thing.
        
           | cruano wrote:
           | That was our SOP for running DELETE SQL commands on
           | production too, a script that generates a .sql that's run
           | manually. It saved out asses a fair amount of times
        
             | ineedasername wrote:
             | Yeah, wish I'd learned that the easy way. Fresh into one of
             | my first jobs I was working with a vendor's custom
             | interface to merge/purge duplicate records. It didn't have
             | a good method of record matching on inserts from the
             | customer web interface so a large % of records had
             | duplicates.
             | 
             | Anyway, I selected what I though was a "merge all
             | duplicates" option without previewing results. What I had
             | _actually_ done was  "merge all selected". So, the system
             | proceeded to merge a very large % of the database... Into
             | One. Single. Record.
             | 
             | Luckily the vendor kept very good backups, and so I kept my
             | job. Because I also luckily had a very good boss and I had
             | already demonstrated my value in other ways, he just asked
             | me "Well, are you going to make that mistake again?". I
             | wisely said no, and he just smiled and said "Then I think
             | we're done here."
             | 
             | I have been particularly fortunate throughout my career to
             | have very good managers. As much as managers get a lot of
             | flack here on HN, done well they are empowering, not a
             | hindrance, and I attribute a lot of success in my career to
             | them.
        
               | JadeNB wrote:
               | > Yeah, wish I'd learned that the easy way.
               | 
               | I think that, if you've only learned something like that
               | the easy way, then you haven't learned it yet. As long as
               | everything's only ever gone right, it's easy to think,
               | I'm in a rush this one time, and I've never really needed
               | those safety procedures before, ....
        
             | karlding wrote:
             | At a previous job the DB admin mandated that everyone had
             | to write queries that would create a temporary table
             | containing a copy of all the rows that needed to be
             | deleted. This data would be inspected to make sure that it
             | was truly the correct data. Then the data would be deleted
             | from the actual table by doing a delete that joined against
             | the copied table. If for some reason it needed to be
             | restored, the data could be restored from the copy.
        
           | XorNot wrote:
           | At the point you're doing this, you should be using a proper
           | programming language with better defined string handling
           | semantics though. In every place it comes up you'll have
           | access to Python and can call the unlink command directly and
           | much more safely - plus a debugging environment which you can
           | actually step through if you're unsure.
        
             | zrail wrote:
             | Eh, I think that misses the point a bit. Use whatever you
             | want to generate the output, but make the intermediary
             | structure trivial to inspect and execute. If you're
             | actually taking the destructive actions within your
             | complicated* logic then there's less room to stop, think,
             | and test.
             | 
             | You could always generate an intermediary set,
             | inspect/test/etc, and then apply it with Python. I've done
             | that too, works just as well. The important thing is to
             | separate the planning step from the apply step.
             | 
             | * where "complicated" means more complicated than, for ex,
             | `rm some_path.txt` or `DELETE FROM table WHERE id = 123`.
        
           | KMnO4 wrote:
           | Ah, I'm glad I'm not the only one who did this. It also means
           | that you can fix things when they break halfway. Say you get
           | an error when the script is processing entry 101 (perhaps
           | it's running files through ffmpeg). Just fix the error and
           | delete the first 100 lines.
        
           | hinkley wrote:
           | I tend to write one script that emits a list of files, and
           | another that takes a list of files as arguments.
           | 
           | It's simple to manually test corner cases, and then when
           | everything is smooth I can just                   script1 |
           | xargs script2
           | 
           | It's also handy if the process gets interrupted in the
           | middle, because running script1 again generates a shorter
           | list the second time, without having to generate the file
           | again.
           | 
           | When I'm trying to get script1 right I can pipe it to a file,
           | and cat the file to work out what the next sed or awk script
           | needs to be.
        
           | francis-io wrote:
           | This was taught to me in my first linux admin job.
           | 
           | I was running commands manually to interact with files and
           | databases, but was quickly shown that even just writing all
           | the commands out, one by one gives room personally review and
           | get a peer review, and also helps with typos. I could ask a
           | colleague "I'm about to run all these commands on the DB, do
           | you see any problem with this?". It also reduces the blame if
           | things go wrong if it managed to pass approval by two
           | engineers.
           | 
           | While I'm thinking back, another little tip I was told was to
           | always put a "#" in front of any command I paste into a
           | terminal. This stops accidentally copying a carriage return
           | and executing the command.
        
             | koolba wrote:
             | > This stops accidentally copying a carriage return and
             | executing the command.
             | 
             | For a one-liner sure, but a multi line command can still be
             | catastrophic.
             | 
             | Showing the contents of the clipboard in the terminal
             | itself (eg via xclip) or opening an editor and saving the
             | contents to a file are usually better approaches. The
             | latter let's you craft the entire command in the editor and
             | then run it as a script.
        
               | afiori wrote:
               | From [0]:
               | 
               | [For Bash] Ctrl + x + Ctrl + e : launch editor defined by
               | $EDITOR to input your command. Useful for multi-line
               | commands.
               | 
               | I have tested this on windows with a MINGW64 bash, it
               | works similarly to how `git commit` works; by creating a
               | new temporary file and detecting* when you close the
               | editor.
               | 
               | [0] https://github.com/onceupon/Bash-Oneliner
               | 
               | * Actually I have no idea how this works; does bash wait
               | for the child process to stop? does it do some posix
               | filesystem magic to detect when the file is "free"? I
               | can't really see other ways
        
               | mh- wrote:
               | It does create and give a temporary file path to the
               | editor, but then simply waits for the process to exit
               | with a healthy status.
               | 
               | Once that happens, it reads from the temporary file that
               | it created.
        
             | remram wrote:
             | The 'enable-bracketed-paste' setting is an easier and more
             | reliable way to deal with that:
             | https://unix.stackexchange.com/a/600641/81005
             | 
             | It will prevent any number of newlines from running the
             | commands if they're pasted instead of typed.
             | 
             | You can enable it either in .inputrc or .bashrc (with `bind
             | 'set enable-bracketed-paste on'`)
        
         | ineedasername wrote:
         | _> literally simple prints statements_
         | 
         | Yes, that can be a simple but powerful live on screen log. I
         | developed a library to use an API from a SaaS vendor, in much
         | the same way as the author. It was my first such project & I
         | learned the hard way (wasted time, luckily no data loss or
         | corruption) that print() was an excellent way to keep tabs on
         | progress. On more than one occasion it saved me when the
         | results started scrolling by and I did an _oh sh*t!_ as I
         | rushed to kill the job.
        
         | krono wrote:
         | The No. 2 philosophy!
         | 
         | Make sure you got everything out and off before you pull up
         | your pants, or else you better be prepared to deal with all the
         | shit that might follow!
        
       | ElCapitanMarkla wrote:
       | Nice work :D I tend to always add a `--dryrun` flag to any
       | scripts like this these days so that when we move it to
       | production we can run an extra test there just to be sure.
        
       | mikotodomo wrote:
       | > Some of the things that might seem obvious to some might not be
       | so for me, thanks!
       | 
       | > my mind thought that url would refresh itself as soon as the
       | page variable changed
       | 
       | This is what I thought too when I read the code. I don't think
       | it's obvious at all!
        
         | xmprt wrote:
         | That's actually surprising to me. In most languages that I've
         | worked with, strings are immutable so the fact that url doesn't
         | update is more obvious to me and I'd be surprised if it did
         | update.
        
       | shantnutiwari wrote:
       | What negativity and arrogance in the comments here. Jeez, it's
       | like no one HN ever made a mistake, a bunch of 10xers ninja
       | programmers here. Please read this:
       | 
       | >I also want to preface this whole post by saying that I'm a
       | Junior Developer with less than one year of actual experience.
       | Some of the things that might seem obvious to some might not be
       | so for me, thanks!
       | 
       | It's just some kid sharing a mistake they made and owning up.
       | Ease up on the "LOL what an idiot" attitude
        
         | nicbou wrote:
         | More importantly, this person is helping us learn from their
         | mistake. This is something that should be encouraged, not
         | mocked.
        
         | JacobiX wrote:
         | Just to be fair also to some commenters, I think that the post
         | had been edited after posting from what I remember ... so maybe
         | the older comments are not very relevant.
        
           | thevinter wrote:
           | To clarify, I only removed the company name and added the top
           | disclaimer
        
             | [deleted]
        
         | [deleted]
        
         | noufalibrahim wrote:
         | I think it was a great post. Reveals a knack for clarity in
         | explanations. The mistake is simple enough and natural for a
         | junior. If it were just one video or something, it would
         | probably not even be noteworthy. I think the developer learned
         | from the incident too. So all good.
         | 
         | I do think Vimeo was irresponsible in the whine affair though.
        
         | snowwrestler wrote:
         | I'm impressed by their commitment to automation. If that was
         | me, once I realized that manually uploading from Gdrive to
         | Vimeo would fix the problem, I probably would have just
         | committed myself to manually doing that all weekend. It would
         | feel safer and serve as a sort of penance for screwing up the
         | automation the first time.
         | 
         | But nope, they went right back to scripting and got it done.
        
         | KrishnaShripad wrote:
         | I have done a lot of such blunders myself. Accidentally deleted
         | my unchecked code and had to re-write everything from memory.
         | 
         | I envy those who claim to do no mistakes at all.
        
           | boygobbo wrote:
           | Don't envy them - they are deluding themselves.
        
           | aeroplanetext wrote:
           | I've been there! At least when you write it the second time
           | it goes more quickly.
        
         | FunnyLookinHat wrote:
         | I was actually really impressed with this individual! For
         | someone who has less than a year of experience, they're showing
         | quite a bit of initiative, drive, and curiosity - which really
         | are what make or break engineers as they develop. Taking the
         | time to do a blog post (effectively a post-mortem) and share it
         | is even better!
         | 
         | And yes - I've literally done this exact same error (with TB of
         | video data!). Spending the following week remediating all of
         | that data loss was a great lesson in patience and attention to
         | detail. :-)
         | 
         | OP: If you're ever looking for a job be sure to send me a
         | message. Contact info in profile.
        
           | Moru wrote:
           | My mistake was on floppy disc with source code, other text
           | files and images. Was hand editing (in hex disc editor) the
           | floppy to get back the data, sector by sector. Fun times. Not
           | going back there though :-)
        
             | nso wrote:
             | Mine was a DELETE FROM Users; WHERE... Fun was had.
        
               | codegeek wrote:
               | Usually the recommendation is to not start writing the
               | DELETE query first. Write the SELECT query first and see
               | the results. If you miss the WHERE clause, you will see
               | that immediately. Then change SELECT * to DELETE. But I
               | assume you have learned that lesson already :)
        
               | Moru wrote:
               | Yes, but it can't be stressed enough, always the first
               | time for someone.
        
           | tasuki wrote:
           | Wrt "less than one year of experience", looking at Nikita's
           | CV and GitHub, despite the title, they aren't really a junior
           | developer :)
        
             | franciscop wrote:
             | True, he's been teaching programming since at least 2018, I
             | was in a similar boat where I'd been programming for almost
             | 5-7 years for fun and profit before my first official
             | fulltime job.
        
         | [deleted]
        
         | 692 wrote:
         | there's an argument that the best people around are the people
         | who have already (or almost) made some big mistakes.
         | 
         | I have made a couple of huge ones - luckily I kept my job
        
           | comprev wrote:
           | When interviewing candidates I always enquire about their
           | professional mistakes. Their reply often is the decider
           | between hiring/rejecting.
           | 
           | I want to have colleagues who admit fault, be truthful about
           | actions which lead to the issue, and learn from it. The
           | learning includes organisations perhaps putting additional
           | measures in place to prevent future issues.
           | 
           | One candidate told of a story how he was On-Call early in his
           | career and was told situations happened so rarely, just to
           | continue living life as normal.
           | 
           | Unfortunately for him, his pager went off at 02:00am while he
           | was high as a kite on drugs - but felt he had to take action
           | (mostly due to arrogance!).
           | 
           | He promptly deleted production data and things only got worse
           | when he tried to rectify the situation.
           | 
           | Of course he was fired for his actions but ever since he's
           | been stone cold sober when on-call.... just in case.
           | 
           | He learned a valuable lesson about professional
           | responsibilities.
        
             | vsareto wrote:
             | >When interviewing candidates I always enquire about their
             | professional mistakes.
             | 
             | "You see, my biggest mistake was programming in the first
             | place! Since then, it's just been an apology tour"
        
             | avgcorrection wrote:
             | It's funny how so many managers on this board are like,
             | yeah I focus disproportionately much on this one factor.
             | Why? Because my intuition and experience says so.
        
             | DoubleDerper wrote:
             | Don't fire for the mistake. Fire for the inability of
             | someone to own it, cover it up, or point fingers at others.
        
               | comprev wrote:
               | His honesty of admitting to being off his nut while on-
               | call led to his firing, not the action of deleting
               | things.
        
               | BolexNOLA wrote:
               | >His honesty of admitting to being off his nut
               | 
               | This now my favorite euphemism for being high
        
         | YorickPeterse wrote:
         | I currently have about 12 years of experience, and a few years
         | back I accidentally cleaned up GitLab's database a bit too
         | well. I wouldn't be surprised if the people being dismissive
         | simply never worked on a moderately complex and large system,
         | and thus don't understand how easy it is to make these kinds of
         | mistakes.
        
         | nspattak wrote:
         | LOL!
         | 
         | I have multiple years of experience than this man and still I
         | could *very* *too* *easily* make a 7Tb mistake (or likely more
         | :P )
        
         | grumple wrote:
         | This sort of mistake happens all the time when you write in
         | multiple languages. A key solution is code review, a standard
         | practice which doesn't seem to have happened here (and
         | certainly isn't the fault of a junior).
        
       | [deleted]
        
       | aristus wrote:
       | Hey, everyone, ease up. I have: 1) dropped a production database
       | because I thought it was the test database. 2) screwed up a print
       | job costing $100,000 in today's money and had to do it again 3)
       | crashed all of Facebook with a C++ bug. 4) crashed Facebook photo
       | uploads, with a JavaScript bug, in my first month. 5) literally
       | killed a startup's cash flow and caused them to lose their
       | merchant account because I over focused on the wrong bugs.
        
         | paintman252 wrote:
         | You worked at Facebook, we get it
        
         | hbn wrote:
         | At my first development job (paid internship at a moderately-
         | sized, though fast-growing business - maybe 300 people at the
         | time?) I introduced a bug that didn't appear until a certain
         | microservice stopped working (my code defaulted in the wrong
         | direction when the ms failed) and as far as I can tell they may
         | have lost or almost lost a pretty big account from it. In an
         | after-hours meeting regarding the issue, one of the higher ups
         | ended up storming out and never showing up again.
         | 
         | In my defence, we had to get 2 PR approvals before anything was
         | merged! But I definitely learned a thing or two from that
         | experience
        
         | [deleted]
        
       | JasonFruit wrote:
       | I believe if we're honest, we've all done stupid things we should
       | have avoided. I remember a group of about 3000 emails that went
       | out to insurance agents saying that policy #123456789 for Someone
       | Funky was going to be cancelled by underwriting. I also remember
       | very quickly figuring out how to automate Outlook's email recall
       | feature.
       | 
       | We've all made big dumb mistakes. Recover and learn.
        
       | hexsprite wrote:
       | when doing migrations/conversions I always write a script in dry-
       | run mode first. I exhaustively check the results to make sure
       | they are expected. Then try to do a real conversion/transfer of
       | only the 1st file and make sure that worked. Then do a couple
       | more. Etc. Only then do I feel confident to do the whole thing.
        
       | uptown wrote:
       | Junior Dev: "I'm under an NDA"
       | 
       | Also Junior Dev: "Here's my source code"
        
       | [deleted]
        
       | bufferoverflow wrote:
       | Always do a dry run when deleting many things with code.
       | 
       | - Captain Obvious
        
       | mastazi wrote:
       | > Vimeo doesn't provide an easy way of doing it. I wrote to the
       | support team around October asking them if it was possible to do
       | a migration, and they told us that they "will look into it"
       | without letting us know anything ever since. [...] At one point,
       | without letting us know anything, Vimeo decided it was a great
       | idea to comply with our request and dumped all the videos present
       | on OTT onto the new platform. No questions were asked [...] they
       | were duplicating videos that were already uploaded.
       | 
       | Oh yes Vimeo, the crappy company that won't let you play videos
       | unless you enable autoplay in your browser[1].
       | 
       | Selecting them as a provider was the actual mistake.
       | 
       | [1] https://askubuntu.com/questions/777489/vimeo-video-not-
       | playi...
        
       | 0xbadcafebee wrote:
       | This is more common than you think. Not just losing data, but not
       | having a good handle on where the important parts of the system
       | are, and how close you are to catastrophe. I find diagrams really
       | help. I can recall a visual map of the system when I work on some
       | component, and think, "OH, I remember seeing this component
       | connected to a really critical thing, I need to check something
       | first."
       | 
       | Start by creating one empty page for every component of your
       | system. You won't remember them all, but over time you can add
       | missing ones. Each page is the authoritative source of info on
       | that component. If you need more pages for one component, put
       | them in a directory of the same name as the page and add ".d" to
       | the directory name, and link to them from the first page.
       | Finally, create a diagram (however you want) that includes every
       | component you have a page for. Add the count of components to the
       | top of the diagram. If the count on the diagram doesn't match the
       | number of documents, time to update the diagram. If you ever add,
       | remove or rename a page, time to update the diagram. If you do
       | this the same way for every different system you have, you can
       | link them all together and get both small and large scale
       | diagrams. (p.s. don't waste time automating this unless you find
       | the system changing constantly or you have a very big system)
        
       | fedeb95 wrote:
       | in my opinion any process that isn't preceded by another
       | identical and automated process that varies only by the data
       | involved is very risky to do in production. your management
       | hopefully had a big reality check? or not because of backups?
        
       | chanandler_bong wrote:
       | Experience is directly proportional to the amount of equipment
       | ruined or data lost.
       | 
       | Even though you were fortunate not to lose any data, you gained a
       | lot of experience!
        
       | rexreed wrote:
       | A big part of the reason for the problem in this post is because
       | Vimeo made it impossible to move videos from one Vimeo product to
       | another Vimeo product: "There were roughly 500 videos on VimeoOTT
       | that had to be transferred to Enterprise and Vimeo doesn't
       | provide an easy way of doing it."
       | 
       | I have found working with Vimeo to be very frustrating,
       | especially recently. They have a great video solution, especially
       | for streaming, but they seem to put these unnecessary and
       | frustrating roadblocks that make me constantly question my
       | decision to use Vimeo. From in ability to move videos from one
       | place to another, requiring complete uploads (resulting in
       | problems like this post) to nonsensical limits and pricing,
       | especially on their new webinar offering, which has a limit of
       | 100 registered attendees. For anyone who has run webinars before,
       | this makes no sense since 100 registered attendees usually means
       | 20-30% of those people actually attend, so you're capped at 20-30
       | live attendees. They should price it like most event sites and
       | charge per live attendance rather than registration.
       | 
       | Regardless, I've been very frustrated with Vimeo since it could
       | be so much better if they didn't have these roadblocks in place.
       | If they could have easily enabled moving videos from one product
       | to another, the post (and 7TB of lost videos) would never have
       | happened. It wasn't always this way with Vimeo, but they went IPO
       | in May 2021 and it's no surprise they're turning the screws on
       | their product offering and pricing now.
        
       | beeforpork wrote:
       | > I Accidentally Deleted 7TB of Videos ...
       | 
       | Spoiler:
       | 
       | But there was a backup that could be reuploaded in time and
       | everything was fine in the end.
        
       | nix23 wrote:
       | ZFS -> Snapshot....always!! Before touching writable-data (my
       | personal mantra) ;)
        
         | hnlmorg wrote:
         | I love ZFS too but that's not really relevant to this
         | discussion because the deleted items were on a video hosting
         | platform and the company did already have local copies.
        
           | nix23 wrote:
           | Yes and? Make a snapshot on live. Again, never touch data
           | before snapshot.
        
             | volume wrote:
             | This reminds of some IRC threads. You post a question and
             | someone's answer assumes you are going to rip out and
             | replace your existing prod setup just so you can use their
             | pet tool.
        
             | hnlmorg wrote:
             | At risk of sounding snarky, you do understand how video
             | hosting platforms work? Customers, even enterprise ones,
             | don't have shell access let alone control over what file
             | system is used.
             | 
             | There are a hundred ways this problem could have been
             | prevented but ZFS isn't one of them.
        
       | whiplash451 wrote:
       | So, "i am under NDA" but I reveal my client's name and a lot of
       | sensitive details about what we are doing. LOL.
        
         | dewey wrote:
         | Where do you see the clients name? I only see Vimeo being
         | mentioned.
        
           | ceejayoz wrote:
           | It has been edited.
           | 
           | https://news.ycombinator.com/item?id=31271836
        
             | daniel-cussen wrote:
             | Well at least deleting the secret is a step back toward the
             | NDA he left behind.
        
             | Closi wrote:
             | It still breaks the NDA:
             | 
             | * Firstly, you don't have to name the company to break the
             | NDA anyway (you are still disclosing information you aren't
             | supposed to disclose regardless of if it can be linked back
             | to the company).
             | 
             | * Secondly, the client is still named on the front page of
             | the website.
             | 
             | * Thirdly, OP posted this with his real name that trivially
             | links back to the dev shop he is working for. The site also
             | has his CV which lists the client again, with a description
             | of the project to link it to the post.
             | 
             | * Finally, The client can trivially be identified by
             | googling the description in the second paragraph (i.e. just
             | search the named countries in operation plus the word Gym).
        
               | 12ian34 wrote:
               | Not all NDAs have the same terms. I could write up and
               | serve an NDA right now that still counts as an NDA yet
               | permits everything in your list.
        
               | Closi wrote:
               | All contracts vary in terms, but I've never seen an NDA
               | that says "you can talk about the content under NDA as
               | long as you don't mention the businesses name, and just
               | identify who they are in a roundabout way instead".
               | 
               | "Well i'm under an NDA, so I can tell you all the
               | specifics of the project, but I can't tell you the
               | companies name. I _can say_ they own the largest search
               | engine though, and have a market cap of 1.5 trillion, and
               | rhyme with  "Roogle", but I really can't say who they
               | are. Anyway, here is some code I wrote for them and a
               | description of how we nearly ruined their project along
               | with me calling them incompetent..."
        
             | dewey wrote:
             | Got it. To be honest I'd be hesitant to publish a blog post
             | like that with your name + current company name attached to
             | it.
             | 
             | It's a bit different to share a fun story a few years later
             | about that time you almost wiped production.
        
         | [deleted]
        
       | unfocused wrote:
       | I'm currently working with FOIA software, and a regular user can
       | only delete one document at a time from the information that they
       | verify/redact before sending out. They can't even multi select!
       | Only an admin can delete multiple documents at one time.
       | 
       | I'm guessing users accidentally deleted multiple documents one
       | too many times, and now it's baked in.
        
       | qwertox wrote:
       | Aaaahhh, the feeling you get when you notice that you fucked up.
       | Everything gets quiet, body motion stops, cheeks get hot, heart
       | starts to beat and sinks really low, "fuck, fuck, fuck, fuck,
       | fuck, fuck, fuck, fuck, fuck, fucking shit". Pause. Wait. Think.
       | "Backups, what do I have, how hard will it be to recover? What is
       | lost?". Later you get up and walk in circles, fingers rolling the
       | beard, building the plan in the head. Coffee gets made.
        
         | wonderwonder wrote:
         | lol, its amazing how fast the blood leaves your face when your
         | mind transitions from "cool that worked well" to "Oh no, what
         | have I done?"
         | 
         | That backups comment sounds very familiar.
         | 
         | I accidentally deleted a clients products table from the
         | production database in my early years as a solo dev. There was
         | only a production database. Luckily I had written a feature to
         | export the products to an excel sheet a while before and
         | happened to have an excel copy from the prior day. I managed to
         | build an export to ingest the excel and repopulate the table in
         | record speed while waiting for my phone to ring and the client
         | to be furious. Luckily they never found out.
        
           | [deleted]
        
         | gwerbret wrote:
         | I had this experience when, years ago on my first day as group
         | lead at $JOB, I was being shown a RAID 5 production server that
         | held years of valuable, irreplaceable data (because there were
         | no backups. Let me repeat that there were no backups). For some
         | bizarre reason, I thought "oh cool, hot-swappable drives" and
         | pulled one out of the rack. This naturally resulted in loud,
         | persistent beeping from the machine, which everyone ignored on
         | the assumption that the fellow who was just hired as the group
         | lead knew what the f he was doing.
         | 
         | While I _didn 't_ know what I was doing, I did manage to get
         | the beeping to stop, and had to come in at 5 a.m. the next day
         | to restripe the drive I'd yanked out.
         | 
         | Did I mention there were no backups? When I was a little bit
         | more seasoned on the job, I raised a polite but persistent
         | issue with management of the need for durable backups. Although
         | I kept at it for months, they thought about it, talked about
         | it, and ultimately did nothing. A few months after I left, the
         | entire array failed. Since the group's work relied on the
         | irreplaceable data, all work ground to a halt for the several
         | months it took for an off-site company to recover the data.
        
           | ycmjs wrote:
           | My previous boss stores company data this same way. I begged
           | him to approve the $5 per month cost for Backblaze on the
           | computers I used. He approved it for some, but not all (about
           | half of the ten computers). He completely rejected the idea
           | for the company's data. After all, it was already protected
           | by RAID.
        
           | ricardobeat wrote:
           | Isn't RAID 5 supposed to survive a single disk being taken
           | out?
        
             | windsurfer wrote:
             | If a second drive fails after the first while rebuilding
             | (which happens more often with larger and slower drives),
             | the data is lost.
        
             | arminiusreturns wrote:
             | Theoretically but there are often other things at play. I
             | know the story is older but since about 2015 raid5 has been
             | dead to me, mostly because at current drive sizes a raid5
             | rebuild takes so long your chance of a cascade failure and
             | losing a second drive which makes it a "send to a recovery
             | lab" risk. Anywhere you would use raid5 just do raid6.
        
         | cntrl wrote:
         | damn, your description is spot on and reading this triggered
         | PTSD in me... Last time I had this feeling was two years ago
         | when I destroyed one of our development servers because of a
         | failed application update. I know exactly how I wished Ctrl + Z
         | to exist in real life... We had backups of the machine, but it
         | was still kind of a humiliating feeling to tell everybody and
         | ask for restore from backup (everybody was cool though in the
         | end)
        
         | Taylor_OD wrote:
         | God the feeling of having your body temp rise based purely on
         | realizing you fucked up is so relatable.
        
         | deltarholamda wrote:
         | Pffft, it's not a real panic until you weigh the pros and cons
         | of leaving the country with nothing but the clothes on your
         | back and becoming a illegal immigrant shepherd in a nation with
         | too many consonants in its name.
         | 
         | (Your description is so, so, spot on.)
        
           | beardedetim wrote:
           | Ah, the goat farmer fantasy that always seems to come _at the
           | cusp_ of the solution.
        
           | CapmCrackaWaka wrote:
           | The worst panic I've felt actually took me over the precipice
           | into peaceful oblivion. I started simply saying to myself "oh
           | well... It's just a job".
        
         | sergiotapia wrote:
         | I lost 1hr and 30 minutes of a Slack like app (chat messages).
         | Luckily at the time we were pretty small so not much data was
         | lost but holy shit did that make me almost throw up.
         | 
         | Thank God my automatic backups were so close to the mistake I
         | made and I didn't lose 24 hours.
         | 
         | Haven't made a mistake like that since and I don't destroy DB
         | records like that anymore.
        
         | Oarch wrote:
         | Poetic! Love it
        
       | Helitio wrote:
       | Just a note: being able to click yourself a server at Google, AWS
       | etc. Might be cheap enough even paying for 15tb of traffic.
        
       | DonHopkins wrote:
       | >... the "Silicon Valley" world ...
       | 
       | To rebillionizing!
       | 
       | https://www.youtube.com/watch?v=wGy5SGTuAGI&t=369s
       | 
       | ...yeah, the Tres Commas bottle was on the DELETE key. The corner
       | of it was just, it juuuust got on there...
        
       | lesgobrandon wrote:
        
       | [deleted]
        
       | [deleted]
        
       | dclowd9901 wrote:
       | His solution reminds me of how I used Cypress to generate test
       | accounts on our local admin dashboard for Cypress tests, since
       | our api was inadequate (it didn't do the billing signoff required
       | to create accounts that last longer than a month... don't
       | ask...).
        
       | SnowHill9902 wrote:
       | Related: is there any HTTP API model that supports transactions
       | with commit and rollback? Also isolation levels? Usually one
       | wants to set_stock(get_stock() + 10) but there may be competing
       | from various clients between both calls, resulting in races.
       | Usual web APIs seem vulnerable to this.
        
         | jffry wrote:
         | Wouldn't the model be to expose an increment_stock(10) type
         | HTTP endpoint instead, and the backend can ensure it's atomic?
        
       | LinAGKar wrote:
       | Shouldn't that be `page={page}` rather than `page{page}`? Or
       | better yet, use the requests `params` argument.
        
       | hanly_paul wrote:
       | I am also a junior with 1 year's experience, just in Python but
       | none with the requests module or web development. If the 'page'
       | variable is being changed, was the error something specific to
       | this module, not refreshing the page?
        
       | orange_puff wrote:
       | As everyone else has already pointed out, better testing would
       | have been very useful here. For instance, print(len(our_ids))
       | would have been a dead giveaway that that something was up
       | 
       | I am also a junior dev and completely empathize with being given
       | a lot of responsibility and potentially messing up. I think for
       | someone with < 1 year of experience, to solve the problems you
       | created as fast as you did is really impressive. Thankfully your
       | story ends well :)
        
       | AtNightWeCode wrote:
       | The conclusion should include that backup at separate locations
       | is key. Also, that the backups are tested and work. I worked with
       | clients that had everything from lightning strikes destroying
       | servers to ransomware to people making mistakes. No problem with
       | solid backups. There is a difference between a good process and
       | skill.
        
       | thisNeeds2BeSad wrote:
       | The only thing that I can remember helping against such actions,
       | is the exponential need for confirmation by intent.
       | 
       | Means, if you delete one small file you need one confirmation, if
       | you delete thousands, you need a intent stating i expect thousand
       | files to be deleted. Same goes for size. So not a okay button,
       | but instead a form allowing you to enter the dimension of the
       | intented outcome. 100 files max, 1 gb max deleted.
       | 
       | If the request goves over the intent, the system aborts.
        
       | dncornholio wrote:
       | What is the f doing in
       | 
       | url = f"https://api.ourservice.com/media?page{page}&step=100 ?
        
         | throwaway744678 wrote:
         | It's a Python f-string [0]. A way of formatting a string by
         | directly including a Python expression between curly braces.
         | 
         | [0] https://docs.python.org/3/tutorial/inputoutput.html#tut-f-
         | st...
        
         | qwertox wrote:
         | "f-strings", a (new) way to format strings.
        
         | jraph wrote:
         | f for format ("formatted string").
         | 
         | It does the same thing as
         | `https://api.ourservice.com/media?page${page}&step=100` [sic]
         | in Javascript, or
         | "https://api.ourservice.com/media?page$page&step=100" in Bash,
         | PHP, Perl or Groovy (and other languages). It outs you into
         | variable substitution / interpolation in the string literal.
         | 
         | In Python these string literals are called f-strings if you
         | want to look it up. They are defined in PEP 498 - Literal
         | String Interpolation [1] and available since Python 3.6.
         | 
         | [1] https://peps.python.org/pep-0498/
         | 
         | [sic] there probably would be a missing '=' in this url after
         | "?page"
        
         | fifticon wrote:
         | if it's python, it's the formatting/interpolation string
         | marker.
        
       | vjust wrote:
       | So much wisdom in these comments, people have different styles of
       | being careful, and each makes sense in a nuclear "go" situation
        
       | p0d wrote:
       | For many years I have had a private blog. I like to write but
       | realised 99% of us are not interesting to read. This is a young
       | guy processing his thoughts. Not "teaching" the rest of us as he
       | frames it. This should have stayed in-house and personal. The
       | company can then decide which clients, authorities to contact if
       | necessary. There is a book in all of us as they say. For most of
       | us it should stay there.
        
       | donalhunt wrote:
       | fwiw I would probably have turned to rclone.org for this. It
       | doesn't have support for vimeo out of the box but the Vimeo API
       | seems sane enough that it would be trivial to implement uploads
       | quickly.
       | 
       | Previously used rclone for doing massive transfers between cloud
       | providers using "cheap" on-demand servers which provide unlimited
       | data transfer (the public clouds make this very expensive).
        
       | ghoomketu wrote:
       | The more I read about vimeo the more I wonder what's up with
       | these guys.
       | 
       | Only recently they made some god aweful policy changes for
       | content creators(1), but it looks like they treat their
       | enterprise customers just the same.
       | 
       | Surely, there must be better alternatives for hosting videos than
       | being at the mercy of a company who couldn't care less about big
       | paying customers.
       | 
       | (1) https://www.theverge.com/2022/3/18/22985820/vimeo-
       | bandwidth-...
        
         | pfista wrote:
         | mux.com seems like a great alternative and is super developer
         | focused.
        
       | bbbush wrote:
       | scary. maybe as well just pay vimeo to restore data.
        
       | IYasha wrote:
       | So, apparently, vimeo has better support than youtube (not
       | informative, but at least they DO something). Duly noted.
        
       | aasasd wrote:
       | After having read about plenty of such cases over the years, I
       | have a persistent dread of pulling something like that myself, to
       | the point of being nervous with '*' in the terminal, and
       | generally checking everything twice. (And also have some kind of
       | mild horror-high from corporate snafu stories, weirdly
       | reminiscent of Ballard's 'Crash').
       | 
       | So: I never feed the data straight from the gathering script into
       | the modifying script, at least not in the first runs. Instead, I
       | dump the whole list of items into a file, count them in there,
       | gawk at them to see that they're right, and compare with the
       | source data by hand until I begin to annoy myself. Then I feed
       | that file to the second script.
        
       | Peleus wrote:
       | Under NDA but I'll give rough details of what's occurring while
       | also naming my client and disparaging them to the public.
       | 
       | Well that's a brave move...
        
         | searchableguy wrote:
         | They said they are a junior developer with not much experience.
         | I'm afraid they may not know what is and isn't covered under
         | NDA.
        
           | KingOfCoders wrote:
           | My tip would be: read what you sign.
        
             | thevinter wrote:
             | Just to clarify, my company is under an NDA and not
             | personally me. It also encompasses only the actual project
             | details so a post like this is legally compliant. (Not a
             | lawyer, might be wrong)
        
               | KingOfCoders wrote:
               | So you're not under an NDA as you wrote.
               | 
               | I don't know your position but I would assume a NDA is
               | part of your freelancer or employee contract.
        
               | mkr-hn wrote:
               | OP might at least want to consult with a contract lawyer
               | in Italy to make sure.
        
               | Closi wrote:
               | You likely have a confidentiality clause in your
               | contract.
               | 
               | If your company is under an NDA, your company will have
               | an obligation to ensure that _you_ also do not disclose
               | information.
               | 
               | Companies are mostly just collections of people, and an
               | NDA is mostly meant to stop people working on the project
               | from talking about the project.
        
               | bluehatbrit wrote:
               | In every contract I've ever signed, part of the NDA
               | clause with my employer is that I'm also bound by NDA's
               | my employer is bound by, so if the employer signs an NDA
               | with a customer, I would also be bound by that. It might
               | be worth checking your contract, otherwise having a
               | company sign an NDA doesn't hold much weight if their
               | staff are free to go around sharing the information
               | themselves.
        
         | [deleted]
        
       | photon-torpedo wrote:
       | Apart from all the advice on how to do such destructive
       | operations more safely, I think there's also a lesson to be
       | learned about communicating more actively:
       | 
       | 1. Vimeo responds to the original request with "will look into
       | it", then... nothing happens? This may depend on culture, but at
       | least from my experience in the UK, this is a very non-committal
       | response, and if you really want them to do something, you'll
       | need to chase them. Wait a few days and inquire if they have any
       | estimate for when it might get done, or if they need more
       | information. I find that the "looking into it" response is
       | sometimes used to gauge how important the request is to you.
       | 
       | 2. Once you go with your own solution, just drop a quick message
       | to Vimeo: "Hey, just wanted to let you know we've found our own
       | solution for this, and won't require your help any more. Sorry if
       | you've already committed any resources for this task. Have a nice
       | day, yada yada." This not just avoids what happened here, but is
       | also a courtesy to them.
        
       | mbostleman wrote:
       | Related: The change is fine, it's only one line.
        
       | amtamt wrote:
       | A computer lets you make more mistakes faster than any invention
       | in human history, with the possible exceptions of handguns and
       | tequila.
        
         | mindcrime wrote:
         | Imagine coding while drinking tequila...
        
       | johnklos wrote:
       | We can all poke at this person for doing things incorrectly, but
       | one has to wonder what mindset could lead to any programmer ever
       | thinking that:                 1) parsing a web page shouldn't be
       | considered incredibly fraught with problems       2) that
       | reloading web pages should be part of (1)       3) that this
       | should ever possibly be run without validating the list of files
       | that would be deleted
       | 
       | So forget the specifics. Where are people learning these things,
       | and what do we do to teach them better things?
        
         | bsder wrote:
         | "rm -rf" blowing you foot off is a Unix Right of Passage(tm).
         | 
         | You _will_ do it at least once in your career. If you 're old
         | enough you will do it twice. If you're really old, you get the
         | joy of doing it a third time.
         | 
         | The subtlety increases each time because you _do_ learn.
        
         | dboreham wrote:
         | College? Parents? In my experience it runs pretty deep so not
         | sure it can be easily trained out. This mindset is probably
         | quite useful in evolutionary terms: rush at the attacking bear
         | without thinking, for example.
        
           | plonk wrote:
           | > rush at the attacking bear without thinking, for example
           | 
           | Would that work? I don't see a bear backing down and I don't
           | see the human winning either.
        
         | qayxc wrote:
         | > Where are people learning these things, and what do we do to
         | teach them better things?
         | 
         | Learn to learn and learn to work carefully. It starts in school
         | and should be part of a proper college/university education or
         | vocational training.
         | 
         | There's several ways of learning the specifics: by experience
         | on-the-job, which can be hard if mistakes can get you fired; or
         | by putting in the work in your free time.
         | 
         | If your job is to work with certain web frameworks and you're
         | not very experienced, either ask senior devs to assist/review
         | before going live with critical changes. Alternatively,
         | practice at home. Unpopular, but you need to get experience
         | from somewhere. OSS projects are a great way to do that - be
         | that by creating your own or by contributing to an existing
         | one.
        
         | dncornholio wrote:
         | Some mistakes can only be learned by making them. Sometimes you
         | can tell someone a hundred times something, they won't learn
         | until they experience it.
         | 
         | The point is not to prevent these mistakes, but to keep the
         | consequences low.
         | 
         | Have backups, have version control, etc.
        
           | ufmace wrote:
           | True, and worth remembering why. Most of us are constantly
           | getting warned about the dire potential consequences of huge
           | numbers of things, most of which are either massively
           | unlikely to ever happen or not actually that bad, or both.
           | It's very difficult to tell which of the things we get warned
           | about are actually high risk until something bites us.
        
         | Mo3 wrote:
         | Seriously.. also, looking at these code snippets...
         | 
         | If someone delivers code that looks like that, especially if
         | intended for a production system, I'm firing immediately.
         | 
         | It's a miracle nothing has happened sooner.
        
           | ziddoap wrote:
           | From the article:
           | 
           | > _I 'm a Junior Developer with less than one year of actual
           | experience._
           | 
           | > _The bad news is that this was on Friday, and we needed to
           | have the videos back up at most for Tuesday morning._
           | 
           | You say:
           | 
           | > _If someone delivers code that looks like that, especially
           | if intended for a production system, I 'm firing immediately_
           | 
           | Fire immediately? What a miserable sounding place to work.
        
             | Mo3 wrote:
             | In this case - seeing how they let them have direct access
             | to production - I agree on the miserable sounding place to
             | work and repeat myself -
             | 
             | It's a miracle nothing happened sooner
        
               | ziddoap wrote:
               | I was referring to your workplace.
        
               | Mo3 wrote:
               | At least we don't let junior developers with close to
               | zero experience anywhere near production..
               | 
               | I didn't quite read the part about his experience in the
               | article, I agree firing over that wouldn't be fair, but
               | that just raises other questions.
        
       | DeathArrow wrote:
       | There's a thing called unit tests.
        
       | muglug wrote:
       | The root of this particular issue was Vimeo's failure to do this
       | migration for their customers.
       | 
       | Vimeo OTT has a codebase written in Rails, whereas the main PHP
       | application is written in PHP. At the time Vimeo acquired Vimeo
       | OTT's codebase, the Vimeo OTT codebase was small -- around 10,000
       | lines of Ruby. Rewriting that codebase inside the Vimeo PHP
       | application would have been a tough technical challenge for the
       | all-Ruby team, and they'd have likely lost some people along the
       | way and missed out on some content deals, so they decided instead
       | to maintain two separate codebases and two separate login
       | systems.
       | 
       | The video-playback and video-storage infra has since been
       | unified, but all the business logic is still siloed.
        
         | conductr wrote:
         | He wasn't asking them to refactor their internal code bases.
         | But they should be able to whip up the 20 lines of code needed
         | to do this between APIs (or just directly on their servers).
         | Essentially what author was trying to do when he screwed up.
         | For the author this was disposable code, for Vimeo this would
         | have been a reusable utility.
         | 
         | I know how these things happen. Support ticket queues and all.
         | And while I don't fully know the difference in cost, I would
         | assume a customer upgrading to an Enterprise plan would get a
         | better support experience.
         | 
         | Whoever within authors company negotiated the upgrade to
         | Enterprise (or didn't) and failed to embed some agreement
         | around OTT to Enterprise transition assistance was the one who
         | made the first mistake.
        
         | macspoofing wrote:
         | >The root of this particular issue was Vimeo's failure to do
         | this migration for their customers.
         | 
         | Yes and No. At the end of the day, you as a business have to
         | insulate yourself from your infrastructure provider.
        
           | notyourday wrote:
           | Vimeo is the only infrastructure provider providing that
           | service. It is impossible to insulate a business from it.
        
         | chernevik wrote:
         | Per the post, Vimeo DID do it -- without telling the customer!
         | And then wouldn't help uncluster the situation.
        
       | macspoofing wrote:
       | > but at the time the code seemed completely correct to me
       | 
       | I venture this kind of (misplaced) over-confidence is not
       | atypical of many junior developers. As someone with a few years
       | under my belt, I don't care how sure I was of the code I wrote
       | that deletes important data, I would have gone through the code
       | over and over again, and at least ran a simulation (by maybe
       | logging the generated delete urls for manual verification).
       | 
       | It's a rite of passage and we all went through something like
       | this. It's how you learn and grow.
       | 
       | >It also should probably teach something to Vimeo
       | 
       | No. Even if Vimeo could have made things better, it's still your
       | fault. You have to take responsibility for your business. At the
       | end of the day, if this causes the closure of your company, Vimeo
       | is still fine.
        
       | wumms wrote:
       | Not completely off topic (as one of my scripts deleted files
       | recently which dates were off by one):
       | 
       | > Fri May 06 2022
       | 
       | > I'm currently working [...] in Italy
        
       | masswerk wrote:
       | Controversial opinion: And this is why block syntax by white
       | space is not for production.
        
         | krit_dms wrote:
         | This is hardly a whitespace issue
        
           | masswerk wrote:
           | Ah, yes, I just noticed the difference in indentation. In
           | actuality, the error about the mental model of variable
           | states.
        
       | havkom wrote:
       | The company was lucky to have someone like you that could
       | actually sort out real problems efficiently. I would bring up
       | this story when negotiating for a raise.
        
       | davbryn1 wrote:
       | "What does this teach us? Well, it teaches me to do more diverse
       | tests when doing destructive operations. It also should probably
       | teach something to Vimeo and to my contractor but I doubt it will
       | (and yes, the upload for some reason is still manual to this day.
       | Go figure!)"
       | 
       | So you wrote bad code, didn't test it properly, ran it on
       | production on the Friday before a release and are blaming Vimeo
       | and [name redacted]?
       | 
       | And your resolution was yet another cobbled together script that
       | you probably didn't test?
       | 
       | This isn't a great article to have attached your name to
        
         | gala8y wrote:
         | Not to mention that he _deleted_, but not _lost_ videos.
         | Nothing to see here.
        
         | oneepic wrote:
         | Earlier in the article, the author does call out that it's bad
         | code, so he's not entirely blaming these companies. Anyway: You
         | should not be afraid of thinking about what _each_ party could
         | have done better. Not just yourself, but other people too. When
         | I look back on times where I only blamed myself for prod
         | issues, it was less of a learning experience, and more focused
         | on beating myself up for no good reason. That approach shows
         | that I 'm afraid of the consequences, and it's an effective way
         | to feel isolated from the team instead of improving.
        
         | nickkell wrote:
         | Better to do it before the release then afterwards. I'm
         | assuming this way nobody noticed the issue.
         | 
         | Also, would you rather everyone only ever posted about all the
         | times they were successful?
        
         | chopin wrote:
         | I'd hire this guy if only being for this frank about his
         | mistake. He owned it and that is what I would look for.
         | 
         | After deletion, what should he have done? Postpone the go-live?
         | That's often not a a cost-effective option. As for a risk-
         | analysis the worst what could happen was deletion of the
         | remaining videos. I don't think that that makes big difference
         | in this situation. And to do the right thing, you have to have
         | the infrastructure in place, if you are in a hurry. I doubt
         | that's the case for a 10 heads shop.
        
           | GordonS wrote:
           | Aye, this is how you learn and make sure it doesn't happen
           | again.
           | 
           | I did a similar thing ~20 years ago when I first started my
           | career, accidentally deleting a production database because I
           | thought I was working on the test database.
           | 
           | I owned it, learned lessons from it, and it's never happened
           | again.
        
           | davbryn1 wrote:
           | Owning the mistake would be fine if he did that - he did'nt.
           | He blamed the company he was contracting for. That's a big no
           | from me
        
             | esquivalience wrote:
             | It's as if we read different articles. He literally writes
             | that he made "A series of mistakes that could've probably
             | been easily prevented."
        
             | thevinter wrote:
             | I'm sorry if it came off like that. The mistake in this
             | case was completely mine (bad code and bad testing). The
             | detour on the other two companies was mostly because this
             | way of deleting/recovering stuff should've probably been
             | avoided in the first place, other than that I'm absolutely
             | not blaming anyone else!
        
               | davbryn1 wrote:
               | Don't worry about all that - there isn't a developer
               | worth their salt that hasn't made a mistake. But I'd
               | consider having this blog post and HN post retracted
               | purely for future internet checks. It isn't a reflection
               | on you, and your honesty is fantastic. But there is a lot
               | to be said about using a pseudonym when it comes this
               | close to your employers
        
               | desarun wrote:
               | I'd probably make your github profile private for a while
               | as well. Or at least removing your real name from it.
        
               | [deleted]
        
           | malexbone wrote:
           | Agree 100%. Acknowledged mistake, moved forward to find a
           | solution. Reflected on lessons learned. Shared valuable
           | lesson.
           | 
           | To me this indicates intelligence, competence, integrity,
           | grit and generosity. TechnicL proficiency is much easier to
           | come by than integrity, grit and generosity. I would trust
           | the author to deliver on commitments.
        
           | honksillet wrote:
           | Agreed. But I'd also fire him from this job.
        
             | Beltiras wrote:
             | For having got into a sticky situation and out of it?
        
             | [deleted]
        
             | SparkyMcUnicorn wrote:
             | "Recently, I was asked if I was going to fire an employee
             | who made a mistake that cost the company $600,000. No, I
             | replied, I just spent $600,000 training him. Why would I
             | want somebody to hire his experience?"
             | 
             | -- Thomas J. Watson
        
             | yohannparis wrote:
             | Doesn't make sense. Their employer literally paid them to
             | learn from their mistake.
             | 
             | Now, you think they should be fired? So that another
             | employer rips the benefits of that learning experience.
        
         | kwertyoowiyop wrote:
         | Will every developer who has never checked in bad code on
         | Friday, or accidentally deleted the wrong data, please raise
         | their hand?
         | 
         | 'Judgment comes from experience, and experience comes from poor
         | judgment.'
         | 
         | :-)
        
         | dang wrote:
         | (Since the OP redacted the company name from the post, I've
         | done the same in your comment here. I hope that's ok.)
         | 
         | (We do this sort of thing to protect users, usually as the
         | result of an emailed request, and you can tell when we've done
         | it because of the word 'redacted' in square brackets.)
        
         | jasonlotito wrote:
         | > This isn't a great article to have attached your name to
         | 
         | A million times better than your comment.
        
           | davbryn1 wrote:
           | All I did was give advice. If you don't like it it's fine.
        
         | smokey_circles wrote:
         | Oof, we wouldn't work well together. Very rarely is someone
         | good enough to be this obnoxious.
        
           | davbryn1 wrote:
           | I very much doubt you would ever work with or for me.
        
         | [deleted]
        
         | breakfastduck wrote:
         | Vimeo completed a major migration of videos between accounts
         | with no confirmation or communication before commiting it, then
         | refused to reverse the change. Hardly the best service.
         | 
         | The article hardly comes across as 'blaming' them for the core
         | issue but they were definitely not helpful.
        
       | wruza wrote:
       | Code without constant logging of "utc [who] does what exactly" is
       | a no-go for me for a long time. Also, if you have to be
       | destructive, replace the <rm/sell/halt> with log() for at least
       | one time (aka --verbose --dry-run) and check your expectations.
       | One-shot scripts like this are screaming disaster.
       | 
       | (The problematic line lacks the closing ", probably a typo? I
       | though it closed in an unexpected location)
        
       | ge96 wrote:
       | The product I work on, I can watch the events occur afterwards
       | (videos of people using it) and it's so embarrassing watching it
       | fail. The wasted time. Ahh... I've gotten better to check deps
       | and run a full automated E2E test everytime new code is deployed
       | (before/after diff envs).
       | 
       | Still things happen. Hopefully you have a large enough client
       | base where some bad experience doesn't define the whole thing.
        
       | BillyTheKing wrote:
       | For larger 'live' production changes I've now started to rely on
       | generative programming. I've got one script in some 'normal'
       | programming language like javascript, or python, which in turn
       | generates a script that contains a list of curl or other cli
       | commands which do the actual deletion, modification, addition,
       | etc.
       | 
       | This allows me to run a small sub-set of commands and test those
       | under a live-environment before running all commands at once. In
       | addition, this also functions as a complete log of what has been
       | changed manually in production.
        
       | RankingMember wrote:
       | I'm impressed you went with an automated solution (PlayWright)
       | for 500 videos after all that, considering they could be cross-
       | loaded from Google Drive almost instantaneously. I'm glad it
       | worked, but coding around a screw-up under the gun seems like a
       | high-risk operation compared to spending 4 hours doing the task
       | manually (albeit being super bored the whole time), but with the
       | benefit of knowing it's being done correctly instead of hurriedly
       | writing a script to potentially do something else wrong very
       | efficiently and dig your hole deeper.
        
         | leokennis wrote:
         | Actually I was surprised reading that the person wrote a script
         | to delete 900 videos.
         | 
         | If you need to do it once, it's probably 2-3 hours of work?
         | That is identifying a duplicate video and then clicking the
         | button(s) to delete it once every 20 seconds.
         | 
         | Reminds me of https://xkcd.com/1205/
        
         | bruhbruhbruh wrote:
         | +1 to this. After the few major screw-ups I've caused at work,
         | my self-confidence in my coding ability is rocked, and I tended
         | to react by erring towards manual cleanup, rather than coding
         | some scalable solution for fixing the issues
        
       | alkaloid wrote:
       | Does anyone else get that deep, dark, disturbing feeling in their
       | gut when they know they have done something bad like this?
       | 
       | This is why I use so many print statements and comment out
       | destructive actions! Lots of experience with these feelings!
        
       | arein3 wrote:
       | You can automate using puppeteer or selenium
        
         | dsego wrote:
         | The author used Playwright in the end to automate uploads.
         | Using e2e tools for automating tasks is clever, I'm not sure I
         | would've thought of it.
        
           | chopin wrote:
           | It's clever, but also brittle. And might have disastrous
           | error conditions (like hitting "Delete" instead of "Continue"
           | if the wrong UI part has focus).
        
       | andreagrandi wrote:
       | It should really be something like: "a flaw in our system allowed
       | me to delete 7am TB of videos". Not entirely your fault.
        
         | mrkwse wrote:
         | System and/or development processes
        
       | desarun wrote:
       | Oh dude, we've all been there.
       | 
       | 9 years ago I was working for a major broadcasting company in the
       | arse end of London as a junior dev, building one of their Android
       | apps.
       | 
       | We'd roll features out months before & enable them with feature
       | flags via a json file we'd manually push to a prod server at a
       | later date.
       | 
       | We'd just built a huge new feature letting you request content to
       | be downloaded to your set top box remotely & it had a 250k
       | marketing campaign to go along with the launch.
       | 
       | Senior dev trusted me with prod deployment rights.
       | 
       | I pushed the wrong json config to prod, launching the feature
       | weeks before the marketing campaign.
       | 
       | Thank god I was a junior perm, that was definitely a firing
       | offence.
        
         | hayd wrote:
         | > Senior dev trusted me with prod deployment rights.
         | 
         | That part's crazy! If you think it was a firing offence
         | wouldn't they've been fired? (I don't think it is, but
         | obviously requires system changes/explanation.)
        
       | BurningPenguin wrote:
       | I accidentally deleted a printer from the printserver by using a
       | python script. The docs weren't exactly clear, so i thought it
       | would only remove the local printer connection. After reading
       | this post i feel better now. My fuckup wasn't that bad in
       | comparison. :)
        
       | furyofantares wrote:
       | Great post and great attitude.
       | 
       | I think I would reflect on why this is a script to begin with.
       | It's run once and with only 500 items could be done manually,
       | though 500 is certainly a bit much.
       | 
       | But it's not a massive time saver; the point of the script should
       | be almost entirely to increase accuracy. I think I would write
       | one script to generate the list of videos to delete; that's the
       | part that's actually difficult, and a human can then verify the
       | list. I would probably just delete them by hand after that, but
       | if I really wanted a script for that part too, it would be a
       | separate script that uses a list that has been vetted by a human
       | even if initially created by the first script.
        
       | Reason077 wrote:
       | > _" What does this teach us? Well, it teaches me to do more
       | diverse tests when doing destructive operations."_
       | 
       | I think it also teaches us that adversity sometimes leads to
       | better solutions. I love that the OP made a hacky script that did
       | in 4 hours what a guy was paid to do manually over several
       | months!
        
       | KingOfCoders wrote:
       | "I'm under an NDA"
       | 
       | Don't write a blog post.
        
       | franciscop wrote:
       | This is a great technical write up, I'd love to hear the human
       | side of this story as well! When did you tell the higher ups that
       | you deleted production? Was no one more senior on call to try to
       | fix it? Did they want you to learn how to fix it? Or were you the
       | most senior responsible for this whole area? Or did they don't
       | know?
        
         | thevinter wrote:
         | The first part of my write up slightly explains it but the
         | point is that HN is the top 1%. In my current company we have
         | 10 developers, most of them without a technical degree. They
         | know how to do what they've been doing for the past 10 years
         | but (as with most small companies here in Italy) people don't
         | know what best practices are used in the industry, what a
         | pipeline is or what a dry-run is (I learned about it today
         | myself!).
         | 
         | What happened is that no one knew how to react and I was
         | probably the best suited for it, we don't really have seniority
         | in office.
         | 
         | That said when I deleted the videos I immediately told my boss.
         | He was kind of scared but his reaction was mostly "Well, now we
         | have to re-upload them immediately, find a way. The people that
         | uploaded them once won't be doing it twice". I was basically
         | left on my own to find a solution (which I luckily did).
         | 
         | Please note that I'm in no way blaming my company or accusing
         | it of something, this is the standard knowledge base and way of
         | dealing with things in many places, contrary to what working in
         | big tech or reading HN might make you believe!
        
           | franciscop wrote:
           | Thanks for the explanation, that makes a lot of sense!
           | 
           | > "HN is the top 1%" + "this is the standard knowledge base
           | and way of dealing with things in many places, contrary to
           | what working in big tech or reading HN might make you
           | believe!"
           | 
           | I'm in fact from Spain and now live in Japan, and I believe
           | the practices in Spain would be as bad as Italy, and in Japan
           | they are def worse (great at hardware, horrible at software),
           | so I do understand a lot of what you are saying. FWIW, in
           | Spain I've seen whole dev teams composed only of interns!
           | 
           | > "we landed a big contract for one of the biggest gym
           | companies in Italy, the UK and South Africa" + "we don't
           | really have seniority in office"
           | 
           | Maybe now that seems like you have the budget it's a good
           | time to go to management and suggest to hire some senior devs
           | who can mentor the rest into learning best practices? You can
           | sell it like a reinvestment in the company to management if
           | they want to take it as pure profit. If Italy is like Spain,
           | many devs won't really even want to learn these things, but
           | some will and then those will become seniors at some point.
        
       | Sirikon wrote:
       | Everyone makes mistakes, juniors and seniors alike, but I
       | consider you have the right mindset and resolutive skills that
       | will make you thrive :)
        
       | ricardobayes wrote:
       | Any process that makes a junior directly access prod
       | codebase/database is flawed. No matter how small of a company you
       | are, you can set up a proper CI/CD pipeline.
        
         | thevinter wrote:
         | 90% of IT companies in Italy don't even know what a CI/CD
         | pipeline is. That said I don't think it's something we could've
         | integrated in our pipeline as it's an error that originated
         | from an external service!
        
       | Fritsdehacker wrote:
       | This is why you have backups. Good on you to have them!
       | 
       | When I just started as a junior dev at a small company I made the
       | classic mistake of emptying the prod db instead of my local dev
       | db. This was a small and in hindsight insignificant project. But
       | Google was our customer, so it didn't feel insignificant at the
       | time.
       | 
       | In this case my inexperience was partly my savior. All the data
       | was inputted by people via a web form. Normally you're supposed
       | to use POST to submit a form. But I was quite clueless at the
       | time, so I had used GET. This meant all requests were still in
       | the Apache logs. I could simply replay all requests.
       | 
       | I still feel my hard pounding when I think about the moment I
       | realized what had happened. I was really relieved when everything
       | was back!
       | 
       | What I learned from this incident:
       | 
       | - make automated backups
       | 
       | - no access to prod db from anywhere but prod
        
         | cassandratt wrote:
         | Yea, I've wiped out an entire government's form library once.
         | Backups are a career saver.
        
       | NikolaNovak wrote:
       | Honestly, this is positively representative of any junior
       | developer with comparable experience. Depending on their
       | background and how much production work they had, there's an
       | overwhelming sense of eagerness and enthusiasm. Quick to script
       | and perhaps a bit too quick to execute.
       | 
       | A friendly team will harness that enthusiasm and tame the
       | quickness / encourage respect for production. We all made a
       | massive doo doo and its how you proceed that'll define your
       | career.
        
       | RcouF1uZ4gsC wrote:
       | This is one of those times that even if you don't use a fully
       | functional language, trying to make as much of your program logic
       | pure functions would be helpful.
       | 
       | It also makes it more testable. Instead of putting the delete
       | call right in the loop, split it into four functions.
       | function getAllVimeoVideos()              function
       | getAllDbVideos()              function
       | getVideosToDelete(vimeo_videos, db_videos)              function
       | deleteVideos(videos_to_delete)
       | 
       | Your core logic lives in getVideosToDelete which is simply a set
       | difference.
       | 
       | Given that there are only a few hundred videos, it is easy to run
       | the getter functions above and quickly verify they are returning
       | what you expect.
        
         | acutis_fan wrote:
         | Yes that's fun. a                   List<Foo>
         | getFoosToUpdate(List<Foo> foos, List<Bar> bars)
         | 
         | function is the first time I thought about time complexity in
         | my job.
         | 
         | Say Foo and Bar have fields in common, such that you can say a
         | Foo object "equals" or "matches to" a Bar object, like if they
         | have name and dateOfBirth fields or something else that are the
         | same (nothing like a common ID between the two). Now say there
         | are some other fields too, like amountSpentThisYearOnDogFood
         | that you know is always accurate for Bars, but might be out of
         | date for Foos. How do you get the list of all the Foos to
         | update?
         | 
         | Initially I did the nested for loop solution that's like
         | List<Foo> getFoosToUpdate(List<Foo> foos, List<Bar> bars)
         | {         List<Foo> returnList = new List<Foo>();
         | foreach (var foo in foos)         {          foreach (var bar
         | in bars)          {           // check if "equal" or "matching"
         | based on some criteria           // if equal, update foo dog
         | food expenditure with bar dog food expenditure, add to
         | returnList, and break          }         }         return
         | returnList;        }
         | 
         | but that's O(n^2) right.
         | 
         | The solution with a Dictionary is obviously better. All you
         | need to ensure is that you have a method for both the Foo and
         | Bar classes that will produce the equivalent hash for both, if
         | they would be considered equal or matching by whatever criteria
         | you are using.
         | 
         | So you could have something like                   int
         | GetHashOfFoo(Foo foo)         {          string firstName =
         | foo.FirstName;          string lastName = foo.LastName;
         | DateTime dob = foo.Dob;               return (firstName,
         | lastName, dob).GetHashCode(); // convenient c# method         }
         | int GetHashOfBar(Bar bar)         {          string firstName =
         | bar.FirstName;          string lastName = bar.LastName;
         | DateTime dob = bar.Dob;               return (firstName,
         | lastName, dob).GetHashCode();         }
         | 
         | These two functions will return the same value if those fields
         | are the same. So then you can do something like
         | List<Foo> getFoosToUpdate(List<Foo> foos, List<Bar> bars)
         | {         List<Foo> returnList = new List<Foo>();
         | Dictionary<int, Bar> barsByHash = new Dictionary<int,
         | Bar>(bars.Count);              foreach (var bar in bars)
         | {          int barHash = GetHashOfBar(bar);
         | barsByHash[barHash] = bar;         }              foreach (var
         | foo in foos)         {          int fooHash =
         | GetHashOfFoo(foo);          if (barsByHash.ContainsKey(fooHash)
         | {           returnList.Add(foo.CopyWith(dogFoodExpenditure:
         | barsByHash[fooHash].DogFoodExpenditure))          }         }
         | return returnList;        }
         | 
         | Which is faster cause you only have to go through the bars list
         | once.
         | 
         | I actually messed up something like OP with this, but with
         | doing undesired additions instead of undesired deletions.
         | 
         | You can think of it as having two endpoints, both expecting a
         | .csv with rows being the things you were
         | updating/changing/deleting.
         | 
         | The problem was, there was a column to indicate (with a
         | character) whether the row was for an edit, or addition, or
         | deletion, but this was only with one of these endpoints. For
         | the other, there was only addition functionality, but I thought
         | changes and deletions were also options for the other kind of
         | .csv due to some unwise assumptions on my part (thinking that
         | the other .csv would have the same options as the other).
         | That's how we accidentally put in over 100 additions that
         | should have been changes that had to be manually deleted.
         | Luckily I had a list of all the mistaken additions.
        
         | tomhallett wrote:
         | This was going to be my exact recommendation. By "separating
         | the concerns", you make it easier on my pretty much every
         | dimension: testing in unit tests, doing a dry run in
         | production, ability to read the code (you and code reviews),
         | and in some cases your code will be written in a more
         | functional way reducing variable scoping issues.
        
       | DeathArrow wrote:
       | This wouldn't be an issues if providers like Vimeo would soft
       | delete and hard delete the items after a period of time, allowing
       | recovery between.
       | 
       | Everywhere I have to implement a delete operation, I never hard
       | delete data on first call.
        
       | kirillzubovsky wrote:
       | Mistakes happen. Kudos to the author on taking it as a learning
       | opportunity. I am friends with a lot of smart devs, and many of
       | them have dropped a production db at least once, and if not then,
       | then accidentally emailed 10k people ...etc. It happens. Work to
       | avoid it, but plan for what to do when it inevitably happens.
       | -\\_(tsu)_/-
        
       ___________________________________________________________________
       (page generated 2022-05-05 23:00 UTC)