[HN Gopher] What we learned after I deleted the main production ...
       ___________________________________________________________________
        
       What we learned after I deleted the main production database by
       mistake
        
       Author : fernandopess1
       Score  : 36 points
       Date   : 2022-09-19 20:19 UTC (2 hours ago)
        
 (HTM) web link (medium.com)
 (TXT) w3m dump (medium.com)
        
       | sulam wrote:
       | Being one click away from a DELETE vs a GET sounds like a serious
       | foot-gun that I would wrap a check around. "Are you sure? This
       | operation will delete 17M entries."
        
         | fabian2k wrote:
         | I'd be seriously scared of putting any production credentials
         | with write access into my Postman/Insomnia/whatever. Those
         | tools are meant for quickly experimenting with requests, they
         | don't have any safety barriers.
        
           | partdavid wrote:
           | I mean, it shouldn't really be very easy to even _get_ a
           | read-write token to a production database, unless you 're a
           | correctly-launched instance of a publisher service. This
           | screams to me that they're ignorant of, and probably very
           | sloppy with, access control up and down their stack.
        
         | layer8 wrote:
         | This is the Postman HTTP method selection dropdown that you can
         | see on the screenshots on this page ("GET"):
         | https://learning.postman.com/docs/sending-requests/requests/...
         | 
         | Postman doesn't know that sending a single DELETE request to
         | that URL will delete 17 million records.
         | 
         | Arguably, REST interfaces shouldn't allow deleting an entire
         | collection with a single parameterless DELETE request.
        
         | theptip wrote:
         | Honestly I'd make the case for writing a simple python script
         | for this kind of thing.
         | 
         | `requests.get(url)` is a lot harder to mis-type as
         | `requests.delete(url)`.
         | 
         | At $dayjob we would sometimes do this sort of one-off request
         | using Django ORM queries in the production shell, which could
         | in principle do catastrophic things like delete the whole
         | dataset if you typed `qs.delete()`. But if you write a one-off
         | 10-line script, and have someone review the code, then you're
         | much less likely to make these sort of "mis-click" errors.
         | 
         | Obviously you need to find the right balance of safety rails
         | vs. moving fast. It might not be a good return on investment to
         | turn the slightly-risky daily 15-min ask into a safe 5-hour
         | task. But I think with the right level of tooling you can make
         | it into a 30 min task that you run/test in staging, and then
         | execute in production by copy/pasting (rather than deploying a
         | new release).
         | 
         | I would say that the author did well by having a copilot;
         | that's the other practice we used to avoid errors. But a
         | copilot looking at a complex UI like Postman is much less
         | helpful than looking at a small bit of code.
        
       | alexjplant wrote:
       | I once worked on an app where the staging database was used for
       | local testing, all devs used the same shared credentials with
       | write access, and you switched environments by changing hosts
       | file entries (!!!). This resulted in me accidentally nuking the
       | staging database during my first week on the job because I ran a
       | dev script containing some DROPs from my corporate Windows system
       | and failed to flush the DNS cache.
       | 
       | I had already called out how sub-optimal this entire setup was
       | before the incident occurred but it rang hollow from then on
       | since it sounded like me just trying to cover for my mistake. The
       | footguns were only half-fixed by the time I ended up leaving some
       | time later.
        
       | Johnny555 wrote:
       | _An old discussion arose about the need for backups. We had
       | backups for most databases but no process was implemented for
       | ElasticSearch databases. Also, that database was a read model and
       | by definition, it wasn't the source of truth for anything. In
       | theory, read models shouldn't have backups, they should be
       | rebuilt fast enough that won't cause any or minimal impact in
       | case of a major incident. Since read models usually have
       | information inferred from somewhere else, It is debatable if they
       | compensate for the associated monetary cost of maintaining
       | regular backups_
       | 
       | My biggest concern about restoring that Elasticsearch backup
       | would be that the restored backup would be inconsistent with the
       | real source of truth and it might be hard to reconcile to bring
       | it up to date.
        
         | soco wrote:
         | While everything there is true, why not having a backup anyway?
         | I have Elasticsearch backups and even used it once (with
         | success) when I terraformed the index away. The delta was
         | sourced then on the fly.
        
         | antisthenes wrote:
         | The backup only needs to last long enough until the production
         | database is rebuilt from the source of truth, and then swapped
         | back to the most recent search database.
         | 
         | In other words, it only has to be good enough for a few days
         | (ideally - hours).
        
       | glintik wrote:
       | <<We had backups for most databases but no process was
       | implemented for ElasticSearch databases.>> - that's all you need
       | to know
        
       | benjaminpv wrote:
       | Funny to think that the issue here is just a relative of the 'no-
       | preserve-root' feature rm (now) has: it's easy to let the user
       | use the same actions equally on the branches of a hierarchy as
       | you could the leaves, but _should_ they?
       | 
       | Pretty recently corporate changed something on my work laptop
       | that resulted in a bunch of temporary files generated during the
       | build getting redirected to OneDrive. I went in and nuked the
       | temp files and shortly thereafter got a message from OD saying
       | 'hey noticed you trashed a ton of files, did you mean to do
       | that?'
       | 
       | The developer side of me thought 'of course I did, duh' but I can
       | imagine that's useful information for most users that made an
       | innocent yet potentially costly mistake.
        
       | duxup wrote:
       | Having an endpoint that can just delete... everything seems kinda
       | risky.
        
       | SoftTalker wrote:
       | > In the fifteen minutes I had before the next meeting, I quickly
       | joined with one of my senior members to quickly access the live
       | environment and perform the query.
       | 
       | Don't do stuff in a rush like this. That's when I almost always
       | make my worst mistakes. If there is a "business urgency" then
       | cancel or get excused from the upcoming meeting so you can focus
       | and work without that additional pressure. If the meeting is
       | urgent, then do the other task afterwards.
        
         | racl101 wrote:
         | Now this meeting will be get many more urgent meetings.
        
       | PeterisP wrote:
       | For me, an interesting statement was "However, it took 6 days to
       | fetch all data for all 17 million products." - in my experience
       | of DB systems, 17 million entries is significant but not
       | particularly much, it's something that fits in the RAM of a
       | laptop and can be imported/exported/indexed/processed in minutes
       | (if you do batch processing, not a separate transaction per
       | entry), perhaps hours if the architecture is lousy but certainly
       | not days.
        
         | thayne wrote:
         | That kind of depends on how big each record is. And it sounds
         | like these records are denormalized from multiple sources, so
         | you probably have several transactions for each record. It's
         | possible to do batching in that situation, but it definitely
         | isn't always easy.
        
         | fabian2k wrote:
         | I think this is a very clear disadvantage of the microservice
         | architecture they chose in this case, and the post does allude
         | to that. To recreate this data they needed to query several
         | different microservices that would not have been able to
         | sustain a higher load.
         | 
         | If I calculated this right the time they mention comes down to
         | 30 items per second. Which is maybe not unreasonable for
         | something that queries a whole bunch of services via HTTP, but
         | is kinda ridiculous if you compare it to directly querying a
         | single RDBMS.
         | 
         | You could probably fix this by scaling _everything_
         | horizontally, if that is possible. But the real solution would
         | be as you say to have bulk processing capabilities.
        
           | PeterisP wrote:
           | Yes, adding a "return X items" mode to the same microservices
           | often is a way to get a significant performance boost with
           | only minor changes, where even if your main use case needs
           | only one item, it enables mass processing without incurring
           | the immense overhead of a separate request per each item.
        
         | gtirloni wrote:
         | _> Any kind of operation was done through an HTTP call, which
         | you would otherwise do with a SQL script, in ElasticSearch, you
         | would do an HTTP request_
         | 
         | There you go.
        
       | motoboi wrote:
       | People, please don't post things in medium, because it wants
       | people to sign up. Use GitHub pages or anything else really.
        
         | ThunderSizzle wrote:
         | I'm torn on this, honestly.
         | 
         | We want an internet with less ads, but good writers deserve to
         | get paid. They can get paid via Medium (though how much, I
         | don't know) through subscriptions. Is that the worse than ads
         | or newspapers?
        
           | bachmeier wrote:
           | What you say is true, but that doesn't mean it should be
           | posted to HN. The purpose of this site is to discuss
           | articles. This one's behind a paywall. Even if you sign up
           | for a free account, you may have used up your two free
           | articles per month. That invites people to comment without
           | reading the article. That's not why HN exists. (I actually
           | checked the comments hoping someone posted a copy of the
           | article.)
        
           | Victerius wrote:
           | I enjoy good writing, but the only writing I'm willing to pay
           | for is print books (I just bought a copy of J.R.R. Tolkien's
           | "The Fall of Gondolin", the hardcover, illustrated one by
           | HarperCollins). I don't want to pay for newspapers, for
           | investigative journalism, or for long form article magazines
           | like The Atlantic or The New Yorker. Nevermind Medium of all
           | places, because Medium has no barrier of entry. No
           | gatekeeping (and, given how easy it is to merely write a
           | blurb of text, I have rather high standards for what I choose
           | to pay to read). I'd rather consume from the likes of Amazon
           | and have them run these writing platforms (e.g. WaPo) at a
           | loss. Which means I'm paying for writers, in the end, just in
           | a very indirect way. This sits well with me.
           | 
           | But if the choice before me was to pay for writers directly
           | (like Medium), or let non-book writers as a profession
           | disappear, I'd opt for the latter. You may criticize this
           | attitude. I assume the responsibility for that and I'm being
           | honest.
        
         | jacooper wrote:
         | There is also hashnode.com
        
         | rlewkov wrote:
         | Exactly. Won't read because it requires sign up.
        
         | gumby wrote:
         | archive.ph cuts through the medium paywall too.
        
           | contravariant wrote:
           | As does basic cookie hygiene.
           | 
           | At least I assume that's what's happening, I haven't seen a
           | medium paywall yet.
        
         | baal80spam wrote:
         | If you're on Firefox there's an extension to bypass this (only
         | for Medium's free articles) -
         | https://gitlab.com/magnolia1234/bypass-paywalls-firefox-clea...
        
           | demindiro wrote:
           | There is also LibRedirect[1] which automatically redirects to
           | an alternative frontend.
           | 
           | [1] https://github.com/libredirect/LibRedirect
        
           | metadat wrote:
           | I'm not keen on playing the browser plugin escalation game
           | with fundamentally UX hostile sites like Medium. They clearly
           | have no respect for the human being at the end of the line
           | trying to simply read a document.
        
             | thatguy0900 wrote:
             | This is extremely melodramatic. They literally just want
             | money so they don't have to run ads
        
       ___________________________________________________________________
       (page generated 2022-09-19 23:00 UTC)