[HN Gopher] Deleting data distributed throughout a microservice ...
       ___________________________________________________________________
        
       Deleting data distributed throughout a microservice architecture
        
       Author : rrampage
       Score  : 41 points
       Date   : 2020-05-05 11:49 UTC (11 hours ago)
        
 (HTM) web link (blog.twitter.com)
 (TXT) w3m dump (blog.twitter.com)
        
       | hinkley wrote:
       | > First, you'll need to find the data that needs to be deleted.
       | 
       | Microservices do not get you out of having to have an information
       | architecture. They add more friction if you don't have one, but
       | it's entirely possible to have an unspoken/undocumented
       | information architecture that mostly works.
       | 
       | If you don't have a System of Record for data, you for sure
       | aren't going to be able to find it. Similar problem with no
       | Source of Truth. For some business models you will have both and
       | they will be separate (especially with 3rd party data).
       | 
       | You still have the problem of logs, but at least the problem is
       | tenable. Without any of this it's just chaos and who knows where
       | the data went or really even where it came from?
        
       | grumpycoder2 wrote:
       | Jesus Christ, why is something as simple as deleting data so
       | complicated now? Unless you're Google, stick your data in a
       | database. Then DROP when you need to. But I guess that doesn't
       | get you any blog posts or resume line items.
        
       | BrentOzar wrote:
       | This is relevant for enterprises with legacy systems, too, like
       | shops that have multiple interfaces that extract, transform, and
       | load data across point-of-sale systems, warehouse fulfillment
       | systems, and data warehouses.
        
       | effoffhn wrote:
       | Jesus Christ, why is something as simple as deleting data so
       | complicated? Unless you're Google, here's what you do:
       | 
       | 1. Get a database. 2. Put data into the database. 3. Delete the
       | data from the database.
       | 
       | But that doesn't get you blog posts or resume line items.
        
         | d_watt wrote:
         | What if you're twitter? Which this person is.
         | 
         | I agree that if you can keep it simple, it's easier to do it.
         | But sometimes you need distributed services. Saying only Google
         | has that problem is a little reductive.
        
         | pfranz wrote:
         | Someone already mentioned cache invalidation. To extend that, I
         | don't think it's all that different from an old paper system.
         | If you want to delete your file it's probably kept in a cabinet
         | in some department, but the billing department or marketing
         | department also has a copy of your name and address in their
         | records. Deleting everything is a multi-step process.
         | 
         | Centralized systems didn't scale in the physical or digital
         | world and it distributed systems complicate things that seem
         | trivial.
        
         | philwelch wrote:
         | Cynical tone aside, there's a good question here. It's just
         | that the question has an actual, valid answer: it's impossible
         | to have a single database that operates at Twitter scale.
         | 
         | If you think it is possible to have a single database that
         | operates at Twitter scale, fine, there's probably an
         | interesting and enlightening conversation to be had about how
         | and why that is or is not the case.
         | 
         | Continue along this vein and you eventually you get to the
         | point where you're discussing realistic solutions, and maybe at
         | the end of it you've either gained an understanding of how
         | these things work or else you've actually come up with a better
         | system design than Twitter. Either way you've gained something
         | more valuable than the petty satisfaction of disparaging other
         | people's motivations.
        
         | namanaggarwal wrote:
         | This article is about microservices. If you are not at scale
         | you might not need microservices in the first place.
         | 
         | When data is distributed, one team/service owns user data and
         | other tweets. It becomes not so trivial.
        
           | AmericanChopper wrote:
           | This particular article is about microservices, but there's
           | plenty of ordinary business reasons that you may have some
           | sort of asynchronous business process that runs across a
           | distributed set of systems/teams/organisations, that do not
           | relate to scale. I was working on a microservice recently
           | (really it was a service-oriented architecture, but they seem
           | to pretty much mean the same thing now), and it only
           | processed around 10,000 transactions per day. But it almost
           | had to be designed that way, due to the nature of the
           | business processes it was supporting, and the systems it had
           | to interface with.
        
         | devonkim wrote:
         | Deletion is a form of cache invalidation if you think about it
        
           | hinkley wrote:
           | If not the King, at least the Crown Prince of cache
           | invalidation.
        
           | flarg wrote:
           | Thank you. And this sort of problem occurs in large
           | organisations with lots of different monoliths all caching
           | each others data.
        
             | loopz wrote:
             | Records storing _transactional facts_ , are NOT "caching
             | each others data".
        
       ___________________________________________________________________
       (page generated 2020-05-05 23:00 UTC)