[HN Gopher] Deleting data distributed throughout a microservice ... ___________________________________________________________________ Deleting data distributed throughout a microservice architecture Author : rrampage Score : 41 points Date : 2020-05-05 11:49 UTC (11 hours ago) (HTM) web link (blog.twitter.com) (TXT) w3m dump (blog.twitter.com) | hinkley wrote: | > First, you'll need to find the data that needs to be deleted. | | Microservices do not get you out of having to have an information | architecture. They add more friction if you don't have one, but | it's entirely possible to have an unspoken/undocumented | information architecture that mostly works. | | If you don't have a System of Record for data, you for sure | aren't going to be able to find it. Similar problem with no | Source of Truth. For some business models you will have both and | they will be separate (especially with 3rd party data). | | You still have the problem of logs, but at least the problem is | tenable. Without any of this it's just chaos and who knows where | the data went or really even where it came from? | grumpycoder2 wrote: | Jesus Christ, why is something as simple as deleting data so | complicated now? Unless you're Google, stick your data in a | database. Then DROP when you need to. But I guess that doesn't | get you any blog posts or resume line items. | BrentOzar wrote: | This is relevant for enterprises with legacy systems, too, like | shops that have multiple interfaces that extract, transform, and | load data across point-of-sale systems, warehouse fulfillment | systems, and data warehouses. | effoffhn wrote: | Jesus Christ, why is something as simple as deleting data so | complicated? Unless you're Google, here's what you do: | | 1. Get a database. 2. Put data into the database. 3. Delete the | data from the database. | | But that doesn't get you blog posts or resume line items. | d_watt wrote: | What if you're twitter? Which this person is. | | I agree that if you can keep it simple, it's easier to do it. | But sometimes you need distributed services. Saying only Google | has that problem is a little reductive. | pfranz wrote: | Someone already mentioned cache invalidation. To extend that, I | don't think it's all that different from an old paper system. | If you want to delete your file it's probably kept in a cabinet | in some department, but the billing department or marketing | department also has a copy of your name and address in their | records. Deleting everything is a multi-step process. | | Centralized systems didn't scale in the physical or digital | world and it distributed systems complicate things that seem | trivial. | philwelch wrote: | Cynical tone aside, there's a good question here. It's just | that the question has an actual, valid answer: it's impossible | to have a single database that operates at Twitter scale. | | If you think it is possible to have a single database that | operates at Twitter scale, fine, there's probably an | interesting and enlightening conversation to be had about how | and why that is or is not the case. | | Continue along this vein and you eventually you get to the | point where you're discussing realistic solutions, and maybe at | the end of it you've either gained an understanding of how | these things work or else you've actually come up with a better | system design than Twitter. Either way you've gained something | more valuable than the petty satisfaction of disparaging other | people's motivations. | namanaggarwal wrote: | This article is about microservices. If you are not at scale | you might not need microservices in the first place. | | When data is distributed, one team/service owns user data and | other tweets. It becomes not so trivial. | AmericanChopper wrote: | This particular article is about microservices, but there's | plenty of ordinary business reasons that you may have some | sort of asynchronous business process that runs across a | distributed set of systems/teams/organisations, that do not | relate to scale. I was working on a microservice recently | (really it was a service-oriented architecture, but they seem | to pretty much mean the same thing now), and it only | processed around 10,000 transactions per day. But it almost | had to be designed that way, due to the nature of the | business processes it was supporting, and the systems it had | to interface with. | devonkim wrote: | Deletion is a form of cache invalidation if you think about it | hinkley wrote: | If not the King, at least the Crown Prince of cache | invalidation. | flarg wrote: | Thank you. And this sort of problem occurs in large | organisations with lots of different monoliths all caching | each others data. | loopz wrote: | Records storing _transactional facts_ , are NOT "caching | each others data". ___________________________________________________________________ (page generated 2020-05-05 23:00 UTC)