[HN Gopher] Soft deletion probably isn't worth it ___________________________________________________________________ Soft deletion probably isn't worth it Author : lfittl Score : 340 points Date : 2022-07-19 18:35 UTC (4 hours ago) (HTM) web link (brandur.org) (TXT) w3m dump (brandur.org) | habibur wrote: | Which is why I don't add that extra deleted field. Rather | duplicate all the tables into a new database called "archive" and | then insert there before deleting from main. | | That works for updates too, by preserving the old data and | showing you a time machine like backlog. But the archive database | gets too large over time and you need to purge it periodically. | You can create some delete triggers for automating this "save | before delete" behavior. | tehbeard wrote: | How do you account for maintaining integrity in the archive? | | E.g. you have 3 users sign up with the same email (a unique | field) one after the other with deletions in-between each sign- | up? | habibur wrote: | No PK, FK or Unique constrains on the archive. Rather use | simple index to speed up queries. | justin_oaks wrote: | I'm not who you asked the question of, but I do sometimes | make use of archive/deleted/history tables. I'll refer to | them as history tables from here on out. | | In short, I leave data integrity to the original table and | drop it for the history table. | | The history table isn't identical to the original table. It | has it's own primary keys that are separate from the original | table. It doesn't include the original table's unique | constraints or foreign key constraints. It also generally has | a timestamp to know when the record was put there. | spfzero wrote: | I like the deleted-items-table suggestion the author makes. It's | useful though, to think about the cases where you'd want to | delete, say a customer with existing invoices. In one situation, | you may have made a mistake and want to start over, say an | operator creates the customer and order, but then the customer | changes their mind. In that situation a hard delete is in order; | you want to "undo" the _creation_ of the customer and invoice, | and nothing further has happened as far as referencing their key | | In other situations though, you may have some reason to treat the | customer as if they were deleted, but better to examine the | reason for that, and use an attribute more relevant to that | reason, such as active/inactive etc. Would be different for | different entities of course. | jasonhansel wrote: | It's pretty easy to solve the foreign key issue (where you need | to write elaborate DELETE queries to avoid breaking foreign keys) | in Postgres using deferrable constraints. Just start a | transaction, run "SET CONSTRAINTS ALL DEFERRED," delete rows from | various tables in any order, then commit the transaction. The | DELETE statements will effectively ignore the foreign key | constraints, but any remaining "broken" foreign keys will be | caught when the transaction commits. | bob1029 wrote: | If you are going to think about this pattern, why not go one step | further and simply event source everything with an append-only, | immutable log? | | You could even sprinkle cryptographic guarantees into the mix. | This would be very challenging to do with mutable DB rows. | phibz wrote: | I've definitely seen soft delete work in practice. A couple | things: for small data sets you can implement the naive | deleted_at you can hide the records from your users by forcing | them to use a view. You can also handle updates on the view to | prevent data conflicting with deleted data if you need to. | | For foreign key constraints you can set the foreign key to null | and orphan the records if the relation is deleted. You could also | hard delete them in this case. It depend on your use case. | | When the data volume grows or the ratio of soft deleted to normal | records is high, you should consider another solution. One | solution you suggested, moving the record to a deleted table is a | fine one. | | The other solution that I've used successfully is to journal your | deletions in another table or system. For smaller volumes having | an audit table Journaling the data and storing the pk, fkeys, and | a serialized version of the record, json works great in postgres, | works well. For large volumes or frequent deletions something | like Kafka or PubSub work better. | | You may very well find others interested in consuming your audit | journal to track changes. Updates and even inserts fit great in | the more general case. | waspight wrote: | I use soft deletes to maintain insights. For instance I would | like to know how many users that has been created in total even | if some has been deleted later on. Is this a bad approach? Most | of the other comments here seems to use it only to be able to | restore deleted entries. | dafelst wrote: | Views are a simple solution to this problem. Pretty much all | moderns RDBMSs support updatable views, so creating views over | your tables with a simple WHERE deleted_at IS NULL solves the | majority of the author's problems, including (IIRC) foreign key | issues, assuming the deletes are done appropriately. | | I feel like a lot of developers underutilize the capabilities of | the massively advanced database engines they code against. Sure, | concerns about splitting logic between the DB and app layers are | valid, but there are fairly well developed techniques for keeping | DB and app states, logic and schemas aligned via migrations and | partitioning and whatnot. | OJFord wrote: | I had the same reaction to the 'code leakage' section, but | 'foreign keys'? You can't reference a view; so you either don't | use them (fks) or they point at the underlying table and you | have the problem described. | | You could have views that say 'thing I have a foreign key to is | not deleted' of course, but that sort of seems like 'code | leakage' again, just in SQL this time. | vlunkr wrote: | That doesn't solve the foreign key problem. You can still | easily have a reference to a record that is "deleted" | yen223 wrote: | Also in Postgres, you cannot have a foreign key constraint | that references a view, not even a materialised view. | | I'm with the author on this one. Any soft delete logic does | have a tendency to bleed into other systems and make your | systems more complicated, for very little gain. | 7crow wrote: | > there are fairly well developed techniques for keeping DB and | app states, logic and schemas aligned via migrations and | partitioning and whatnot. | | Hi, <1 yr experience swe here. Would HN mind unpacking | "whatnot" with specific names of some these techniques? | semiquaver wrote: | The main problem with views for this use case in practice is | that they ossify your schema. Views and matviews are | effectively a dependency tree, and many common types of schema | evolution become substantially more difficult when the system | forces you to wrap your DDL in a series of view drop/recreation | steps. | | This is merely annoying when dealing with regular views because | recreating even a large number of views is fast, but can be | catastrophic if you have any matviews in your table dependency | tree. A matview can easily turn what should be an instantaneous | DDL operation into a partial outage while the matview is being | regenerated. | | (this is all postgres specific, it may be untrue for other | systems) | smallnamespace wrote: | To avoid an outage, have you tried fronting the matview with | an additional view to allow hot-swapping? | enepture wrote: | As an FYI using a tool like DBT solves this problem. As | someone who was not a data engineer I was not familiar, there | were tools like this | teej wrote: | dbt is great, but I'm not sure it's appropriate to manage | objects in a production transactional database. It's | designed for analytical, columnar databases. | cpursley wrote: | I have this problem. What is DBT? | banashark wrote: | I believe this: https://docs.getdbt.com/docs/introduction | PaulMest wrote: | In case you're not familiar with DBT, it helps with the | "transform" in ELT | https://docs.getdbt.com/docs/introduction. | slt2021 wrote: | it is very dangerous to have dependency on materialized view | - it is a poor architectural decision from DBA to do that. | | if you want view depending on mat view - materialize it | yourself in a table, and refresh it yourself controllably. | semiquaver wrote: | Aren't you just saying that materialized views should not | be used? I'm pretty sure that was my original point. | ibejoeb wrote: | > developers underutilize the capabilities of the massively | advanced database engines | | So true. There are so many amazing, powerful features in all of | the major players. | | Also: updatable views are amazing. With query rewriting | (whatever you vendor calls it) you can affect some truly | material changes to the system without any changes to the | client applications. An example would be implementing temporal | relations. | chrisshroba wrote: | How would a view solve the foreign key issue? Are you | suggesting coding specific deletion triggers into the view such | that appropriate foreign keys are "cascade" deleted when a row | in the view is deleted? | jermaustin1 wrote: | because when you are getting a child record through a join | (in the view), the parent will never have deleted_at set. | | Say I have a simple view `select * from foo join bar on | foo.foo_id = bar.foo_id where foo.deleted_at is null` | | I never have to worry about deleting from bar, because I | should never grab a child when the parent is 'deleted'. | thehappypm wrote: | Assuming that the joins are always joined against the | views, not the raw tables. This introduces an opportunity | for a mistake. | rodw wrote: | Seriously. That "Downsides: Code leakage" point is nonsensical. | | ``` CREATE OR REPLACE VIEW active_customer AS SELECT * FROM | customer WHERE deleted_at IS NULL OR deleted_at <= NOW() ; ``` | | There, I fixed it. | | Just use `active_customer` instead of `customer ... deleted_at | IS NULL`. | | In fact, since the deleted_at column is a timestamp, the | original "leakage" query: | | ``` SELECT * FROM customer WHERE id = @id AND deleted_at IS | NULL; ``` | | is actually broken. A non-null `deleted_at` timestamp that's in | the future implies the record hasn't been deleted yet, right? | | I've often had junior devs assert that views are some kind of | code smell, but these sorts of "canned query/filter that you | want to apply very very often" seem like the perfect use case | for a view to me. It's DRY, and the fact that your standard | "query" is in the database" means you can change it more | readily than trying to make sure you hit all the points it | might be embedded in the application code. | | > I feel like a lot of developers underutilize the capabilities | of the massively advanced database engines they code against | | Early-ish in the JDBC days a senior dev I was working with at | the time (as a junior dev myself) made a pretty good case that | "the database is part of the application" that's always stuck | with me. Full database independence via software level | abstractions is a pretty silly goal outside of library code. If | you have a service that makes extensive use of the database, | don't throw away the database features in the interest of some | abstract "we could swap out oracle with mysql without changing | anything" objective. If you want it to be generic, use the SQL | standard, but don't be afraid to have a few db-specific bits in | the app code if that's a subsystem you _might_ replace once a | decade or something. | | I blame the DBA/Dev divide for a lot of this. A lot of the | impedance between these layers is social/procedural. If you can | change the DB as easily as the code, there's a lot less fear of | using the right tool for the specific job. | doctor_eval wrote: | > the database is part of the application | | 100% this. If you accept that the database is part of the | application, you give yourself permission to use the full | feature set of the database, and life becomes a lot simpler. | Using views, stored procedures and other features lets you | implement things like soft delete trivially, without it | infecting all your application code. | | In my entire career I've changed backend databases for an | application exactly twice. It's not easy, and no amount of | abstraction is likely to make it easier. | djur wrote: | The query isn't broken. In the Rails community at least it is | very common to use a nullable frobbed_at column to indicate | both "was it frobbed" and "when was it frobbed". In that | context, the boolean check is always NULL/NOT NULL, rather | than a time comparison. | madisp wrote: | I'd argue that the simple `deleted_at IS NULL` check is not | broken - unless your product / domain specifically allows and | requires scheduled future deletions adding such logic can | easily introduce bugs. For example, you could to get the | comparison flipped by accident, and if it's only in one place | out of many that bug could go unnoticed for a while. | doctor_eval wrote: | You don't put the check in your code, you put it in the | view, and access the data exclusively through that view. In | that way, the check is defined exactly once. | nousermane wrote: | > (...updatable view...) WHERE deleted_at IS NULL | | This is the way. Also, save record creation timestamp, and you | can have very flexible "time-machine" selects/views of your | table essentially for free. | layer8 wrote: | How does this help with foreign keys? Normally you can't have | foreign keys referencing a view. | | I agree that one should make use of RDBMS capabilities. A check | constraint may be practical instead of (or in addition to) the | foreign-key constraint. | zozbot234 wrote: | Views can be used to implement pretty much any kind of | automated inference, reasoning, rules etc. on the "raw" table | data. The example of filtering out deleted records is just one | of the simplest. That one single feature can easily transform a | simple DB platform into a fully-featured knowledge base system, | easily usable to support even complex reasoning tasks. | aeyes wrote: | At least in Postgres, having a huge amount of "dead" data in | large tables is problematic because vacuum always has to read | the full data set. | | Even with conditional indexes where you exclude deleted data | you take a significant performance hit reading dead blocks | because there is no way to quickly vacuum them. You accumulate | hours of bloat until your vacuum finishes. | | You can't beat a separate insert only archive table which you | never have to vacuum. | anarazel wrote: | Vacuum does not have to read the full data set every time. | The visibility map tracks, on a block level, whether all rows | in a page are known to be visible to everyone (starting in | 8.4 or such) and whether the page is "frozen", i.e., does not | contain visibility information that might need vacuuming | (starting in 9.6 IIRC). | | However, indexes _do_ currently have to be scanned as a | whole. But that 's only done by autovacuum if there's enough | row versions for that to be worth it (in recent versions). | layer8 wrote: | Shouldn't partitioning help with that? (I have no experience | with Postgres.) | tomnipotent wrote: | Only if those partitions are on separate storage, otherwise | you have the same number of dead tuples/dirty pages. | SoftTalker wrote: | A problem (unless something has changed, my context is Oracle | from some time ago) is that NULL values are not indexed. So the | "WHERE deleted_at IS NULL" could trigger a full table scan. It | can also cause row migration when the NULL value is eventually | filled in. Unless you explicitly need the deleted date, it's | probably better to use a non-nullable Y/N for this. | remram wrote: | It seems Oracle does it although there is a special syntax to | opt-in. That seems wild. I am not aware of another DBMS | having that limitation though. | firloop wrote: | Views can really bite you performance wise, at least with | Postgres. If you add a WHERE against a query on a view, | Postgres (edit: often) won't merge in your queries' predicates | with the predicates of the view, often leading to large table | scans. | AdrianB1 wrote: | There is no impact with views in MS SQL. You can also have | indexed views and filtered indexes, so you can have even | better performance. | dafelst wrote: | IIRC Postgres has supported predicate push down on trivial | views like this for over a decade now, and possibly even more | complex views these days (I haven't kept up with the latest | greatest changes). | firloop wrote: | Postgres can do it, you're correct, but in my experience it | rarely happens with any view that's even slightly non- | trivial even on recent versions of Postgres. Most views | with a join break predicate pushdown. It greatly reduces | the usecases of views in practice. | sarchertech wrote: | I haven't had any problems with this at all and I've been | using joins in my views for years. | | Are you using CTEs in your views? | tomnipotent wrote: | Postgres 12 fixed the CTE issue. | bavell wrote: | Postgres docs on CTEs: | https://www.postgresql.org/docs/current/queries-with.html | [deleted] | [deleted] | jaydub wrote: | Based on my experience, I like the author's approach since it | makes things pretty clear-cut and optimized the storage in the | core table (in my experience as well, deletes happen frequently | and the soft deletes are rarely touched). In large, row- | oriented tables that that storage can add up and even with | views/materialized views there's a cost to using/maintaining | those as well. | 5e92cb50239222b wrote: | This is one of those situations where a good ORM can simplify | things greatly. For example, with EF Core you can add a global | filter which will filter out soft-deleted rows in all queries | automatically (unless you add .IgnoreQueryFilters()). | | It couples nicely with some hackery which turns removes into | soft-deletes. You can remove objects as usual and they get | soft-deleted in the database. | | I've used this in a few projects and it's fantastic. | | https://docs.microsoft.com/en-us/ef/core/querying/filters | | https://www.thereformedprogrammer.net/ef-core-in-depth-soft-... | airstrike wrote: | Views are such a powerful concept I'm honestly disheartened by | how hard it is to use, replicate or leverage that functionality | outside of dropping straight into the db shell | quickthrower2 wrote: | See http://materialize.com | pbardea wrote: | > assuming the deletes are done appropriately | | This is one gripe I have with soft-deletion. Since I can no | longer rely on ON DELETE CASCADE relationships, I need to re- | defined these relationship between objects at the application | layer. This gets more and more difficult as relationships | between objects increase. | | If the goal is to keep a history of all records for compliance | reasons or "just in case", I tend to prefer a CDC stream into a | separate historical system of record. | dexwiz wrote: | You may end up doing this anyways if you have any application | code that needs access to delete hooks, or access control | varies across objects. At this point, you are probably using | a ORM instead of direct queries, and place logic that could | be in the db instead at the app layer. | danielrhodes wrote: | Being unable to effectively use foreign key relationships is | definitely a downside of using soft deletes. But it's also | worth asking if these types of behaviors, which would also | include a feature like triggers, really belongs in a database | or whether it's better to have at the application level (or | at least at a layer above the data layer). I'd argue that | ultimately you probably don't want these things at the DB | level because you get into a situation where you're sharing | business logic between two (or more places). | Volundr wrote: | I'm less likely to use triggers, but I'll say I pretty much | always want proper foreign key relationships set up in the | database. Unique and other constraints too. In principal I | might agree with you that it's an application level | concern, but being able to setup these kind of invariants | and trust that they will be enforced even in the face of | bugs in my code (and there will be bugs in my code) is just | too powerful to let go of in the name of purity. I'd much | rather let a bit of business logic creep between multiple | layers that discover that I have a whole bunch of orphaned | records and no sensible way to reconcile them. | corrral wrote: | The DB's responsible for maintaining the integrity of data | in it, unless there's some very good reason you can't let | it do that. It's faster and better at it than your | application, 99% of the time, and it can keep doing that | even when/if a _second_ application starts using the same | database, or when being used from a SQL prompt, or | whatever. | mmerlin wrote: | My perspective is DB level triggers are the absolute very | best place to put cascading update/delete logic so it only | ever needs to be written once and is consistent regardless | of any future frontend clients that might be written in a | different language and/or framework than the original | codebase. | | Right now in $dayjob I am converting an old non-DRY | codebase from NoSQL data layer format to proper relational | SQL backend. | | This old front-end was created by verbosely coding up a | relational cascading update/delete system for the NoSQL | backend, in numerous places redundantly with subtle | differences and inconsistencies, making the code brittle. | | My current estimate is some front end functions will be | reduced in LOC size by 95% once we use the power of SQL in | the backend. | | And the backend SQL Triggers+StoredProcedure required to | replace these long NoSQL frontend functions doing cascading | updates/deletes is only around 10% the size of the replaced | front-end code. | | And now future new frontends can reuse this without | exploding the size of their codebase where complex data | operations are required. And no need to reinvent the same | data handling algorithm all over again (and risk subtle | variation creeping in from the different front-end | implementation of data algorithms) | munk-a wrote: | If we're assuming you're using a view based approach which | elides the soft deleted rows automatically then you'll get a | lot of these dependent objects correctly updated for free | assuming you're pulling them out of the DB with JOINs - | SELECT FROM foo JOIN bar (assuming bar is a view into | barwithdeleted) will automatically filter out the invalid | rows from foo... if you're using this information to populate | a CRUD interface it's likely you'll be JOINing bar already to | get some metadata for display (like maybe bar.name instead of | the surrogate bar.id key you use for joining). | nfm wrote: | Yes, but other queries (any aggregate queries that don't | join the soft deleted table, any joins to other tables) | will now return rows that would have been deleted under | hard deletion with cascade. | corrral wrote: | You could use a trigger to cascade soft-delete flag | toggles, provided all the relevant tables have such a | column. Still have to factor that into your other | queries, but at least you wouldn't have to make | potentially-impossible-to-exhaustively-check joins to | figure out if a given row was "deleted". | munk-a wrote: | This is definitely something to watch out for, but in | practice (as someone that migrated a system without soft | deletes to one that had them) I found that it doesn't | tend to come up nearly as much as you might think - | usually the table being transitioned to support soft | deletes is a relatively core table (since ancillary | tables usually aren't worth the complexity to transition) | so a lot of your reporting queries will already be | pulling in that table. You definitely need to check to | make sure you're not missing anything - and sometimes | CRUD interfaces will need to be completely revamped to | include the table in question - but it's usually not | _that_ hard. | dspillett wrote: | _> Since I can no longer rely on ON DELETE CASCADE | relationships_ | | Cascaded deletes scare me anyway. It only takes one idiot to | implement UPSERT as DELETE+INSERT because it seems easier, | and child data is lost. You could always use triggers to | cascade you soft-delete flags as an alternative method, | though that would be less efficient (and more likely to be | buggy) than the built-in solution that cascaded deletes are. | | If you look at how system-versioned (or "temporal") tables | are implemented in some DBMSs, that is a good compromise. The | history table is your audit, containing all old versions of | rows even deleted ones, and the base table can be really | deleted from, so you don't need views or other abstractions | away from the base data to avoid accidentally resurrecting | data. You can also apply different storage options to the | archive data (compression/not, different indexes, ... | depending on expected use cases) without more manaully | setting up partitioning based on the deleted/not flag. It can | make some query times less efficient (you need to union two | tables to get the latest version of things including deleted | ones, etc.) but they make other things easier (especially | with the syntactic sugar like AS AT SYSTEM_TIME <when> and so | forth) and yet more things are rendered possible (if | inefficient) where they were not before. | | _> I tend to prefer a CDC stream into a separate historical | system of record._ | | This is similar, though with system versioned tables you are | pretty much always keeping the history/audit in the same DB. | | --- | | FWIW: we create systems for highly regulated finance | companies where _really_ deleting things is often verboten, | until it isn 't and then you have the other extreme and need | to _absolutely_ purge information, so these things are often | on my mind. | michaelt wrote: | _> It only takes one idiot to implement UPSERT as | DELETE+INSERT because it seems easier, and child data is | lost._ | | Seems unfortunate to miss out on all the referential | integrity benefits of a serious database when hiring | standards, training and code reviews should all be | preventing idiotic changes. | | If I'm making a shopping cart system, I want to know every | order line belongs to an order, every order belongs to a | user and so on. Anyone who can't be trusted to write an | update statement certainly can't be trusted to avoid | creating a bunch of orphan records IMHO. | wvenable wrote: | Often you don't have to rely on ON DELETE CASCADE | relationships. Because you are never deleting anything, you | will never have any orphaned records. If you don't want to | see say Invoices for a deleted Customer then that's just | another filter feature. | | Mostly I use soft-delete because for auditing requirements we | pretty much can't remove anything but also because nothing | ever truly goes away. If we have an Invoice or Order then, | from our perspective, we must have those forever even if the | corresponding client is deleted and can never place another | one. | nkozyra wrote: | > Often you don't have to rely on ON DELETE CASCADE | relationships. Because you are never deleting anything, you | will never have any orphaned records | | Exactly. Unless you're doing something silly like adding | deleted at to bridge tables ... which, you probably don't | need even in 1:many. | kukx wrote: | Is not there any attempt to improve the soft deletion at the | engine/SQL level? I can see it as a possible feature request. | pbardea wrote: | One interesting feature that some DBs implement is | something like SELECT AS OF SYSTEM TIME | (https://www.cockroachlabs.com/docs/stable/as-of-system- | time....) which _kinda_ does this. | | However in practice this usually dramatically slows down | reads if you have to constantly skip over the historic rows | so you probably don't want to keep garbage round longer | than absolutely necessary. The concept of a historic table | mentioned below could be interesting though - especially if | it could be offloaded to cold storage. | simcop2387 wrote: | There's the idea of temporal tables, | https://pgxn.org/dist/temporal_tables/ | | It's not a standard (I think) but it'd let you do a | cascading delete and then be able to go and look at the old | objects as they were at time of deletion too. | | You'd need to do things very differently to show a list of | deleted objects though. | psYchotic wrote: | It appears that there's been an attempt at standardizing | temporal features in SQL in the SQL:2011 standard: | https://en.wikipedia.org/wiki/SQL:2011 | dspillett wrote: | _> temporal tables ... It 's not a standard_ | | They were introduced in ANSI SQL 2011. | | How closely implementations follow the standard I don't | know, but something close exists in several DBMSs: I use | them regualrly in MS SQL Server, there are plug-ins for | postgres, MariaDB has them, and so forth. | smallnamespace wrote: | If you're using PostgreSQL, you can implement cascading | soft-deletes yourself. | | The information schema table holds all foreign key | relationships, so one can write a generic procedure that | cascades through the fkey graph to soft-delete rows in any | related tables. | BeefySwain wrote: | Could someone take a stab at an example of what this | would look like? Sounds really interesting. | xwdv wrote: | Doubt it. It seems like something obvious yet I've waited | so long for it. Seems like you have to rely on third party | plugins. | dragonwriter wrote: | > This is one gripe I have with soft-deletion. Since I can no | longer rely on ON DELETE CASCADE relationships | | If you use soft deletes on all tables, you can also cascade | them as long as you either cascade updates to the real keys | as well, or prevent such updates, by having a deleted flag | column on each table, including it in a unique constraint | with the actual key column(s), and including it in the | foreign key. | coding123 wrote: | I was going to chime in with this. thanks. One issue with views | however is that a lot of these features require more and more | nuanced knowledge of RDBMSes where these days unless you have a | veteran architect, most of the team just knows the various | library/tooling that interacts with "a variety of databases" so | there is often less effort to go deeper. | quickthrower2 wrote: | Where is this anti-fb culture? Is it a startup thing? | | Everywhere I have worked people know a decent amount about | their data store. Not architects, just mid devs and higher. | [deleted] | lowercased wrote: | "Instead, we rolled forward by creating a new app, and helping | them copy environment and data from the deleted app to it. So | even where soft deletion was theoretically most useful, we still | didn't use it." | | But... weren't you using all those env and data info from the | soft-deleted set? | | I've typically been using soft-deletes for most projects for | years. People have accidentally deleted records, and having a | process to undelete them - manually or giving them a screen to | review/restore - has usually been great. | | Yes, if there's a lot of related artefacts not in the database | (files/etc) that were literally deleted, you may not be able to | get them back. But that's an ever greater edge case in projects I | work in as to not be a huge issue. We probably have some files in | a backup somewhere, if it's recent. Trying to 'undelete' a record | from years ago - yeah, likely ain't gonna happen. | | People are used to 'undo' and 'undelete'. Soft-deletes are one | way to provide that functionality for some projects. | tehbeard wrote: | > Instead, we rolled forward by creating a new app, and helping | them copy environment and data from the deleted app to it. So | even where soft deletion was theoretically most useful, we still | didn't use it | | I don't get this statement. You wouldn't have had the env or data | without soft delete? You did use it! | | I would say, soft delete isn't a tick the box and done solution | as many ORMs make it. | | You need to consider the data model, and adjust your queries to | that. | | It may make sense for a product to be deleted, but orderlines | still able to access it to display product name etc. | | With blob data, I tend to move that to a "bin" with a 30-60 day | grace period. Customers know quickly reporting, we can fully | recover, while outside that time they'll have to provide images | etc. It's a decent compromise. | | Reuse of unique fields is the sticking point I run into often, as | mysql interprets null as not clashing with other nulls so | composite uniques using the ID and deletion date don't work. | TylerE wrote: | > mysql interprets null as not clashing with other nulls | | Which is correct per SQL. Null is NaN, not zero (or negative | infinity). | GartzenDeHaes wrote: | Personally I like no delete designs, which give you a full audit | history of changes. This is similar to generally accepted | accounting principles. | https://en.wikipedia.org/wiki/Generally_Accepted_Accounting_... | znpy wrote: | Your taste in database design is probably nor gdpr compliant, i | hope you don't work in the eu. | munk-a wrote: | It's surprising but the EU tends to be one of the most | stringent regions to work in both when it comes to totally | permanently deleting things and when it comes to never ever | deleting things - as with anything like this where there is a | debate (rather than a settled best practice) there are some | times when soft deletion is appropriate and necessary (i.e. | to adhere to log retention requirements common in the EU) and | some times when it's unnecessary... and the occasional fun | time when it's both necessary to support soft deletes and | hard deletes - when logs need to be retained for auditing | purposes but also when some users can force a hard delete | (leading to that data either being purged or moved to an | archive storage if it's still needed for auditing purposes). | | The world is almost never as simple as it seems. | munk-a wrote: | So if an account was active in your system and is active no | longer... do you soft delete it (even if that means UPDATE ... | SET active = 'f') or hard delete it? | GartzenDeHaes wrote: | That depends on the problem domain and how you design the | system, since there are several way to do this. Typically in | financial systems you would either have a start and end date | on a summary record, or add an inactive record to a | transaction table that has a record for each change to the | account, or both. | munk-a wrote: | I just wanted to touch on the fact that eliding soft-deleted rows | from queries is really, really easy - this article makes it out | to be a constant headache but here's my suggested approach. | ALTER TABLE blah ADD COLUMN deleted_at NULL TIMESTAMP; | ALTER TABLE blah RENAME TO blahwithdeleted; CREATE VIEW | blah (SELECT * FROM blahwithdeleted WHERE deleted_at IS NULL); | | And thus your entire application just needs to keep SELECTing | from blah while only a few select pieces of code related to | undeleting things (or generating reports including deleted | things) need to be shifted to read from blahwithdeleted. | yladiz wrote: | But, assuming you don't really need the data, why make your | queries more complex and instead just actually delete the data? | munk-a wrote: | This is a really scenario specific question - sometimes it's | needed, sometimes it isn't. At my shop we have customers that | will suspend their account but our sales team is pretty damn | awesome so usually they end up renewing after going a while | without our product - so being able to easily restore a large | swath of former customers with all their permissions and | preferences intact with a simple click of a button is a huge | win compared to having a dev try to piece the data together | out of backups that are six months out of date (a little | secret... we never did this and just put the obligation on | the CS team to manually recreate the records since that cost | the company less). | yladiz wrote: | If I was a tech lead and also wanted to advocate for hard | deletion, I would ask the question for this scenario: | "What's the cost of keeping all this data unnecessarily, | modifying most queries to filter for deleted data, and | dealing with other various consequences of soft deletion, | and how does this cost compare with the cost of building a | bespoke tool to restore deleted data at a customer's | request within a certain time frame and compare with the | benefit of a customer being able to restore their data at | the 'click of a button'?". | | Having dealt with systems that have hundreds of millions of | records or more, many of which reference deleted data and | are therefore useless, I lean towards hard deletion more | and more and on the off chance that deleted data really | needs to be recovered you build a separate | system/infrastructure to support that, rather than building | your _entire_ system around the small likelihood you really | need to restore it. | munk-a wrote: | As your data architect I'd probably answer your question | "Well, it'll take a little while longer to vet all the | indices - since we'll probably want most indices to | filter on WHERE deleted_at IS NULL but we'll likely want | a few without that constraint for managing undeletion. | We'll use more space on disk which, honestly, is pretty | much a non-issue in the modern world - and if it gets bad | enough we can always partition the table on deletion | status and dump the soft deleted rows on a secondary | server... but I wouldn't worry about that until we hit | facebook scale. In terms of application developer time - | I'll probably need an hour of their time to explain it | once and all existing queries keep working as they're | working now... and as for that undelete tool, well, we | don't actually have to build it - if we don't build it we | can just ask devs to submit manual queries to undelete | data rows as needed via our migration, or oneoff or | console interface. We'll want to do a sweep for any | queries that reference tables dependent on the soft- | delete having table - just to make sure they're throwing | an INNER JOIN against the view but, honestly, we could | just bop those bugs on the head if they ever come up." | | Like, I'm definitely not saying soft-deletion is always | the answer, it takes additional effort and planning, but | it's really, really easy to do safely. | TylerE wrote: | Becauase that's a huge assumption. From my experience, it's | sorta like the giant box of old cables most techies keep | around. | | Do you reach into it often? No. | | But when you do it absolutely saves the day and makes you a | hero. | bjourne wrote: | This is not a solution. It introduces a leaky abstraction which | sooner or later will lead to errors. Sure, all code _you_ write | will access the view and not the table. But how can you ensure | all other code in the organisation uses the view? Perhaps you | add some access control to the table so that only authorized | users can read directly from it, but that 's even more | technical overhead. Then you have foreign keys. If you have a | "deleted" column in the Customer table you need to remake the | Invoice table as a view so that it hides invoices for deleted | customers. The same goes for the InvoiceItem table (foreign key | of a foreign key) and all author auxiliary information related | to the soft-deleted customers. | | Furthermore, the cost of an error is potentially massive. | Someone new at the company makes a revenue report based in the | billed Invoices and does not realize they should query the view | and not the table... Not great if 90% of all invoices belong to | soft-deleted customers! | | The author is right; soft-deletes are probably most definitely | not worth it. There are many better ways to solve the problem. | croes wrote: | If you mention the costs of reporting error you should | consider the costs of wrongful hard delete too. | | And reports are developed and tested before they are used for | crucial purposes. | tshaddox wrote: | > Furthermore, the cost of an error is potentially massive. | Someone new at the company makes a revenue report based in | the billed Invoices and does not realize they should query | the view and not the table... Not great if 90% of all | invoices belong to soft-deleted customers! | | I'm not sure I buy this argument. It's certainly conceivable | for that to happen, but no more so than any other case of | "the engineer queried the wrong table and thus got incorrect | results." There's never going to be any _technical_ way of | preventing this: if you have access to multiple sets of | numbers, and you want to sum up one set of numbers but | mistakenly sum up the other set of numbers, you 're going to | get the wrong answer! | bjourne wrote: | > "the engineer queried the wrong table and thus got | incorrect results." | | The difference is that the path of least resistance, the | most obvious method - just query the damn table - is | incorrect. Bad design can certainly make a system more | error prone. | tshaddox wrote: | In the example from this thread the names of the table | and view are reasonably clear, so even in a hypothetical | project without any external documentation of naming | conventions or engineering processes (code review, etc.) | the most obvious thing would be to query _blah_ instead | of _blahwithdeleted_. | | Of course, this is _extremely_ hypothetical, and in any | real project where you 're generating a financial report | you absolutely must have a detailed understanding of the | relation that you're aggregating over. Even if your | project has very strict naming conventions for tables, | views, etc. you've gotta put in more work than "this | short string sounds like a plausible label for the | relation I want to aggregate over." | bjourne wrote: | "reasonably clear" are famous last words. It ignores tons | of evidence on how commercial software development works. | Everything is "reasonably clear" in isolation, but not | when you throw in thousands of other "reasonably clear" | things developers are supposed to keep track while not | missing tight deadlines. Fact is that if you add | opportunities for people to screw up then you will make | people screw up, regardless of how "reasonably clear" or | "obvious" the system is. | | Maybe you need more work experience because believing | that it is implausible for someone to accidentally query | the customer_with_deleted table over the customer view is | incredibly naive. Likewise, people generating financial | reports _can_ have detailed understanding of data | modelling but scarily often don 't. Give them extra | opportunities to fuck up and they will take them every | time. KISS | munk-a wrote: | I don't really agree with that. Within an organization you | have documentation and instruction as tools - but you're also | making the dumb approach (SELECT * FROM blah) the correct | approach. If a user is writing a query against the DB, has no | idea what the layout of the data is, and decides to prefer | blahwithdeleted over blah then I'd really question whats | going on at your organization - blahwithdeleted is pretty | clearly self-documenting and it's likely a lot of your other | domain specific tables with be much harder to naively | discover your way through. | | I, personally, would in no way restrict access to | blahwithdeleted, but I have made a pattern of it in our DB, | there are about a dozen blahwithdeleted tables - each with a | corresponding blah view... I usually get about one question | per every two new employees about which table to use which I | can answer in less than a minute with a helpful little | explanation. | | I'd also mention I've not made a specific value statement on | soft deletions in a general case since, if there was a clear | general case solution we'd just all do that. This is a | decision that needs to be made on a per table basis - it's a | rather trivial decision in most cases, but it's very specific | to the problem at hand. | scifibestfi wrote: | > When I worked at Heroku, we used soft deletion. When I worked | at Stripe, we used soft deletion. At my job right now, we use | soft deletion. | | > As far as I'm aware, never once, in ten plus years, did anyone | at any of these places ever actually use soft deletion to | undelete something. | | That's wild. So it seems the idea of needing undelete is largely | an unfounded fear. | revskill wrote: | In realworld, there's no concept as deletion from DB ! | | There's only deactivate account, archive a legacy product,... | | Because there's no such thing as delete something from real | world. | dfee wrote: | My experience is that soft-deletes are blunt tools bridging the | gap between hard deletes and event sourcing (capturing all the | changes against the table, in a replay-worthy stream). | | Event sourcing is hard - because the engineers responsible for | setting it up and managing it aren't generally well skilled in | this domain (myself included) and there aren't a wealth of great | tools helping engineers find their way into the pit of success. | | The downsides of soft-deletes (as identified in the article) are | numerous. The biggest problem is that it appears "simple" at | first blush (just add a deleted_at column!), but it rots your | data model from the inside out. | vbezhenar wrote: | You can restore to any point of time from your database backup. | So it can cover some requirements. | Terr_ wrote: | > My experience is that soft-deletes are blunt tools bridging | the gap between hard deletes and event sourcing | | Agreed, sometimes it makes business-sense to implement it, but | in the big picture it's still kludgy and not-ideal. | | While full-on event-sourcing isn't always the answer, once | business-rules prevent you from un-deleting anything there's | not much point of having all those dead-rows interspersed in | your regular tables. | zozbot234 wrote: | An event store is just a special case of a temporal database. | The whole point of temporal databases is to natively support | the notion of historical vs. current data. | redavni wrote: | Just want to add that the downsides as identified in the | article make little sense. Deleting a customers invoices should | be a very rare thing. I can't imagine any accountant or auditor | is going to be happy with an IT guy deciding when to delete | invoices. | | If accidentally writing the wrong query is a problem, then | writing the wrong query is your problem. | btown wrote: | The question for either of these systems IMO is: do you trust | that a change from your upstream represents a true, everlasting | intention, or is it something that may need to be reinterpreted | or rolled back in the future? | | At my startup, soft deletes for our SKUs are critical, because | we work with data sources where notoriously both the technical | systems and the humans driving will all-too-frequently | accidentally represent something to our connection as deleted. | Or there might be an irrecoverable error when asking "what | things are still active upstream" - but that doesn't mean the | SKUs are deleted, we might just not have certain live details | until a bugfix is made. So "error status" and "soft delete" are | somewhat synonymous, and both require investigation into root | causes and root intents. Yes, the concept of "unerrored and | active" is peppered through our codebase and analytics - but | our ability to recover from supplier technical mistakes is much | higher as a result. And we could absolutely do this with an | event sourced system - but the tooling for relational databases | is so much better, it's night and day. | rodelrod wrote: | Or you can see it the other way around: soft-deletes are a | pragmatic alternative to event sourcing that provides a lot of | the value without requiring a team of super-humans and a | radical redesign of the existing systems. | willlll wrote: | For the control plane part of Crunchy Bridge, on day one I | decided to go with the deleted_records table that is mentioned at | the end of this post. It's been great. No need to keep around | dead data that no one ever looks at. | | We don't need to have `where deleted_at is null` on every single | query. But the best part though is our actual working data set of | records we actually care about is tiny compared to the deleted | cruft that would have otherwise been just sticking around | forever. Backups and restores take no time at all. It's really | cool that postgres lets you have conditional indexes on things, | but it's even cooler not to need them. | dcdc123 wrote: | If you are using a state manager with models in something like | rails/django/etc then it is trivial to support soft deletion | without it infecting your entire code base. | duxup wrote: | > All our selects look something like this: SELECT * FROM | customer WHERE id = @id AND deleted_at IS NULL; | | Solution... a whole other table of deleted stuff... in a new | structure. | | Man soft deletes just look better to my eye. | satyrnein wrote: | We switched a lot of tables to soft deletes so we could replicate | those deletes into our data warehouse. You can also use bin log | replication for hard deletes, but every schema change would break | it. | msie wrote: | I've used Soft Deletion so many times so I'll say it's been worth | it. I believe using an audit table would have made recovery more | difficult for me. Anyways take this advice with a grain of salt. | It's only one guy's opinion. As is mine. | lolsal wrote: | In my 20 years of software experience the soft delete is not so | often used to _undelete_ something, but more often used to _know | what_ has been deleted. If you delete a record from a table, did | it ever exist? Can you reference that customer /user/product ever | again? Not to mention the one-in-a-million case where a customer | had their account erroneously or fraudulently deleted - | undeleting saves time/money/bacon when it's needed and is | relatively inexpensive to maintain. | justin_oaks wrote: | The author didn't mention it, but restoring data from a database | backup is a perfectly reasonable way to handle undeletes. By this | I mean the situations that are "Oh crap, we didn't mean to delete | that!" instead of usual business operations. | | I've probably restored data from backup maybe 4 times in my | career. I greatly prefer to do this on the rare one-off scenario | than to deal with the overhead of soft deleting everything. | marcosdumay wrote: | The difference in framing one gets by looking around is | amazing, even funny. | | > I've probably restored data from backup maybe 4 times in my | career. | | Yet, I often use soft-deletes because it allows people to | undelete things from the software interface and not call me all | day long. | | But that's not the most common reason I have for them. Normally | it is because the data just can not be gone, and the full table | is still important somewhere. | hn_throwaway_99 wrote: | The deleted records table he mentions at the end is a good | approach, but: | | 1. This can easily be done with a trigger, so that you just call | a DELETE on the table and deleted tables are copied to the | deletion table automatically. | | 2. I prefer, instead of having a jsonb column, that each table | has a corresponding `deleted_original_table_name` table that | exactly matches the schema of the base table, with the addition | that the first column is a `deleted_at` timestamp. It's easy to | use helper methods in schema migrations to always keep the table | definitions in sync. | pierrebai wrote: | The author claims pruning soft-deleted entries requires a complex | query, but hard-deleting an entry would have required the same | complexity. So it's really not an argument. | tgbugs wrote: | One use case that I think is not sufficiently considered in this | is related to two comments I made about a year ago [0, 1]. | | If you can _actually_ delete something, then that means that a | malicious actor can fabricate data an claim that you deleted it. | GDPR may be well intentioned but systems that have the ability to | remove any record of a thing lay the groundwork for systematic | fabrication of data, because any record of the past has been | erased. | | Operationally, I can totally see why soft delete might be | considered to be problematic in certain cases, but from an | information security point of view I think it is absolutely | critic for protecting users against a whole class of attacks. | | 0. https://news.ycombinator.com/item?id=27249738 1. | https://news.ycombinator.com/item?id=27691442 | jonstaab wrote: | If you implement soft delete, you should surface it to your user. | That's who is accidentally deleting things, and that's who will | want to un-delete them. As for side effects like spinning up/down | servers, build that into your data model (of course, in a case | like Heroku's that can be prohibitively expensive, so don't). | | Source: I write back of house software for resale store owners, | and accidental deletes happen occasionally. Being able to restore | things instills a lot of confidence for our customers. | kleer001 wrote: | Yea, it is. | kache_ wrote: | wait until this guy finds out about financial regulations | jelkand wrote: | Soft deletion is certainly very situationally worth it. I've | found the most value when 1. it is well supported at the ORM | layer and 2. business requirements dictate strong auditability of | data. While I have undeleted items on occasion, I've used soft | deletes more frequently to debug and build a timeline of events | around the data. | | For context, I've worked in fintech where I often needed to | review backoffice approvals, transactions, offers, etc. | krstf13 wrote: | Wouldn't storing the deleted data in an immutable storage, with | time stamp, be much better for auditability ? I mean how could | you audit deleted, restored and deleted again data with that | setup? Also, while I know it's not really accurate, I tend to | understand relations as sets, it makes me uncomfortable to have | soft deleted data that are neither member or not member of the | set. | chomp wrote: | Yep, we have an abstraction layer on top of the ORM to provide | common queries. "Give me all X" will always return stuff not | soft deleted. Data people also like to go diving through old | data, and without getting into data warehousing and stuff like | that, it's not too complex to support a single flag to enable | us to keep old stuff. | delusional wrote: | They might like to, but you should definitely consider if | saving data no longer required for your business violates | privacy regulation/ethical considerations. | chomp wrote: | Definitely. When a user "deletes" their account, we null | out all identifying fields to "DELETED_PII_$user_id". We | have running metrics we compute that would go off the rails | if we dropped the row completely. | hinkley wrote: | In my limited experience, soft deletion also has better | prospects where partial indexes are involved, since it reduces | the size of the index and reduces search and insert time a | little bit. If soft deletes are rare, you aren't going to see | much of a payback for your investment in code complexity. | | And since you can never really be sure what you'll need 2 years | from now, I imagine there are a lot of anecdotes out there of | people who implemented it thinking it would be used a lot, and | turned out to be wrong. | pilgrimfff wrote: | All you need is a layer of abstraction to get past the downsides | of soft deletion. You can use views or your ORM (if you use one) | | In Django, it's really easy to create almost seamless soft | deletion logic in the model manager or in your querysets. | | Over the last decade, I find myself using soft deletion more and | more - usually to accommodate user/client requests. | adrianmsmith wrote: | > Instead of keeping deleted data in the same tables from which | it was deleted from, there can be a new relation specifically for | storing all deleted data | | The disadvantage of this is that if you ever _do_ want to access | this "deleted" data, e.g. in admin or compliance tools, you now | have to do it in two different ways, one way for the main data | and a different way in case the data has been "deleted". | | The article asserts you'll never need to "undelete" the data. So | they're presenting a solution with that assumption, fair enough. | Without that assumption, however, moving the data back from an | archive table becomes a pain, and if there are any unique | constraints e.g. on username or email address, you'll have a | problem if you've moved the data out of the main table and | another user has used that username or email address. | layer8 wrote: | > now have to do it in two different ways | | Use a view. | | > if there are any unique constraints e.g. on username or email | address | | Have those in a dedicated table where they aren't deleted, and | add a synthetic key referenced by the other tables. | Terr_ wrote: | > The article asserts you'll never need to "undelete" the data. | | IMO it's worth distinguishing between (A) some kind of "click | to undelete" feature versus (B) simply having that old-data | conveniently exposed for a developer to manually-edit things or | craft database-change scripts. | | In practice I've only ever seen the latter get used, because it | requires a developer to figure out how the heck to get "the | parts that matter" back while preserving the integrity of other | newer data and obeying certain business-rules. | dunkelheit wrote: | This brings back memories... Some time ago I was an intern in a | team working on a UGC map editor. We were using this soft-delete | pattern and for some task I needed to deploy a database migration | that fiddled with the "deleted" status field. It was quite late | and after the migration finished I almost went home but for some | reason decided to check community forums. There users were having | a time of their life taking screenshots of deleted objects that | suddenly became visible (many of them quite amusing, including | swear words written in 500km letters). Dunno how this escaped | testing, but horror of what I have done brought clarity of mind | and I quickly found an error and devised another migration that | fixed the data. That worked and I was able to finally go home. | | So yeah, be careful with the soft-delete pattern :) | scott_w wrote: | The example the author gives is... frankly awful. | | I can't think of a single case where you'd want to remove the | invoices of a customer you delete. Ever. In fact, the opposite is | more likely to be a big problem, accidentally cascading your | delete to your financial records! | | Using a soft delete, your invoices won't "disappear" because your | app WILL have a view for looking at just the invoices. | | Source: I built a bookkeeping system and soft deletes is a | necessary feature. | Pxtl wrote: | Trivial case I hit: | | 1) Client wants to remove user from the system who have left | their org but | | 2) There are objects that were contributed by that user which are | required to persist beyond the user's deletion. | | Those are ideal cases for soft deletion. We can still query | information about the deleted user to explain who created this | object, with the note that their account has been deleted. | | Probably I should be doing full event-sourcing for this case, but | delete flag works well. MS offers temporal tables for this use | case and I'm still considering the implications there -- AFAIK | ORM support is WIP. | | And unlike the article author, I _have_ used soft deletion to | undelete things. Many times. Maybe he has better users than I do, | I don 't know. | outworlder wrote: | I wish Datomic was made open-source (with maybe some features | available as an 'enterprise' offering) so that we could actually | have a decent alternative for this 'soft-delete' problem. | joshuanapoli wrote: | Atlassian's deletion-related outage demonstrates why soft | deletion should be the default. Use hard deletion after a grace | period for data that truly needs to be expunged. Even if undelete | is not part of the normal workflow, experience shows that swift | recovery from bugs and operator errors is a universal part of | serving users. The less data motion involved in deletion (and | recovery) the better for both the original deletion process and | also any recovery process. | | https://news.ycombinator.com/item?id=31015813 | dragonwriter wrote: | > The concept behind soft deletion is to make deletion safer, and | reversible | | IME, as with "updated_by" and "last_modified_at" columns, it's | usually hazy audit requirements, not making deletion reversible, | that motivates it. | | A proper history store maintained by appropriate triggers solves | this, and leaves the referential integrity constraints on the | base table intact. (It can also be used for reversibility if you | need that.) | | Views conceptually would work, but then you get bitten by all the | ways that all relations are not equal in real-world RDBMSs. | kardianos wrote: | This poster misses the point completly. Soft delete is a must | have for historical data, where you want to keep history, but | keep the current set clean. | | Effectively, you don't check for the soft delete flag if you get | to it from a an un-deleted record, but you do check for it if you | access it the other way around. | armchairhacker wrote: | Dumb solution: make soft deletes explicit in your backup system. | | Your company has a database backup system right? That system | should be configured so that when it runs a backup, it will not | remove deleted entries from the previous backup, instead just | mark them as "deleted_since" the current backup time. | | Idk if any backup system actually support this, if there's some | glaring problem (like you can't just overwrite parts of a | database backup for some reason), or if most companies just don't | have backups because they're too expensive (probably not), but | this is the solution I would go with. It works for other sorts of | data like file systems as well. | deerIRL wrote: | As someone who has done development work with Class A data and | specifically in the realm of justice, soft deletes aren't simply | a good idea, they are required by law. | | Most of these downsides are easily mitigatable issues as well. As | many users have stated, something like views solves the issue of | forgetting the 'deleted' clause. | | Lastly, I'm not sure the issue with foreign keys/stray records | really resonates with me. I'd be hard pressed to be comfortable | allowing a developer or DBA who isn't fully comfortable with the | data model to be hard deleting records, let alone flagging them | as soft deleted. | ThePhysicist wrote: | I don't get what the problem is with cascading deletes. I mean | you typically only use them for foreign keys where deletion of | the parent object makes the referencing object simply invalid, so | there would be no reason to leave the referencing object in the | database. | | The point that is true is that queries get more complicated as | you'll have to add a "WHERE deleted_at IS NULL" to every SELECT | (once for each table you refer to), but that can be automated if | you use an ORM. A paradigm that I often use is that all objects | in the database belong to a role object that determines who can | read/write/delete the given object. So before doing anything with | an object I always check the role object (e.g. the "user" | referencing an "invoice", to stay with the example OP gives), and | as part of this I check whether the user object still exists. | Alternatively, you could automate most of the required update | logic using triggers as well. | | But otherwise I agree, soft deletes often don't seem to be a | worthwhile tradeoff, not sure if I would use them again when | designing a relational schema. They are very useful for auditing | and undo though: In a current project, whenever a set of objects | gets updated I soft-delete the old versions and create new | objects, keeping the UUIDs intact. That allows me to display the | entire version history of each object to the user, which can be | necessary e.g. for compliance reasons. You can achieve this with | an audit log as well but that would require more logic and | different queries, whereas querying soft-deleted objects just | requires a slight modification of existing queries. | giantg2 wrote: | "The concept behind soft deletion is to make deletion safer, and | reversible." | | That's one part. The other part is that in many industries you | have regulatory data retention and audit requirements. This is | arguably the most valuable and common reason to perform Logical | deletes. | brtkdotse wrote: | In banking and bookkeeping, there's no such thing as a | "delete". Once something is in the ledger you can't undo it - | you have to make a new entry that negates the old one. | hammock wrote: | Is the same true in mass surveillance? | m-p-3 wrote: | and the blockchain. | tomrod wrote: | Which is a decentralized, distributed ledger, so... | banking! :) | cweagans wrote: | "banking" | i_hate_pigeons wrote: | not every thing is a transaction though, say a deleted piece | of ref data that still has FKs to other stuff etc | Hallucinaut wrote: | These are still versioned in many serious finance systems | munk-a wrote: | Yes, but banks tend to have websites with accounts and those | accounts need to be deactivated when a customer should no | longer have access (or, even more finicky, specific accounts | for a client need to be deactivated or activated as they | change their usage). | | All this essentially forces the use of some sort of soft | deletion. (Activation flags are sort of just a more | complicated form of soft deletion). | AdrianB1 wrote: | Status (active/inactive/other) and existence (present or | deleted) are very different things, they may be separated | most of the time. Legal requirements can prevent deletion | of inactive data, so in some cases one may have a lot of | inactive but must keep data. Soft delete can help in some | scenarios, for example you want to give people a way to re- | activate accounts, but hide the ones marked as deleted. | jiggywiggy wrote: | Ha, and then there is the opposite regulation that you have to | delete user data. | giantg2 wrote: | In some industries the retention regulations trump the | deletion ones. I believe finance is one area where they will | delete some of your data, but are still required to maintain | 7 years of specifically listed data. | MonkeyMalarky wrote: | Well you see, you need a complete record of when it was | created, every change that occurred, everyone who could view | it and also log every access attempt. But it. You're not | actually supposed to keep it. Just everything surrounding it. | scott_w wrote: | Not quite. GDPR (and equivalents) have clear escape hatches | to allow you to store data if you have good reason (even if | the data subject requests its removal). | | Invoices, from the article, is a great example. That record | must remain unchanged in most financial regulations. I'd | wager a customer sending a deletion request for invoices will | be met with raucous laughter from the legal and finance | teams. | nirvdrum wrote: | My experience at a few start-ups has been that account deletion | just isn't prioritized. It's not a focus when building an MVP. | If the application ever gains traction, everyone is then | terrified they'll accidentally delete customer data that they | never delete anything. It's a shame. As a user, when I delete | my account or data in my account, I want you to permanently | delete it, not keep it around and just make inaccessible to me. | nerdponx wrote: | I've also seen businesses retain "deleted" data in order to | support legitimate data analysis work in the future. And it | actually can help significantly. Maybe scrubbing PII from | deleted accounts is a good idea, but those deleted accounts | are perfectly good data points, especially in smaller/newer | companies with lower-volume data streams. | nirvdrum wrote: | If you want to retain anonymous statistics, fine. I'm not | keen on you retaining my actual data so you can monetize it | after our business relationship has concluded. The biggest | thing a small team can learn is why they failed to retain a | customer or a trial that didn't convert. For that, usage | metrics are considerably more valuable than user data. | jrockway wrote: | If it makes you feel better, the startups that don't have | time to delete your data probably don't have a viable | disaster recovery plan either. | | As we learned from the Atlassian snafu, even giant companies | with billions in revenue often can't recover from disasters. | (I try to test mine every 6 months. I've never had a test go | perfectly.) | tomcam wrote: | Good to know. What do you were tests look like? | dexwiz wrote: | I think billions have been spent bridging the gap between | "ideal" software and what businesses actually need. Access | control is another thing I see developers wanting to simplify | or push to implement later, but is actually a key feature. | necovek wrote: | And yet another part is making deletes (appear) instantaneous: | useful when it involves cleaning up a bunch of "related" data | possibly living on different services (eg. S3, ES...). | | This also helps with the original goal of making them safer by | manually implementing "eventual consistency" for data living | outside the transactional world. | taeric wrote: | Don't make deletes appear instantaneous? If you have heavy | weight systems, then it makes sense for provisioning and | deleting entities is a process, that should be open to | monitoring. | jewayne wrote: | I would argue that in many cases the concept behind soft | deletion is to make deletion permanent. | | Hard deletes retain no memory of what you wanted to be gone, so | any malfunctioning sync process will continuously recreate the | deleted record soon after it's deleted. Soft deletes are often | the only way to make sure deleted records don't reappear. | trinovantes wrote: | Assuming you're using a modern database, replication is done | with paxos/raft and they are formally proven to not allow | this to happen as both edits/deletions are both just entries | in distributed event log. | eeperson wrote: | Couldn't a malfunctioning sync process undo a soft delete as | well? | jayd16 wrote: | Null is more ambiguous than an explicit conflict. If there | is literally no record, even of the delete action then | there's no timestamp for last write wins. | [deleted] | silisili wrote: | Why not just use an audit table, to keep from littering your | indices? | jacobr1 wrote: | You still need a record of the data, which probably isn't | fully captured in the audit table | jelkand wrote: | An audit table is a similar, but not entirely equivalent | tool. There are circumstance where both audit tables / soft | deletes are appropriate, and where only one of the two is. | Other systems that work are append-only tables, event driven | systems, and I'm sure there are more that I'm not aware of. | MonkeyMalarky wrote: | Then there's always the joy of a situation where your client is | being sued by one of their clients and now needs your help | recovering everything possible from your platform. And you're | going to help them because you'd like to keep them as a client | rather than let them be sued into oblivion. | taeric wrote: | This is almost certainly going to bite you if you don't push | all customer identification data out of your main data stores. | | And it will go a long way to making your services harder to use | if you don't allow users to associate friendly names with | things. And to assume that the same friendly name will be used | for a future item. (For example, if you name devices based on | the room you put them in. Is reasonable to think that when you | replace a device, that you are likely to want to reuse the | name.) | [deleted] | Apreche wrote: | I agree with the author that a separate table is the way to go, | but I go one step further than the author and use database | triggers to manage that second table. Alternatively, a | combination of database views and triggers can do the same thing | without having an actual extra table to manage. | | Either way, it allows you to have soft deletion and/or full | activity logging functionality without the application having to | know about it. | encoderer wrote: | Even if you don't "undelete" something, soft deletes make it | possible to instantly hide something while saving the expensive | sql delete for processing later. | rubyist5eva wrote: | One thing that I could find in the article: performance. | | At least for our use case, soft deletes made everything slower | because it's much harder to index. For our database we basically | had to do an audit of all of our WHERE clauses and create partial | indexes on "not yet deleted" records. Of course, this bloats your | indexes/disk and hurts write performance so it's not a silver | bullet. | | We've also taken to inserting into "delete records tables" for | records we may want to recover or for historical reasons. You | still lose foreign keys but indexing and query optimization is a | lot easier, and your old data is just still a simple query away. | jacobsenscott wrote: | Deletion is never worth it full stop. How do you delete from a | backup? You can't delete all your backups. Effective dates and | app level encryption to allow for cryptographic "deletes" is the | way to go. | whack wrote: | We use soft-deletes extensively at our startup. Here's a couple | reasons: | | - Feature creep. "Sometimes our users accidentally hit the delete | button, or change their minds a minute later. We want to give | them a way to undo the deletion." Or "I know we said last quarter | that we users want to delete stuff, but they also want to see a | list of everything they've deleted in the past." Soft-deletes | handle feature-creep a lot better than hard-deletions | | - It simplifies foreign-keys management. If you want to hard- | delete something that some other entity is referencing, you'll | have to hard-delete or modify that other entity first. And | potentially repeat this process recursively for their own | references. This is a pain. One could argue that if you really | want to delete something, you should be deleting all children as | well. Such arguments are highly domain specific, and very bad | universal claims. We've seen some use-cases where such pedantry | is not necessary | | - It makes it easier to recover from mistakes and bugs. Customer | deleted something accidentally and emailed you begging for help? | Your code has a bug causing stuff to get deleted when it | shouldn't be? You'll be thankful you did a soft-delete and not a | hard-delete. Is it going to solve every single problem where the | data has system-wide ripple effects in a unicorn sized | organization? No. But it'll still solve a number of problems | where the data impact is more localized | | - It makes debugging easier. You have a clear record of | everything that used to exist. You don't have to go digging | through your logs to find something that used to exist but has | now been deleted | | - Speed. All of the above problems can be solved in other ways | too. The author suggests putting all deleted data in a "deleted | records table." So now you need to maintain a 2nd table for every | table that you may want to delete stuff from. All schema updates | will need to be mirrored on this 2nd table. And you'll need to | write and maintain code to populate this deleted-records-table | every time you delete stuff from the original table. All doable | and straight-forward but takes time away from other things you | could be doing instead | | The main benefit from hard-deletions is data compliance and | liability. Ie, being able to tell privacy-conscious customers | that you actually deleted their data. If you're handling any | sensitive data, you should definitely do hard-deletions at some | point for this reason. But the other reason the author gave - _" | it's annoying having to check for `deleted_at` when writing SQL | queries"_ - seems pretty minor compared to the benefits. | waspight wrote: | It seems that it is too easy to delete things in your system. | Rather than solving it with reversible soft deletes I would | suggest to improve the UX. I don't agree that it simplifies | foreign key management, it is most often the opposite from my | point of view. | codemac wrote: | Well, there are several problems with this analysis when you go | very large (>10000 machines): | | - For many applications, it's easiest to put the state of the | object in the primary key, and thus point reads will fail when | something gets deleted. This has other problems though with | hotspotting and compaction during deletes. The deleted table | doesn't really solve this either. | | - For storage systems, GC is critical functionality to implement. | Most systems whether they want to believe it or not are glorified | storage systems. Garbage collection is hard to do at scale, and | I've never seen it implemented as SQL statements rather than | code. Especially for GDPR etc. | | - For large scale distributed systems, foreign key constraints | are rare if impossible to implement with reasonable latency, so | they don't exist either way. I haven't worked on a system in >15 | years that had fk constraints. | | - For large scale restores where you need to undelete trillions | of rows, keeping the rows basically pre-assigns the distribution | of writes. When you have to re-create the rows, you tend to get | intense hotspotting and failures along the way as you attempt to | load balance on the keyspace of the writes. | | A deleted records table is good for smaller (<10000 machine) | systems when latency between nodes can be kept within the same | campus. It can really improve performance of your GC if reading | by column isn't fast compared to reading by table. | khaledh wrote: | One reason we encourage keeping soft-deleted records at least for | a while is synchronizing data across systems. We want to | propagate deletions downstream. At some point when all downstream | consumers have caught up, we can purge the soft-deleted records. | JohnBooty wrote: | I've been a software dev since the 90s and at this point, I've | learned to basically do things like audit trails and soft | deletion by default, unless there's some reason _not_ to. | | Somebody _always_ wants to undelete something, or examine it to | see why it was deleted, or see who changed something, or blah | blah blah. It helps the business, it helps _you_ as developer by | giving you debug information as well as helping you to cover your | ass when you are blamed for some data loss bug that was really | user error. | | Soft deletion has obvious drawbacks but is usually far less work | than implementing equivalent functionality out-of-stream, with | verbose logging or some such. | | Retrofitting your app and adding soft deletion and audit trails | after the fact is usually an order of magnitude more work. Can | always add it pre-launch and leave it turned off. | | If performance is a concern, this is usually something that can | be mitigated. You can e.g. have a reaper job that runs daily and | hard-deletes everything that was soft-deleted more than _n_ days | ago, or whatever. | rjzzleep wrote: | In rails you get these things for free. What I don't get is why | everyone rolls their own framework with node.js. It's basically | 90s PHP all over again. | | EDIT: Soft delete is a trivial piece of code when the framework | has a well defined transaction system for its ORM. It's not | really related to Rails per se. Your statement is extremely | disingenuous, while trying to look smart. Audit trails _can_ | be(but don't have to be) more complex, especially when the | framework uses a lot of stored procedures to handle operations. | But other than that these frameworks are specifically designed | to REDUCE complexity of such operations, dependency costs - | which are huge in node.js, specifically because you can mix and | match anything into everything. | | Node.js people tend to stitch together XSS solutions, random | templating solutions based on their frontend work, even basic | salting of auth passwords becomes unpredictable because you | have 30 options on minimal auth libraries. | | But yes nothing is ever free. If you want to use Rails you | still have to learn ruby and the framework and a basic | understand of how ActiveRecord builds queries if you want to be | writing performant code. And the same applies to Laravel, | Django, or whatever of the 50 patchwork node.js solutions you | want to base your code on. | joshmanders wrote: | I don't know why you're ragging on Node.js users or even PHP | for that matter as both ecosystems have this stuff covered | too. | | Also you're comparing language/runtime with an actual | framework and then dogging those users... | | If you want to compare Rails with Node/PHP then I'd suggest | comparing with things like Laravel (PHP), Adonis (Node) and | you'll find everything you can do in Rails is done in | Node/PHP too. | sky_rw wrote: | Nothing is free. In Rails you have _currently well maintained | libraries_ for this. There are still complexity costs, | dependency costs, data costs, performance cots, etc, etc, | etc. | gmiller123456 wrote: | The author uses the "no one ever undeleted anything" as the | primary justification. I think this is the part they miss. I've | never undeleted a user either, but there have been many times | I've gone back to look at something. Either a complaint finally | gets around to me as to why the user wanted their account | deleted (e.g. feature not working) and it helps to figure out | why. Or they're returning and want things set up like they | were. Or someone is taking over their roll and needs to be set | up like the last person who's already gone. | | Though you really shouldn't be relying on a database for an | audit trail. It might help find some issues, but things | actually used for security shouldn't be writable so easily. | alerighi wrote: | It may be convenient, but under the GDPR is illegal. When an | user deletes an account, all the personal data associated | with that user must be deleted (or anonymize it in a way that | it's no longer possible to associate it back to the | particular user). | | You cannot just keep user information forever "just in case" | they are useful again. | changoplatanero wrote: | User deleting an account is just one way that stuff gets | deleted. There are other deletion scenarios where it is | appropriate to keep the information after its deleted. | Akronymus wrote: | Yeah. Basically anything that an employee can mess up, | should be reversible IMO. But actions such as deleting an | account should have the option of "YES, DELETE IT | PERMANENTLY, THIS CAN'T BE UNDONE" | smolder wrote: | I wonder what percentage of companies who even _appear_ to | comply with deletion requests actually do full deletion in | practice. I suspect it 's small, knowing how many things | are coded to fake-delete for convenience. Businesses also | tend to keep cold data backups around. (Maybe backups are | exempted? I don't know.) There might even be cases where | ostensibly deleted data can still be recovered from a disk, | if they haven't overwritten. | dataflow wrote: | It's illegal but companies don't necessarily care to avoid | soft deletes regardless. I think companies wait to get sued | so they can try to argue in court why their soft deletions | are reasonable and why it's too technically difficult for | them to do hard deletes. | | To be honest, in the age of modern overprovisioned storage | drives that remap blocks frequently, I'm not really sure | you _can_ implement genuine "hard" deletes without | choosing significantly unorthodox hardware (or destroying a | drive every time you need a single bit erased), no matter | how much you want to in software. One of those details that | I'm both surprised and unsurprised doesn't seem to have | been addressed legally. I feel like a court ought to at | least buy this aspect of the argument, so maybe they'll buy | that it can be difficult in terms of the software too? Who | knows. My guess is that a reasonable court would | accommodate something that's reasonable for a given | company, but there are lots of variations that could fall | into that category. | afiori wrote: | I would guess that there are quite a few limits to that... | | A user has a long history of participating on your forum | and other users have quoted their messages far and wide. | Collectively all of the messages posted on your forums | (with or without timestamps) reveal a few PII about the | user. Do you have to delete those? | | The user filed a bug report about a functionality not | working, do you have to delete the text of the bug report? | | Arguably if your user table look like [user_id, | creation_date, deletion_date, status, account_type] then | this table does not contain any PII. | | Assuming that user content is not automatically PII, whose | responsibility is it to track where PII can be? | | Sometimes users doxx themselves (like mistakenly sharing | tax return forms instead cat pics), in such a case it is | the user responsibility to signal this to you. | | If the user filed an issue saying "when I insert my name | (Abe Cox) the input validation fails" is it you or the user | that need to read through all the issues to find this case? | | My point is that GDPR + right to be forgotten cannot make | it look like you never had an account at all, especially | without user assistance. | 411111111111111 wrote: | Stop spreading misinformation. Nobody is required to hard | delete the very second the [delete] button is pressed. | You're following the law as long as that mentioned reaper | job exists and runs at least once per quarter (90 days) | Akronymus wrote: | I messed up a mass update query enough times to leave myself | SOME provision to undo it. | | The exceptions are when there is a well tested query that | affects a single account or something. Like GDPR | wnevets wrote: | This is something that I was forced to learn the hard way more | than once. Literally today I needed to undelete a record | because a customer was confused by what the "delete" button did | and wanted their record back. | wst_ wrote: | Isn't it the problem of UI, though. If the user would be | informed about the consequences (possibly with bold red font | and with a confirmation checkbox) would they still click that | button? | thrashh wrote: | As a user, I would still sometimes and then regret my | decision | | Soft deletions are awesome | robertlagrant wrote: | One will, one day. | wnevets wrote: | All of that exist. | doctor_eval wrote: | File this under "falsehoods programmers believe about | users": they act rationally. | pg_1234 wrote: | This 100% | nicoburns wrote: | +1 on audit trails. And one should always store audit trails in | machine readable format. That way you can not only manually | inspect what happened, but you can query it too (and | reconstruct the entire state as it existed in the past if | necessary). | pradn wrote: | Soft deletion is just one way to achieve undeletion. The | author's proposed solution of moving the resource to another | table works just as well. You can move it back to the non- | deleted table to perform the undeletion. You can keep around | these deleted objects as long as you want; they work as a | subset of a proper audit trail. The cost of course is you have | more tables, but that is less of a cost than having to add | "deleted=False" predicates in all of your queries. | | Also note, if you use a soft-deleted column, indexes need to be | keyed by that column as well if you want to access non-deleted | objects quickly. That's extra complexity. | mjevans wrote: | Even more important; the deleted records don't need to live | in your cache / RAM / etc. Potentially faster queries. | smackeyacky wrote: | What seems to be missing here is the DB tech he is using. On | a proper database you can do your "undeleted" with triggers | and it's relatively trivial. Nonsense like a "deleted" column | on your main data table just seems silly. | robertlagrant wrote: | Triggers don't solve what the author's getting at, i.e. who | knows what else outside this database changes upon deletion | that would need to be reversed? It's all in the article. | icedchai wrote: | I worked at a place that kept all the deleted stuff in | their main tables. It turned out over 90% of the rows were | deleted. I'm not sure how often something was undeleted, | but it was not very frequent. Some of these soft deleted | rows were 5+ years old. Archive that crap. | augustiine wrote: | Scarbutt wrote: | With the "cloud" you can do PITR as a fork for most mainstream | databases, which is an enough solution for some. | efsavage wrote: | Also if people know that deletion is reversible, they're more | likely to actually do it, which can keep things generally | tidier. | | I don't actually like using a "deleted" column, my standard | table has a status column, and deleted is one of those states, | along with active/pending/suspended/etc, as the needs dictate. | This way I get soft deletes for basically free both in the | schema, but also in the queries (which would generally default | to active), so it's not really the spaghetti that the author | discusses. | robertlagrant wrote: | That still has the same issues. You have to remember to set | every linked table's records to the same state, or remember | to query every linked table through the table that has the | lifecycle column on it. | augustl wrote: | This is why I don't understand why Datomic isn't more popular. | Pretty much every system I've worked on never needed to scale | past 100s of writes per second due to hard limits on the system | (internal backoffice stuff, fundamenally scoped/shardable to | defined regions, etc etc). And since Datomic is built with that | in mind, you get the trade-off of full history, first class | transactions and being able to query for things like "who | changes this attribute to its current value, when, and why" is | such as super power! | rjbwork wrote: | In the past I've used MSSQL's Temporal Tables (also called | System-Versioned Tables) to implement this kind of | functionality. This also gets you, for free, Type 2 SCD | functionality for OLAP-style queries. | | I can't wait until Postgres has this kind of functionality | baked in. It's such a nice feature. | synthc wrote: | Datomic is great but i think its missing some features that | many enterprises need (access control and a query planner). | Also it seems to be mostly built for DynamoDB/AWS. | nightski wrote: | Personally looking at their pricing since it is so tied to | AWS it is completely non-transparent how much it's going to | cost us now or in the future. | | I really like the concept of datomic though. | Scarbutt wrote: | Too niche of a technology, tied to clojure, not open source | and very slow(doesn't matter for most internal apps though). | For many, it's better to do the tedious thing here and there | with postgres. SQL also has strong grip on databases and if | you look at it the other way around, postgres has lots of | features that datomic lacks, with datomic you almost always | need a secondary database. | tsuujin wrote: | It's hard to get adoption for expensive toys. | | I think Datomic is neat, and I'd like to use it, but it is | prohibitively expensive for a personal or hobby project. | Personal projects are where I get excited about tech, and | when I'm excited I'm more likely to adopt it in my day job. | | They're really shooting themselves in the foot by not having | a one-click install free tier or a self hosted option. | marcofiset wrote: | You may want to look into XTDB then. Not quite the same as | Datomic, but they share many similarities. It's free and | open-source. | tsuujin wrote: | Neat, thanks! | th0ma5 wrote: | For me it is simply that isn't open source (at least last I | checked.) | corrral wrote: | I'd add tagging, for anything that could conceivably use it, | when you're doing DB design. May as well start with support, | even if the functionality's initially dormant. Someone _will_ | ask for it, directly or indirectly, and it won 't take long | before they do. | pgt wrote: | XTDB: xtdb.com | jandrewrogers wrote: | The complexity of soft deletes is that they implicitly introduce | the difficult semantics of bi-temporality into the data model, | typically without the benefit of a formal specification that | minimizes the number of edge cases that have to be dealt with. | | Mechanically, I've typically supported soft deletes with audit | tables that shadow the primary table, with a bunch of automation | in the database to make management mostly automagic. It isn't too | bad in PostgreSQL. | Minor49er wrote: | It's interesting that the author notes that, as far as he's | aware, nobody's ever undeleted something. It could be true. But | I'm wondering if maybe he simply hasn't seen it first-hand since | the action of recovering something is often handled by a | customer-facing team and not by a developer. | wizofaus wrote: | The assumption seems to be that the undelete operation is | performed by the vendor's support staff, rather than the end | user. I've been involved in the implementation/ maintenance of | systems with soft delete that was entirely for that purpose - it | allowed the user to delete/undelete at will. In our case it also | meant certain uniqueness constraints were kept in place | effectively reserving things like email addresses or business | registration numbers that couldn't be reused until a hard delete | was issued. Arguably it's more like an "is active" flag in such a | case, but it's debatable what the distinction is. | vivegi wrote: | If you do want to retain the deleted records for any purpose | (audit, compliance etc.,) it is better to design a DELETED table | to maintain the history (just as suggested in the article towards | the end). | | Once your main tables start getting to the order of tens of | millions of records, the filtering by 'deleted_at is NULL' or | 'deleted_at is NOT NULL' gets in the way of query performance. | | NULL is also not indexed. So, that throws the spanner in the | works sometimes (depending on the query). | danielrhodes wrote: | In a previous place I worked, we were programmatically using Box | to store files. One day we were presented with a case study in | Murphy's Law: a script went awry and deleted everything (10s of | thousands of files). There was no clear way to recover these | files, they were gone from what we could see. It was a disaster. | We got a Box support person on the phone and described what had | happened. There was a pause, some mouse clicking and then: "Ok, | those files will be back in your account in an hour." | | It was 100% our fault. But soft deletes saved us that day. If | you're in a situation where you or your customers could benefit | from the same, it's wise to not only embrace them but also make | sure they work. | latchkey wrote: | That sounds more like a lack of backups and disaster recovery | than it does soft deletes. | pradn wrote: | The author agrees with you in principle. All the author is | arguing against is the use of "deleted" bool column to indicate | deletion. His solution of moving deleted objects to their own | column gives you the ability to un-delete, just as before. Only | now, your queries and indexes are simpler and you get to use | foreign keys and other useful futures. | jtwebman wrote: | The bigger reason to use soft deletes is to keep history. Just | because someone does not access doesn't mean we should report on | the things they did months ago. | Ensorceled wrote: | I use soft deletes in our system and literally used it to restore | an accidentally deleted item about 3 hours ago. Took a second to | toggle the deleted item. | | I don't get how this rocket science. Almost every query in the | system is some kind of where clause on a fk to account or user or | project or some other critical object ... so there are only a few | places in the ORM where I need to support this. | vyrotek wrote: | I've found SQL Server Temporal Tables are a good alternative to | get the benefits of soft-deletes without some of the drawbacks. | | https://docs.microsoft.com/en-us/sql/relational-databases/ta... | tfigment wrote: | Mysql also has this now. I've wanted to rewrite out apps to use | it but haven't gotten around to it. Postgres has it as an addon | but feels like it wouldn't work for us until its first class | supported. | evanelias wrote: | MariaDB has this -- called system-versioned tables -- but | MySQL actually does not. Although they share a common | lineage, MySQL and MariaDB are somewhat distinct databases at | this point, with each one having a number of features that | the other lacks. | nicoburns wrote: | I believe first-class support is in development for Postgres. | The article I read made it sound like it would probably land | in either the next major version, or the one after that. | AdrianB1 wrote: | Soft deletes are really worth in the right scenario. There are | cases when they can be avoided, cases when they are not worth and | cases when they are worth, for the problems presented in the | article there are solutions or workarounds. | ajuc wrote: | If you need this why reimplement it when you can use database | history (dbms_flashback or SELECT AS OF in Oracle)? | 202206241203 wrote: | It's something that a team of a PM, a QA and two developers can | bill for at least a sprint. So, well worth it. | gigatexal wrote: | There's a lot wrong with this write up. Why would anyone want to | delete corresponding invoices when you "delete" the corresponding | user? And GDPR provides a caveat that if you need the data for a | biz usecase like legacy reporting you can keep the data (I think | it has to be masked or something but it's not insane to say you | must delete data on request that could materially affect a | company like removing transactions). | | Just put a filtered index on the column to better query non | deleted data. | | On the whole I don't think in practice the author's take makes | much sense. | justin_oaks wrote: | For those who are expressing favor with soft deletes, do you | default to soft deletes on every table unless you know you won't | need them? Or do you only apply them where you know you'll need | them? | | I think people arguing for and against soft deletes both | understand that there are cases where you want to use them and | when you don't. | baq wrote: | soft delete everywhere by default. true deletes only after | retention policy expires, if FK constraints allow it (best if | you can drop whole partitions). | mrinterweb wrote: | For audit trails in rails, I still like papertrail. | https://github.com/paper-trail-gem/paper_trail. It provides the | ability to restore records as well as auditing abilities. | muhaaa wrote: | Always use a temporal database (datomic, postgres with | temporal_tables extension). You get out of the box the full | history of your data. That is really helpful for business | intelligence and analytics, auditing / audit log (security, | accountability), live sync & real-time features and as a bonus | easy recovery after application fails. | | If disk gets to full, project the latest time slice into a new | database and move the old database onto a cold storage. | sam_lowry_ wrote: | I do not understand the foreign keys issue. Do not use the | _deleted_at_ timestamp that is nullable by default. Instead, | nullify the field when the line is deleted. Foreign keys on NULL | values will be possible. | | In any case, soft deletion is usually a sign of incompetence. | Whenever I saw it on a project, both soft deletion and the | project turned sour. | nwah1 wrote: | If you have a lot of stored procs then the argument makes some | sense. If you do most things in code, then I would argue these | complaints are moot. | | In your code you can isolate all soft-deleting from business | logic in the ORM layer or data layer, so the complaint about | littering your codebase is moot for me. For instance, using | Entity Framework, you can change deletes to soft deletes in a | centralized place for all records matching a particular | interface, then add a query filter that applies in the background | for all queries. | | The complaint that soft deleting is never done is maybe valid | since you can review things with audit logging or backups without | risking unknown effects of an un-delete. But if you need a | recycle bin feature then you get that for free if you just build | that in from the start, and it is one more guarantee. | | The risk of orphaned records is real, although you could probably | handle most cases generically in the data layer or ORM as well. | It seems like there's just tradeoffs to the various approaches. | Do you want to err on the side of deleting data, or on the side | of keeping it? Do you worry more about orphaned records or data | loss? | [deleted] | openthc wrote: | I use a delta-log table, so each INSERT/UPDATE/DELETE on objects | I care about are captured (via trigger) -- but that one has to | get date partitioned. So in my system a DELETE statement (and | DELETE CASCADE) work as expected -- any history has to be | discovered from the logs | timomax2 wrote: | We just have a history table (for each table) where all deleted | and past versions of record are stored. Seems to solve all the | issues. The history table is NOT part of the application, but is | there for audit and diagnostics etc. | [deleted] | ivank wrote: | https://github.com/xocolatl/periods implements SYSTEM VERSIONING | for PostgreSQL and moves deleted rows to a history table. | llimos wrote: | Do any databases let you refer to constant values in foreign | keys? | | Then you could do FOREIGN KEY (foreign_id, NULL) | REFERENCES foreign_table(id, deleted_at) | munk-a wrote: | I don't believe that's possible in postgres at least - but I | don't think it's a huge concern either - you can have | deleted_at cascade via trigger or just use views to hide the | data - both are extremely easy to implement at the DB level | without the application devs ever needing to worry about what's | what. | agentultra wrote: | Soft deletion by storing the row in JSON won't survive months of | schema migrations. If restoring a record is rare you don't want | to have to find out that there's no way to map the old data to | the new table when it matters. | | There are cases where you shouldn't be deleting or updating data; | auditable and non-repudiation systems for some regulatory | compliance come to mind. Best to use patterns that don't require | those operations. | | Soft deletion does come at a cost. Choose carefully! | empiko wrote: | The "Code leakage" problem can easily be solved by using views. | Or am I missing something? | dboreham wrote: | Solved with "...and deleted = false" | krascovict wrote: | If it's the case of deleting files safely, I recommend shared, | it's very good... | | https://wiki.archlinux.org/title/Securely_wipe_disk | Smoosh wrote: | DB2 has implemented temporal tables which can automatically | capture all changes to the primary table. | | https://www.ibm.com/docs/en/db2/10.1.0?topic=tables-history | jacksnipe wrote: | The ONLY reason that you should avoid soft deletion is that | deleting things permanently in a soft-deletion-based system is | hard and error prone. | | GDPR, among other regulations, requires that you be able to do | this sometimes; and it requires that the data REALLY BE GONE. | | But I really think that soft deletion should be the default | unless you think you'll be fielding user data deletion requests. ___________________________________________________________________ (page generated 2022-07-19 23:00 UTC)