[HN Gopher] Soft deletion probably isn't worth it
       ___________________________________________________________________
        
       Soft deletion probably isn't worth it
        
       Author : lfittl
       Score  : 340 points
       Date   : 2022-07-19 18:35 UTC (4 hours ago)
        
 (HTM) web link (brandur.org)
 (TXT) w3m dump (brandur.org)
        
       | habibur wrote:
       | Which is why I don't add that extra deleted field. Rather
       | duplicate all the tables into a new database called "archive" and
       | then insert there before deleting from main.
       | 
       | That works for updates too, by preserving the old data and
       | showing you a time machine like backlog. But the archive database
       | gets too large over time and you need to purge it periodically.
       | You can create some delete triggers for automating this "save
       | before delete" behavior.
        
         | tehbeard wrote:
         | How do you account for maintaining integrity in the archive?
         | 
         | E.g. you have 3 users sign up with the same email (a unique
         | field) one after the other with deletions in-between each sign-
         | up?
        
           | habibur wrote:
           | No PK, FK or Unique constrains on the archive. Rather use
           | simple index to speed up queries.
        
           | justin_oaks wrote:
           | I'm not who you asked the question of, but I do sometimes
           | make use of archive/deleted/history tables. I'll refer to
           | them as history tables from here on out.
           | 
           | In short, I leave data integrity to the original table and
           | drop it for the history table.
           | 
           | The history table isn't identical to the original table. It
           | has it's own primary keys that are separate from the original
           | table. It doesn't include the original table's unique
           | constraints or foreign key constraints. It also generally has
           | a timestamp to know when the record was put there.
        
       | spfzero wrote:
       | I like the deleted-items-table suggestion the author makes. It's
       | useful though, to think about the cases where you'd want to
       | delete, say a customer with existing invoices. In one situation,
       | you may have made a mistake and want to start over, say an
       | operator creates the customer and order, but then the customer
       | changes their mind. In that situation a hard delete is in order;
       | you want to "undo" the _creation_ of the customer and invoice,
       | and nothing further has happened as far as referencing their key
       | 
       | In other situations though, you may have some reason to treat the
       | customer as if they were deleted, but better to examine the
       | reason for that, and use an attribute more relevant to that
       | reason, such as active/inactive etc. Would be different for
       | different entities of course.
        
       | jasonhansel wrote:
       | It's pretty easy to solve the foreign key issue (where you need
       | to write elaborate DELETE queries to avoid breaking foreign keys)
       | in Postgres using deferrable constraints. Just start a
       | transaction, run "SET CONSTRAINTS ALL DEFERRED," delete rows from
       | various tables in any order, then commit the transaction. The
       | DELETE statements will effectively ignore the foreign key
       | constraints, but any remaining "broken" foreign keys will be
       | caught when the transaction commits.
        
       | bob1029 wrote:
       | If you are going to think about this pattern, why not go one step
       | further and simply event source everything with an append-only,
       | immutable log?
       | 
       | You could even sprinkle cryptographic guarantees into the mix.
       | This would be very challenging to do with mutable DB rows.
        
       | phibz wrote:
       | I've definitely seen soft delete work in practice. A couple
       | things: for small data sets you can implement the naive
       | deleted_at you can hide the records from your users by forcing
       | them to use a view. You can also handle updates on the view to
       | prevent data conflicting with deleted data if you need to.
       | 
       | For foreign key constraints you can set the foreign key to null
       | and orphan the records if the relation is deleted. You could also
       | hard delete them in this case. It depend on your use case.
       | 
       | When the data volume grows or the ratio of soft deleted to normal
       | records is high, you should consider another solution. One
       | solution you suggested, moving the record to a deleted table is a
       | fine one.
       | 
       | The other solution that I've used successfully is to journal your
       | deletions in another table or system. For smaller volumes having
       | an audit table Journaling the data and storing the pk, fkeys, and
       | a serialized version of the record, json works great in postgres,
       | works well. For large volumes or frequent deletions something
       | like Kafka or PubSub work better.
       | 
       | You may very well find others interested in consuming your audit
       | journal to track changes. Updates and even inserts fit great in
       | the more general case.
        
       | waspight wrote:
       | I use soft deletes to maintain insights. For instance I would
       | like to know how many users that has been created in total even
       | if some has been deleted later on. Is this a bad approach? Most
       | of the other comments here seems to use it only to be able to
       | restore deleted entries.
        
       | dafelst wrote:
       | Views are a simple solution to this problem. Pretty much all
       | moderns RDBMSs support updatable views, so creating views over
       | your tables with a simple WHERE deleted_at IS NULL solves the
       | majority of the author's problems, including (IIRC) foreign key
       | issues, assuming the deletes are done appropriately.
       | 
       | I feel like a lot of developers underutilize the capabilities of
       | the massively advanced database engines they code against. Sure,
       | concerns about splitting logic between the DB and app layers are
       | valid, but there are fairly well developed techniques for keeping
       | DB and app states, logic and schemas aligned via migrations and
       | partitioning and whatnot.
        
         | OJFord wrote:
         | I had the same reaction to the 'code leakage' section, but
         | 'foreign keys'? You can't reference a view; so you either don't
         | use them (fks) or they point at the underlying table and you
         | have the problem described.
         | 
         | You could have views that say 'thing I have a foreign key to is
         | not deleted' of course, but that sort of seems like 'code
         | leakage' again, just in SQL this time.
        
         | vlunkr wrote:
         | That doesn't solve the foreign key problem. You can still
         | easily have a reference to a record that is "deleted"
        
           | yen223 wrote:
           | Also in Postgres, you cannot have a foreign key constraint
           | that references a view, not even a materialised view.
           | 
           | I'm with the author on this one. Any soft delete logic does
           | have a tendency to bleed into other systems and make your
           | systems more complicated, for very little gain.
        
         | 7crow wrote:
         | > there are fairly well developed techniques for keeping DB and
         | app states, logic and schemas aligned via migrations and
         | partitioning and whatnot.
         | 
         | Hi, <1 yr experience swe here. Would HN mind unpacking
         | "whatnot" with specific names of some these techniques?
        
         | semiquaver wrote:
         | The main problem with views for this use case in practice is
         | that they ossify your schema. Views and matviews are
         | effectively a dependency tree, and many common types of schema
         | evolution become substantially more difficult when the system
         | forces you to wrap your DDL in a series of view drop/recreation
         | steps.
         | 
         | This is merely annoying when dealing with regular views because
         | recreating even a large number of views is fast, but can be
         | catastrophic if you have any matviews in your table dependency
         | tree. A matview can easily turn what should be an instantaneous
         | DDL operation into a partial outage while the matview is being
         | regenerated.
         | 
         | (this is all postgres specific, it may be untrue for other
         | systems)
        
           | smallnamespace wrote:
           | To avoid an outage, have you tried fronting the matview with
           | an additional view to allow hot-swapping?
        
           | enepture wrote:
           | As an FYI using a tool like DBT solves this problem. As
           | someone who was not a data engineer I was not familiar, there
           | were tools like this
        
             | teej wrote:
             | dbt is great, but I'm not sure it's appropriate to manage
             | objects in a production transactional database. It's
             | designed for analytical, columnar databases.
        
             | cpursley wrote:
             | I have this problem. What is DBT?
        
               | banashark wrote:
               | I believe this: https://docs.getdbt.com/docs/introduction
        
             | PaulMest wrote:
             | In case you're not familiar with DBT, it helps with the
             | "transform" in ELT
             | https://docs.getdbt.com/docs/introduction.
        
           | slt2021 wrote:
           | it is very dangerous to have dependency on materialized view
           | - it is a poor architectural decision from DBA to do that.
           | 
           | if you want view depending on mat view - materialize it
           | yourself in a table, and refresh it yourself controllably.
        
             | semiquaver wrote:
             | Aren't you just saying that materialized views should not
             | be used? I'm pretty sure that was my original point.
        
         | ibejoeb wrote:
         | > developers underutilize the capabilities of the massively
         | advanced database engines
         | 
         | So true. There are so many amazing, powerful features in all of
         | the major players.
         | 
         | Also: updatable views are amazing. With query rewriting
         | (whatever you vendor calls it) you can affect some truly
         | material changes to the system without any changes to the
         | client applications. An example would be implementing temporal
         | relations.
        
         | chrisshroba wrote:
         | How would a view solve the foreign key issue? Are you
         | suggesting coding specific deletion triggers into the view such
         | that appropriate foreign keys are "cascade" deleted when a row
         | in the view is deleted?
        
           | jermaustin1 wrote:
           | because when you are getting a child record through a join
           | (in the view), the parent will never have deleted_at set.
           | 
           | Say I have a simple view `select * from foo join bar on
           | foo.foo_id = bar.foo_id where foo.deleted_at is null`
           | 
           | I never have to worry about deleting from bar, because I
           | should never grab a child when the parent is 'deleted'.
        
             | thehappypm wrote:
             | Assuming that the joins are always joined against the
             | views, not the raw tables. This introduces an opportunity
             | for a mistake.
        
         | rodw wrote:
         | Seriously. That "Downsides: Code leakage" point is nonsensical.
         | 
         | ``` CREATE OR REPLACE VIEW active_customer AS SELECT * FROM
         | customer WHERE deleted_at IS NULL OR deleted_at <= NOW() ; ```
         | 
         | There, I fixed it.
         | 
         | Just use `active_customer` instead of `customer ... deleted_at
         | IS NULL`.
         | 
         | In fact, since the deleted_at column is a timestamp, the
         | original "leakage" query:
         | 
         | ``` SELECT * FROM customer WHERE id = @id AND deleted_at IS
         | NULL; ```
         | 
         | is actually broken. A non-null `deleted_at` timestamp that's in
         | the future implies the record hasn't been deleted yet, right?
         | 
         | I've often had junior devs assert that views are some kind of
         | code smell, but these sorts of "canned query/filter that you
         | want to apply very very often" seem like the perfect use case
         | for a view to me. It's DRY, and the fact that your standard
         | "query" is in the database" means you can change it more
         | readily than trying to make sure you hit all the points it
         | might be embedded in the application code.
         | 
         | > I feel like a lot of developers underutilize the capabilities
         | of the massively advanced database engines they code against
         | 
         | Early-ish in the JDBC days a senior dev I was working with at
         | the time (as a junior dev myself) made a pretty good case that
         | "the database is part of the application" that's always stuck
         | with me. Full database independence via software level
         | abstractions is a pretty silly goal outside of library code. If
         | you have a service that makes extensive use of the database,
         | don't throw away the database features in the interest of some
         | abstract "we could swap out oracle with mysql without changing
         | anything" objective. If you want it to be generic, use the SQL
         | standard, but don't be afraid to have a few db-specific bits in
         | the app code if that's a subsystem you _might_ replace once a
         | decade or something.
         | 
         | I blame the DBA/Dev divide for a lot of this. A lot of the
         | impedance between these layers is social/procedural. If you can
         | change the DB as easily as the code, there's a lot less fear of
         | using the right tool for the specific job.
        
           | doctor_eval wrote:
           | > the database is part of the application
           | 
           | 100% this. If you accept that the database is part of the
           | application, you give yourself permission to use the full
           | feature set of the database, and life becomes a lot simpler.
           | Using views, stored procedures and other features lets you
           | implement things like soft delete trivially, without it
           | infecting all your application code.
           | 
           | In my entire career I've changed backend databases for an
           | application exactly twice. It's not easy, and no amount of
           | abstraction is likely to make it easier.
        
           | djur wrote:
           | The query isn't broken. In the Rails community at least it is
           | very common to use a nullable frobbed_at column to indicate
           | both "was it frobbed" and "when was it frobbed". In that
           | context, the boolean check is always NULL/NOT NULL, rather
           | than a time comparison.
        
           | madisp wrote:
           | I'd argue that the simple `deleted_at IS NULL` check is not
           | broken - unless your product / domain specifically allows and
           | requires scheduled future deletions adding such logic can
           | easily introduce bugs. For example, you could to get the
           | comparison flipped by accident, and if it's only in one place
           | out of many that bug could go unnoticed for a while.
        
             | doctor_eval wrote:
             | You don't put the check in your code, you put it in the
             | view, and access the data exclusively through that view. In
             | that way, the check is defined exactly once.
        
         | nousermane wrote:
         | > (...updatable view...) WHERE deleted_at IS NULL
         | 
         | This is the way. Also, save record creation timestamp, and you
         | can have very flexible "time-machine" selects/views of your
         | table essentially for free.
        
         | layer8 wrote:
         | How does this help with foreign keys? Normally you can't have
         | foreign keys referencing a view.
         | 
         | I agree that one should make use of RDBMS capabilities. A check
         | constraint may be practical instead of (or in addition to) the
         | foreign-key constraint.
        
         | zozbot234 wrote:
         | Views can be used to implement pretty much any kind of
         | automated inference, reasoning, rules etc. on the "raw" table
         | data. The example of filtering out deleted records is just one
         | of the simplest. That one single feature can easily transform a
         | simple DB platform into a fully-featured knowledge base system,
         | easily usable to support even complex reasoning tasks.
        
         | aeyes wrote:
         | At least in Postgres, having a huge amount of "dead" data in
         | large tables is problematic because vacuum always has to read
         | the full data set.
         | 
         | Even with conditional indexes where you exclude deleted data
         | you take a significant performance hit reading dead blocks
         | because there is no way to quickly vacuum them. You accumulate
         | hours of bloat until your vacuum finishes.
         | 
         | You can't beat a separate insert only archive table which you
         | never have to vacuum.
        
           | anarazel wrote:
           | Vacuum does not have to read the full data set every time.
           | The visibility map tracks, on a block level, whether all rows
           | in a page are known to be visible to everyone (starting in
           | 8.4 or such) and whether the page is "frozen", i.e., does not
           | contain visibility information that might need vacuuming
           | (starting in 9.6 IIRC).
           | 
           | However, indexes _do_ currently have to be scanned as a
           | whole. But that 's only done by autovacuum if there's enough
           | row versions for that to be worth it (in recent versions).
        
           | layer8 wrote:
           | Shouldn't partitioning help with that? (I have no experience
           | with Postgres.)
        
             | tomnipotent wrote:
             | Only if those partitions are on separate storage, otherwise
             | you have the same number of dead tuples/dirty pages.
        
         | SoftTalker wrote:
         | A problem (unless something has changed, my context is Oracle
         | from some time ago) is that NULL values are not indexed. So the
         | "WHERE deleted_at IS NULL" could trigger a full table scan. It
         | can also cause row migration when the NULL value is eventually
         | filled in. Unless you explicitly need the deleted date, it's
         | probably better to use a non-nullable Y/N for this.
        
           | remram wrote:
           | It seems Oracle does it although there is a special syntax to
           | opt-in. That seems wild. I am not aware of another DBMS
           | having that limitation though.
        
         | firloop wrote:
         | Views can really bite you performance wise, at least with
         | Postgres. If you add a WHERE against a query on a view,
         | Postgres (edit: often) won't merge in your queries' predicates
         | with the predicates of the view, often leading to large table
         | scans.
        
           | AdrianB1 wrote:
           | There is no impact with views in MS SQL. You can also have
           | indexed views and filtered indexes, so you can have even
           | better performance.
        
           | dafelst wrote:
           | IIRC Postgres has supported predicate push down on trivial
           | views like this for over a decade now, and possibly even more
           | complex views these days (I haven't kept up with the latest
           | greatest changes).
        
             | firloop wrote:
             | Postgres can do it, you're correct, but in my experience it
             | rarely happens with any view that's even slightly non-
             | trivial even on recent versions of Postgres. Most views
             | with a join break predicate pushdown. It greatly reduces
             | the usecases of views in practice.
        
               | sarchertech wrote:
               | I haven't had any problems with this at all and I've been
               | using joins in my views for years.
               | 
               | Are you using CTEs in your views?
        
               | tomnipotent wrote:
               | Postgres 12 fixed the CTE issue.
        
               | bavell wrote:
               | Postgres docs on CTEs:
               | https://www.postgresql.org/docs/current/queries-with.html
        
               | [deleted]
        
           | [deleted]
        
         | jaydub wrote:
         | Based on my experience, I like the author's approach since it
         | makes things pretty clear-cut and optimized the storage in the
         | core table (in my experience as well, deletes happen frequently
         | and the soft deletes are rarely touched). In large, row-
         | oriented tables that that storage can add up and even with
         | views/materialized views there's a cost to using/maintaining
         | those as well.
        
         | 5e92cb50239222b wrote:
         | This is one of those situations where a good ORM can simplify
         | things greatly. For example, with EF Core you can add a global
         | filter which will filter out soft-deleted rows in all queries
         | automatically (unless you add .IgnoreQueryFilters()).
         | 
         | It couples nicely with some hackery which turns removes into
         | soft-deletes. You can remove objects as usual and they get
         | soft-deleted in the database.
         | 
         | I've used this in a few projects and it's fantastic.
         | 
         | https://docs.microsoft.com/en-us/ef/core/querying/filters
         | 
         | https://www.thereformedprogrammer.net/ef-core-in-depth-soft-...
        
         | airstrike wrote:
         | Views are such a powerful concept I'm honestly disheartened by
         | how hard it is to use, replicate or leverage that functionality
         | outside of dropping straight into the db shell
        
           | quickthrower2 wrote:
           | See http://materialize.com
        
         | pbardea wrote:
         | > assuming the deletes are done appropriately
         | 
         | This is one gripe I have with soft-deletion. Since I can no
         | longer rely on ON DELETE CASCADE relationships, I need to re-
         | defined these relationship between objects at the application
         | layer. This gets more and more difficult as relationships
         | between objects increase.
         | 
         | If the goal is to keep a history of all records for compliance
         | reasons or "just in case", I tend to prefer a CDC stream into a
         | separate historical system of record.
        
           | dexwiz wrote:
           | You may end up doing this anyways if you have any application
           | code that needs access to delete hooks, or access control
           | varies across objects. At this point, you are probably using
           | a ORM instead of direct queries, and place logic that could
           | be in the db instead at the app layer.
        
           | danielrhodes wrote:
           | Being unable to effectively use foreign key relationships is
           | definitely a downside of using soft deletes. But it's also
           | worth asking if these types of behaviors, which would also
           | include a feature like triggers, really belongs in a database
           | or whether it's better to have at the application level (or
           | at least at a layer above the data layer). I'd argue that
           | ultimately you probably don't want these things at the DB
           | level because you get into a situation where you're sharing
           | business logic between two (or more places).
        
             | Volundr wrote:
             | I'm less likely to use triggers, but I'll say I pretty much
             | always want proper foreign key relationships set up in the
             | database. Unique and other constraints too. In principal I
             | might agree with you that it's an application level
             | concern, but being able to setup these kind of invariants
             | and trust that they will be enforced even in the face of
             | bugs in my code (and there will be bugs in my code) is just
             | too powerful to let go of in the name of purity. I'd much
             | rather let a bit of business logic creep between multiple
             | layers that discover that I have a whole bunch of orphaned
             | records and no sensible way to reconcile them.
        
             | corrral wrote:
             | The DB's responsible for maintaining the integrity of data
             | in it, unless there's some very good reason you can't let
             | it do that. It's faster and better at it than your
             | application, 99% of the time, and it can keep doing that
             | even when/if a _second_ application starts using the same
             | database, or when being used from a SQL prompt, or
             | whatever.
        
             | mmerlin wrote:
             | My perspective is DB level triggers are the absolute very
             | best place to put cascading update/delete logic so it only
             | ever needs to be written once and is consistent regardless
             | of any future frontend clients that might be written in a
             | different language and/or framework than the original
             | codebase.
             | 
             | Right now in $dayjob I am converting an old non-DRY
             | codebase from NoSQL data layer format to proper relational
             | SQL backend.
             | 
             | This old front-end was created by verbosely coding up a
             | relational cascading update/delete system for the NoSQL
             | backend, in numerous places redundantly with subtle
             | differences and inconsistencies, making the code brittle.
             | 
             | My current estimate is some front end functions will be
             | reduced in LOC size by 95% once we use the power of SQL in
             | the backend.
             | 
             | And the backend SQL Triggers+StoredProcedure required to
             | replace these long NoSQL frontend functions doing cascading
             | updates/deletes is only around 10% the size of the replaced
             | front-end code.
             | 
             | And now future new frontends can reuse this without
             | exploding the size of their codebase where complex data
             | operations are required. And no need to reinvent the same
             | data handling algorithm all over again (and risk subtle
             | variation creeping in from the different front-end
             | implementation of data algorithms)
        
           | munk-a wrote:
           | If we're assuming you're using a view based approach which
           | elides the soft deleted rows automatically then you'll get a
           | lot of these dependent objects correctly updated for free
           | assuming you're pulling them out of the DB with JOINs -
           | SELECT FROM foo JOIN bar (assuming bar is a view into
           | barwithdeleted) will automatically filter out the invalid
           | rows from foo... if you're using this information to populate
           | a CRUD interface it's likely you'll be JOINing bar already to
           | get some metadata for display (like maybe bar.name instead of
           | the surrogate bar.id key you use for joining).
        
             | nfm wrote:
             | Yes, but other queries (any aggregate queries that don't
             | join the soft deleted table, any joins to other tables)
             | will now return rows that would have been deleted under
             | hard deletion with cascade.
        
               | corrral wrote:
               | You could use a trigger to cascade soft-delete flag
               | toggles, provided all the relevant tables have such a
               | column. Still have to factor that into your other
               | queries, but at least you wouldn't have to make
               | potentially-impossible-to-exhaustively-check joins to
               | figure out if a given row was "deleted".
        
               | munk-a wrote:
               | This is definitely something to watch out for, but in
               | practice (as someone that migrated a system without soft
               | deletes to one that had them) I found that it doesn't
               | tend to come up nearly as much as you might think -
               | usually the table being transitioned to support soft
               | deletes is a relatively core table (since ancillary
               | tables usually aren't worth the complexity to transition)
               | so a lot of your reporting queries will already be
               | pulling in that table. You definitely need to check to
               | make sure you're not missing anything - and sometimes
               | CRUD interfaces will need to be completely revamped to
               | include the table in question - but it's usually not
               | _that_ hard.
        
           | dspillett wrote:
           | _> Since I can no longer rely on ON DELETE CASCADE
           | relationships_
           | 
           | Cascaded deletes scare me anyway. It only takes one idiot to
           | implement UPSERT as DELETE+INSERT because it seems easier,
           | and child data is lost. You could always use triggers to
           | cascade you soft-delete flags as an alternative method,
           | though that would be less efficient (and more likely to be
           | buggy) than the built-in solution that cascaded deletes are.
           | 
           | If you look at how system-versioned (or "temporal") tables
           | are implemented in some DBMSs, that is a good compromise. The
           | history table is your audit, containing all old versions of
           | rows even deleted ones, and the base table can be really
           | deleted from, so you don't need views or other abstractions
           | away from the base data to avoid accidentally resurrecting
           | data. You can also apply different storage options to the
           | archive data (compression/not, different indexes, ...
           | depending on expected use cases) without more manaully
           | setting up partitioning based on the deleted/not flag. It can
           | make some query times less efficient (you need to union two
           | tables to get the latest version of things including deleted
           | ones, etc.) but they make other things easier (especially
           | with the syntactic sugar like AS AT SYSTEM_TIME <when> and so
           | forth) and yet more things are rendered possible (if
           | inefficient) where they were not before.
           | 
           |  _> I tend to prefer a CDC stream into a separate historical
           | system of record._
           | 
           | This is similar, though with system versioned tables you are
           | pretty much always keeping the history/audit in the same DB.
           | 
           | ---
           | 
           | FWIW: we create systems for highly regulated finance
           | companies where _really_ deleting things is often verboten,
           | until it isn 't and then you have the other extreme and need
           | to _absolutely_ purge information, so these things are often
           | on my mind.
        
             | michaelt wrote:
             | _> It only takes one idiot to implement UPSERT as
             | DELETE+INSERT because it seems easier, and child data is
             | lost._
             | 
             | Seems unfortunate to miss out on all the referential
             | integrity benefits of a serious database when hiring
             | standards, training and code reviews should all be
             | preventing idiotic changes.
             | 
             | If I'm making a shopping cart system, I want to know every
             | order line belongs to an order, every order belongs to a
             | user and so on. Anyone who can't be trusted to write an
             | update statement certainly can't be trusted to avoid
             | creating a bunch of orphan records IMHO.
        
           | wvenable wrote:
           | Often you don't have to rely on ON DELETE CASCADE
           | relationships. Because you are never deleting anything, you
           | will never have any orphaned records. If you don't want to
           | see say Invoices for a deleted Customer then that's just
           | another filter feature.
           | 
           | Mostly I use soft-delete because for auditing requirements we
           | pretty much can't remove anything but also because nothing
           | ever truly goes away. If we have an Invoice or Order then,
           | from our perspective, we must have those forever even if the
           | corresponding client is deleted and can never place another
           | one.
        
             | nkozyra wrote:
             | > Often you don't have to rely on ON DELETE CASCADE
             | relationships. Because you are never deleting anything, you
             | will never have any orphaned records
             | 
             | Exactly. Unless you're doing something silly like adding
             | deleted at to bridge tables ... which, you probably don't
             | need even in 1:many.
        
           | kukx wrote:
           | Is not there any attempt to improve the soft deletion at the
           | engine/SQL level? I can see it as a possible feature request.
        
             | pbardea wrote:
             | One interesting feature that some DBs implement is
             | something like SELECT AS OF SYSTEM TIME
             | (https://www.cockroachlabs.com/docs/stable/as-of-system-
             | time....) which _kinda_ does this.
             | 
             | However in practice this usually dramatically slows down
             | reads if you have to constantly skip over the historic rows
             | so you probably don't want to keep garbage round longer
             | than absolutely necessary. The concept of a historic table
             | mentioned below could be interesting though - especially if
             | it could be offloaded to cold storage.
        
             | simcop2387 wrote:
             | There's the idea of temporal tables,
             | https://pgxn.org/dist/temporal_tables/
             | 
             | It's not a standard (I think) but it'd let you do a
             | cascading delete and then be able to go and look at the old
             | objects as they were at time of deletion too.
             | 
             | You'd need to do things very differently to show a list of
             | deleted objects though.
        
               | psYchotic wrote:
               | It appears that there's been an attempt at standardizing
               | temporal features in SQL in the SQL:2011 standard:
               | https://en.wikipedia.org/wiki/SQL:2011
        
               | dspillett wrote:
               | _> temporal tables ... It 's not a standard_
               | 
               | They were introduced in ANSI SQL 2011.
               | 
               | How closely implementations follow the standard I don't
               | know, but something close exists in several DBMSs: I use
               | them regualrly in MS SQL Server, there are plug-ins for
               | postgres, MariaDB has them, and so forth.
        
             | smallnamespace wrote:
             | If you're using PostgreSQL, you can implement cascading
             | soft-deletes yourself.
             | 
             | The information schema table holds all foreign key
             | relationships, so one can write a generic procedure that
             | cascades through the fkey graph to soft-delete rows in any
             | related tables.
        
               | BeefySwain wrote:
               | Could someone take a stab at an example of what this
               | would look like? Sounds really interesting.
        
             | xwdv wrote:
             | Doubt it. It seems like something obvious yet I've waited
             | so long for it. Seems like you have to rely on third party
             | plugins.
        
           | dragonwriter wrote:
           | > This is one gripe I have with soft-deletion. Since I can no
           | longer rely on ON DELETE CASCADE relationships
           | 
           | If you use soft deletes on all tables, you can also cascade
           | them as long as you either cascade updates to the real keys
           | as well, or prevent such updates, by having a deleted flag
           | column on each table, including it in a unique constraint
           | with the actual key column(s), and including it in the
           | foreign key.
        
         | coding123 wrote:
         | I was going to chime in with this. thanks. One issue with views
         | however is that a lot of these features require more and more
         | nuanced knowledge of RDBMSes where these days unless you have a
         | veteran architect, most of the team just knows the various
         | library/tooling that interacts with "a variety of databases" so
         | there is often less effort to go deeper.
        
           | quickthrower2 wrote:
           | Where is this anti-fb culture? Is it a startup thing?
           | 
           | Everywhere I have worked people know a decent amount about
           | their data store. Not architects, just mid devs and higher.
        
       | [deleted]
        
       | lowercased wrote:
       | "Instead, we rolled forward by creating a new app, and helping
       | them copy environment and data from the deleted app to it. So
       | even where soft deletion was theoretically most useful, we still
       | didn't use it."
       | 
       | But... weren't you using all those env and data info from the
       | soft-deleted set?
       | 
       | I've typically been using soft-deletes for most projects for
       | years. People have accidentally deleted records, and having a
       | process to undelete them - manually or giving them a screen to
       | review/restore - has usually been great.
       | 
       | Yes, if there's a lot of related artefacts not in the database
       | (files/etc) that were literally deleted, you may not be able to
       | get them back. But that's an ever greater edge case in projects I
       | work in as to not be a huge issue. We probably have some files in
       | a backup somewhere, if it's recent. Trying to 'undelete' a record
       | from years ago - yeah, likely ain't gonna happen.
       | 
       | People are used to 'undo' and 'undelete'. Soft-deletes are one
       | way to provide that functionality for some projects.
        
       | tehbeard wrote:
       | > Instead, we rolled forward by creating a new app, and helping
       | them copy environment and data from the deleted app to it. So
       | even where soft deletion was theoretically most useful, we still
       | didn't use it
       | 
       | I don't get this statement. You wouldn't have had the env or data
       | without soft delete? You did use it!
       | 
       | I would say, soft delete isn't a tick the box and done solution
       | as many ORMs make it.
       | 
       | You need to consider the data model, and adjust your queries to
       | that.
       | 
       | It may make sense for a product to be deleted, but orderlines
       | still able to access it to display product name etc.
       | 
       | With blob data, I tend to move that to a "bin" with a 30-60 day
       | grace period. Customers know quickly reporting, we can fully
       | recover, while outside that time they'll have to provide images
       | etc. It's a decent compromise.
       | 
       | Reuse of unique fields is the sticking point I run into often, as
       | mysql interprets null as not clashing with other nulls so
       | composite uniques using the ID and deletion date don't work.
        
         | TylerE wrote:
         | > mysql interprets null as not clashing with other nulls
         | 
         | Which is correct per SQL. Null is NaN, not zero (or negative
         | infinity).
        
       | GartzenDeHaes wrote:
       | Personally I like no delete designs, which give you a full audit
       | history of changes. This is similar to generally accepted
       | accounting principles.
       | https://en.wikipedia.org/wiki/Generally_Accepted_Accounting_...
        
         | znpy wrote:
         | Your taste in database design is probably nor gdpr compliant, i
         | hope you don't work in the eu.
        
           | munk-a wrote:
           | It's surprising but the EU tends to be one of the most
           | stringent regions to work in both when it comes to totally
           | permanently deleting things and when it comes to never ever
           | deleting things - as with anything like this where there is a
           | debate (rather than a settled best practice) there are some
           | times when soft deletion is appropriate and necessary (i.e.
           | to adhere to log retention requirements common in the EU) and
           | some times when it's unnecessary... and the occasional fun
           | time when it's both necessary to support soft deletes and
           | hard deletes - when logs need to be retained for auditing
           | purposes but also when some users can force a hard delete
           | (leading to that data either being purged or moved to an
           | archive storage if it's still needed for auditing purposes).
           | 
           | The world is almost never as simple as it seems.
        
         | munk-a wrote:
         | So if an account was active in your system and is active no
         | longer... do you soft delete it (even if that means UPDATE ...
         | SET active = 'f') or hard delete it?
        
           | GartzenDeHaes wrote:
           | That depends on the problem domain and how you design the
           | system, since there are several way to do this. Typically in
           | financial systems you would either have a start and end date
           | on a summary record, or add an inactive record to a
           | transaction table that has a record for each change to the
           | account, or both.
        
       | munk-a wrote:
       | I just wanted to touch on the fact that eliding soft-deleted rows
       | from queries is really, really easy - this article makes it out
       | to be a constant headache but here's my suggested approach.
       | ALTER TABLE blah ADD COLUMN deleted_at NULL TIMESTAMP;
       | ALTER TABLE blah RENAME TO blahwithdeleted;         CREATE VIEW
       | blah (SELECT * FROM blahwithdeleted WHERE deleted_at IS NULL);
       | 
       | And thus your entire application just needs to keep SELECTing
       | from blah while only a few select pieces of code related to
       | undeleting things (or generating reports including deleted
       | things) need to be shifted to read from blahwithdeleted.
        
         | yladiz wrote:
         | But, assuming you don't really need the data, why make your
         | queries more complex and instead just actually delete the data?
        
           | munk-a wrote:
           | This is a really scenario specific question - sometimes it's
           | needed, sometimes it isn't. At my shop we have customers that
           | will suspend their account but our sales team is pretty damn
           | awesome so usually they end up renewing after going a while
           | without our product - so being able to easily restore a large
           | swath of former customers with all their permissions and
           | preferences intact with a simple click of a button is a huge
           | win compared to having a dev try to piece the data together
           | out of backups that are six months out of date (a little
           | secret... we never did this and just put the obligation on
           | the CS team to manually recreate the records since that cost
           | the company less).
        
             | yladiz wrote:
             | If I was a tech lead and also wanted to advocate for hard
             | deletion, I would ask the question for this scenario:
             | "What's the cost of keeping all this data unnecessarily,
             | modifying most queries to filter for deleted data, and
             | dealing with other various consequences of soft deletion,
             | and how does this cost compare with the cost of building a
             | bespoke tool to restore deleted data at a customer's
             | request within a certain time frame and compare with the
             | benefit of a customer being able to restore their data at
             | the 'click of a button'?".
             | 
             | Having dealt with systems that have hundreds of millions of
             | records or more, many of which reference deleted data and
             | are therefore useless, I lean towards hard deletion more
             | and more and on the off chance that deleted data really
             | needs to be recovered you build a separate
             | system/infrastructure to support that, rather than building
             | your _entire_ system around the small likelihood you really
             | need to restore it.
        
               | munk-a wrote:
               | As your data architect I'd probably answer your question
               | "Well, it'll take a little while longer to vet all the
               | indices - since we'll probably want most indices to
               | filter on WHERE deleted_at IS NULL but we'll likely want
               | a few without that constraint for managing undeletion.
               | We'll use more space on disk which, honestly, is pretty
               | much a non-issue in the modern world - and if it gets bad
               | enough we can always partition the table on deletion
               | status and dump the soft deleted rows on a secondary
               | server... but I wouldn't worry about that until we hit
               | facebook scale. In terms of application developer time -
               | I'll probably need an hour of their time to explain it
               | once and all existing queries keep working as they're
               | working now... and as for that undelete tool, well, we
               | don't actually have to build it - if we don't build it we
               | can just ask devs to submit manual queries to undelete
               | data rows as needed via our migration, or oneoff or
               | console interface. We'll want to do a sweep for any
               | queries that reference tables dependent on the soft-
               | delete having table - just to make sure they're throwing
               | an INNER JOIN against the view but, honestly, we could
               | just bop those bugs on the head if they ever come up."
               | 
               | Like, I'm definitely not saying soft-deletion is always
               | the answer, it takes additional effort and planning, but
               | it's really, really easy to do safely.
        
           | TylerE wrote:
           | Becauase that's a huge assumption. From my experience, it's
           | sorta like the giant box of old cables most techies keep
           | around.
           | 
           | Do you reach into it often? No.
           | 
           | But when you do it absolutely saves the day and makes you a
           | hero.
        
         | bjourne wrote:
         | This is not a solution. It introduces a leaky abstraction which
         | sooner or later will lead to errors. Sure, all code _you_ write
         | will access the view and not the table. But how can you ensure
         | all other code in the organisation uses the view? Perhaps you
         | add some access control to the table so that only authorized
         | users can read directly from it, but that 's even more
         | technical overhead. Then you have foreign keys. If you have a
         | "deleted" column in the Customer table you need to remake the
         | Invoice table as a view so that it hides invoices for deleted
         | customers. The same goes for the InvoiceItem table (foreign key
         | of a foreign key) and all author auxiliary information related
         | to the soft-deleted customers.
         | 
         | Furthermore, the cost of an error is potentially massive.
         | Someone new at the company makes a revenue report based in the
         | billed Invoices and does not realize they should query the view
         | and not the table... Not great if 90% of all invoices belong to
         | soft-deleted customers!
         | 
         | The author is right; soft-deletes are probably most definitely
         | not worth it. There are many better ways to solve the problem.
        
           | croes wrote:
           | If you mention the costs of reporting error you should
           | consider the costs of wrongful hard delete too.
           | 
           | And reports are developed and tested before they are used for
           | crucial purposes.
        
           | tshaddox wrote:
           | > Furthermore, the cost of an error is potentially massive.
           | Someone new at the company makes a revenue report based in
           | the billed Invoices and does not realize they should query
           | the view and not the table... Not great if 90% of all
           | invoices belong to soft-deleted customers!
           | 
           | I'm not sure I buy this argument. It's certainly conceivable
           | for that to happen, but no more so than any other case of
           | "the engineer queried the wrong table and thus got incorrect
           | results." There's never going to be any _technical_ way of
           | preventing this: if you have access to multiple sets of
           | numbers, and you want to sum up one set of numbers but
           | mistakenly sum up the other set of numbers, you 're going to
           | get the wrong answer!
        
             | bjourne wrote:
             | > "the engineer queried the wrong table and thus got
             | incorrect results."
             | 
             | The difference is that the path of least resistance, the
             | most obvious method - just query the damn table - is
             | incorrect. Bad design can certainly make a system more
             | error prone.
        
               | tshaddox wrote:
               | In the example from this thread the names of the table
               | and view are reasonably clear, so even in a hypothetical
               | project without any external documentation of naming
               | conventions or engineering processes (code review, etc.)
               | the most obvious thing would be to query _blah_ instead
               | of _blahwithdeleted_.
               | 
               | Of course, this is _extremely_ hypothetical, and in any
               | real project where you 're generating a financial report
               | you absolutely must have a detailed understanding of the
               | relation that you're aggregating over. Even if your
               | project has very strict naming conventions for tables,
               | views, etc. you've gotta put in more work than "this
               | short string sounds like a plausible label for the
               | relation I want to aggregate over."
        
               | bjourne wrote:
               | "reasonably clear" are famous last words. It ignores tons
               | of evidence on how commercial software development works.
               | Everything is "reasonably clear" in isolation, but not
               | when you throw in thousands of other "reasonably clear"
               | things developers are supposed to keep track while not
               | missing tight deadlines. Fact is that if you add
               | opportunities for people to screw up then you will make
               | people screw up, regardless of how "reasonably clear" or
               | "obvious" the system is.
               | 
               | Maybe you need more work experience because believing
               | that it is implausible for someone to accidentally query
               | the customer_with_deleted table over the customer view is
               | incredibly naive. Likewise, people generating financial
               | reports _can_ have detailed understanding of data
               | modelling but scarily often don 't. Give them extra
               | opportunities to fuck up and they will take them every
               | time. KISS
        
           | munk-a wrote:
           | I don't really agree with that. Within an organization you
           | have documentation and instruction as tools - but you're also
           | making the dumb approach (SELECT * FROM blah) the correct
           | approach. If a user is writing a query against the DB, has no
           | idea what the layout of the data is, and decides to prefer
           | blahwithdeleted over blah then I'd really question whats
           | going on at your organization - blahwithdeleted is pretty
           | clearly self-documenting and it's likely a lot of your other
           | domain specific tables with be much harder to naively
           | discover your way through.
           | 
           | I, personally, would in no way restrict access to
           | blahwithdeleted, but I have made a pattern of it in our DB,
           | there are about a dozen blahwithdeleted tables - each with a
           | corresponding blah view... I usually get about one question
           | per every two new employees about which table to use which I
           | can answer in less than a minute with a helpful little
           | explanation.
           | 
           | I'd also mention I've not made a specific value statement on
           | soft deletions in a general case since, if there was a clear
           | general case solution we'd just all do that. This is a
           | decision that needs to be made on a per table basis - it's a
           | rather trivial decision in most cases, but it's very specific
           | to the problem at hand.
        
       | scifibestfi wrote:
       | > When I worked at Heroku, we used soft deletion. When I worked
       | at Stripe, we used soft deletion. At my job right now, we use
       | soft deletion.
       | 
       | > As far as I'm aware, never once, in ten plus years, did anyone
       | at any of these places ever actually use soft deletion to
       | undelete something.
       | 
       | That's wild. So it seems the idea of needing undelete is largely
       | an unfounded fear.
        
       | revskill wrote:
       | In realworld, there's no concept as deletion from DB !
       | 
       | There's only deactivate account, archive a legacy product,...
       | 
       | Because there's no such thing as delete something from real
       | world.
        
       | dfee wrote:
       | My experience is that soft-deletes are blunt tools bridging the
       | gap between hard deletes and event sourcing (capturing all the
       | changes against the table, in a replay-worthy stream).
       | 
       | Event sourcing is hard - because the engineers responsible for
       | setting it up and managing it aren't generally well skilled in
       | this domain (myself included) and there aren't a wealth of great
       | tools helping engineers find their way into the pit of success.
       | 
       | The downsides of soft-deletes (as identified in the article) are
       | numerous. The biggest problem is that it appears "simple" at
       | first blush (just add a deleted_at column!), but it rots your
       | data model from the inside out.
        
         | vbezhenar wrote:
         | You can restore to any point of time from your database backup.
         | So it can cover some requirements.
        
         | Terr_ wrote:
         | > My experience is that soft-deletes are blunt tools bridging
         | the gap between hard deletes and event sourcing
         | 
         | Agreed, sometimes it makes business-sense to implement it, but
         | in the big picture it's still kludgy and not-ideal.
         | 
         | While full-on event-sourcing isn't always the answer, once
         | business-rules prevent you from un-deleting anything there's
         | not much point of having all those dead-rows interspersed in
         | your regular tables.
        
         | zozbot234 wrote:
         | An event store is just a special case of a temporal database.
         | The whole point of temporal databases is to natively support
         | the notion of historical vs. current data.
        
         | redavni wrote:
         | Just want to add that the downsides as identified in the
         | article make little sense. Deleting a customers invoices should
         | be a very rare thing. I can't imagine any accountant or auditor
         | is going to be happy with an IT guy deciding when to delete
         | invoices.
         | 
         | If accidentally writing the wrong query is a problem, then
         | writing the wrong query is your problem.
        
         | btown wrote:
         | The question for either of these systems IMO is: do you trust
         | that a change from your upstream represents a true, everlasting
         | intention, or is it something that may need to be reinterpreted
         | or rolled back in the future?
         | 
         | At my startup, soft deletes for our SKUs are critical, because
         | we work with data sources where notoriously both the technical
         | systems and the humans driving will all-too-frequently
         | accidentally represent something to our connection as deleted.
         | Or there might be an irrecoverable error when asking "what
         | things are still active upstream" - but that doesn't mean the
         | SKUs are deleted, we might just not have certain live details
         | until a bugfix is made. So "error status" and "soft delete" are
         | somewhat synonymous, and both require investigation into root
         | causes and root intents. Yes, the concept of "unerrored and
         | active" is peppered through our codebase and analytics - but
         | our ability to recover from supplier technical mistakes is much
         | higher as a result. And we could absolutely do this with an
         | event sourced system - but the tooling for relational databases
         | is so much better, it's night and day.
        
         | rodelrod wrote:
         | Or you can see it the other way around: soft-deletes are a
         | pragmatic alternative to event sourcing that provides a lot of
         | the value without requiring a team of super-humans and a
         | radical redesign of the existing systems.
        
       | willlll wrote:
       | For the control plane part of Crunchy Bridge, on day one I
       | decided to go with the deleted_records table that is mentioned at
       | the end of this post. It's been great. No need to keep around
       | dead data that no one ever looks at.
       | 
       | We don't need to have `where deleted_at is null` on every single
       | query. But the best part though is our actual working data set of
       | records we actually care about is tiny compared to the deleted
       | cruft that would have otherwise been just sticking around
       | forever. Backups and restores take no time at all. It's really
       | cool that postgres lets you have conditional indexes on things,
       | but it's even cooler not to need them.
        
       | dcdc123 wrote:
       | If you are using a state manager with models in something like
       | rails/django/etc then it is trivial to support soft deletion
       | without it infecting your entire code base.
        
       | duxup wrote:
       | > All our selects look something like this: SELECT * FROM
       | customer WHERE id = @id AND deleted_at IS NULL;
       | 
       | Solution... a whole other table of deleted stuff... in a new
       | structure.
       | 
       | Man soft deletes just look better to my eye.
        
       | satyrnein wrote:
       | We switched a lot of tables to soft deletes so we could replicate
       | those deletes into our data warehouse. You can also use bin log
       | replication for hard deletes, but every schema change would break
       | it.
        
       | msie wrote:
       | I've used Soft Deletion so many times so I'll say it's been worth
       | it. I believe using an audit table would have made recovery more
       | difficult for me. Anyways take this advice with a grain of salt.
       | It's only one guy's opinion. As is mine.
        
       | lolsal wrote:
       | In my 20 years of software experience the soft delete is not so
       | often used to _undelete_ something, but more often used to _know
       | what_ has been deleted. If you delete a record from a table, did
       | it ever exist? Can you reference that customer /user/product ever
       | again? Not to mention the one-in-a-million case where a customer
       | had their account erroneously or fraudulently deleted -
       | undeleting saves time/money/bacon when it's needed and is
       | relatively inexpensive to maintain.
        
       | justin_oaks wrote:
       | The author didn't mention it, but restoring data from a database
       | backup is a perfectly reasonable way to handle undeletes. By this
       | I mean the situations that are "Oh crap, we didn't mean to delete
       | that!" instead of usual business operations.
       | 
       | I've probably restored data from backup maybe 4 times in my
       | career. I greatly prefer to do this on the rare one-off scenario
       | than to deal with the overhead of soft deleting everything.
        
         | marcosdumay wrote:
         | The difference in framing one gets by looking around is
         | amazing, even funny.
         | 
         | > I've probably restored data from backup maybe 4 times in my
         | career.
         | 
         | Yet, I often use soft-deletes because it allows people to
         | undelete things from the software interface and not call me all
         | day long.
         | 
         | But that's not the most common reason I have for them. Normally
         | it is because the data just can not be gone, and the full table
         | is still important somewhere.
        
       | hn_throwaway_99 wrote:
       | The deleted records table he mentions at the end is a good
       | approach, but:
       | 
       | 1. This can easily be done with a trigger, so that you just call
       | a DELETE on the table and deleted tables are copied to the
       | deletion table automatically.
       | 
       | 2. I prefer, instead of having a jsonb column, that each table
       | has a corresponding `deleted_original_table_name` table that
       | exactly matches the schema of the base table, with the addition
       | that the first column is a `deleted_at` timestamp. It's easy to
       | use helper methods in schema migrations to always keep the table
       | definitions in sync.
        
       | pierrebai wrote:
       | The author claims pruning soft-deleted entries requires a complex
       | query, but hard-deleting an entry would have required the same
       | complexity. So it's really not an argument.
        
       | tgbugs wrote:
       | One use case that I think is not sufficiently considered in this
       | is related to two comments I made about a year ago [0, 1].
       | 
       | If you can _actually_ delete something, then that means that a
       | malicious actor can fabricate data an claim that you deleted it.
       | GDPR may be well intentioned but systems that have the ability to
       | remove any record of a thing lay the groundwork for systematic
       | fabrication of data, because any record of the past has been
       | erased.
       | 
       | Operationally, I can totally see why soft delete might be
       | considered to be problematic in certain cases, but from an
       | information security point of view I think it is absolutely
       | critic for protecting users against a whole class of attacks.
       | 
       | 0. https://news.ycombinator.com/item?id=27249738 1.
       | https://news.ycombinator.com/item?id=27691442
        
       | jonstaab wrote:
       | If you implement soft delete, you should surface it to your user.
       | That's who is accidentally deleting things, and that's who will
       | want to un-delete them. As for side effects like spinning up/down
       | servers, build that into your data model (of course, in a case
       | like Heroku's that can be prohibitively expensive, so don't).
       | 
       | Source: I write back of house software for resale store owners,
       | and accidental deletes happen occasionally. Being able to restore
       | things instills a lot of confidence for our customers.
        
       | kleer001 wrote:
       | Yea, it is.
        
       | kache_ wrote:
       | wait until this guy finds out about financial regulations
        
       | jelkand wrote:
       | Soft deletion is certainly very situationally worth it. I've
       | found the most value when 1. it is well supported at the ORM
       | layer and 2. business requirements dictate strong auditability of
       | data. While I have undeleted items on occasion, I've used soft
       | deletes more frequently to debug and build a timeline of events
       | around the data.
       | 
       | For context, I've worked in fintech where I often needed to
       | review backoffice approvals, transactions, offers, etc.
        
         | krstf13 wrote:
         | Wouldn't storing the deleted data in an immutable storage, with
         | time stamp, be much better for auditability ? I mean how could
         | you audit deleted, restored and deleted again data with that
         | setup? Also, while I know it's not really accurate, I tend to
         | understand relations as sets, it makes me uncomfortable to have
         | soft deleted data that are neither member or not member of the
         | set.
        
         | chomp wrote:
         | Yep, we have an abstraction layer on top of the ORM to provide
         | common queries. "Give me all X" will always return stuff not
         | soft deleted. Data people also like to go diving through old
         | data, and without getting into data warehousing and stuff like
         | that, it's not too complex to support a single flag to enable
         | us to keep old stuff.
        
           | delusional wrote:
           | They might like to, but you should definitely consider if
           | saving data no longer required for your business violates
           | privacy regulation/ethical considerations.
        
             | chomp wrote:
             | Definitely. When a user "deletes" their account, we null
             | out all identifying fields to "DELETED_PII_$user_id". We
             | have running metrics we compute that would go off the rails
             | if we dropped the row completely.
        
         | hinkley wrote:
         | In my limited experience, soft deletion also has better
         | prospects where partial indexes are involved, since it reduces
         | the size of the index and reduces search and insert time a
         | little bit. If soft deletes are rare, you aren't going to see
         | much of a payback for your investment in code complexity.
         | 
         | And since you can never really be sure what you'll need 2 years
         | from now, I imagine there are a lot of anecdotes out there of
         | people who implemented it thinking it would be used a lot, and
         | turned out to be wrong.
        
       | pilgrimfff wrote:
       | All you need is a layer of abstraction to get past the downsides
       | of soft deletion. You can use views or your ORM (if you use one)
       | 
       | In Django, it's really easy to create almost seamless soft
       | deletion logic in the model manager or in your querysets.
       | 
       | Over the last decade, I find myself using soft deletion more and
       | more - usually to accommodate user/client requests.
        
       | adrianmsmith wrote:
       | > Instead of keeping deleted data in the same tables from which
       | it was deleted from, there can be a new relation specifically for
       | storing all deleted data
       | 
       | The disadvantage of this is that if you ever _do_ want to access
       | this  "deleted" data, e.g. in admin or compliance tools, you now
       | have to do it in two different ways, one way for the main data
       | and a different way in case the data has been "deleted".
       | 
       | The article asserts you'll never need to "undelete" the data. So
       | they're presenting a solution with that assumption, fair enough.
       | Without that assumption, however, moving the data back from an
       | archive table becomes a pain, and if there are any unique
       | constraints e.g. on username or email address, you'll have a
       | problem if you've moved the data out of the main table and
       | another user has used that username or email address.
        
         | layer8 wrote:
         | > now have to do it in two different ways
         | 
         | Use a view.
         | 
         | > if there are any unique constraints e.g. on username or email
         | address
         | 
         | Have those in a dedicated table where they aren't deleted, and
         | add a synthetic key referenced by the other tables.
        
         | Terr_ wrote:
         | > The article asserts you'll never need to "undelete" the data.
         | 
         | IMO it's worth distinguishing between (A) some kind of "click
         | to undelete" feature versus (B) simply having that old-data
         | conveniently exposed for a developer to manually-edit things or
         | craft database-change scripts.
         | 
         | In practice I've only ever seen the latter get used, because it
         | requires a developer to figure out how the heck to get "the
         | parts that matter" back while preserving the integrity of other
         | newer data and obeying certain business-rules.
        
       | dunkelheit wrote:
       | This brings back memories... Some time ago I was an intern in a
       | team working on a UGC map editor. We were using this soft-delete
       | pattern and for some task I needed to deploy a database migration
       | that fiddled with the "deleted" status field. It was quite late
       | and after the migration finished I almost went home but for some
       | reason decided to check community forums. There users were having
       | a time of their life taking screenshots of deleted objects that
       | suddenly became visible (many of them quite amusing, including
       | swear words written in 500km letters). Dunno how this escaped
       | testing, but horror of what I have done brought clarity of mind
       | and I quickly found an error and devised another migration that
       | fixed the data. That worked and I was able to finally go home.
       | 
       | So yeah, be careful with the soft-delete pattern :)
        
       | scott_w wrote:
       | The example the author gives is... frankly awful.
       | 
       | I can't think of a single case where you'd want to remove the
       | invoices of a customer you delete. Ever. In fact, the opposite is
       | more likely to be a big problem, accidentally cascading your
       | delete to your financial records!
       | 
       | Using a soft delete, your invoices won't "disappear" because your
       | app WILL have a view for looking at just the invoices.
       | 
       | Source: I built a bookkeeping system and soft deletes is a
       | necessary feature.
        
       | Pxtl wrote:
       | Trivial case I hit:
       | 
       | 1) Client wants to remove user from the system who have left
       | their org but
       | 
       | 2) There are objects that were contributed by that user which are
       | required to persist beyond the user's deletion.
       | 
       | Those are ideal cases for soft deletion. We can still query
       | information about the deleted user to explain who created this
       | object, with the note that their account has been deleted.
       | 
       | Probably I should be doing full event-sourcing for this case, but
       | delete flag works well. MS offers temporal tables for this use
       | case and I'm still considering the implications there -- AFAIK
       | ORM support is WIP.
       | 
       | And unlike the article author, I _have_ used soft deletion to
       | undelete things. Many times. Maybe he has better users than I do,
       | I don 't know.
        
       | outworlder wrote:
       | I wish Datomic was made open-source (with maybe some features
       | available as an 'enterprise' offering) so that we could actually
       | have a decent alternative for this 'soft-delete' problem.
        
       | joshuanapoli wrote:
       | Atlassian's deletion-related outage demonstrates why soft
       | deletion should be the default. Use hard deletion after a grace
       | period for data that truly needs to be expunged. Even if undelete
       | is not part of the normal workflow, experience shows that swift
       | recovery from bugs and operator errors is a universal part of
       | serving users. The less data motion involved in deletion (and
       | recovery) the better for both the original deletion process and
       | also any recovery process.
       | 
       | https://news.ycombinator.com/item?id=31015813
        
       | dragonwriter wrote:
       | > The concept behind soft deletion is to make deletion safer, and
       | reversible
       | 
       | IME, as with "updated_by" and "last_modified_at" columns, it's
       | usually hazy audit requirements, not making deletion reversible,
       | that motivates it.
       | 
       | A proper history store maintained by appropriate triggers solves
       | this, and leaves the referential integrity constraints on the
       | base table intact. (It can also be used for reversibility if you
       | need that.)
       | 
       | Views conceptually would work, but then you get bitten by all the
       | ways that all relations are not equal in real-world RDBMSs.
        
       | kardianos wrote:
       | This poster misses the point completly. Soft delete is a must
       | have for historical data, where you want to keep history, but
       | keep the current set clean.
       | 
       | Effectively, you don't check for the soft delete flag if you get
       | to it from a an un-deleted record, but you do check for it if you
       | access it the other way around.
        
       | armchairhacker wrote:
       | Dumb solution: make soft deletes explicit in your backup system.
       | 
       | Your company has a database backup system right? That system
       | should be configured so that when it runs a backup, it will not
       | remove deleted entries from the previous backup, instead just
       | mark them as "deleted_since" the current backup time.
       | 
       | Idk if any backup system actually support this, if there's some
       | glaring problem (like you can't just overwrite parts of a
       | database backup for some reason), or if most companies just don't
       | have backups because they're too expensive (probably not), but
       | this is the solution I would go with. It works for other sorts of
       | data like file systems as well.
        
       | deerIRL wrote:
       | As someone who has done development work with Class A data and
       | specifically in the realm of justice, soft deletes aren't simply
       | a good idea, they are required by law.
       | 
       | Most of these downsides are easily mitigatable issues as well. As
       | many users have stated, something like views solves the issue of
       | forgetting the 'deleted' clause.
       | 
       | Lastly, I'm not sure the issue with foreign keys/stray records
       | really resonates with me. I'd be hard pressed to be comfortable
       | allowing a developer or DBA who isn't fully comfortable with the
       | data model to be hard deleting records, let alone flagging them
       | as soft deleted.
        
       | ThePhysicist wrote:
       | I don't get what the problem is with cascading deletes. I mean
       | you typically only use them for foreign keys where deletion of
       | the parent object makes the referencing object simply invalid, so
       | there would be no reason to leave the referencing object in the
       | database.
       | 
       | The point that is true is that queries get more complicated as
       | you'll have to add a "WHERE deleted_at IS NULL" to every SELECT
       | (once for each table you refer to), but that can be automated if
       | you use an ORM. A paradigm that I often use is that all objects
       | in the database belong to a role object that determines who can
       | read/write/delete the given object. So before doing anything with
       | an object I always check the role object (e.g. the "user"
       | referencing an "invoice", to stay with the example OP gives), and
       | as part of this I check whether the user object still exists.
       | Alternatively, you could automate most of the required update
       | logic using triggers as well.
       | 
       | But otherwise I agree, soft deletes often don't seem to be a
       | worthwhile tradeoff, not sure if I would use them again when
       | designing a relational schema. They are very useful for auditing
       | and undo though: In a current project, whenever a set of objects
       | gets updated I soft-delete the old versions and create new
       | objects, keeping the UUIDs intact. That allows me to display the
       | entire version history of each object to the user, which can be
       | necessary e.g. for compliance reasons. You can achieve this with
       | an audit log as well but that would require more logic and
       | different queries, whereas querying soft-deleted objects just
       | requires a slight modification of existing queries.
        
       | giantg2 wrote:
       | "The concept behind soft deletion is to make deletion safer, and
       | reversible."
       | 
       | That's one part. The other part is that in many industries you
       | have regulatory data retention and audit requirements. This is
       | arguably the most valuable and common reason to perform Logical
       | deletes.
        
         | brtkdotse wrote:
         | In banking and bookkeeping, there's no such thing as a
         | "delete". Once something is in the ledger you can't undo it -
         | you have to make a new entry that negates the old one.
        
           | hammock wrote:
           | Is the same true in mass surveillance?
        
             | m-p-3 wrote:
             | and the blockchain.
        
               | tomrod wrote:
               | Which is a decentralized, distributed ledger, so...
               | banking! :)
        
               | cweagans wrote:
               | "banking"
        
           | i_hate_pigeons wrote:
           | not every thing is a transaction though, say a deleted piece
           | of ref data that still has FKs to other stuff etc
        
             | Hallucinaut wrote:
             | These are still versioned in many serious finance systems
        
           | munk-a wrote:
           | Yes, but banks tend to have websites with accounts and those
           | accounts need to be deactivated when a customer should no
           | longer have access (or, even more finicky, specific accounts
           | for a client need to be deactivated or activated as they
           | change their usage).
           | 
           | All this essentially forces the use of some sort of soft
           | deletion. (Activation flags are sort of just a more
           | complicated form of soft deletion).
        
             | AdrianB1 wrote:
             | Status (active/inactive/other) and existence (present or
             | deleted) are very different things, they may be separated
             | most of the time. Legal requirements can prevent deletion
             | of inactive data, so in some cases one may have a lot of
             | inactive but must keep data. Soft delete can help in some
             | scenarios, for example you want to give people a way to re-
             | activate accounts, but hide the ones marked as deleted.
        
         | jiggywiggy wrote:
         | Ha, and then there is the opposite regulation that you have to
         | delete user data.
        
           | giantg2 wrote:
           | In some industries the retention regulations trump the
           | deletion ones. I believe finance is one area where they will
           | delete some of your data, but are still required to maintain
           | 7 years of specifically listed data.
        
           | MonkeyMalarky wrote:
           | Well you see, you need a complete record of when it was
           | created, every change that occurred, everyone who could view
           | it and also log every access attempt. But it. You're not
           | actually supposed to keep it. Just everything surrounding it.
        
           | scott_w wrote:
           | Not quite. GDPR (and equivalents) have clear escape hatches
           | to allow you to store data if you have good reason (even if
           | the data subject requests its removal).
           | 
           | Invoices, from the article, is a great example. That record
           | must remain unchanged in most financial regulations. I'd
           | wager a customer sending a deletion request for invoices will
           | be met with raucous laughter from the legal and finance
           | teams.
        
         | nirvdrum wrote:
         | My experience at a few start-ups has been that account deletion
         | just isn't prioritized. It's not a focus when building an MVP.
         | If the application ever gains traction, everyone is then
         | terrified they'll accidentally delete customer data that they
         | never delete anything. It's a shame. As a user, when I delete
         | my account or data in my account, I want you to permanently
         | delete it, not keep it around and just make inaccessible to me.
        
           | nerdponx wrote:
           | I've also seen businesses retain "deleted" data in order to
           | support legitimate data analysis work in the future. And it
           | actually can help significantly. Maybe scrubbing PII from
           | deleted accounts is a good idea, but those deleted accounts
           | are perfectly good data points, especially in smaller/newer
           | companies with lower-volume data streams.
        
             | nirvdrum wrote:
             | If you want to retain anonymous statistics, fine. I'm not
             | keen on you retaining my actual data so you can monetize it
             | after our business relationship has concluded. The biggest
             | thing a small team can learn is why they failed to retain a
             | customer or a trial that didn't convert. For that, usage
             | metrics are considerably more valuable than user data.
        
           | jrockway wrote:
           | If it makes you feel better, the startups that don't have
           | time to delete your data probably don't have a viable
           | disaster recovery plan either.
           | 
           | As we learned from the Atlassian snafu, even giant companies
           | with billions in revenue often can't recover from disasters.
           | (I try to test mine every 6 months. I've never had a test go
           | perfectly.)
        
             | tomcam wrote:
             | Good to know. What do you were tests look like?
        
         | dexwiz wrote:
         | I think billions have been spent bridging the gap between
         | "ideal" software and what businesses actually need. Access
         | control is another thing I see developers wanting to simplify
         | or push to implement later, but is actually a key feature.
        
         | necovek wrote:
         | And yet another part is making deletes (appear) instantaneous:
         | useful when it involves cleaning up a bunch of "related" data
         | possibly living on different services (eg. S3, ES...).
         | 
         | This also helps with the original goal of making them safer by
         | manually implementing "eventual consistency" for data living
         | outside the transactional world.
        
           | taeric wrote:
           | Don't make deletes appear instantaneous? If you have heavy
           | weight systems, then it makes sense for provisioning and
           | deleting entities is a process, that should be open to
           | monitoring.
        
         | jewayne wrote:
         | I would argue that in many cases the concept behind soft
         | deletion is to make deletion permanent.
         | 
         | Hard deletes retain no memory of what you wanted to be gone, so
         | any malfunctioning sync process will continuously recreate the
         | deleted record soon after it's deleted. Soft deletes are often
         | the only way to make sure deleted records don't reappear.
        
           | trinovantes wrote:
           | Assuming you're using a modern database, replication is done
           | with paxos/raft and they are formally proven to not allow
           | this to happen as both edits/deletions are both just entries
           | in distributed event log.
        
           | eeperson wrote:
           | Couldn't a malfunctioning sync process undo a soft delete as
           | well?
        
             | jayd16 wrote:
             | Null is more ambiguous than an explicit conflict. If there
             | is literally no record, even of the delete action then
             | there's no timestamp for last write wins.
        
             | [deleted]
        
         | silisili wrote:
         | Why not just use an audit table, to keep from littering your
         | indices?
        
           | jacobr1 wrote:
           | You still need a record of the data, which probably isn't
           | fully captured in the audit table
        
           | jelkand wrote:
           | An audit table is a similar, but not entirely equivalent
           | tool. There are circumstance where both audit tables / soft
           | deletes are appropriate, and where only one of the two is.
           | Other systems that work are append-only tables, event driven
           | systems, and I'm sure there are more that I'm not aware of.
        
         | MonkeyMalarky wrote:
         | Then there's always the joy of a situation where your client is
         | being sued by one of their clients and now needs your help
         | recovering everything possible from your platform. And you're
         | going to help them because you'd like to keep them as a client
         | rather than let them be sued into oblivion.
        
         | taeric wrote:
         | This is almost certainly going to bite you if you don't push
         | all customer identification data out of your main data stores.
         | 
         | And it will go a long way to making your services harder to use
         | if you don't allow users to associate friendly names with
         | things. And to assume that the same friendly name will be used
         | for a future item. (For example, if you name devices based on
         | the room you put them in. Is reasonable to think that when you
         | replace a device, that you are likely to want to reuse the
         | name.)
        
         | [deleted]
        
       | Apreche wrote:
       | I agree with the author that a separate table is the way to go,
       | but I go one step further than the author and use database
       | triggers to manage that second table. Alternatively, a
       | combination of database views and triggers can do the same thing
       | without having an actual extra table to manage.
       | 
       | Either way, it allows you to have soft deletion and/or full
       | activity logging functionality without the application having to
       | know about it.
        
       | encoderer wrote:
       | Even if you don't "undelete" something, soft deletes make it
       | possible to instantly hide something while saving the expensive
       | sql delete for processing later.
        
       | rubyist5eva wrote:
       | One thing that I could find in the article: performance.
       | 
       | At least for our use case, soft deletes made everything slower
       | because it's much harder to index. For our database we basically
       | had to do an audit of all of our WHERE clauses and create partial
       | indexes on "not yet deleted" records. Of course, this bloats your
       | indexes/disk and hurts write performance so it's not a silver
       | bullet.
       | 
       | We've also taken to inserting into "delete records tables" for
       | records we may want to recover or for historical reasons. You
       | still lose foreign keys but indexing and query optimization is a
       | lot easier, and your old data is just still a simple query away.
        
       | jacobsenscott wrote:
       | Deletion is never worth it full stop. How do you delete from a
       | backup? You can't delete all your backups. Effective dates and
       | app level encryption to allow for cryptographic "deletes" is the
       | way to go.
        
       | whack wrote:
       | We use soft-deletes extensively at our startup. Here's a couple
       | reasons:
       | 
       | - Feature creep. "Sometimes our users accidentally hit the delete
       | button, or change their minds a minute later. We want to give
       | them a way to undo the deletion." Or "I know we said last quarter
       | that we users want to delete stuff, but they also want to see a
       | list of everything they've deleted in the past." Soft-deletes
       | handle feature-creep a lot better than hard-deletions
       | 
       | - It simplifies foreign-keys management. If you want to hard-
       | delete something that some other entity is referencing, you'll
       | have to hard-delete or modify that other entity first. And
       | potentially repeat this process recursively for their own
       | references. This is a pain. One could argue that if you really
       | want to delete something, you should be deleting all children as
       | well. Such arguments are highly domain specific, and very bad
       | universal claims. We've seen some use-cases where such pedantry
       | is not necessary
       | 
       | - It makes it easier to recover from mistakes and bugs. Customer
       | deleted something accidentally and emailed you begging for help?
       | Your code has a bug causing stuff to get deleted when it
       | shouldn't be? You'll be thankful you did a soft-delete and not a
       | hard-delete. Is it going to solve every single problem where the
       | data has system-wide ripple effects in a unicorn sized
       | organization? No. But it'll still solve a number of problems
       | where the data impact is more localized
       | 
       | - It makes debugging easier. You have a clear record of
       | everything that used to exist. You don't have to go digging
       | through your logs to find something that used to exist but has
       | now been deleted
       | 
       | - Speed. All of the above problems can be solved in other ways
       | too. The author suggests putting all deleted data in a "deleted
       | records table." So now you need to maintain a 2nd table for every
       | table that you may want to delete stuff from. All schema updates
       | will need to be mirrored on this 2nd table. And you'll need to
       | write and maintain code to populate this deleted-records-table
       | every time you delete stuff from the original table. All doable
       | and straight-forward but takes time away from other things you
       | could be doing instead
       | 
       | The main benefit from hard-deletions is data compliance and
       | liability. Ie, being able to tell privacy-conscious customers
       | that you actually deleted their data. If you're handling any
       | sensitive data, you should definitely do hard-deletions at some
       | point for this reason. But the other reason the author gave - _"
       | it's annoying having to check for `deleted_at` when writing SQL
       | queries"_ - seems pretty minor compared to the benefits.
        
         | waspight wrote:
         | It seems that it is too easy to delete things in your system.
         | Rather than solving it with reversible soft deletes I would
         | suggest to improve the UX. I don't agree that it simplifies
         | foreign key management, it is most often the opposite from my
         | point of view.
        
       | codemac wrote:
       | Well, there are several problems with this analysis when you go
       | very large (>10000 machines):
       | 
       | - For many applications, it's easiest to put the state of the
       | object in the primary key, and thus point reads will fail when
       | something gets deleted. This has other problems though with
       | hotspotting and compaction during deletes. The deleted table
       | doesn't really solve this either.
       | 
       | - For storage systems, GC is critical functionality to implement.
       | Most systems whether they want to believe it or not are glorified
       | storage systems. Garbage collection is hard to do at scale, and
       | I've never seen it implemented as SQL statements rather than
       | code. Especially for GDPR etc.
       | 
       | - For large scale distributed systems, foreign key constraints
       | are rare if impossible to implement with reasonable latency, so
       | they don't exist either way. I haven't worked on a system in >15
       | years that had fk constraints.
       | 
       | - For large scale restores where you need to undelete trillions
       | of rows, keeping the rows basically pre-assigns the distribution
       | of writes. When you have to re-create the rows, you tend to get
       | intense hotspotting and failures along the way as you attempt to
       | load balance on the keyspace of the writes.
       | 
       | A deleted records table is good for smaller (<10000 machine)
       | systems when latency between nodes can be kept within the same
       | campus. It can really improve performance of your GC if reading
       | by column isn't fast compared to reading by table.
        
       | khaledh wrote:
       | One reason we encourage keeping soft-deleted records at least for
       | a while is synchronizing data across systems. We want to
       | propagate deletions downstream. At some point when all downstream
       | consumers have caught up, we can purge the soft-deleted records.
        
       | JohnBooty wrote:
       | I've been a software dev since the 90s and at this point, I've
       | learned to basically do things like audit trails and soft
       | deletion by default, unless there's some reason _not_ to.
       | 
       | Somebody _always_ wants to undelete something, or examine it to
       | see why it was deleted, or see who changed something, or blah
       | blah blah. It helps the business, it helps _you_ as developer by
       | giving you debug information as well as helping you to cover your
       | ass when you are blamed for some data loss bug that was really
       | user error.
       | 
       | Soft deletion has obvious drawbacks but is usually far less work
       | than implementing equivalent functionality out-of-stream, with
       | verbose logging or some such.
       | 
       | Retrofitting your app and adding soft deletion and audit trails
       | after the fact is usually an order of magnitude more work. Can
       | always add it pre-launch and leave it turned off.
       | 
       | If performance is a concern, this is usually something that can
       | be mitigated. You can e.g. have a reaper job that runs daily and
       | hard-deletes everything that was soft-deleted more than _n_ days
       | ago, or whatever.
        
         | rjzzleep wrote:
         | In rails you get these things for free. What I don't get is why
         | everyone rolls their own framework with node.js. It's basically
         | 90s PHP all over again.
         | 
         | EDIT: Soft delete is a trivial piece of code when the framework
         | has a well defined transaction system for its ORM. It's not
         | really related to Rails per se. Your statement is extremely
         | disingenuous, while trying to look smart. Audit trails _can_
         | be(but don't have to be) more complex, especially when the
         | framework uses a lot of stored procedures to handle operations.
         | But other than that these frameworks are specifically designed
         | to REDUCE complexity of such operations, dependency costs -
         | which are huge in node.js, specifically because you can mix and
         | match anything into everything.
         | 
         | Node.js people tend to stitch together XSS solutions, random
         | templating solutions based on their frontend work, even basic
         | salting of auth passwords becomes unpredictable because you
         | have 30 options on minimal auth libraries.
         | 
         | But yes nothing is ever free. If you want to use Rails you
         | still have to learn ruby and the framework and a basic
         | understand of how ActiveRecord builds queries if you want to be
         | writing performant code. And the same applies to Laravel,
         | Django, or whatever of the 50 patchwork node.js solutions you
         | want to base your code on.
        
           | joshmanders wrote:
           | I don't know why you're ragging on Node.js users or even PHP
           | for that matter as both ecosystems have this stuff covered
           | too.
           | 
           | Also you're comparing language/runtime with an actual
           | framework and then dogging those users...
           | 
           | If you want to compare Rails with Node/PHP then I'd suggest
           | comparing with things like Laravel (PHP), Adonis (Node) and
           | you'll find everything you can do in Rails is done in
           | Node/PHP too.
        
           | sky_rw wrote:
           | Nothing is free. In Rails you have _currently well maintained
           | libraries_ for this. There are still complexity costs,
           | dependency costs, data costs, performance cots, etc, etc,
           | etc.
        
         | gmiller123456 wrote:
         | The author uses the "no one ever undeleted anything" as the
         | primary justification. I think this is the part they miss. I've
         | never undeleted a user either, but there have been many times
         | I've gone back to look at something. Either a complaint finally
         | gets around to me as to why the user wanted their account
         | deleted (e.g. feature not working) and it helps to figure out
         | why. Or they're returning and want things set up like they
         | were. Or someone is taking over their roll and needs to be set
         | up like the last person who's already gone.
         | 
         | Though you really shouldn't be relying on a database for an
         | audit trail. It might help find some issues, but things
         | actually used for security shouldn't be writable so easily.
        
           | alerighi wrote:
           | It may be convenient, but under the GDPR is illegal. When an
           | user deletes an account, all the personal data associated
           | with that user must be deleted (or anonymize it in a way that
           | it's no longer possible to associate it back to the
           | particular user).
           | 
           | You cannot just keep user information forever "just in case"
           | they are useful again.
        
             | changoplatanero wrote:
             | User deleting an account is just one way that stuff gets
             | deleted. There are other deletion scenarios where it is
             | appropriate to keep the information after its deleted.
        
               | Akronymus wrote:
               | Yeah. Basically anything that an employee can mess up,
               | should be reversible IMO. But actions such as deleting an
               | account should have the option of "YES, DELETE IT
               | PERMANENTLY, THIS CAN'T BE UNDONE"
        
             | smolder wrote:
             | I wonder what percentage of companies who even _appear_ to
             | comply with deletion requests actually do full deletion in
             | practice. I suspect it 's small, knowing how many things
             | are coded to fake-delete for convenience. Businesses also
             | tend to keep cold data backups around. (Maybe backups are
             | exempted? I don't know.) There might even be cases where
             | ostensibly deleted data can still be recovered from a disk,
             | if they haven't overwritten.
        
             | dataflow wrote:
             | It's illegal but companies don't necessarily care to avoid
             | soft deletes regardless. I think companies wait to get sued
             | so they can try to argue in court why their soft deletions
             | are reasonable and why it's too technically difficult for
             | them to do hard deletes.
             | 
             | To be honest, in the age of modern overprovisioned storage
             | drives that remap blocks frequently, I'm not really sure
             | you _can_ implement genuine  "hard" deletes without
             | choosing significantly unorthodox hardware (or destroying a
             | drive every time you need a single bit erased), no matter
             | how much you want to in software. One of those details that
             | I'm both surprised and unsurprised doesn't seem to have
             | been addressed legally. I feel like a court ought to at
             | least buy this aspect of the argument, so maybe they'll buy
             | that it can be difficult in terms of the software too? Who
             | knows. My guess is that a reasonable court would
             | accommodate something that's reasonable for a given
             | company, but there are lots of variations that could fall
             | into that category.
        
             | afiori wrote:
             | I would guess that there are quite a few limits to that...
             | 
             | A user has a long history of participating on your forum
             | and other users have quoted their messages far and wide.
             | Collectively all of the messages posted on your forums
             | (with or without timestamps) reveal a few PII about the
             | user. Do you have to delete those?
             | 
             | The user filed a bug report about a functionality not
             | working, do you have to delete the text of the bug report?
             | 
             | Arguably if your user table look like [user_id,
             | creation_date, deletion_date, status, account_type] then
             | this table does not contain any PII.
             | 
             | Assuming that user content is not automatically PII, whose
             | responsibility is it to track where PII can be?
             | 
             | Sometimes users doxx themselves (like mistakenly sharing
             | tax return forms instead cat pics), in such a case it is
             | the user responsibility to signal this to you.
             | 
             | If the user filed an issue saying "when I insert my name
             | (Abe Cox) the input validation fails" is it you or the user
             | that need to read through all the issues to find this case?
             | 
             | My point is that GDPR + right to be forgotten cannot make
             | it look like you never had an account at all, especially
             | without user assistance.
        
             | 411111111111111 wrote:
             | Stop spreading misinformation. Nobody is required to hard
             | delete the very second the [delete] button is pressed.
             | You're following the law as long as that mentioned reaper
             | job exists and runs at least once per quarter (90 days)
        
           | Akronymus wrote:
           | I messed up a mass update query enough times to leave myself
           | SOME provision to undo it.
           | 
           | The exceptions are when there is a well tested query that
           | affects a single account or something. Like GDPR
        
         | wnevets wrote:
         | This is something that I was forced to learn the hard way more
         | than once. Literally today I needed to undelete a record
         | because a customer was confused by what the "delete" button did
         | and wanted their record back.
        
           | wst_ wrote:
           | Isn't it the problem of UI, though. If the user would be
           | informed about the consequences (possibly with bold red font
           | and with a confirmation checkbox) would they still click that
           | button?
        
             | thrashh wrote:
             | As a user, I would still sometimes and then regret my
             | decision
             | 
             | Soft deletions are awesome
        
             | robertlagrant wrote:
             | One will, one day.
        
             | wnevets wrote:
             | All of that exist.
        
             | doctor_eval wrote:
             | File this under "falsehoods programmers believe about
             | users": they act rationally.
        
         | pg_1234 wrote:
         | This 100%
        
         | nicoburns wrote:
         | +1 on audit trails. And one should always store audit trails in
         | machine readable format. That way you can not only manually
         | inspect what happened, but you can query it too (and
         | reconstruct the entire state as it existed in the past if
         | necessary).
        
         | pradn wrote:
         | Soft deletion is just one way to achieve undeletion. The
         | author's proposed solution of moving the resource to another
         | table works just as well. You can move it back to the non-
         | deleted table to perform the undeletion. You can keep around
         | these deleted objects as long as you want; they work as a
         | subset of a proper audit trail. The cost of course is you have
         | more tables, but that is less of a cost than having to add
         | "deleted=False" predicates in all of your queries.
         | 
         | Also note, if you use a soft-deleted column, indexes need to be
         | keyed by that column as well if you want to access non-deleted
         | objects quickly. That's extra complexity.
        
           | mjevans wrote:
           | Even more important; the deleted records don't need to live
           | in your cache / RAM / etc. Potentially faster queries.
        
           | smackeyacky wrote:
           | What seems to be missing here is the DB tech he is using. On
           | a proper database you can do your "undeleted" with triggers
           | and it's relatively trivial. Nonsense like a "deleted" column
           | on your main data table just seems silly.
        
             | robertlagrant wrote:
             | Triggers don't solve what the author's getting at, i.e. who
             | knows what else outside this database changes upon deletion
             | that would need to be reversed? It's all in the article.
        
             | icedchai wrote:
             | I worked at a place that kept all the deleted stuff in
             | their main tables. It turned out over 90% of the rows were
             | deleted. I'm not sure how often something was undeleted,
             | but it was not very frequent. Some of these soft deleted
             | rows were 5+ years old. Archive that crap.
        
         | augustiine wrote:
        
         | Scarbutt wrote:
         | With the "cloud" you can do PITR as a fork for most mainstream
         | databases, which is an enough solution for some.
        
         | efsavage wrote:
         | Also if people know that deletion is reversible, they're more
         | likely to actually do it, which can keep things generally
         | tidier.
         | 
         | I don't actually like using a "deleted" column, my standard
         | table has a status column, and deleted is one of those states,
         | along with active/pending/suspended/etc, as the needs dictate.
         | This way I get soft deletes for basically free both in the
         | schema, but also in the queries (which would generally default
         | to active), so it's not really the spaghetti that the author
         | discusses.
        
           | robertlagrant wrote:
           | That still has the same issues. You have to remember to set
           | every linked table's records to the same state, or remember
           | to query every linked table through the table that has the
           | lifecycle column on it.
        
         | augustl wrote:
         | This is why I don't understand why Datomic isn't more popular.
         | Pretty much every system I've worked on never needed to scale
         | past 100s of writes per second due to hard limits on the system
         | (internal backoffice stuff, fundamenally scoped/shardable to
         | defined regions, etc etc). And since Datomic is built with that
         | in mind, you get the trade-off of full history, first class
         | transactions and being able to query for things like "who
         | changes this attribute to its current value, when, and why" is
         | such as super power!
        
           | rjbwork wrote:
           | In the past I've used MSSQL's Temporal Tables (also called
           | System-Versioned Tables) to implement this kind of
           | functionality. This also gets you, for free, Type 2 SCD
           | functionality for OLAP-style queries.
           | 
           | I can't wait until Postgres has this kind of functionality
           | baked in. It's such a nice feature.
        
           | synthc wrote:
           | Datomic is great but i think its missing some features that
           | many enterprises need (access control and a query planner).
           | Also it seems to be mostly built for DynamoDB/AWS.
        
           | nightski wrote:
           | Personally looking at their pricing since it is so tied to
           | AWS it is completely non-transparent how much it's going to
           | cost us now or in the future.
           | 
           | I really like the concept of datomic though.
        
           | Scarbutt wrote:
           | Too niche of a technology, tied to clojure, not open source
           | and very slow(doesn't matter for most internal apps though).
           | For many, it's better to do the tedious thing here and there
           | with postgres. SQL also has strong grip on databases and if
           | you look at it the other way around, postgres has lots of
           | features that datomic lacks, with datomic you almost always
           | need a secondary database.
        
           | tsuujin wrote:
           | It's hard to get adoption for expensive toys.
           | 
           | I think Datomic is neat, and I'd like to use it, but it is
           | prohibitively expensive for a personal or hobby project.
           | Personal projects are where I get excited about tech, and
           | when I'm excited I'm more likely to adopt it in my day job.
           | 
           | They're really shooting themselves in the foot by not having
           | a one-click install free tier or a self hosted option.
        
             | marcofiset wrote:
             | You may want to look into XTDB then. Not quite the same as
             | Datomic, but they share many similarities. It's free and
             | open-source.
        
               | tsuujin wrote:
               | Neat, thanks!
        
           | th0ma5 wrote:
           | For me it is simply that isn't open source (at least last I
           | checked.)
        
         | corrral wrote:
         | I'd add tagging, for anything that could conceivably use it,
         | when you're doing DB design. May as well start with support,
         | even if the functionality's initially dormant. Someone _will_
         | ask for it, directly or indirectly, and it won 't take long
         | before they do.
        
       | pgt wrote:
       | XTDB: xtdb.com
        
       | jandrewrogers wrote:
       | The complexity of soft deletes is that they implicitly introduce
       | the difficult semantics of bi-temporality into the data model,
       | typically without the benefit of a formal specification that
       | minimizes the number of edge cases that have to be dealt with.
       | 
       | Mechanically, I've typically supported soft deletes with audit
       | tables that shadow the primary table, with a bunch of automation
       | in the database to make management mostly automagic. It isn't too
       | bad in PostgreSQL.
        
       | Minor49er wrote:
       | It's interesting that the author notes that, as far as he's
       | aware, nobody's ever undeleted something. It could be true. But
       | I'm wondering if maybe he simply hasn't seen it first-hand since
       | the action of recovering something is often handled by a
       | customer-facing team and not by a developer.
        
       | wizofaus wrote:
       | The assumption seems to be that the undelete operation is
       | performed by the vendor's support staff, rather than the end
       | user. I've been involved in the implementation/ maintenance of
       | systems with soft delete that was entirely for that purpose - it
       | allowed the user to delete/undelete at will. In our case it also
       | meant certain uniqueness constraints were kept in place
       | effectively reserving things like email addresses or business
       | registration numbers that couldn't be reused until a hard delete
       | was issued. Arguably it's more like an "is active" flag in such a
       | case, but it's debatable what the distinction is.
        
       | vivegi wrote:
       | If you do want to retain the deleted records for any purpose
       | (audit, compliance etc.,) it is better to design a DELETED table
       | to maintain the history (just as suggested in the article towards
       | the end).
       | 
       | Once your main tables start getting to the order of tens of
       | millions of records, the filtering by 'deleted_at is NULL' or
       | 'deleted_at is NOT NULL' gets in the way of query performance.
       | 
       | NULL is also not indexed. So, that throws the spanner in the
       | works sometimes (depending on the query).
        
       | danielrhodes wrote:
       | In a previous place I worked, we were programmatically using Box
       | to store files. One day we were presented with a case study in
       | Murphy's Law: a script went awry and deleted everything (10s of
       | thousands of files). There was no clear way to recover these
       | files, they were gone from what we could see. It was a disaster.
       | We got a Box support person on the phone and described what had
       | happened. There was a pause, some mouse clicking and then: "Ok,
       | those files will be back in your account in an hour."
       | 
       | It was 100% our fault. But soft deletes saved us that day. If
       | you're in a situation where you or your customers could benefit
       | from the same, it's wise to not only embrace them but also make
       | sure they work.
        
         | latchkey wrote:
         | That sounds more like a lack of backups and disaster recovery
         | than it does soft deletes.
        
         | pradn wrote:
         | The author agrees with you in principle. All the author is
         | arguing against is the use of "deleted" bool column to indicate
         | deletion. His solution of moving deleted objects to their own
         | column gives you the ability to un-delete, just as before. Only
         | now, your queries and indexes are simpler and you get to use
         | foreign keys and other useful futures.
        
       | jtwebman wrote:
       | The bigger reason to use soft deletes is to keep history. Just
       | because someone does not access doesn't mean we should report on
       | the things they did months ago.
        
       | Ensorceled wrote:
       | I use soft deletes in our system and literally used it to restore
       | an accidentally deleted item about 3 hours ago. Took a second to
       | toggle the deleted item.
       | 
       | I don't get how this rocket science. Almost every query in the
       | system is some kind of where clause on a fk to account or user or
       | project or some other critical object ... so there are only a few
       | places in the ORM where I need to support this.
        
       | vyrotek wrote:
       | I've found SQL Server Temporal Tables are a good alternative to
       | get the benefits of soft-deletes without some of the drawbacks.
       | 
       | https://docs.microsoft.com/en-us/sql/relational-databases/ta...
        
         | tfigment wrote:
         | Mysql also has this now. I've wanted to rewrite out apps to use
         | it but haven't gotten around to it. Postgres has it as an addon
         | but feels like it wouldn't work for us until its first class
         | supported.
        
           | evanelias wrote:
           | MariaDB has this -- called system-versioned tables -- but
           | MySQL actually does not. Although they share a common
           | lineage, MySQL and MariaDB are somewhat distinct databases at
           | this point, with each one having a number of features that
           | the other lacks.
        
           | nicoburns wrote:
           | I believe first-class support is in development for Postgres.
           | The article I read made it sound like it would probably land
           | in either the next major version, or the one after that.
        
       | AdrianB1 wrote:
       | Soft deletes are really worth in the right scenario. There are
       | cases when they can be avoided, cases when they are not worth and
       | cases when they are worth, for the problems presented in the
       | article there are solutions or workarounds.
        
       | ajuc wrote:
       | If you need this why reimplement it when you can use database
       | history (dbms_flashback or SELECT AS OF in Oracle)?
        
       | 202206241203 wrote:
       | It's something that a team of a PM, a QA and two developers can
       | bill for at least a sprint. So, well worth it.
        
       | gigatexal wrote:
       | There's a lot wrong with this write up. Why would anyone want to
       | delete corresponding invoices when you "delete" the corresponding
       | user? And GDPR provides a caveat that if you need the data for a
       | biz usecase like legacy reporting you can keep the data (I think
       | it has to be masked or something but it's not insane to say you
       | must delete data on request that could materially affect a
       | company like removing transactions).
       | 
       | Just put a filtered index on the column to better query non
       | deleted data.
       | 
       | On the whole I don't think in practice the author's take makes
       | much sense.
        
       | justin_oaks wrote:
       | For those who are expressing favor with soft deletes, do you
       | default to soft deletes on every table unless you know you won't
       | need them? Or do you only apply them where you know you'll need
       | them?
       | 
       | I think people arguing for and against soft deletes both
       | understand that there are cases where you want to use them and
       | when you don't.
        
         | baq wrote:
         | soft delete everywhere by default. true deletes only after
         | retention policy expires, if FK constraints allow it (best if
         | you can drop whole partitions).
        
       | mrinterweb wrote:
       | For audit trails in rails, I still like papertrail.
       | https://github.com/paper-trail-gem/paper_trail. It provides the
       | ability to restore records as well as auditing abilities.
        
       | muhaaa wrote:
       | Always use a temporal database (datomic, postgres with
       | temporal_tables extension). You get out of the box the full
       | history of your data. That is really helpful for business
       | intelligence and analytics, auditing / audit log (security,
       | accountability), live sync & real-time features and as a bonus
       | easy recovery after application fails.
       | 
       | If disk gets to full, project the latest time slice into a new
       | database and move the old database onto a cold storage.
        
       | sam_lowry_ wrote:
       | I do not understand the foreign keys issue. Do not use the
       | _deleted_at_ timestamp that is nullable by default. Instead,
       | nullify the field when the line is deleted. Foreign keys on NULL
       | values will be possible.
       | 
       | In any case, soft deletion is usually a sign of incompetence.
       | Whenever I saw it on a project, both soft deletion and the
       | project turned sour.
        
       | nwah1 wrote:
       | If you have a lot of stored procs then the argument makes some
       | sense. If you do most things in code, then I would argue these
       | complaints are moot.
       | 
       | In your code you can isolate all soft-deleting from business
       | logic in the ORM layer or data layer, so the complaint about
       | littering your codebase is moot for me. For instance, using
       | Entity Framework, you can change deletes to soft deletes in a
       | centralized place for all records matching a particular
       | interface, then add a query filter that applies in the background
       | for all queries.
       | 
       | The complaint that soft deleting is never done is maybe valid
       | since you can review things with audit logging or backups without
       | risking unknown effects of an un-delete. But if you need a
       | recycle bin feature then you get that for free if you just build
       | that in from the start, and it is one more guarantee.
       | 
       | The risk of orphaned records is real, although you could probably
       | handle most cases generically in the data layer or ORM as well.
       | It seems like there's just tradeoffs to the various approaches.
       | Do you want to err on the side of deleting data, or on the side
       | of keeping it? Do you worry more about orphaned records or data
       | loss?
        
       | [deleted]
        
       | openthc wrote:
       | I use a delta-log table, so each INSERT/UPDATE/DELETE on objects
       | I care about are captured (via trigger) -- but that one has to
       | get date partitioned. So in my system a DELETE statement (and
       | DELETE CASCADE) work as expected -- any history has to be
       | discovered from the logs
        
       | timomax2 wrote:
       | We just have a history table (for each table) where all deleted
       | and past versions of record are stored. Seems to solve all the
       | issues. The history table is NOT part of the application, but is
       | there for audit and diagnostics etc.
        
       | [deleted]
        
       | ivank wrote:
       | https://github.com/xocolatl/periods implements SYSTEM VERSIONING
       | for PostgreSQL and moves deleted rows to a history table.
        
       | llimos wrote:
       | Do any databases let you refer to constant values in foreign
       | keys?
       | 
       | Then you could do                 FOREIGN KEY (foreign_id, NULL)
       | REFERENCES foreign_table(id, deleted_at)
        
         | munk-a wrote:
         | I don't believe that's possible in postgres at least - but I
         | don't think it's a huge concern either - you can have
         | deleted_at cascade via trigger or just use views to hide the
         | data - both are extremely easy to implement at the DB level
         | without the application devs ever needing to worry about what's
         | what.
        
       | agentultra wrote:
       | Soft deletion by storing the row in JSON won't survive months of
       | schema migrations. If restoring a record is rare you don't want
       | to have to find out that there's no way to map the old data to
       | the new table when it matters.
       | 
       | There are cases where you shouldn't be deleting or updating data;
       | auditable and non-repudiation systems for some regulatory
       | compliance come to mind. Best to use patterns that don't require
       | those operations.
       | 
       | Soft deletion does come at a cost. Choose carefully!
        
       | empiko wrote:
       | The "Code leakage" problem can easily be solved by using views.
       | Or am I missing something?
        
         | dboreham wrote:
         | Solved with "...and deleted = false"
        
       | krascovict wrote:
       | If it's the case of deleting files safely, I recommend shared,
       | it's very good...
       | 
       | https://wiki.archlinux.org/title/Securely_wipe_disk
        
       | Smoosh wrote:
       | DB2 has implemented temporal tables which can automatically
       | capture all changes to the primary table.
       | 
       | https://www.ibm.com/docs/en/db2/10.1.0?topic=tables-history
        
       | jacksnipe wrote:
       | The ONLY reason that you should avoid soft deletion is that
       | deleting things permanently in a soft-deletion-based system is
       | hard and error prone.
       | 
       | GDPR, among other regulations, requires that you be able to do
       | this sometimes; and it requires that the data REALLY BE GONE.
       | 
       | But I really think that soft deletion should be the default
       | unless you think you'll be fielding user data deletion requests.
        
       ___________________________________________________________________
       (page generated 2022-07-19 23:00 UTC)