[HN Gopher] Idempotence Now Prevents Pain Later
       ___________________________________________________________________
        
       Idempotence Now Prevents Pain Later
        
       Author : zdw
       Score  : 95 points
       Date   : 2021-04-07 15:37 UTC (1 days ago)
        
 (HTM) web link (ericlathrop.com)
 (TXT) w3m dump (ericlathrop.com)
        
       | ibirman wrote:
       | Watch out for edge cases. What happens to accounts that change at
       | 11:35PM on the last day of the month?
        
       | bob1029 wrote:
       | I feel like immutability and a 100% leak-proof layer of domain
       | methods which in turn manage domain mutations would ultimately
       | bring more value than explicitly adding idempotence throughout.
       | 
       | If I have an idempotent method like "CreateCustomerRecord", this
       | can cause a lot of pain for audit features and other aspects of
       | the domain model if it is internally making determinations about
       | whether to actually create or silently skip creation. For me, I
       | would much rather that the method throw an exception if there is
       | a duplicate business key than have it silently complete without
       | taking any actual action. Exceptions indicating attempts at
       | invalid state transitions can be extremely valuable if you have
       | the discipline to create & use them properly.
       | 
       | Generally, seeking idempotence in otherwise mutable methods is a
       | band-aid for when you have broken immutability rules and allowed
       | things to leak out of the sacred garden of unit-tested state
       | machines and other provably-correct code items.
       | 
       | If you should only conditionally execute some method, perhaps the
       | solution is to investigate the caller(s) of the method, rather
       | than attempt to infer the intent of all possible callers within
       | the method itself.
        
         | eweise wrote:
         | You have the caller pass in a unique ID to the
         | CreateCustomerRecord. Don't create a new customer record if you
         | receive a duplicate ID.
        
         | jayd16 wrote:
         | This can create false negatives when a request must be retried
         | due to network failure but actually succeeded because the
         | failure was during the response.
         | 
         | Idempotency is great for "debouncing" requests. If you want to
         | tell difference between identical requests that are different
         | transactions, add a unique transaction id of some kind.
        
           | adrusi wrote:
           | There's a place for idempotency tokens. They're relatively
           | easy to retrofit onto old systems, and occasionally they are
           | the best way to go about making changes idempotent, but they
           | should be a red flag - an indication that you should step
           | back and see if maybe you can redesign an API to make
           | idempotency a natural guarantee rather than something you
           | artificially strap on with a token. As a rule of thumb, I
           | would always mention the idea of adding an idempotency token,
           | and prompt for alternatives, with all stakeholders present.
        
       | smiley1437 wrote:
       | I had only ever encountered idempotence in the context of system
       | management (Ansible, Puppet, Chef, etc)
       | 
       | Article made me think it's actually applicable to other
       | management as well
        
         | void_mint wrote:
         | PUT requests are intended to be idempotent, which is one of the
         | things that distinguishes them from POST requests. This (in my
         | experience) is most non-CS-backgrounded software developers'
         | exposure to idempotence, but it actually has tons of value
         | pretty much wherever you can apply it. The ability to
         | (sometimes accidentally) do a thing twice and have it leave no
         | unintended consequence is huge.
         | 
         | UPSERTs can be idempotent as well. "If this doesn't exist,
         | create it, and if it does, update it to match this state",
         | implies that running it twice will leave no unintended side
         | effects.
        
         | Animats wrote:
         | It's a basic property of HTTP "GET". Or it's supposed to be.
         | "GET" is not supposed to change server site state. That's what
         | "POST" is for. This matters if there's a cache in the middle,
         | since caches tend to assume that GET requests are idempotent
         | and can be served from cache. Cloudflare assumes that. POST
         | requests have to go through to the real server.
        
           | elcomet wrote:
           | Get is idempotent because it's the identity function. It does
           | not change the state of the data. So it's a trivial case of
           | idempotency.
           | 
           | A more interesting function is PUT (idempotent) vs POST (not
           | idempotent)
        
       | Justsignedup wrote:
       | I've done this exact thing many times before...
       | 
       | I can honestly say that Eric is 100% right with his approach. It
       | always leads to less headaches, more flexibility (oh trust me,
       | someone is always gonna have a "but... there's like a special
       | thing that I sometimes have to do" and it breaks some
       | assumptions.
       | 
       | In any case... yeah... let's just say any time you have to be
       | worried "did we already schedule this", really think "can this
       | never care if it was or not? Should be always safe to schedule it
       | again"
        
       | brsg wrote:
       | Idempotency is a pretty critical concept in system design, and I
       | think most developers have run into issues related to it even if
       | they aren't directly familiar with the term.
       | 
       | To give another simple example as the OP - Suppose you have a
       | product that relies on time series data. For demo purposes you
       | might create a curated data set to present to clients, but the
       | presenter doesn't want to show data from 2019 as the "most
       | recent"
       | 
       | Naturally, you decide to write a script. Do you
       | 
       | A) Write as script that moves the data forward by 1 week
       | explicitly, and simply run this once per week or
       | 
       | B) Write a script that compares the current date to the data and
       | moves it forward as much as it needs
       | 
       | At first glance, these two approaches work the same, but what if
       | (A) triggers twice? What if it runs once every 6 days by mistake?
       | (B) is idempotent however - subsequent executions won't change
       | the state. It's usually impossible to predict all of the ways
       | that software breaks, but designing with idempotency in mind
       | eliminates a lot of them.
        
         | jayd16 wrote:
         | I don't think B is technically idempotent either. Change still
         | occurs but with minimal difference. You cannot cache the
         | results and use them again next week.
         | 
         | An idempotent change would be to pass in the current time
         | instead of checking system time. In this case, as long as the
         | input is the same, the result is the same. You could use cached
         | results, but most likely you want to use new inputs.
        
         | pbreit wrote:
         | The idempotency I've seen is usually an unnecessary extra
         | complexity.
        
           | jacobsenscott wrote:
           | If you design for it from the start it makes your system much
           | less complex. Consider all the errors, special cases, and
           | ultimately data cleanup you need to handle about if your
           | transactions are not idempotent. Idempotency is table stakes
           | for any production app.
        
       | mrbadideas wrote:
       | Is that really idempotence?
        
       | omarhaneef wrote:
       | I'm assuming a lot of people click on it to see what the word
       | Idempotence means. From the article:
       | 
       | "Idempotence is the property of a software that when run 1 or
       | more times, it only has the effect of being run once."
       | 
       | And the example is, instead of a chron job just running a process
       | once a month or on some other schedule, it runs more frequently
       | but checks if the change has already been made.
       | 
       | (From the latin Idem which means "same" and potence is of course
       | power/potent, so it has the same power/effect however many times
       | you run it)
        
         | bobbylarrybobby wrote:
         | When writing a Jupyter notebook, always try to make your cells
         | idempotent. You'll save yourself a lot of headache down the
         | line.
        
         | throwawayboise wrote:
         | One of my first jobs was at an investment bank. They had a lot
         | of programs that ran overnight, in a batch fashion. Everything
         | had to be done before the markets opened the next morning. The
         | term they used for idempotency was "free rerun." Being able to
         | rerun any program with no special setup work was a high
         | priority.
         | 
         | The value in programs being a "free rerun" was that every so
         | often the program would barf on a bad bit of data in a record.
         | 
         | The programming environemnt was interpreted BASIC, so if an
         | error occurred the program would print a message on the console
         | and drop to an interactive prompt.
         | 
         | The operators running the batch schedule would see this and
         | call the programmer on call for that night. You'd log in (over
         | dial up at this time) and attach to the process, look at the
         | error, figure out what went wrong, either correct the data or
         | (more likely) skip the record and deal with it the next day. It
         | was more important to have the programs finish on time;
         | individual issues could be dealt with later.
         | 
         | Often you could just start up the program from where it left
         | off, but if things were more screwed up it was important to be
         | able to re-run it without any negative consequence.
         | 
         | Edit: this was ~30 years ago, so my point is that it's not any
         | kind of new idea or something that wasn't recognized long ago.
        
           | omarhaneef wrote:
           | I hope this example makes it evident that one of the primary
           | innovations of the last 30 years is defaulting to Latin terms
           | so that they are taken more seriously in business and
           | technology circles to acquire ... you know... gravitas.
        
           | 6t6t6t6 wrote:
           | I used to be an operator in night shift in my twenties and
           | the job was exactly how you said. Good memories. Lots of
           | sleeping at work and some days of panic when shit broke.
           | 
           | And a lot of "secret" scripts that automated a big part of
           | our job.
        
         | treve wrote:
         | > And the example is, instead of a chron job just running a
         | process once a month or on some other schedule, it runs more
         | frequently but checks if the change has already been made.
         | 
         | As a property, I think it's even nicer if a script can
         | literally fully run twice and for the outcome to be the same if
         | it only ran once (so skipping the 'did I run before?' check).
         | 
         | Even though this check is useful in general, if you can define
         | your data in such a way if it _did_ somehow run, that this is
         | not destructive  / creates incorrect data, it makes the system
         | more robust.
         | 
         | Of course this is not always possible though. For example, if
         | the process results in an email being sent, you need an
         | explicit check to not do that twice.
        
           | gen220 wrote:
           | In situations like these, it's a legitimate goal to implement
           | an idempotent, or "functional" core.
           | 
           | So the goal of your functional core is to fully construct the
           | email, and return it to the caller, who then has the choice
           | to send the email, print it, write it to disk, etc.
           | 
           | The program you deploy looks like this
           | 
           | EmailSender().send_email(construct_email(args))
           | 
           | You can test by implementing a "safe" EmailSender interface,
           | so that you're executing the same code that's in prod.
           | 
           | In general, if a job/function is mutating state deep in the
           | syntax tree (i.e. sending emails in the middle of a batch
           | job), I personally see that as a violation of the Single
           | Responsibility Principle.
        
         | sdenton4 wrote:
         | Mathematically, it's x^2 = x, which implies x^n = x for all
         | positive integers n.
         | 
         | Nilpotence (x^2 = 0) is also very helpful some times: it's a
         | process which is self-reversing. Like the discrete Fourier
         | transform (if you set up the constants properly).
        
           | elcomet wrote:
           | self-reversing is not nilpotence.
           | 
           | In mathematics, a self-reversing function is called an
           | involution, and it's f^2 (or f(f) ) = Id, the identity
           | function.
           | 
           | Nilpotence is very different. It says that if you apply your
           | function a certain number of times, you end up with zero no
           | matter what the input is. For example, projection on x axis +
           | 90 deg rotation of a vector is nilpotent.
        
           | carreau wrote:
           | No, you are confusing with involution. 1/x is an involution.
           | Symetries are often involutions.
           | 
           | Squaring a upper triangular matrix with 0 on the diagonal is
           | nilpotent. Derivatiting a polynomial of degree N is nilpotent
           | after N iteration.
        
           | contravariant wrote:
           | Careful, for nilpotence the power doesn't have to be 2.
           | 
           | Also you may be confusing it with x^n = 1 (which I'm not sure
           | how to name, 'root of unity' perhaps). This would be the case
           | for the Fourier transform (with n=4).
           | 
           | If x^2 = 0 then applying the Fourier transform twice would
           | null your function, which isn't the case.
        
           | pdpi wrote:
           | X^2 is a weird way to describe it. A function f is idempotent
           | if f(f(x)) = f(x).
        
             | creata wrote:
             | It's not _that_ weird. People often write iterated
             | composition as f^k, and this is especially true with
             | matrices, where composition and multiplication mean the
             | same thing.
        
             | corty wrote:
             | It is quite common in some fields. Operator application is
             | written without parentheses, and functions are a kind of
             | operator. Therefore:
             | 
             | f(x) = f(f(x)) = f f x = f^2 x = f x
             | 
             | And leaving out the x, because it is just a placeholder
             | anyways:
             | 
             | f f = f^2 = f
             | 
             | And of course this means that
             | 
             | f^n = f because f^n-1 f = f^n-1 by induction.
        
       | globular-toast wrote:
       | I try to write idempotent software whenever I can. It's usually
       | not much more difficult to make it work and affords so much more
       | flexibility and less worry when it's done.
        
       | staticassertion wrote:
       | https://lostechies.com/jimmybogard/2013/06/06/acid-2-0-in-ac...
       | 
       | If you can build a system with ACID 2.0 life gets really easy.
       | You can reason about your system without worrying about ordering,
       | time, 'exactly once' semantics, etc.
       | 
       | Idempotency is usually one of the simplest pieces to implement,
       | and you definitely get a ton of benefit right off the bat - it's
       | worth designing systems from scratch with it in mind.
        
       | [deleted]
        
       | firebaze wrote:
       | We had one "special" team member who insisted on everything being
       | idempotent.
       | 
       | This was his only leading principle. Result: absolute chaos - the
       | code aspired to be idempotent, but due to idempotency he avoided
       | thinking problems through and just created a mess of individual
       | functions - each being idempotent, aside from the unavoidable
       | bugs - which didn't form a coherent flow at all.
       | 
       | We did a major refactoring, threw out about all that code,
       | rewrote everything in a logical manner. Now everything is still
       | idempotent, but comprehensible.
       | 
       | TLDR: idempotency is the same snakeoil as the majority of guiding
       | principles: alone, it doesn't help at all. There are lots of
       | other factors to consider, which make the developer/architect
       | role demanding (and fun).
       | 
       | Craftmanship at least, a sense for architecture (better) or
       | understanding the whole picture of the requirements as a team of
       | developers (best) is still required.
        
         | longhairedhippy wrote:
         | I don't see this as any reflection on idempotency as principle
         | (or other principles in general). Building systems poorly,
         | without a plan, and no testing, will result in a bug-riddled
         | mess, regardless of what pattern is being used.
        
         | smitty1e wrote:
         | Sure, one little hobby horse, e.g. "inversion of control" can
         | run amok to negative effect (looking at you, Java projects with
         | object traces 75 layers deep) but that doesn't make idempotency
         | or inversion of control into bad ideas.
         | 
         | A bit of pragmatism goes a long way, like Python's odd
         | 
         | x = (some_tuple,)
         | 
         | . . .syntax amidst its generally clean approach.
         | 
         | Inflexibility itself is the bugaboo.
        
         | spaetzleesser wrote:
         | That is a general problem with any principle used as rigid
         | ideology. Almost very principle becomes a problem if applied
         | too dogmatically . This applies to software dev but also others
         | like politics or economics.
        
       | telekid wrote:
       | In general, you should be thinking about the delivery semantics
       | of the systems calling your code. Many very useful callers offer
       | "at least once" delivery guarantees, implying that your system
       | should behave idempotently to their calls.
        
       | dang wrote:
       | Possibly related past threads:
       | 
       |  _What Is Idempotence?_ -
       | https://news.ycombinator.com/item?id=19570815 - April 2019 (51
       | comments)
       | 
       |  _Idempotence: What is it and why should I care?_ -
       | https://news.ycombinator.com/item?id=17804617 - Aug 2018 (73
       | comments)
       | 
       |  _You know how HTTP GET requests are meant to be idempotent?_ -
       | https://news.ycombinator.com/item?id=16964907 - May 2018 (304
       | comments)
       | 
       |  _Implementing Stripe-Like Idempotency Keys in Postgres_ -
       | https://news.ycombinator.com/item?id=15569478 - Oct 2017 (41
       | comments)
       | 
       |  _APIs, robustness, and idempotency_ -
       | https://news.ycombinator.com/item?id=13707681 - Feb 2017 (50
       | comments)
       | 
       |  _A simple distributed algorithm for small idempotent
       | information_ - https://news.ycombinator.com/item?id=7276491 - Feb
       | 2014 (14 comments)
       | 
       |  _Idempotent Web APIs: What benefit do I get?_ -
       | https://news.ycombinator.com/item?id=5662138 - May 2013 (53
       | comments)
       | 
       | A word like that is particularly easy to search for:
       | 
       | https://hn.algolia.com/?dateRange=all&page=0&prefix=true&que...
        
       | tigger0jk wrote:
       | > 1. Query the database to find all dormant accounts with a
       | balance, which haven't been charged the fee this month.
       | 
       | > 2. Charge each of these accounts a fee
       | 
       | > 3. Setup a cron job to run this every hour
       | 
       | Note that if this job ever runs successfully, but takes more than
       | an hour, you will double-count. Can easily happen if the box
       | running these crons is overloaded. One fix is to automatically
       | halt the job after 55 minutes, another would be to have the
       | middle step be impotent, for each user you're doing the process
       | on, ensure (ideally in a threadsafe manner) that they need the
       | operation to be done still.
        
         | alex_young wrote:
         | Sounds like a good reason to use a pidfile or mutex so you can
         | eliminate the possibility of any concurrent jobs.
        
       | jchw wrote:
       | This is good but not enough. You also need to be sure that you
       | can't charge twice if the job runs twice. When you do that same
       | query twice, you will get the same list of users. This could be
       | done by exploiting database consistency rules, like using
       | strongly isolated transactions. One simple more general approach
       | is to use an idempotence token. You could, say, have a table with
       | a uniqueness constraint, and generate IDs that will match for the
       | same user in the same month. Then add that in the same
       | transaction that subtracts the money. The table could be cleaned
       | up periodically.
       | 
       | If you're making or using an API where repeating would be bad,
       | consider using idempotency tokens for those too. I believe Stripe
       | supports them. The basic idea is the same: if you pass a token
       | into them, they will guarantee that in a certain time frame, no
       | other requests with that ID can be duplicated. This is useful
       | when the network flakes during the response. Is it safe to retry?
       | 
       | Things get trickier when you combine network and database
       | consistency measures; that's when you get into locks and multi
       | stage commit and etc. and it helps to know your database's
       | consistency model, since it's often not as solid as you think!
       | (In the past, even PostgreSQL had issues with providing
       | serializable isolation.)
        
       ___________________________________________________________________
       (page generated 2021-04-08 23:00 UTC)