[HN Gopher] Idempotence Now Prevents Pain Later ___________________________________________________________________ Idempotence Now Prevents Pain Later Author : zdw Score : 95 points Date : 2021-04-07 15:37 UTC (1 days ago) (HTM) web link (ericlathrop.com) (TXT) w3m dump (ericlathrop.com) | ibirman wrote: | Watch out for edge cases. What happens to accounts that change at | 11:35PM on the last day of the month? | bob1029 wrote: | I feel like immutability and a 100% leak-proof layer of domain | methods which in turn manage domain mutations would ultimately | bring more value than explicitly adding idempotence throughout. | | If I have an idempotent method like "CreateCustomerRecord", this | can cause a lot of pain for audit features and other aspects of | the domain model if it is internally making determinations about | whether to actually create or silently skip creation. For me, I | would much rather that the method throw an exception if there is | a duplicate business key than have it silently complete without | taking any actual action. Exceptions indicating attempts at | invalid state transitions can be extremely valuable if you have | the discipline to create & use them properly. | | Generally, seeking idempotence in otherwise mutable methods is a | band-aid for when you have broken immutability rules and allowed | things to leak out of the sacred garden of unit-tested state | machines and other provably-correct code items. | | If you should only conditionally execute some method, perhaps the | solution is to investigate the caller(s) of the method, rather | than attempt to infer the intent of all possible callers within | the method itself. | eweise wrote: | You have the caller pass in a unique ID to the | CreateCustomerRecord. Don't create a new customer record if you | receive a duplicate ID. | jayd16 wrote: | This can create false negatives when a request must be retried | due to network failure but actually succeeded because the | failure was during the response. | | Idempotency is great for "debouncing" requests. If you want to | tell difference between identical requests that are different | transactions, add a unique transaction id of some kind. | adrusi wrote: | There's a place for idempotency tokens. They're relatively | easy to retrofit onto old systems, and occasionally they are | the best way to go about making changes idempotent, but they | should be a red flag - an indication that you should step | back and see if maybe you can redesign an API to make | idempotency a natural guarantee rather than something you | artificially strap on with a token. As a rule of thumb, I | would always mention the idea of adding an idempotency token, | and prompt for alternatives, with all stakeholders present. | smiley1437 wrote: | I had only ever encountered idempotence in the context of system | management (Ansible, Puppet, Chef, etc) | | Article made me think it's actually applicable to other | management as well | void_mint wrote: | PUT requests are intended to be idempotent, which is one of the | things that distinguishes them from POST requests. This (in my | experience) is most non-CS-backgrounded software developers' | exposure to idempotence, but it actually has tons of value | pretty much wherever you can apply it. The ability to | (sometimes accidentally) do a thing twice and have it leave no | unintended consequence is huge. | | UPSERTs can be idempotent as well. "If this doesn't exist, | create it, and if it does, update it to match this state", | implies that running it twice will leave no unintended side | effects. | Animats wrote: | It's a basic property of HTTP "GET". Or it's supposed to be. | "GET" is not supposed to change server site state. That's what | "POST" is for. This matters if there's a cache in the middle, | since caches tend to assume that GET requests are idempotent | and can be served from cache. Cloudflare assumes that. POST | requests have to go through to the real server. | elcomet wrote: | Get is idempotent because it's the identity function. It does | not change the state of the data. So it's a trivial case of | idempotency. | | A more interesting function is PUT (idempotent) vs POST (not | idempotent) | Justsignedup wrote: | I've done this exact thing many times before... | | I can honestly say that Eric is 100% right with his approach. It | always leads to less headaches, more flexibility (oh trust me, | someone is always gonna have a "but... there's like a special | thing that I sometimes have to do" and it breaks some | assumptions. | | In any case... yeah... let's just say any time you have to be | worried "did we already schedule this", really think "can this | never care if it was or not? Should be always safe to schedule it | again" | brsg wrote: | Idempotency is a pretty critical concept in system design, and I | think most developers have run into issues related to it even if | they aren't directly familiar with the term. | | To give another simple example as the OP - Suppose you have a | product that relies on time series data. For demo purposes you | might create a curated data set to present to clients, but the | presenter doesn't want to show data from 2019 as the "most | recent" | | Naturally, you decide to write a script. Do you | | A) Write as script that moves the data forward by 1 week | explicitly, and simply run this once per week or | | B) Write a script that compares the current date to the data and | moves it forward as much as it needs | | At first glance, these two approaches work the same, but what if | (A) triggers twice? What if it runs once every 6 days by mistake? | (B) is idempotent however - subsequent executions won't change | the state. It's usually impossible to predict all of the ways | that software breaks, but designing with idempotency in mind | eliminates a lot of them. | jayd16 wrote: | I don't think B is technically idempotent either. Change still | occurs but with minimal difference. You cannot cache the | results and use them again next week. | | An idempotent change would be to pass in the current time | instead of checking system time. In this case, as long as the | input is the same, the result is the same. You could use cached | results, but most likely you want to use new inputs. | pbreit wrote: | The idempotency I've seen is usually an unnecessary extra | complexity. | jacobsenscott wrote: | If you design for it from the start it makes your system much | less complex. Consider all the errors, special cases, and | ultimately data cleanup you need to handle about if your | transactions are not idempotent. Idempotency is table stakes | for any production app. | mrbadideas wrote: | Is that really idempotence? | omarhaneef wrote: | I'm assuming a lot of people click on it to see what the word | Idempotence means. From the article: | | "Idempotence is the property of a software that when run 1 or | more times, it only has the effect of being run once." | | And the example is, instead of a chron job just running a process | once a month or on some other schedule, it runs more frequently | but checks if the change has already been made. | | (From the latin Idem which means "same" and potence is of course | power/potent, so it has the same power/effect however many times | you run it) | bobbylarrybobby wrote: | When writing a Jupyter notebook, always try to make your cells | idempotent. You'll save yourself a lot of headache down the | line. | throwawayboise wrote: | One of my first jobs was at an investment bank. They had a lot | of programs that ran overnight, in a batch fashion. Everything | had to be done before the markets opened the next morning. The | term they used for idempotency was "free rerun." Being able to | rerun any program with no special setup work was a high | priority. | | The value in programs being a "free rerun" was that every so | often the program would barf on a bad bit of data in a record. | | The programming environemnt was interpreted BASIC, so if an | error occurred the program would print a message on the console | and drop to an interactive prompt. | | The operators running the batch schedule would see this and | call the programmer on call for that night. You'd log in (over | dial up at this time) and attach to the process, look at the | error, figure out what went wrong, either correct the data or | (more likely) skip the record and deal with it the next day. It | was more important to have the programs finish on time; | individual issues could be dealt with later. | | Often you could just start up the program from where it left | off, but if things were more screwed up it was important to be | able to re-run it without any negative consequence. | | Edit: this was ~30 years ago, so my point is that it's not any | kind of new idea or something that wasn't recognized long ago. | omarhaneef wrote: | I hope this example makes it evident that one of the primary | innovations of the last 30 years is defaulting to Latin terms | so that they are taken more seriously in business and | technology circles to acquire ... you know... gravitas. | 6t6t6t6 wrote: | I used to be an operator in night shift in my twenties and | the job was exactly how you said. Good memories. Lots of | sleeping at work and some days of panic when shit broke. | | And a lot of "secret" scripts that automated a big part of | our job. | treve wrote: | > And the example is, instead of a chron job just running a | process once a month or on some other schedule, it runs more | frequently but checks if the change has already been made. | | As a property, I think it's even nicer if a script can | literally fully run twice and for the outcome to be the same if | it only ran once (so skipping the 'did I run before?' check). | | Even though this check is useful in general, if you can define | your data in such a way if it _did_ somehow run, that this is | not destructive / creates incorrect data, it makes the system | more robust. | | Of course this is not always possible though. For example, if | the process results in an email being sent, you need an | explicit check to not do that twice. | gen220 wrote: | In situations like these, it's a legitimate goal to implement | an idempotent, or "functional" core. | | So the goal of your functional core is to fully construct the | email, and return it to the caller, who then has the choice | to send the email, print it, write it to disk, etc. | | The program you deploy looks like this | | EmailSender().send_email(construct_email(args)) | | You can test by implementing a "safe" EmailSender interface, | so that you're executing the same code that's in prod. | | In general, if a job/function is mutating state deep in the | syntax tree (i.e. sending emails in the middle of a batch | job), I personally see that as a violation of the Single | Responsibility Principle. | sdenton4 wrote: | Mathematically, it's x^2 = x, which implies x^n = x for all | positive integers n. | | Nilpotence (x^2 = 0) is also very helpful some times: it's a | process which is self-reversing. Like the discrete Fourier | transform (if you set up the constants properly). | elcomet wrote: | self-reversing is not nilpotence. | | In mathematics, a self-reversing function is called an | involution, and it's f^2 (or f(f) ) = Id, the identity | function. | | Nilpotence is very different. It says that if you apply your | function a certain number of times, you end up with zero no | matter what the input is. For example, projection on x axis + | 90 deg rotation of a vector is nilpotent. | carreau wrote: | No, you are confusing with involution. 1/x is an involution. | Symetries are often involutions. | | Squaring a upper triangular matrix with 0 on the diagonal is | nilpotent. Derivatiting a polynomial of degree N is nilpotent | after N iteration. | contravariant wrote: | Careful, for nilpotence the power doesn't have to be 2. | | Also you may be confusing it with x^n = 1 (which I'm not sure | how to name, 'root of unity' perhaps). This would be the case | for the Fourier transform (with n=4). | | If x^2 = 0 then applying the Fourier transform twice would | null your function, which isn't the case. | pdpi wrote: | X^2 is a weird way to describe it. A function f is idempotent | if f(f(x)) = f(x). | creata wrote: | It's not _that_ weird. People often write iterated | composition as f^k, and this is especially true with | matrices, where composition and multiplication mean the | same thing. | corty wrote: | It is quite common in some fields. Operator application is | written without parentheses, and functions are a kind of | operator. Therefore: | | f(x) = f(f(x)) = f f x = f^2 x = f x | | And leaving out the x, because it is just a placeholder | anyways: | | f f = f^2 = f | | And of course this means that | | f^n = f because f^n-1 f = f^n-1 by induction. | globular-toast wrote: | I try to write idempotent software whenever I can. It's usually | not much more difficult to make it work and affords so much more | flexibility and less worry when it's done. | staticassertion wrote: | https://lostechies.com/jimmybogard/2013/06/06/acid-2-0-in-ac... | | If you can build a system with ACID 2.0 life gets really easy. | You can reason about your system without worrying about ordering, | time, 'exactly once' semantics, etc. | | Idempotency is usually one of the simplest pieces to implement, | and you definitely get a ton of benefit right off the bat - it's | worth designing systems from scratch with it in mind. | [deleted] | firebaze wrote: | We had one "special" team member who insisted on everything being | idempotent. | | This was his only leading principle. Result: absolute chaos - the | code aspired to be idempotent, but due to idempotency he avoided | thinking problems through and just created a mess of individual | functions - each being idempotent, aside from the unavoidable | bugs - which didn't form a coherent flow at all. | | We did a major refactoring, threw out about all that code, | rewrote everything in a logical manner. Now everything is still | idempotent, but comprehensible. | | TLDR: idempotency is the same snakeoil as the majority of guiding | principles: alone, it doesn't help at all. There are lots of | other factors to consider, which make the developer/architect | role demanding (and fun). | | Craftmanship at least, a sense for architecture (better) or | understanding the whole picture of the requirements as a team of | developers (best) is still required. | longhairedhippy wrote: | I don't see this as any reflection on idempotency as principle | (or other principles in general). Building systems poorly, | without a plan, and no testing, will result in a bug-riddled | mess, regardless of what pattern is being used. | smitty1e wrote: | Sure, one little hobby horse, e.g. "inversion of control" can | run amok to negative effect (looking at you, Java projects with | object traces 75 layers deep) but that doesn't make idempotency | or inversion of control into bad ideas. | | A bit of pragmatism goes a long way, like Python's odd | | x = (some_tuple,) | | . . .syntax amidst its generally clean approach. | | Inflexibility itself is the bugaboo. | spaetzleesser wrote: | That is a general problem with any principle used as rigid | ideology. Almost very principle becomes a problem if applied | too dogmatically . This applies to software dev but also others | like politics or economics. | telekid wrote: | In general, you should be thinking about the delivery semantics | of the systems calling your code. Many very useful callers offer | "at least once" delivery guarantees, implying that your system | should behave idempotently to their calls. | dang wrote: | Possibly related past threads: | | _What Is Idempotence?_ - | https://news.ycombinator.com/item?id=19570815 - April 2019 (51 | comments) | | _Idempotence: What is it and why should I care?_ - | https://news.ycombinator.com/item?id=17804617 - Aug 2018 (73 | comments) | | _You know how HTTP GET requests are meant to be idempotent?_ - | https://news.ycombinator.com/item?id=16964907 - May 2018 (304 | comments) | | _Implementing Stripe-Like Idempotency Keys in Postgres_ - | https://news.ycombinator.com/item?id=15569478 - Oct 2017 (41 | comments) | | _APIs, robustness, and idempotency_ - | https://news.ycombinator.com/item?id=13707681 - Feb 2017 (50 | comments) | | _A simple distributed algorithm for small idempotent | information_ - https://news.ycombinator.com/item?id=7276491 - Feb | 2014 (14 comments) | | _Idempotent Web APIs: What benefit do I get?_ - | https://news.ycombinator.com/item?id=5662138 - May 2013 (53 | comments) | | A word like that is particularly easy to search for: | | https://hn.algolia.com/?dateRange=all&page=0&prefix=true&que... | tigger0jk wrote: | > 1. Query the database to find all dormant accounts with a | balance, which haven't been charged the fee this month. | | > 2. Charge each of these accounts a fee | | > 3. Setup a cron job to run this every hour | | Note that if this job ever runs successfully, but takes more than | an hour, you will double-count. Can easily happen if the box | running these crons is overloaded. One fix is to automatically | halt the job after 55 minutes, another would be to have the | middle step be impotent, for each user you're doing the process | on, ensure (ideally in a threadsafe manner) that they need the | operation to be done still. | alex_young wrote: | Sounds like a good reason to use a pidfile or mutex so you can | eliminate the possibility of any concurrent jobs. | jchw wrote: | This is good but not enough. You also need to be sure that you | can't charge twice if the job runs twice. When you do that same | query twice, you will get the same list of users. This could be | done by exploiting database consistency rules, like using | strongly isolated transactions. One simple more general approach | is to use an idempotence token. You could, say, have a table with | a uniqueness constraint, and generate IDs that will match for the | same user in the same month. Then add that in the same | transaction that subtracts the money. The table could be cleaned | up periodically. | | If you're making or using an API where repeating would be bad, | consider using idempotency tokens for those too. I believe Stripe | supports them. The basic idea is the same: if you pass a token | into them, they will guarantee that in a certain time frame, no | other requests with that ID can be duplicated. This is useful | when the network flakes during the response. Is it safe to retry? | | Things get trickier when you combine network and database | consistency measures; that's when you get into locks and multi | stage commit and etc. and it helps to know your database's | consistency model, since it's often not as solid as you think! | (In the past, even PostgreSQL had issues with providing | serializable isolation.) ___________________________________________________________________ (page generated 2021-04-08 23:00 UTC)