[HN Gopher] Applying "make invalid states unrepresentable" ___________________________________________________________________ Applying "make invalid states unrepresentable" Author : fanf2 Score : 352 points Date : 2020-10-05 08:43 UTC (14 hours ago) (HTM) web link (kevinmahoney.co.uk) (TXT) w3m dump (kevinmahoney.co.uk) | yxhuvud wrote: | Not certain I get the needless attack on OOP in this text. The | error could just as well have happened in any alternative to OO. | What is needed is the realization that there is that there is a | schedule that needs to have full control over the times to not | create coordination issues. That is a realization that is utterly | independent of the OOP-ness of the eventual solution. | brbrodude wrote: | It's as if OOP code negates putting thought in designing a | solution, absolute nonsequitur for me. | kpmah wrote: | The problem with a narrow OO mindset is that it encourages | encapsulation and atomic objects with hidden state. Simply | gluing those pieces together can create suboptimal | representations - a more holistic thought process is better. | jschwartzi wrote: | Not necessarily. Some people can't see the forest for the | trees no matter what kind of programming language they're | using. | | What's needed here is to sit down and think REALLY hard about | how the system works and what some of the terms are that are | used to describe things. And also to have an intution for | when someone is using imprecise language. GP cited use of a | "Schedule" object as a way to enforce the invariant. That | might represent a block of intervals by a sequence of times. | | The less precise you are about what's needed and how the | system should work, the more of a soupy mess you're going to | build. | | People gravitate toward OO because OO makes it feel like | you've got a lot of conceptual clarity. You can get back to | that familiar "subject/object" dual we love in English. But | the problem is actually that, although people can pick | subjects and objects readily from a sentence, they might not | have much luck picking the important subjects and objects | from a sentence. And that's the problem we really need to | solve. | tomstuart wrote: | My favourite real-world example is something like: you have a | React component which accepts the boolean props `showFoo`, | `showBar` and `showBaz` to control its mode. The intention is | that exactly one of `foo`, `bar` or `baz` will be shown at any | given time, but that invariant is maintained only loosely, e.g. | by a parent component which holds three flags in its state and | mutates them in tandem inside various event handlers. | | The obvious problem is that those three props can easily get out | of sync if you make a mistake in updating them, and the equally | obvious solution is to replace them with a single prop (e.g. | `show` or `mode`) which contains a value from some enumeration -- | maybe just the name of the thing to show, or a numeric ID that is | given meaning elsewhere, or (galaxy brain) a component. That way | the invariant is maintained strongly and automatically by the | representation itself. | | This example sounds vacuous -- who would ever use three booleans | in the first place? -- but in practice it's very easy for UI code | to incrementally get into this mess over time without anyone | noticing. The situation is also often more subtle, e.g. the | invariant is more complex than "exactly one flag is true", which | makes it harder to spot that you can model all the valid states | with a finite enumeration. | leoc wrote: | I appreciate what people are getting at with "make invalid states | unrepresentable", but it can't be the best description of what | the true objective is. After all, the first language to make | invalid states unrepresentable was probably TECO | https://en.wikipedia.org/wiki/TECO_(text_editor)#As_a_progra... | ... | chrisweekly wrote: | Regarding "make invalid states unrepresentable", I'm curious | what others think of FSM (Finite State Machines), eg use of | XState (vs eg Redux) in the FE webapp state mgmt space. | legulere wrote: | You could also see this as a form of single point of truth. | teddyh wrote: | This is very much like database normalization, in that it has the | benefit of making invalid data impossible, but the drawback of | often making queries into the data much more cumbersome and | usually also inefficient. | | As with database normalization, it is a good idea to first do it | as much as possible, and then denormalize again until it is fast | enough. | rocqua wrote: | I have always has the idea of a database that does the | denormalizations you want automatically for you. | | Essentially, you keep the DB in a normalized state. You define | views of the DB that you want. Then the DB keeps those views as | tables for you, and the DB does all of the hard work of keeping | those view tables consistent with the denormalized data. | Essentially the DB does the atomicity, cache-invalidation, and | cache-updating for you. | | You get performance, and you get the certainty that invalid | states are un-representable. | | I guess the biggest blocker here is in automatically | determining what fields do and do not need to be update | automatically? | Serow225 wrote: | Check out the Noria project/database from MIT, I think you'd | like it :) | | https://github.com/mit-pdos/noria | | https://corecursive.com/030-rethinking-databases-with-jon- | gj... | | https://notamonadtutorial.com/interview-with-norias- | creator-... | | https://pdos.csail.mit.edu/papers/noria:osdi18.pdf | tzs wrote: | A large fraction of the comments are about how if you do this and | then someday your requirements change you might have to redo your | underlying data structures or databases, with the implication | being that you should therefore make those as general and | flexible as possible. | | That reminds me of an interesting point I saw in a book whose | title and author escape me. He said one of the reason you | encounter so many bad designs in Java programs is that many new | Java programmers look at the design of Swing to learn good | design. | | He wasn't saying that Swing is badly designed--but Swing is a | framework/library, not an application. What it takes to be a good | framework/library is different than what it takes to be a good | application. | | If you are writing an inventory management application you can | design your tables and data structures and interfaces around | things inventory management applications need. If you are writing | a medical billing application you can design around what medical | billing needs. | | If you are writing a framework or library that might be used by | inventory management applications and medical billing | applications and all the other nearly infinite kinds of | applications people will write you need to keep it very general | and flexible...but you also have to keep it fast and not too | bloated. It's a much harder design problem, with different best | practices for what is good design and what is not. | hinkley wrote: | There is a trick that I wish all senior staff knew but we find | ourselves having to teach as a matter of routine. | | 1) Don't advertise what you're not selling | | 2) Don't sell everything that you've made | | It's possible to write software that has a public contract that | allows only a subset of states that the internal system allows. | You can use this to support one customer that has a requirement | that is mutually incompatible with another customer's, but you | can also use it for migrations. One should be able to create an | API where the legal values in the system are strictly limited, | but the data structures and storage format may have affordances | to support migration. | | That doesn't necessarily solve the problem of communication | between services and migrating these changes into a running | system, but it's a useful tool. If you've ever had a coworker | who insists on your service call diagrams looks like a tree or | an acyclic graph, this problem is certainly one of many reasons | they may be insisting on this. With a DAG there is an order in | which I can deploy things that has a prayer (but not a | guarantee) of letting all of the systems understand each other | during each increment of deployment. | | People have come up with alternative solutions for this problem | by employing sophisticated sets of feature toggles, and in some | ways this is superior, but it trades the number of steps (each | of which has a potential for human error, and consumes calendar | time) for increased reliability on average. | secondcoming wrote: | It looks like a variation of Run Length Encoding to me, where the | 'Run' is a duration instead of a count. | wa1987 wrote: | Related video on this subject: | https://www.youtube.com/watch?v=IcgmSRJHu_8 | matsemann wrote: | I thought about that video today when integrating with an old | SOAP API. I need to find the name+some property of some | persons. Instead of having a list with [(name1, prop1), (name2, | prop2),...], I get two distinct lists of [name1, name2,..] and | [prop1, prop2,..]. In practice I think the lists will always | match. But there is nothing stopping them from not being the | same length, or even worse: one having a gap.. | vbezhenar wrote: | Put an assertion. It's better to throw early rather than | dealing with wrong data later. | wruza wrote: | Ah, good old design for systems with no special business cases. | Heard of them, but never seen one. | etripe wrote: | This is a good introduction on a conceptual level. | | I think a large contributor to the problem is story-oriented | development, where all that matters in the sprint is "getting it | done" and not looking at the broader context. | | To make unrepresentable states practical, Scott Wlaschin has an | excellent write-up here (0). His book (plugged in that article) | is also excellent. | | [0] https://fsharpforfunandprofit.com/posts/designing-with- | types... | lmm wrote: | > I think a large contributor to the problem is story-oriented | development, where all that matters in the sprint is "getting | it done" and not looking at the broader context. | | I think that's exactly backwards. This kind of overcomplicated | representation usually happens because people put too much | effort into designing their representations up front. If you | follow story-oriented development and only implement the parts | they actually need to get the current task done, you never end | up with these wasteful extra states because you never actually | needed them. But people think that planning before coding is | somehow virtuous, and then they're tied to following those | plans. | acidbaseextract wrote: | The representational debt the parent comment is referring to | is akin to that quip, "if I had more time, I would have | written a shorter letter." | | I find that both types of representational debt are common. | jka wrote: | To be fair, both are possible scenarios: | | - A team spends too much time over-engineering a | representation for something that could be more easily | maintained using a simple model | | - A team spends too little time considering the edge cases | with a representation because they feel pressure to deliver | the feature within a short space of time | matthewmacleod wrote: | And they often happen at the same time, too. I've seen | situations where developers over-plan a system in advance, | then spend ages hacking smaller features and changes into | it to meet quick turnaround times, instead of taking proper | step back and being aware of when their original design | needs to be revised. | Aeolun wrote: | Our database gets shit added on an 'as necessary' basis, and | I guarantee you it's not a great thing. | dtech wrote: | the flip side of this is that sometimes your initial designs | need to be expanded to account for something new and that's | hard, leading to tech debt and hacks. | | personally I think that's a better trade-off than | implementing something complicated up-front that you don't | know will work and ending up with flexibility in the wrong | places, leading to tech debt and hacks. | pjmlp wrote: | I assure you that I have wasted enough time fixing story- | oriented development with major refactorings, because some | edge cases weren't possible to be easily extended in the | existing code. | | Also have experience fixing story-oriented development with | dirty workarounds, because major refactorings were also | required, but not desired by whom was paying for the stories. | | Both ways I didn't care, it was money on the bank anyway. | thsealienbstrds wrote: | It doesn't sound to me like the kind of thing that any | particular WoW is going to fix. As if it would suddenly | naturally dawn on someone to do this given enough time. It's | just lack of knowledge and/or discipline. | maest wrote: | Sum types are one of the main things I miss when working in | Python. Is anyone aware of any good ways of adding sum types to | Python? | masklinn wrote: | The entire point of sum types is that they be statically | checked. Without static typing, they don't seem really | useful: Erlang doesn't have sum types, and it doesn't really | have a use for them until it gets a type system which can | leverage them. Instead it models "sum types" as tagged tuples | e.g. {ok, Ok} | {err, Err} | | _however_ Erlang has very good pattern matching. Python... | doesn 't. | | There's a PEP but I stopped following it because the | discussion was a mess. And it's apparently now split into 3 | different PEP, I don't know whether that's an improvement or | not though. | | Furthermore Python's dislike of HOFs means you can't really | do "monadic" processing as you'd do in, say, smalltalk where | your "variants" would really be subtypes with cool higher- | order messages. So you're mostly just adding indirections. | nybble41 wrote: | I'm not sure you would consider this a _good_ way, but one | can implement (unchecked) sum and product types in any | language with lexical closures via Scott encoding[1]: | # data Pair a b = Pair a b def pair(x, y): | return lambda f: f(x, y) # data Either a b = | Left a | Right b def left(x): return lambda | l, r: l(x) def right(y): return lambda l, | r: r(y) v0 = pair(2, 3) v1 = left(7) | v2 = right(9) # case v0 of { Pair x y -> x + y } | print(v0(lambda x, y: x + y)) # case v1 of { | Left x -> x + 1; Right y -> y * 2 } print(v1(lambda | x: x + 1, lambda y: y * 2)) | | [1] https://en.wikipedia.org/wiki/Mogensen%E2%80%93Scott_enco | din... | maest wrote: | Wow, this looks clever. (maybe a bit too clever) | | I'll need to read more about it, thanks for the pointer! | choeger wrote: | > I think a large contributor to the problem is story-oriented | development, where all that matters in the sprint is "getting | it done" and not looking at the broader context. | | I think you have a point here. This design offers much better | safety, comparable to "parsing instead of validating". But it | requires up-front design. And that is indeed "verboten" in | modern software development management style. | | Why is it "verboten"? I think it has to do with two fundamental | concepts of scrum et. al.: | | 1. Stories that are "ready" just need to be "implemented". The | implication is that the developer does not design and | everything is orthogonal. There are no interdependencies, | maintenance effort or non-functional requirements. | | 2. A story that is "ready" focuses solely on the desired | outcome for some selected examples. There is no generalization | of the examples and consequently little to no abstraction. | | I think these issues stem from the fact that Scrum et. al. are | intrinsically tools for managers to isolate them from the | complexities of software engineering. Every metric of scrum, | for instance, like "progress" or "definition of ready" is | essentially empty of meaning for software engineering. | chrisweekly wrote: | +1, favorited. | aunty_helen wrote: | I like how you've said it here, but one thing that scrum | doesn't have in it is relief from your professional duty as | an engineer. | | If a system needs to be designed in a particular way, do so. | That's how long it takes and that's why in planning you | discuss how it will be designed. | | The design of how you're going to build the system is taken | care of before the task is split into easily digestible bits | that meet a definition of ready. | | If you need to build it before you know how to build it, | scrum allows for research spikes into this. A focused, | timeboxed interval that so you have the ability to estimate | the actual difficulty of work to be completed. | jl6 wrote: | I like the concept but I've seen a fair few examples of where the | developers and users clearly had differing opinions about which | states are invalid! | | Dates are a rich vein of examples. Some users will happily | consider "25th December" to be a date, without any year, because | it might be the name of a folder in which they store their | Christmas stuff. More seriously, genealogists or historians may | want to record "25th December" (again without a year) as the date | of a photograph because they can clearly see it was taken on _a_ | Christmas, they just don't know which one. The naive developer | would just slap a DateTime type onto the system and feel good | about having avoided malformed input. | IggleSniggle wrote: | This is so obvious now that you've said it. I can see so many | possibilities. I feel as though my eyes have been opened. | kroltan wrote: | But then you're not dealing with dates at all, just categories | which happen to have names that look like dates, no? | | Like you say, a "folder". But the photo file's metadata will | either have a complete DateTime, or none at all, unless there | is some sort of camera that is able to know what day of the | year it is without knowing the year! Which due to things like | leap years is impossible. | IggleSniggle wrote: | Importantly, these "date-like" things have important date- | semantics. That is, it may be reasonable to expect your | software to be able to handle varieties of precision or | completeness of date metadata, in which case your date | representation may need to be able to interact with DateTime | fluidly without actually being one itself, despite it being | tempting to remove these possibilities by making incomplete | date information unrepresentable. | will4274 wrote: | > some sort of camera that is able to know what day of the | year it is without knowing the year! Which due to things like | leap years is impossible. | | Just a small note, as we're talking about assumptions, but | this is clearly untrue. Consider - for decades, humans wore | watches that knew the day of the month, but not the month - | you were just expected to turn the day forward on the 1st day | of months following non-31 day months. Similarly, we can | imagine disposable cameras that ask the user for the current | day on first use and simply assume leap years don't exist, | and require the user to correct the date for any leap years. | You might call this a silly design, but systems often have to | interact with external systems that have silly designs. I | believe I actually owned a toy PDA (a device for a child, not | an adult) that did not have a year back in the 2003 or so. | rocqua wrote: | > But then you're not dealing with dates at all, just | categories which happen to have names that look like dates, | no? | | If you and your client disagree about what constitutes a | date, that doesn't mean "your client is wrong" it means "you | need better communication. You can't fix this problem by | requiring everyone use consistent definitions. Instead the | solution is to check assumptions as often as possible. | lkitching wrote: | If you only have a day and month like 25th December you should | represent it with a type that contains just that information. | Java for example has the MonthDay class (https://docs.oracle.co | m/javase/8/docs/api/java/time/MonthDay...) which can be | converted into a local date when the additional information is | available. If your users want to refer to that as a date then | that should be handled in the UI but should not lead to | ambiguity in the internal representation. | isbvhodnvemrwvn wrote: | Also in genealogy: | | - estimated dates | | - calculated dates (e.g. someone was 30 in 1870, so he was born | in "calculated 1840") | | - unreadable or unavailable months or days (typically recorded | as 1980-00-13) | | - time ranges with all of the above as boundaries e.g. "after | 1760-03-00 and before calculated 1800" | | - plainly incorrect dates, but that's what the document says | (1865-02-30) | | - no dates (some software tries to enforce putting in some data | in for whatever reason) | | - dates with unknown calendar | Terr_ wrote: | All plausible and scary. It sounds like a recipe for a system | where the different qualities of dates are their own entity, | and anybody doing a date-search needs to provide some | criteria on the degree of specificity or certainty they | require. | | For that matter, someone might want to do a text-based search | on dates: "This damaged photo shows 196_-_1, which could be | at least 20 different months..." | isbvhodnvemrwvn wrote: | There's also another dimension to that uncertainty. Dates | are typically attributes of events involving one or more | people in various roles, and these events are (hopefully) | attached to one or more sources. The type of sources you | use affects the confidence of various attributes of the | event (like dates or participants). | | In my area of research death records are typically a bad | predictor for date and location of birth. When the records | were managed by the churches (until after WWII pretty | much), priests did not want to pester grieving family for | the exact date of birth, so they relied on approximate age. | Naturally the subject of the death record would not | typically point out errors. On the other hand marriage | records required looking at the actual birth records | (sometimes mailed across the country), as these contained | notes on all marriages of the individual (this was done to | ensure monogamy). | | Is it on the genealogist to consider this confidence when | specifying the date of the event, or should the software | intervene? Due to complexity, the former is the industry | standard. But maybe there are some brave (or stupid) people | who will try to take it into account in the future. | jl6 wrote: | I implemented a [private] genealogical data entry system | based on the GenTech data model, which has a "surety | scheme" entity, designed to capture perceived | uncertainty. I also added a "Fuzzy Date", where every | date-like value was decomposed into all its constituent | components (year, month, day, hour, ...), all optional. | It could also capture a range of such fuzzy dates, so it | was possible to enter a "date" such as "The first of a | month no earlier than 1950 and before June 1990". There | was a loooong list of validation constraints to attempt | to prevent contradictions being entered, and I _think_ I | caught all the cases, but... | [deleted] | JamesBarney wrote: | If you enjoyed this blog post and want to go into more depth on | how to model your domain using types check out | | https://fsharpforfunandprofit.com/series/designing-with-type... | a-dub wrote: | this is a nice idea, but as mentioned in other comments, i think | the most important goal when designing a schema is to design to | make the queries you'll frequently be making simple and | efficient. extensibility comes second and if you can make some | invalid states impossible to represent, then that's definitely a | bonus. | userbinator wrote: | Another application of this principle is in data representation, | and why I think text-based formats are horrible in general for | communication between software that doesn't involve a human | reading it the majority of the time: there, the "invalid states" | not only cause complexity in the parser, but they also waste | space (think of storing a 4-byte integer as the 4 bytes directly, | vs. a string of variable-length ASCII text.) | joosters wrote: | The time period example seems to miss an obvious weakness in the | described Time Period 'object' - it's implied that the end date | should be >= the start, but if you are representing a time period | with ( Date, Date ) then you are still allowing invalid states to | be represented - yet this is what the writer is trying to avoid. | Likewise, a timeline split into contiguous periods can still | represent out-of-order Dates. | | a Time Period object of ( Date, Duration ) would fix the first | issue, and a TimeLine of ( Date, Duration, Duration, ... ) would | fix the second one (assuming unsigned Durations!) | [deleted] | lxe wrote: | I think the example's last visualization is confusing. | | The timeline is {date1, date2, date3, date4}. Let's say you | have 2 periods, date1 - date3, and date2 - date4. Period 1 can | be represented as {date1, date2, date3}. Period 2 can be | {date2, date3, date4}. | | Am I understanding this correctly? | jxf wrote: | I think you misunderstood the author's point: time periods | aren't explicitly represented as (Date: start, Date: end). | Instead they're a set of dates. The time periods are then | implied by the set, making "end date before start date" | impossible. | VBprogrammer wrote: | It also deals with the problem of the bounds on the ranges | (open-closed, open-open etc) implicitly in a way which is | harder to mess up. | | One complication it's potentially missing is exactly who's | days we are talking about i.e. is it days starting in GMT or | UTC or EST or whatever the 'suppliers' or the 'customers' | timezone is, are we actually talking about some day concept | perhaps from start of business. Representing this as a | datetime start / end certainly makes it possible to represent | these concepts, if perhaps not making anything else | particularly easier. | [deleted] | zimpenfish wrote: | I was going to say that this has a weakness in testing | because you can't `assert(end > start)` without knowing which | is which because `assert(later > sooner)` will always be true | but then I guess you'd have input (and other) validations | before it got to that point anyway. | choeger wrote: | You should re-read the article. The author explicitly mentions | to use _sets_ of dates. Hence the ordering is implicit. For an | interval you can do the same and use either a set or an | unordered pair. | joosters wrote: | A set doesn't necessarily imply an ordering (unless this | article is about some specific programming language, but it | seemed fairly generic to me). e.g. Java and C++ have many Set | implementations, some sorted (e.g. a TreeSet) and some not | (e.g. HashSet) | matsemann wrote: | That's the point. There's no ordering of the items. So the | representation can never have an error like [yesterday, | tomorrow, today]. It's just a set of (yesterday, tomorrow, | today). | joosters wrote: | Doesn't that just push all the responsibility onto every | piece of code that uses the datatype? 'Remember to sort | the contents of this set every time before using it' | sounds like it is asking for trouble. | kevinmgranger wrote: | The fact that it's a set can be hidden, and any queries | that depend upon an ordering can be presented as a sorted | view or whatnot. | secondcoming wrote: | So finding out a contract type from a timestamp is | linear? | rocqua wrote: | The set, and an ordering over the elements is sufficient. | It perfectly defines and represents the intervals | mathematically. What implementation of a set you use is a | practical detail. | | When implementing this in practice, you probably want to | use a set implementation that gives fast ordering results. | But that is a performance consideration. Not a data- | representation consideration. | browsergrip wrote: | This is good article. But the second example seems to suffer from | the defect of the first. Removing default contracts and | representing fixed contracts as intervals leaves it possible that | these fixed contracts can overlap....which is | probably...undesirable? | | In that case, applying the remedy of the first example (a set of | dates, and inferring that every 2nd (even zero length, to account | for adjacent fixed) interval will be default,) introduces another | bug where if you lop off any random date in that set or list, you | invert everything. | | I love the concept represented here, it _is_ akin to | normalization as in...simplify the representation so no | redundancy is introduced as this usually leads to better | results...but it seems it 's no guarantee of better results. | | But maybe that's just because the "model" we are simplifying from | was not an optimal representation. Perhaps there's a better model | of the second example that doesn't end up with the defect of the | first example. | | I really like this article but am struck by how something that I | wanted to be almost a silver bullet trick for modeling, ends up | being a mass of compromises mired in tradeoffs that doesn't show | any clear way forward in the general case. Still, probably a good | rule of thumb, but I guess this rule is not optimal...as it can | have so many unworkable misinterpretations/misapplications. | | It would be cool to see a list of, like, "Programming | Heuristics", ranked by decreasing general applicability, of which | this rule was a member somewhere far down the list. | chrisweekly wrote: | "It would be cool to see a list of, like, "Programming | Heuristics", ranked by decreasing general applicability, of | which this rule was a member somewhere far down the list." | | +1 for this! Anyone got good links to share? | JangoSteve wrote: | > In that case, applying the remedy of the first example (a set | of dates, and inferring that every 2nd (even zero length, to | account for adjacent fixed) interval will be default,) | introduces another bug where if you lop off any random date in | that set or list, you invert everything. | | This is a good point, and something I usually describe | recoverable problems versus non-recoverable problems. If I make | start/end dates in the first example instead of just a set of | start dates, then I can always create application-level or | database-level constraints that don't allow either overlapping | or incomplete segments. When business rules change, I can | delete the constraints and update the business logic as | necessary with no change to underlying data structures. | | However, if I miss implementing a constraint and it erroneously | allows overlapping or incomplete segments, I can easily run a | query to identify all such invalid entries. Then I can then | investigate and decide how to fix them. | | However, if I go with the start-date-only set-based approach, | and miss implementing a constraint, and it leads to a deleted | date creating incomplete segments... I'm screwed. There will be | no query you can do to identify incomplete segments to | investigate or fix, because all segments are assumed to extend | to the next start date. You can irreversibly lengthen one | segment by deleting another, due to a forgotten constraint | preventing you from making the change. | | These could both be errors on the developer's part, depending | on the requirements at the time, but one data design may lead | to more non-recoverable issues than the other. Add in the | flexibility of the former approach, and I'd probably be more | likely to implement the former approach than the one proposed. | moron4hire wrote: | This violates one of my core, learned-the-hard-way, database | design constraints: querying of rows should never have to rely on | any linear dependence on any other row in the same table. If you | ever have to do some sort of inner join of a table on itself to | bring in the single "next" row to tell you information about the | current row, then you are stuffed. Query complexity explodes. | Performance takes a nose dive. | | The end date of a contract is a property of _that_ contract, not | any other. | | Forgetting all other contracts for a moment, what do you need to | know about one contract? There should be a straight forward way | to query that contract on its own, with a query that represents a | tree through tables in the database. It should not become a | graph, with the potential for cycles that graphs allow. | | And I get the business requirements could need no overlaps, but | gaps are clearly possible if a customer leaves for a while and | then comes back later. Does that person need to then become a new | "customer", because you don't allow gaps? And then are your | customers' PII only allowed to be registered to a single account? | Comcast has been a huge pain in the ass in years past because of | moving, gaps, and email address reuse. | ajuc wrote: | This is part of approach to programming that is more popular in | functional world. | | You take requirements and make system exactly right to fit these | requirements perfectly and don't bother with any other concerns. | | When you design this close to the requirements you get better, | faster and more elegant code that's easier to understand - but | when requirements change you have to do much more work to adapt. | Suddenly a state which was previously invalid is valid, or a part | of system that only needed one kind of input needs 4 different | inputs from separate parts of your code. Have fun basically | rewriting your program. | | That's IMHO the main motivation between differences in functional | and OO programming - how close to the requirements you want to | design your code. | pjmlp wrote: | This kind of approach (type driven development) was already | popular in Algol derived languages, hence why the cowboy coder | would call us on Ada, Modula-2, Object Pascal side of the | fence, programming with straitjacket. | ajuc wrote: | I don't think it maps 1-1 with strong vs weak typing. | | C (very loosely typed language) code is usually pretty close | to the requirements and very strongly typed functional | languages (like Haskell) are often "make DSL and write the | specification in it, then run it". | | Meanwhile object oriented languages often have pretty strong | typesystems and cultures of using them extensively, but they | also encourage designing with margin for changes (and thus | using lots of layers of abstractions instead of just | implementing the specification as elegantly as possible). | mlthoughts2018 wrote: | "Make invalid states unrepresentable" is a type of fool's gold. | You don't care that invalid states are unrepresentable, you only | ever care that a specific instance of your running program is | very unlikely to enter an invalid state - and the difference | between formally disallowing invalid states vs. test coverage | that proves a reasonable likelihood of avoiding invalid states is | huge. | | The extra code and conceptual complexity spent to make type | designs that disallow invalid cases is a liability, it comes with | its own bugs, maintenance and huge risks of premature abstraction | and brittleness in the face of changing requirements. | | If it takes anything more than a simple enum-style menu of | permitted options, then it's a code small. Things like Scala case | classes (especially with sealed behavior), or pattern matching | against type constructors, or phantom types - these are all very | bad ideas, where the costs far outweigh the benefits. | | Most of the time you can just ignore enforcement of assumptions, | and add a few assert statements plus lightweight unit tests and | integration tests that generate an abundance of real world | example cases - and achieve all the safety you need for a | fraction of the code & conceptual complexity and tech debt | incurred by false promises of enforcing correctness with type | system designs. | nefitty wrote: | Software engineer here. | | Does anyone have any questions? | codeulike wrote: | lol, you should post this in every thread | nefitty wrote: | It's not going well | codeulike wrote: | Wait, is kevinmahoney.co.uk your blog? | nefitty wrote: | No, just a software engineer | [deleted] | codeulike wrote: | Dude, this is HN - we're all either software engineers, | or people pretending to be software engineers | | edit: or related hangers-on, like entrepreneurs or | "thought leaders" | nefitty wrote: | I can answer questions for VCs then I guess | [deleted] | alch- wrote: | It was a joke, son. | codeulike wrote: | But jokes aren't allowed here | 8note wrote: | Sure: what's the value in $, engineer time, number of clients | calling, etc, where this is a good tradeoff or a bad tradeoff? | alch- wrote: | Lol, nice one. | judofyr wrote: | In general I agree that it's nice to make invalid states | unrepresentable, but I'm not sure if I agree that this counts as | a fundamental "invalid state". There is nothing about contracts | which require that you can only have one active at the same time, | or that that current one must be open ended. | | From a practical point of view it might be advantageous if you | maintain only a single contract with a customer at all times, but | that is a _business_ requirement which might be changed in the | future. | | I mention this mostly from experience: Multiple times I've | designed systems where I've reduced the representable states to | the minimum, and when some requirements change I realize I have | to re-design the full system. | | The new presented representation might make sense in _this_ | situation, but I 'd be very wary of taking current business | practices and make all other alternatives _impossible_ to | represent. It 's a balancing act of course as you can go in the | opposite direction and make it way too flexible. | | > This poor choice was not just a theoretical problem - gaps in | contracts were found on more than one occasion, requiring hours | of engineering effort to hunt down and fix. | | I'd like to hear more about what happened here. Was the problem | that the default contract was not re-applied correctly? If so, | changing the representation might not actually solve any problems | -- it make actually make it _worse_. A renewal of a contract | typically involves some automated process where other services | are involved (payment, invoicing, emails). The previous | representation (with explicit start/end dates) made it possible | for you to verify that everything was correct and lined up. | goto11 wrote: | > Multiple times I've designed systems where I've reduced the | representable states to the minimum, and when some requirements | change I realize I have to re-design the full system. | | Yes, if requirements change, you change the design and code to | support the new requirements. | | Compromising the consistency and maintainability of the current | design to accommodate a hypothetical future requirement change | is a bad trade-off IMHO, since you can't predict the future. A | requirement change may happen in a completely different | direction than the one you anticipated, and then you have the | worst of both worlds. | | It is better to make code _maintainable_ than making it | _flexible_. | judofyr wrote: | > Yes, if requirements change, you change the design and code | to support the new requirements. | | Code and representation (i.e. schema) are vastly different. | In my experience it's takes an order of magnitude longer to | change a representation than to change code. Once there are | multiple services/tools which works with a representation you | typically have to support both the new and the old | representation at the same time (since you can't rollout | everything simultaneously). | | > Compromising the consistency and maintainability of the | current design to accommodate a hypothetical future | requirement change is a bad trade-off IMHO | | Designing a representation which can handle possible | requirement changes does not necessarily mean "compromising | the consistency and maintainability". We have great tools for | ensuring consistency (e.g. transactions and constraints in | SQL databases), and I don't exactly see how this new | representation is more "maintainable" than the old one | (although we don't have all the information in this article). | skybrian wrote: | Getting good at data migrations (with tools and processes | to do this) can pay off. It's a more general way of | preparing for the future than attempting to anticipate | specific changes. | | On the other hand, some may say YAGNI. | tonyarkles wrote: | > since you can't predict the future | | That's one of those statements that makes sense, but is often | not true. Very rarely does a client comes to me with a | feature request that _does_ require a pretty significant | design change, but most of the time they 're changes that | were foreseen. | | Using this current article as an example, I love the way that | they're storing the intervals to guarantee that they can't | overlap. That's awesome! What I would likely end up doing, | though, is use that as the underlying representation but | still return individual interval objects through the query | API with a start and end date on each interval. That way, if | the "only one at a time" rule changes, the changes required | are localized. | hondo77 wrote: | > What I would likely end up doing, though, is use that as | the underlying representation but still return individual | interval objects through the query API with a start and end | date on each interval. | | How the model you present to the user is represented in the | database is an implementation detail. | philwelch wrote: | > Using this current article as an example, I love the way | that they're storing the intervals to guarantee that they | can't overlap. That's awesome! What I would likely end up | doing, though, is use that as the underlying representation | but still return individual interval objects through the | query API with a start and end date on each interval. | | The article addresses that concept: | | > It is sometimes still useful to represent the periods as | a sequence of start and end dates. It is trivial to project | the set of dates in to this form. As long as the canonical | representation is the set, the constraints will still hold. | philwelch wrote: | I like to call this, "speculative complexity". I've seen many | cases where speculative complexity was added, and persisted | for a long time, for reasons that fundamentally mispredicted | the way the system would evolve and actually inhibited that | evolution. | cle wrote: | I've seen this too. And I've also seen the converse-- | complexity that was added because engineers refused | anything but the most myopic designs, using thought- | terminating cliches like "YAGNI" or "that's hypothetical". | | "Never think ahead" is obviously not good advice. There's | no silver bullet here--we have to think about how likely | future scenarios are, and plan for them based on the | business context and needs. Many of them are unlikely or | too costly to do anything about...and many of them aren't. | jpindar wrote: | But then you'd have to have domain knowledge and why | would you learn domain knowledge when the ideal career is | a new job in a new industry every few months. </HN> | goto11 wrote: | Under what circumstances did following YAGNI lead to | added complexity? | refactor_master wrote: | I think if you misinterpret YAGNI as "you don't need to | change this later". So your becomes rigidly hard coded, | instead of having easily configurable variables and | arguments. The over-engineered solution (the real YAGNI) | was an interface, an object, methods and fields, only | serving a very niche purpose with a lot of boilerplate. | tsimionescu wrote: | A common pattern is writing code as a series of isolated | cases, when taking some time to design the general case | would greatly reduce the amount of code. You add a bool | parameter to a function to modify one small bit of what | it does, then another one, and you add some new return | value, and before long, you've got a class with several | getters and instance variables represented as code in a | single function, with parameters controlling which actual | method is run. | thewebcount wrote: | A real world case I recently ran into where thankfully we | realized we _would_ need it and added it is versioning. A | struct you pass to or receive from an API that has a | version in it means the difference between being able to | make changes to the internals without changing the | externals and not being able to. We didn 't _need_ the | version field until the second version was released. Had | we just said, "Oh, we aren't going to need it," on the | first version we would have been boned. | jax_the_dog wrote: | Not OP, but it would add complexity because that method | you "weren't going to need" turns out to actually be | needed. | | Now you have to work around your simplified design | because you decided that you didn't need anything more. | dnautics wrote: | Correctly judging this tradeoff is what makes the | difference between a good architect and a great architect. | There are definitely cases where I have put in a bit of | effort (a week or so's worth of programming) to make things | flexible because due to the requirements I knew they would | be necessary in the 9-12 month timeframe (I've also been | wrong about architectural decisions). Then when the time | came around, it was painless to make the transition. | | I suppose if you were cynical, you could claim that if it's | painless no one sees how important you are. And then you | wind up leaving the company, because they think everything | is easy and don't provide you with the autonomy to achieve | what you need to make their system work. And then they | discover that it's actually hard. | hackerfromthefu wrote: | I agree very much with your comment. | | I've found the fundamental principle that helps to keep the | system extensible is to: make your system model the real world | accurately. | | This involves building the system concepts closely mapped onto | real world concepts without taking shortcuts. | | That way when the requirements change, all the fundamental | pieces of the system stay valid, and only the piece that is | changing tends to need updating. | | This helps to void the problem you mentioned of needing to re- | design the full system, keeping the system extensible. | ahoka wrote: | Then the code will be changed. Making up business requirements | is the number one reason for instant legacy code. Code is not | set in concrete, you can add that flexibility later when it is | needed, but making everything overly generic to make it easier | to "implement new requirements" only leads to code that is hard | to change in my experience. Also don't forget that this is only | an example. | fennecfoxen wrote: | A few basic invariants about how your code is structured does | a lot more than it being excessively "generic". Writing | simple components that actually adhere to the single- | responsibility principle (and composing them into more | complex logic) doesn't just help out the mysterious future. | It makes your existing codebase easier to understand and to | validate up front. | chii wrote: | The YAGNI (you aint gonna need it) principle overrides the | DRY principle imo. | chrisweekly wrote: | Yes - as does AHA (Avoid Hasty Abstractions). IMHO. | pc86 wrote: | This is partly why I've found it helpful to wait until | there are at least 3 _identical_ (not nearly identical) | implementations of something before trying to make a more | generic /abstracted version of it. | dllthomas wrote: | I think these rules of thumb are... okay. But I think | it's more helpful to go back to how DRY was initially | defined ("Every piece of knowledge must have a single, | unambiguous, authoritative representation within a | system") and ask whether what I'm dealing with is | actually "a piece of knowledge." There can be 10 | identical copies, and if they just _happen_ to be | identical but they represent different things that might | change independently, they should probably remain 10 | identical, independent copies. Alternatively, if there | are exactly two places something occurs in the code, but | if they 're out of sync the result is a broken system, | you should think about unifying. | | DRY is too often treated as purely (or principally) | syntactic, when that's actually much less useful. | pc86 wrote: | This is a great way to look at it | bluGill wrote: | Computers have been widespread for between 70 and 30 | years (there is reason to debate, but whatever). Nearly | everything has been done before: what you are doing isn't | fundamentally new. There is lots of opportunity to add | minor new features, reliability, or better user | interfaces. But the fundamentals of what you are doing | isn't new anymore. You can look at your past versions and | what competitors do for guidance on what you will | probably need next. If you have any broad knowledge of | your problem domain you can make reasonable guesses as to | what you will need and what you won't need. When | replacing a subsystem I know if there will be 100 users | of it in the future or if it is a leaf with 1 user - | because I know what the old crufty subsystem has (the | first case I spend days thinking about the interface, the | second I design the interface when I integrate the one | subsystem) | [deleted] | dkarl wrote: | The requirements here aren't clear, but I'm guessing the | requirement is to model the contracts the company actually | has with customers. | | The business _also_ tells you that there are never two | contracts running at the same time. But are you actually | going to believe that? Is this condition really | "impossible?" | | A vital and necessary factor here is whether the system being | designed has complete control of the creation of contracts. | This is perhaps taken for granted by the author, but it's too | important to leave implicit. You have three choices in a | situation like this: _make_ it impossible for contracts to | overlap, model it, or don 't model it and accept the | consequences. Depending on the frequency and the consequences | of the assumption being incorrect, maybe it's acceptable not | to model it. Maybe not. My point is that you can't assume | something is impossible unless you can actually prevent it | from happening, and the author should not have sidestepped | this part of the analysis (though possibly they meant for it | to be understood that contract creation happens through this | data model.) | | > Also don't forget that this is only an example. | | The problem is that this is an example meant to illustrate | and justify a rule of thumb, but it's extremely, extremely | simple. How often do you deal with requirements that are this | simple, this mathematical? Is this really the kind of example | you want to build a rule of thumb from? | | Realistically, when I hear requirements like this, I assume | they're wrong (very common at the beginning of a project) and | I get together with the product manager and ideally a domain | expert representing the customer (if the product manager | isn't too territorial about that being their job) and figure | out what the hell the actual requirements are. What if a | customer has a contract to rent 10 units of space at $5 and | in the middle of that contract needs more 5 more units but | the price has gone up $10? Do you tell them they have to | cancel the existing contract at $5 and pay $10 for all their | units if they want to add some? Or give them the new units at | the old price? Or is it okay to represent the same customer | by distinct customer records? | | I do like the principle of making invalid states | unrepresentable, but I would like to add two supplementary | principles: | | _1. Oftentimes what the business tells you about the data | they produce is purely aspirational._ | | "There will never be overlapping contracts," often means, "We | swear we're going to stop creating overlapping contracts, and | this time we really mean it." You have to follow up with | questions like, "How often have we had overlapping contracts | in the past? When was the most recent occurrence?" You should | even ask, "When do we anticipate signing the next one?" A | logically-oriented software developer might expect someone to | take offense if you respond to "we don't sign contracts like | that" with "Do you have any currently in the pipeline?" but | this is a totally normal kind of question to ask. | | _2. When users give you a rule in their business | requirements, they often take it for granted that the | software will handle exceptions to the rule gracefully._ | | They don't necessarily appreciate how bad things can go in | software when something "impossible" happens. When they say, | "Contracts will never overlap," you have to say, "What should | happen when they do?" If you are talking to a mathematician | or a programmer this might come off as questioning their | competence, but most people will not find it unusual at all | or at least will appreciate that the question is motivated by | experience rather than disrespect. It's not like a math | problem in school; it is legitimate to question the givens. | arethuza wrote: | I once did weeks of work trying to unpack what my employer | meant by the term "customer" - if your customers are large | companies with hundreds of legal entities and hundreds of | locations across the world things can get pretty complex | pretty fast e.g. does "never two contracts running at the | same time" mean that you can't have a contract with a | subsidiary and another separate subsidiary of the same | parent (which might make sense for credit checking | purposes)? What about subsidiaries in different countries? | What about partly owned subsidiaries... | [deleted] | stickfigure wrote: | I think the fundamental problem here is that the table/entity | is incredibly badly named. What kind of contract has only three | fields?! This isn't a case of YAGNI; no real-world "contract" | is this simple. | | It appears to actually be some sort of contract_period or | contract_duration, and probably has a link to a real "contract" | object somewhere that contains the real meat of the concept. | But it's hard to tell what's literal and what's the author | trying to simplify the example for us. | layoutIfNeeded wrote: | This. | | Basically this is what event sourcing tries to solve: it lets | you change your state representation to reflect new | requirements, because you can always rebuild everything from | the event log. | steve_g wrote: | There are trade-offs, of course, but I'm generally not a fan of | using implicit defaults for business applications (i.e., the | application infers the default when there's no data). | | If things go well, business data outlives business | applications. After years or decades, it can be a major pain to | figure out all the "secret" values that aren't actually in the | data. | Zenst wrote: | > There is nothing about contracts which require that you can | only have one active at the same time, or that that current one | must be open ended. | | Agreed and yet for many systems, that flexability still eludes. | | One example I personally experienced was changing a phone | contract. The contract had run for many years, so could be | cancelled any time with one months notice. The new plan and | contract was much better and yet a limitation played out doing | this. Ended up that the systems at the telco was unable to | activate the new contract until the old contract had ended. | Whilst a new contract could be physically signed in a shop with | a start date of the day of signing, and logged into the system. | The provisioning backed was unable to activate it until the old | contract had ceased as you can't have two contracts for the | same phone number. | | That I do believe is a case of - whilst some things can run in | parallel, others are locked to a single dependant resource. | | But every rule has an exception, it is with good design that | you limit those exceptions impact. | tlarkworthy wrote: | Hard agree. The type system is no place for business logic. If | you want to get fancy, maybe some database constraints you can | change later. Business logic is constraints over the | representation... not the representation itself. | | Also in generally the principle of "make invalid states | unrepresentable" ends with the realization that only Idris can | properly do it, which is not pragmatically useful. | bcrosby95 wrote: | Absent business requirements, I would love to see what you | think fundamental invalid states for a contract would be. Every | property of a contract I can come up with seems like a business | requirement. | seer wrote: | While this is great if you know exactly what you want to achieve, | it does "lock you in" those constraints on a more fundamental | level. More times than I can count I've seen business | requirements change to require those "unrepresentable" states, | and since you've now designed you whole data model around it you | need to add awful hacks to make it work. | | The timeline example is actually very telling. A lot of times | you'd actually want to encode overlapping time periods at the | edges. | | You'd be laughed out off a meeting where a business asks about | this and you smugly explain how it is unrepresentable. | | I guess what I'm saying is that it might be worth over designing | your system a bit to leave you some wiggle room, unless you have | hard guarantees that something should be "impossible" | ImprobableTruth wrote: | I agree that this is an issue, but I think the answer is simply | using a very flexible basic data representation (which admits | invalid state) and then using predicates to refine it. e.g. | starting with a list of (start,end) intervals and then adding | predicates for valid intervals (start <= end), ordered, non- | overlapping and continuous. | | If any of the requirements change, it's easy to either add more | predicates or relax/even outright remove them. | jerf wrote: | If you play your cards right, you can even get your type | system to alert you to every place in your code that needs to | change when you change the constraints. | | I don't deny there's a certain art to that, and I can't | explain it all myself. But I don't even necessarily mean | amazing clever type tricks like you might see in Haskell or | something, I mean that something as simple as "I've changed | the definition of what a 'Customer' is in some fundamental | manner, so I'm going to rename that class to 'CustomerNew', | use the compiler to point me at every single place that | breaks, audit it, change the local name to CustomerNew, and | then, once everything is fixed, use my IDE's rename feature | to rename CustomerNew back to Customer before my final | commit". Many times you can get by just by renaming a field | or something to similar effect, but in the worst case you may | need to audit everything. | | It's one of the more tedious bits of the job sometimes, but | net-net this can still be a timesaver, if you account for the | full cost of the trickle of bugs this sort of thing can | prevent. | [deleted] | pc86 wrote: | Even "hard guarantees" are worthless. All it takes is one | client with a checkbook to change business requirements. | erpellan wrote: | That could be fixed with 1 additional concept. Instead of | Customer: 1 - 1: Set(Dates) | | What if it was Customer 1 - *: Contract: 1 - | 1: Set(Dates) | | Overlap achieved. | kpmah wrote: | I think you are vastly overstating the risks of changing | requirements. It is usually easy to go from a less permissive | model to a more permissive one, but the opposite is often | difficult. | the-smug-one wrote: | You could always split your types into frontend and backend | types where the backend ones are more open and the frontend | ones are more restricted. I don't necessarily mean FE/BE as on | the web. A lot of code is only interested in shuffling around | data anyway, the shape is fairly uninteresting. | nemetroid wrote: | I would put it the other way around: make your basic | representation restricted, but present a more permissive API. | This way, the data model helps enforce your constraints, but | you don't need to redesign the API when the requirements | (inevitably) change. | gambler wrote: | Kind of like internal data/algorithms and external interface | in OOP? | TooCreative wrote: | Additionally, turning (startDate, endDate) into a set of dates | will make the code more complex in some places. Before: | SELECT event FROM events WHERE endDate<"2021" | | After: Whatever additional complexity you | add to your codebase to query end dates. | zimpenfish wrote: | It's not _hugely_ complex although definitely more than the | first example. I guess it depends whether it 's offset by the | benefits... | | https://dbfiddle.uk/?rdbms=postgres_11&fiddle=50e6a963cd1db0. | .. | | (YMMV, obvs.) | dghf wrote: | It's not that much more complex: if you do | SELECT event FROM events WHERE startDate < "2021" | | then all but one of the results (the one with the greatest | `startDate`) will also have an implicit end-date prior to | 2021. | mobjack wrote: | Doing all the logic in SQL requires more complexity using | subqueries. | | It gets uglier if you need to find the contract valid on a | certain date based off of a join. | | These issues can be covered up with code as it will a | easier to have reusable functions, but it makes the job of | a data analysts much more difficult and error prone. | dghf wrote: | Uglier than having to find the record after the one | you're inserting (so you can determine your new record's | end date from the subsequent record's start date) _and_ | the record before (so you can modify its end date to | match your new record 's start date)? | peteradio wrote: | Until business tells you that endDate is not necessarily | greater than startDate. <- real world experience | NegativeLatency wrote: | Sounds interesting can you explain more about it? | jdmichal wrote: | This is fine for an open-ended query like the one given, | because you still receive all the relevant data. But if | you're looking at a range, for the same reason you have one | extra at the end, you also have one _missing_ at the | beginning. And you can 't just filter away missing data. | noisy_boy wrote: | I don't see how the contracts example is simplifying things while | staying realistic. What if I need to have separate kind of | default contracts for different classes of customers? What if I | need to modify the details of a certain type of default contracts | for all consumers that are using it? And on and on. If storing | them in contracts table is not good (which I don't really agree | with), then where should we store these variety of default | contracts? How do I join them with contracts table to know which | consumers have default contracts? Or groups consumers by type of | default contracts? | | Keep the model sensible. Contracts belong to contracts. Add basic | sanity to the model. The service that manages the data guards the | data beyond basic data model sanity checks. Also, explicit is | better than implicit. | gambler wrote: | _> I think this happens because of atomistic, object-orientated | thinking._ | | If you think storing a list of date tuples is "OOP thinking", you | have no clue what OOP really is. Educate yourself by listening to | people who invented it, not Java consultants or FP zealots. | | OOP is about interacting with things via interfaces and messages, | rather than data. An OOP solution to inconsistencies of this sort | would be an interface that either automatically corrects | inconsistencies or throws errors when you try to introduce them. | _The whole point_ of OOP approach is that you 're not locked into | a single data representation, so, for example, you can improve | how you store data without re-engineering everything in your | system that relies on that data. | dudul wrote: | Just use the language of the business/domain to write your | model/API. No need to reinvent names and rules that already | exist. Do whatever you want with the underlying implementation, | your DB, etc. | 1-more wrote: | A great talk on this subject is "Making Impossible States | Impossible" by Richard Feldman from Elm Conf 2016 | | https://www.youtube.com/watch?v=IcgmSRJHu_8 | unwind wrote: | This is sound advice for sure, and I think it applies much more | broadly (or do I mean deeply?) than just for databases. | | For instance, one micro-application of it that makes a lot fo | sense to me is the const-ness of variables in languages like C. | Since a normal variable can be overwritten, and that affects the | use and semantics of that variable, marking them as const | whenever possible really helps in my opinion. | | For instance, take this micro-snippet of code from Redis [1]: | int time_independent_strcmp(char *a, char *b) { char | bufa[CONFIG_AUTHPASS_MAX_LEN], bufb[CONFIG_AUTHPASS_MAX_LEN]; | /* The above two strlen perform len(a) + len(b) operations where | either * a or b are fixed (our password) length, and | the difference is only * relative to the length of | the user provided string, so no information * leak | is possible in the following two lines of code. */ | unsigned int alen = strlen(a); unsigned int blen = | strlen(b); unsigned int j; int diff = 0; | | Here, it seems quite important that the values of 'alen' and | 'blen' do not change during the execution of the function, since | it's iterating over them. The 'diff' variable on the other hand | is intended to change as a function of all the characters in both | strings, that's the whole purpose of the function. | | So, I think the middle two lines should be: | const size_t alen = strlen(a); const size_t blen = | strlen(b); | | That "locks" the values in, so you know that for the rest of the | function at least these two values stay the same. Since changing | either length mid-function would represent an invalid state, I | think this is close to the OP's point. | | Also please note that I have massive amounts of respect for Redis | and Antirez, I'm not trying to say that the code is bad or | anything, it was simply the first file in the first high-profile | open source project that came to mind. Obviously this code | _works_ and has probably been more tested than most things I 've | written, again I'm NOT trying to somehow paint that program(mer) | in a bad light. | | Btw, changing the type (to me) to size_t is also an obvious, | free, improvement since it frees the reader from having to worry | about why the type was unsigned int to begin with. Also 'int' can | be less wide than 'size_t', which again is probably not a problem | _in practice_ since the CONFIG_AUTHPASS_MAX_LEN is probably | always going to be even less, but still. It 's pointless | complexity that triggers anxiety in people like me. :) | | [1]: https://github.com/redis/redis/blob/unstable/src/acl.c | Kaze404 wrote: | There's a great talk from Richard Feldman that talks about this | in the context of Elm. | https://www.youtube.com/watch?v=IcgmSRJHu_8 | UglyToad wrote: | I was pleased and a bit surprised to see this post talk about a | database level approach to this problem. As important an idea it | is at the application code level where most posts discuss it, | especially in the context of type systems like Haskell, I think | it gets neglected when it comes to persistence. | | For those of us developers who are mere CRUD peons I think it's | the most important factor in system stability that is mostly | negected; either in favour of speed of iteration (NoSQL) or | checks at the application code layer. | | As I'm increasingly coming to appreciate, systems without | enforced integrity at the database level are a breeding ground | for bugs. You can add checks in application code but all it takes | is 1 bad commit, or 1 check that slipped your notice and now you | have bad data and all future code in the system needs to support | and work around the bad data. With foundations of sand even the | most elegant structure in application code is doomed to a short | and catastrophic future. | | As other commenters mention hindsight is 20:20 and you won't | always know what the constraints should have been until after the | fact, or the constraints might be wrong. But the 'trendy' | development practices treat good old fashioned SQL constraints | and data integrity as decidedly unsexy, to the detriment of a lot | of systems. | | MySQL didn't even have check constraints (well, actually apply | them) until version 8 which shows how ignored these things are. I | appreciate the post is more about the fundamental design of the | stored data but people are also forgetting unique constraints, | foreign keys and all the other tried and tested tools which | protect the most important part of most CRUD systems, the data, | from devolving into an awful mess. | sakoht wrote: | Presuming if you are 100% sure what "invalid" is: It is possible | that being _unable_ to represent a logic error might mean that a | logic error is represented as a "valid but incorrect" value, | which is even more dangerous. | | Take the example of storing a continuous series of date ranges. | If I only store the first date of each pair, I can never | accidentally have an overlap or gap. But if my code has a logic | error that incorrectly calculates a range, being able to | represent it could throw an error. If that code error translates | to an incorrect break-point instead, I haven't prevented a bug, | I've hidden it. | JoeAltmaier wrote: | Surely the data model is helpful, not only in keeping data | integrity intact by disallowing invalid state. But also to help | you think about your data, and discover simplifications and | subtle rules to improve your model. | | I attacked the old 8-queens problem years ago as part of a | contest in Byte Magazine. All the solutions published modelled | the board as an 8X8 array with a 1 or zero to indicate the | presence of a queen. They all ran slow and suffered from invalid | game states confusing the algorithms. | | My solution was to observe that only 1 queen could be in each | column (all solutions require queens to not be able to capture | one another, and they can capture vertically). So I represented | the board by an array of 8 values, the height of the queen in the | column represented by indexing the array. | | Further, since only one queen can be in each row, the values were | the numbers 1-8. | | My solution then, was to seed the array with the value 1, 2, 3, | 4, 5, 6, 7, 8. Then test if array[i]-array[j] == (i-j) or (j-i), | which would mean a diagonal capture. | | Simply permuting the values, searched a subset of board states | that had to contain all possible solutions. And the permutation | tree could be truncated as soon as for any (i,j) the test failed. | | Anyway, the program was tiny and finished in negligible time. A | pity I didn't enter the contest! | ChrisMarshallNY wrote: | I'm not sure that this is what I think about, when I think of | "Making Invalid States representable" (A concept that I | practice). | | That said, it's an excellent, commonsense article that describes | a highly usable approach to information architecture. | | I also agree that OOP programmers have always considered their | designs to "represent the 'Real World'(tm)." In my experience, I | use OOP constructs to represent many things that should never be | exposed to the user (like messages, adapters, states, and state | transitions). | | There's the classic usability concept of the "Mental Model." That | is the model that the user builds in their head, as they navigate | the UX. These mental models can be drastically different from | what happens internally, and a good UX designer can reinforce a | desired model (which the user may then ignore). | ChrisMarshallNY wrote: | Um..." _un_ representable". :P | H8crilA wrote: | So Google's protocol buffers have this feature called "required" | fields, which enforce schema in the type system. You should never | use it. Never. It's one of those things that sound good until | you're a few years into the project. Similar to how you should | never be using meaningful IDs as primary keys for objects, always | use meaningless fingerprint-like integers. Or how all integers | should be signed unless you're dead sure the number is unsigned | (like a fingerprint). And how many integers should actually be | strings, unless you're dead sure this is a number (externally | provided IDs, such as for example customer account IDs, are not | numbers). Or how you should be careful to use bytes rather than | unicode strings. | | Make your schema permissible and your code paranoid, it will pay | off later. Build a data linter if necessary, but don't tie the | schema. | sathorn wrote: | This might make sense for a transport schema because you can | receive messages from the past or the future but it does not | translate to internal program state or database schemas where | this is not the case. | | Making invalid states unrepresentable is basically the process | of taking human-checked invariants and turning them into type- | checked invariants. This reduces the likelihood of bugs and | guides humans to use the system correctly. | nemetroid wrote: | I feel that this advice is almost opposite to that given in the | article. By making all fields optional, your data model no | longer helps in making invalid states unrepresentable. | H8crilA wrote: | Exactly! Because today's invalid states are commonplace in | the future. | srtjstjsj wrote: | "required" is an example of making an invalid state | _representable_ and having it ruin your program. | dodobirdlord wrote: | Yea, a better example of making invalid state unrepresentable | in Google's protocol buffers is to use the "oneof" feature to | mark that a set of fields are mutually exclusive. If A, B, | and C are mutually exclusive you can put them in a oneof, | which also saves space in the binary representation. If in | future you discover that A and B but not C needs to be a | valid state, you can add a 4th AB option inside of the oneof. | titanomachy wrote: | That feature's been dropped in proto3, all fields are optional. | jdmichal wrote: | > And how many integers should actually be strings, unless | you're dead sure this is a number. | | I phrase this as: If it doesn't make sense to do math on it, | it's not a number. What does adding one to a customer account | number mean? Absolutely nothing -- you get a completely | different account number. So it's not a number, but a numeric | string. | lukasLansky wrote: | It's not a string either though: in the same way integer | addition (almost always) does not make sense, string | concatenation (almost always) does not make sense either. The | proper type would allow for equality check and explicit | string (de)serialization only. | jdmichal wrote: | I meant string in a more casual sense. From a more | technical sense, it would be a symbol. I'll be more precise | in the future. | trevor-e wrote: | Can you elaborate more on the "required" fields point? We've | been using a similar feature for several years now in APIs at | my work and haven't run into any issues, though we do only use | it very sparingly for fields that logically can never be | missing. At some point a client has to make the call for what | they consider essential, so pushing it in the schema makes this | less ambiguous from what I've seen. Maybe it's fine for our | use-case (mostly static APIs), whereas what you're saying is | good advice in general. | srtjstjsj wrote: | https://capnproto.org/faq.html#how-do-i-make-a-field- | require... | | Required now means requires forever because people can't | migrate safely. But technically you can change a protocol | descriptor from required to optional, which is invalid | (usually, in a distributed non-transactional system (the | common kin) but nothing stops you from doing it. So why not | make required forever? Well, do you really want to commit to | anything _forever_? | trevor-e wrote: | After reading that article my take-away is not that | "required" is bad and should never be used ever, but rather | it was bad with how Google wanted to use it. And since this | is Google's project, it makes sense for them to remove the | feature if it's causing data center outages, it's not worth | the risk at that point. | | For example, in the case of the message bus they say "And | even though the message bus doesn't care about message | content", and later on "The right answer is for | applications to do validation as-needed in application- | level code." Strict schema and validation is most helpful | for application developers, not some middleware routing | code. Was it not possible for them to write a parser that | doesn't fully validate the message for use-cases like this? | dodobirdlord wrote: | Protocol buffers already require you to commit to some | things forever, like the type of a field, or whether two | fields belong in a oneof together. I'm not saying that | "required" was a great feature, but it's not exactly | unique. | joshuamorton wrote: | No they don't. An optional field can be deprecated and | replaced with a different field. This can be done to | change the type (also some types can be changed, although | you probably shouldn't). | | Required usually cannot be deprecated. | yongjik wrote: | I may be in the minority, but after happily using protobuf for | years, I believe that there's nothing inherently wrong with | required fields - instead, what's "wrong" is the protocol | buffer API. | | Namely, when constructing a protobuf, theoretically, there | might be two different ways: (A) first gather all the fields, | and then construct the protobuf from these fields; (B) first | construct an empty protobuf, and fill in the fields as | necessary. The actual protobuf uses (B) - which is convenient | in most cases, because when you start constructing a protobuf | usually you don't have all the data ready yet. | | However, with required fields, this means when you construct | the protobuf it starts with all required fields missing - i.e., | an invalid state! | | I'm not sure what's the best way to fix it, because it would be | infeasible to rewrite all the code to gather all the fields and | then construct the protobuf - also it will be hugely | inefficient in many cases. However, I feel the "no required | fields" rule is essentially a null pointer (the "billion dollar | mistake") in disguise - the actual problem is that the API | doesn't enforce type safety. | joshuamorton wrote: | This isn't actually the issue with required fields (some | languages, like java and (usually) python, use a construct- | once style). | | Imagine you have an innocent `required` field. You have a | producer and a consumer of that field that communicate over | the wire. (or instead of the wire, imagine a database). | | You send or store an instance of that protobuf. Now let's say | that you want to make the field optional (or remove it). With | an already-optional field, this is easy. You stop setting it, | and maybe eventually you clean it up. | | With a required field, however, you can't do that. If any of | your _clients_ don 't have the newest schema version, you | can't unset the field (so imagine that you support mobile | clients who may never update). Or if there's middleware you | don't know about that introspects your proto. Even if you do | the dance right and update your server and client before not | setting the new field, you could crash outdated middleware | that you didn't know about. Whoops! | | Or with the database, you now need to dual write or something | complex because if you need to roll-back to an older version, | you'd be unable to _read_ the protos that don 't include the | required field. | | Required doesn't do well over time. It has nothing to do with | setting the values. | grogers wrote: | If you have old clients that expect that field you are | removing to be there in a meaningful way, you still have to | update all clients before you can stop setting it. Having | the protobuf schema itself use optional or required doesn't | change that, it just makes the dependency explicit there, | instead of only in the code at the endpoints. | | Changing required to optional isn't a magic fix for | protocol compatibility. If it were (for your limited use | case) you can just make that change to the protobuf client | side as it doesn't affect the wire | representation/interpretation. | joshuamorton wrote: | > If you have old clients that expect that field you are | removing to be there in a meaningful way | | Right, there's the rub. `required` means that _anyone who | deserializes your proto_ falls into this category. That | 's a much larger group than "anyone who reads a specific | field". | | (Note also that there's lots of ways to make reading a | field that is empty fallback to doing some reasonable | non-catastrophic behavior, required doesn't let you do | those things). | [deleted] | dahauns wrote: | This line raised a huge red flag for me: | | "If the customer doesn't have a fixed contract, it is assumed | they are on a default contract" | | No. Don't assume, specify. Explicitly. | | If this is part of your specification, have a DefaultContract | entity of some kind somewhere. And don't call this table just | "Contracts", make it clear that those exist in addition to or | overlay a default contract. | | It might sound like overkill, but in my experience in business | application development, one of the single largest and most | painful sources of errors and refactoring headaches are implicit | assumptions in the data model. | Justsignedup wrote: | Yeah that's a lot to be said about being explicit. For example, | what if even on a default contract you want to start tracking | payment adherence, or maybe sales information and attributing | them to sales' numbers. | | Or maybe you start reporting and want reports on contracts vs | default contracts. | | And then things get messy. Because the real world gets messy. | pdonis wrote: | _> If this is part of your specification, have a | DefaultContract entity of some kind somewhere._ | | Yes, but not in the database table for contracts. That's how I | read this part of the post. I would expect the assumption that | if the customer doesn't have a fixed contract, they are on a | default contract to be encoded in business logic somewhere in | an application that uses this database. | dahauns wrote: | > _I would expect the assumption that if the customer doesn | 't have a fixed contract, they are on a default contract to | be encoded in business logic somewhere in an application that | uses this database._ | | That's exactly the kind of harmful assumption I'm talking | about. Harmful in that when people act on that assumption and | actually implement it that way. | | How a default contract might be represented may vary, and of | course it is in no way required or even sensible to be stored | as a row in the contracts table. | | But to think that it is such a fundamentally different kind | of data that it should be represented apart from the rest, in | a different system, in a different layer, in a completely | different form of representation - this is where madness | lies. | pdonis wrote: | _> How a default contract might be represented may vary, | and of course it is in no way required or even sensible to | be stored as a row in the contracts table._ | | But you're saying it does need to be represented somewhere | in the database? That putting it anywhere else is harmful? | | Can you elaborate? Why is it harmful? | pdonis wrote: | _> to think that it is such a fundamentally different kind | of data that it should be represented apart from the rest, | in a different system_ | | The concept of a "contract" is a business logic concept to | begin with. That concept is represented in the application | already. How (or whether) the data associated with a | particular contract is stored in the database is an | implementation detail. | kpmah wrote: | > have a DefaultContract entity of some kind somewhere | | This is the OO mindset described at the bottom - the odd | compulsion to have a reified entity for every concept. | | A lot of people have missed that the representation you persist | doesn't have to match the representation you present. In this | case, don't store default contracts, but present them e.g. via | a database view. | dahauns wrote: | But nothing of this has to do with OO. This is an argument at | the relational level. | | And yes, if there's something as fundamental a concept in the | business model as a default contract that's in effect when no | other contracts overrule it, then IMO it damn well should be | represented explicitly in the _persisted_ data model. | | I didn't talk about the specific nature of the | representation. The important part is that the intent of the | data should be explicit - data lives longer than code. | | I've seen far too many DB schemas leaning too hard on | implicit assumptions and inferring information that lead to | hard to understand data models, unnecessary complex (and hard | to optimize) data access (no matter the paradigm), and well, | lots of errors. ___________________________________________________________________ (page generated 2020-10-05 23:01 UTC)