[HN Gopher] Applying "make invalid states unrepresentable"
       Applying "make invalid states unrepresentable"
       Author : fanf2
       Score  : 352 points
       Date   : 2020-10-05 08:43 UTC (14 hours ago)
 (HTM) web link (kevinmahoney.co.uk)
 (TXT) w3m dump (kevinmahoney.co.uk)
       | yxhuvud wrote:
       | Not certain I get the needless attack on OOP in this text. The
       | error could just as well have happened in any alternative to OO.
       | What is needed is the realization that there is that there is a
       | schedule that needs to have full control over the times to not
       | create coordination issues. That is a realization that is utterly
       | independent of the OOP-ness of the eventual solution.
         | brbrodude wrote:
         | It's as if OOP code negates putting thought in designing a
         | solution, absolute nonsequitur for me.
         | kpmah wrote:
         | The problem with a narrow OO mindset is that it encourages
         | encapsulation and atomic objects with hidden state. Simply
         | gluing those pieces together can create suboptimal
         | representations - a more holistic thought process is better.
           | jschwartzi wrote:
           | Not necessarily. Some people can't see the forest for the
           | trees no matter what kind of programming language they're
           | using.
           | What's needed here is to sit down and think REALLY hard about
           | how the system works and what some of the terms are that are
           | used to describe things. And also to have an intution for
           | when someone is using imprecise language. GP cited use of a
           | "Schedule" object as a way to enforce the invariant. That
           | might represent a block of intervals by a sequence of times.
           | The less precise you are about what's needed and how the
           | system should work, the more of a soupy mess you're going to
           | build.
           | People gravitate toward OO because OO makes it feel like
           | you've got a lot of conceptual clarity. You can get back to
           | that familiar "subject/object" dual we love in English. But
           | the problem is actually that, although people can pick
           | subjects and objects readily from a sentence, they might not
           | have much luck picking the important subjects and objects
           | from a sentence. And that's the problem we really need to
           | solve.
       | tomstuart wrote:
       | My favourite real-world example is something like: you have a
       | React component which accepts the boolean props `showFoo`,
       | `showBar` and `showBaz` to control its mode. The intention is
       | that exactly one of `foo`, `bar` or `baz` will be shown at any
       | given time, but that invariant is maintained only loosely, e.g.
       | by a parent component which holds three flags in its state and
       | mutates them in tandem inside various event handlers.
       | The obvious problem is that those three props can easily get out
       | of sync if you make a mistake in updating them, and the equally
       | obvious solution is to replace them with a single prop (e.g.
       | `show` or `mode`) which contains a value from some enumeration --
       | maybe just the name of the thing to show, or a numeric ID that is
       | given meaning elsewhere, or (galaxy brain) a component. That way
       | the invariant is maintained strongly and automatically by the
       | representation itself.
       | This example sounds vacuous -- who would ever use three booleans
       | in the first place? -- but in practice it's very easy for UI code
       | to incrementally get into this mess over time without anyone
       | noticing. The situation is also often more subtle, e.g. the
       | invariant is more complex than "exactly one flag is true", which
       | makes it harder to spot that you can model all the valid states
       | with a finite enumeration.
       | leoc wrote:
       | I appreciate what people are getting at with "make invalid states
       | unrepresentable", but it can't be the best description of what
       | the true objective is. After all, the first language to make
       | invalid states unrepresentable was probably TECO
       | https://en.wikipedia.org/wiki/TECO_(text_editor)#As_a_progra...
       | ...
         | chrisweekly wrote:
         | Regarding "make invalid states unrepresentable", I'm curious
         | what others think of FSM (Finite State Machines), eg use of
         | XState (vs eg Redux) in the FE webapp state mgmt space.
       | legulere wrote:
       | You could also see this as a form of single point of truth.
       | teddyh wrote:
       | This is very much like database normalization, in that it has the
       | benefit of making invalid data impossible, but the drawback of
       | often making queries into the data much more cumbersome and
       | usually also inefficient.
       | As with database normalization, it is a good idea to first do it
       | as much as possible, and then denormalize again until it is fast
       | enough.
         | rocqua wrote:
         | I have always has the idea of a database that does the
         | denormalizations you want automatically for you.
         | Essentially, you keep the DB in a normalized state. You define
         | views of the DB that you want. Then the DB keeps those views as
         | tables for you, and the DB does all of the hard work of keeping
         | those view tables consistent with the denormalized data.
         | Essentially the DB does the atomicity, cache-invalidation, and
         | cache-updating for you.
         | You get performance, and you get the certainty that invalid
         | states are un-representable.
         | I guess the biggest blocker here is in automatically
         | determining what fields do and do not need to be update
         | automatically?
           | Serow225 wrote:
           | Check out the Noria project/database from MIT, I think you'd
           | like it :)
           | https://github.com/mit-pdos/noria
           | https://corecursive.com/030-rethinking-databases-with-jon-
           | gj...
           | https://notamonadtutorial.com/interview-with-norias-
           | creator-...
           | https://pdos.csail.mit.edu/papers/noria:osdi18.pdf
       | tzs wrote:
       | A large fraction of the comments are about how if you do this and
       | then someday your requirements change you might have to redo your
       | underlying data structures or databases, with the implication
       | being that you should therefore make those as general and
       | flexible as possible.
       | That reminds me of an interesting point I saw in a book whose
       | title and author escape me. He said one of the reason you
       | encounter so many bad designs in Java programs is that many new
       | Java programmers look at the design of Swing to learn good
       | design.
       | He wasn't saying that Swing is badly designed--but Swing is a
       | framework/library, not an application. What it takes to be a good
       | framework/library is different than what it takes to be a good
       | application.
       | If you are writing an inventory management application you can
       | design your tables and data structures and interfaces around
       | things inventory management applications need. If you are writing
       | a medical billing application you can design around what medical
       | billing needs.
       | If you are writing a framework or library that might be used by
       | inventory management applications and medical billing
       | applications and all the other nearly infinite kinds of
       | applications people will write you need to keep it very general
       | and flexible...but you also have to keep it fast and not too
       | bloated. It's a much harder design problem, with different best
       | practices for what is good design and what is not.
         | hinkley wrote:
         | There is a trick that I wish all senior staff knew but we find
         | ourselves having to teach as a matter of routine.
         | 1) Don't advertise what you're not selling
         | 2) Don't sell everything that you've made
         | It's possible to write software that has a public contract that
         | allows only a subset of states that the internal system allows.
         | You can use this to support one customer that has a requirement
         | that is mutually incompatible with another customer's, but you
         | can also use it for migrations. One should be able to create an
         | API where the legal values in the system are strictly limited,
         | but the data structures and storage format may have affordances
         | to support migration.
         | That doesn't necessarily solve the problem of communication
         | between services and migrating these changes into a running
         | system, but it's a useful tool. If you've ever had a coworker
         | who insists on your service call diagrams looks like a tree or
         | an acyclic graph, this problem is certainly one of many reasons
         | they may be insisting on this. With a DAG there is an order in
         | which I can deploy things that has a prayer (but not a
         | guarantee) of letting all of the systems understand each other
         | during each increment of deployment.
         | People have come up with alternative solutions for this problem
         | by employing sophisticated sets of feature toggles, and in some
         | ways this is superior, but it trades the number of steps (each
         | of which has a potential for human error, and consumes calendar
         | time) for increased reliability on average.
       | secondcoming wrote:
       | It looks like a variation of Run Length Encoding to me, where the
       | 'Run' is a duration instead of a count.
       | wa1987 wrote:
       | Related video on this subject:
       | https://www.youtube.com/watch?v=IcgmSRJHu_8
         | matsemann wrote:
         | I thought about that video today when integrating with an old
         | SOAP API. I need to find the name+some property of some
         | persons. Instead of having a list with [(name1, prop1), (name2,
         | prop2),...], I get two distinct lists of [name1, name2,..] and
         | [prop1, prop2,..]. In practice I think the lists will always
         | match. But there is nothing stopping them from not being the
         | same length, or even worse: one having a gap..
           | vbezhenar wrote:
           | Put an assertion. It's better to throw early rather than
           | dealing with wrong data later.
       | wruza wrote:
       | Ah, good old design for systems with no special business cases.
       | Heard of them, but never seen one.
       | etripe wrote:
       | This is a good introduction on a conceptual level.
       | I think a large contributor to the problem is story-oriented
       | development, where all that matters in the sprint is "getting it
       | done" and not looking at the broader context.
       | To make unrepresentable states practical, Scott Wlaschin has an
       | excellent write-up here (0). His book (plugged in that article)
       | is also excellent.
       | [0] https://fsharpforfunandprofit.com/posts/designing-with-
       | types...
         | lmm wrote:
         | > I think a large contributor to the problem is story-oriented
         | development, where all that matters in the sprint is "getting
         | it done" and not looking at the broader context.
         | I think that's exactly backwards. This kind of overcomplicated
         | representation usually happens because people put too much
         | effort into designing their representations up front. If you
         | follow story-oriented development and only implement the parts
         | they actually need to get the current task done, you never end
         | up with these wasteful extra states because you never actually
         | needed them. But people think that planning before coding is
         | somehow virtuous, and then they're tied to following those
         | plans.
           | acidbaseextract wrote:
           | The representational debt the parent comment is referring to
           | is akin to that quip, "if I had more time, I would have
           | written a shorter letter."
           | I find that both types of representational debt are common.
           | jka wrote:
           | To be fair, both are possible scenarios:
           | - A team spends too much time over-engineering a
           | representation for something that could be more easily
           | maintained using a simple model
           | - A team spends too little time considering the edge cases
           | with a representation because they feel pressure to deliver
           | the feature within a short space of time
             | matthewmacleod wrote:
             | And they often happen at the same time, too. I've seen
             | situations where developers over-plan a system in advance,
             | then spend ages hacking smaller features and changes into
             | it to meet quick turnaround times, instead of taking proper
             | step back and being aware of when their original design
             | needs to be revised.
           | Aeolun wrote:
           | Our database gets shit added on an 'as necessary' basis, and
           | I guarantee you it's not a great thing.
           | dtech wrote:
           | the flip side of this is that sometimes your initial designs
           | need to be expanded to account for something new and that's
           | hard, leading to tech debt and hacks.
           | personally I think that's a better trade-off than
           | implementing something complicated up-front that you don't
           | know will work and ending up with flexibility in the wrong
           | places, leading to tech debt and hacks.
           | pjmlp wrote:
           | I assure you that I have wasted enough time fixing story-
           | oriented development with major refactorings, because some
           | edge cases weren't possible to be easily extended in the
           | existing code.
           | Also have experience fixing story-oriented development with
           | dirty workarounds, because major refactorings were also
           | required, but not desired by whom was paying for the stories.
           | Both ways I didn't care, it was money on the bank anyway.
         | thsealienbstrds wrote:
         | It doesn't sound to me like the kind of thing that any
         | particular WoW is going to fix. As if it would suddenly
         | naturally dawn on someone to do this given enough time. It's
         | just lack of knowledge and/or discipline.
         | maest wrote:
         | Sum types are one of the main things I miss when working in
         | Python. Is anyone aware of any good ways of adding sum types to
         | Python?
           | masklinn wrote:
           | The entire point of sum types is that they be statically
           | checked. Without static typing, they don't seem really
           | useful: Erlang doesn't have sum types, and it doesn't really
           | have a use for them until it gets a type system which can
           | leverage them. Instead it models "sum types" as tagged tuples
           | e.g.                   {ok, Ok} | {err, Err}
           | _however_ Erlang has very good pattern matching. Python...
           | doesn 't.
           | There's a PEP but I stopped following it because the
           | discussion was a mess. And it's apparently now split into 3
           | different PEP, I don't know whether that's an improvement or
           | not though.
           | Furthermore Python's dislike of HOFs means you can't really
           | do "monadic" processing as you'd do in, say, smalltalk where
           | your "variants" would really be subtypes with cool higher-
           | order messages. So you're mostly just adding indirections.
           | nybble41 wrote:
           | I'm not sure you would consider this a _good_ way, but one
           | can implement (unchecked) sum and product types in any
           | language with lexical closures via Scott encoding[1]:
           | # data Pair a b = Pair a b         def pair(x, y):
           | return lambda f: f(x, y)              # data Either a b =
           | Left a | Right b         def left(x):           return lambda
           | l, r: l(x)         def right(y):           return lambda l,
           | r: r(y)              v0 = pair(2, 3)         v1 = left(7)
           | v2 = right(9)              # case v0 of { Pair x y -> x + y }
           | print(v0(lambda x, y: x + y))              # case v1 of {
           | Left x -> x + 1; Right y -> y * 2 }         print(v1(lambda
           | x: x + 1, lambda y: y * 2))
           | [1] https://en.wikipedia.org/wiki/Mogensen%E2%80%93Scott_enco
           | din...
             | maest wrote:
             | Wow, this looks clever. (maybe a bit too clever)
             | I'll need to read more about it, thanks for the pointer!
         | choeger wrote:
         | > I think a large contributor to the problem is story-oriented
         | development, where all that matters in the sprint is "getting
         | it done" and not looking at the broader context.
         | I think you have a point here. This design offers much better
         | safety, comparable to "parsing instead of validating". But it
         | requires up-front design. And that is indeed "verboten" in
         | modern software development management style.
         | Why is it "verboten"? I think it has to do with two fundamental
         | concepts of scrum et. al.:
         | 1. Stories that are "ready" just need to be "implemented". The
         | implication is that the developer does not design and
         | everything is orthogonal. There are no interdependencies,
         | maintenance effort or non-functional requirements.
         | 2. A story that is "ready" focuses solely on the desired
         | outcome for some selected examples. There is no generalization
         | of the examples and consequently little to no abstraction.
         | I think these issues stem from the fact that Scrum et. al. are
         | intrinsically tools for managers to isolate them from the
         | complexities of software engineering. Every metric of scrum,
         | for instance, like "progress" or "definition of ready" is
         | essentially empty of meaning for software engineering.
           | chrisweekly wrote:
           | +1, favorited.
           | aunty_helen wrote:
           | I like how you've said it here, but one thing that scrum
           | doesn't have in it is relief from your professional duty as
           | an engineer.
           | If a system needs to be designed in a particular way, do so.
           | That's how long it takes and that's why in planning you
           | discuss how it will be designed.
           | The design of how you're going to build the system is taken
           | care of before the task is split into easily digestible bits
           | that meet a definition of ready.
           | If you need to build it before you know how to build it,
           | scrum allows for research spikes into this. A focused,
           | timeboxed interval that so you have the ability to estimate
           | the actual difficulty of work to be completed.
       | jl6 wrote:
       | I like the concept but I've seen a fair few examples of where the
       | developers and users clearly had differing opinions about which
       | states are invalid!
       | Dates are a rich vein of examples. Some users will happily
       | consider "25th December" to be a date, without any year, because
       | it might be the name of a folder in which they store their
       | Christmas stuff. More seriously, genealogists or historians may
       | want to record "25th December" (again without a year) as the date
       | of a photograph because they can clearly see it was taken on _a_
       | Christmas, they just don't know which one. The naive developer
       | would just slap a DateTime type onto the system and feel good
       | about having avoided malformed input.
         | IggleSniggle wrote:
         | This is so obvious now that you've said it. I can see so many
         | possibilities. I feel as though my eyes have been opened.
         | kroltan wrote:
         | But then you're not dealing with dates at all, just categories
         | which happen to have names that look like dates, no?
         | Like you say, a "folder". But the photo file's metadata will
         | either have a complete DateTime, or none at all, unless there
         | is some sort of camera that is able to know what day of the
         | year it is without knowing the year! Which due to things like
         | leap years is impossible.
           | IggleSniggle wrote:
           | Importantly, these "date-like" things have important date-
           | semantics. That is, it may be reasonable to expect your
           | software to be able to handle varieties of precision or
           | completeness of date metadata, in which case your date
           | representation may need to be able to interact with DateTime
           | fluidly without actually being one itself, despite it being
           | tempting to remove these possibilities by making incomplete
           | date information unrepresentable.
           | will4274 wrote:
           | > some sort of camera that is able to know what day of the
           | year it is without knowing the year! Which due to things like
           | leap years is impossible.
           | Just a small note, as we're talking about assumptions, but
           | this is clearly untrue. Consider - for decades, humans wore
           | watches that knew the day of the month, but not the month -
           | you were just expected to turn the day forward on the 1st day
           | of months following non-31 day months. Similarly, we can
           | imagine disposable cameras that ask the user for the current
           | day on first use and simply assume leap years don't exist,
           | and require the user to correct the date for any leap years.
           | You might call this a silly design, but systems often have to
           | interact with external systems that have silly designs. I
           | believe I actually owned a toy PDA (a device for a child, not
           | an adult) that did not have a year back in the 2003 or so.
           | rocqua wrote:
           | > But then you're not dealing with dates at all, just
           | categories which happen to have names that look like dates,
           | no?
           | If you and your client disagree about what constitutes a
           | date, that doesn't mean "your client is wrong" it means "you
           | need better communication. You can't fix this problem by
           | requiring everyone use consistent definitions. Instead the
           | solution is to check assumptions as often as possible.
         | lkitching wrote:
         | If you only have a day and month like 25th December you should
         | represent it with a type that contains just that information.
         | Java for example has the MonthDay class (https://docs.oracle.co
         | m/javase/8/docs/api/java/time/MonthDay...) which can be
         | converted into a local date when the additional information is
         | available. If your users want to refer to that as a date then
         | that should be handled in the UI but should not lead to
         | ambiguity in the internal representation.
         | isbvhodnvemrwvn wrote:
         | Also in genealogy:
         | - estimated dates
         | - calculated dates (e.g. someone was 30 in 1870, so he was born
         | in "calculated 1840")
         | - unreadable or unavailable months or days (typically recorded
         | as 1980-00-13)
         | - time ranges with all of the above as boundaries e.g. "after
         | 1760-03-00 and before calculated 1800"
         | - plainly incorrect dates, but that's what the document says
         | (1865-02-30)
         | - no dates (some software tries to enforce putting in some data
         | in for whatever reason)
         | - dates with unknown calendar
           | Terr_ wrote:
           | All plausible and scary. It sounds like a recipe for a system
           | where the different qualities of dates are their own entity,
           | and anybody doing a date-search needs to provide some
           | criteria on the degree of specificity or certainty they
           | require.
           | For that matter, someone might want to do a text-based search
           | on dates: "This damaged photo shows 196_-_1, which could be
           | at least 20 different months..."
             | isbvhodnvemrwvn wrote:
             | There's also another dimension to that uncertainty. Dates
             | are typically attributes of events involving one or more
             | people in various roles, and these events are (hopefully)
             | attached to one or more sources. The type of sources you
             | use affects the confidence of various attributes of the
             | event (like dates or participants).
             | In my area of research death records are typically a bad
             | predictor for date and location of birth. When the records
             | were managed by the churches (until after WWII pretty
             | much), priests did not want to pester grieving family for
             | the exact date of birth, so they relied on approximate age.
             | Naturally the subject of the death record would not
             | typically point out errors. On the other hand marriage
             | records required looking at the actual birth records
             | (sometimes mailed across the country), as these contained
             | notes on all marriages of the individual (this was done to
             | ensure monogamy).
             | Is it on the genealogist to consider this confidence when
             | specifying the date of the event, or should the software
             | intervene? Due to complexity, the former is the industry
             | standard. But maybe there are some brave (or stupid) people
             | who will try to take it into account in the future.
               | jl6 wrote:
               | I implemented a [private] genealogical data entry system
               | based on the GenTech data model, which has a "surety
               | scheme" entity, designed to capture perceived
               | uncertainty. I also added a "Fuzzy Date", where every
               | date-like value was decomposed into all its constituent
               | components (year, month, day, hour, ...), all optional.
               | It could also capture a range of such fuzzy dates, so it
               | was possible to enter a "date" such as "The first of a
               | month no earlier than 1950 and before June 1990". There
               | was a loooong list of validation constraints to attempt
               | to prevent contradictions being entered, and I _think_ I
               | caught all the cases, but...
         | [deleted]
       | JamesBarney wrote:
       | If you enjoyed this blog post and want to go into more depth on
       | how to model your domain using types check out
       | https://fsharpforfunandprofit.com/series/designing-with-type...
       | a-dub wrote:
       | this is a nice idea, but as mentioned in other comments, i think
       | the most important goal when designing a schema is to design to
       | make the queries you'll frequently be making simple and
       | efficient. extensibility comes second and if you can make some
       | invalid states impossible to represent, then that's definitely a
       | bonus.
       | userbinator wrote:
       | Another application of this principle is in data representation,
       | and why I think text-based formats are horrible in general for
       | communication between software that doesn't involve a human
       | reading it the majority of the time: there, the "invalid states"
       | not only cause complexity in the parser, but they also waste
       | space (think of storing a 4-byte integer as the 4 bytes directly,
       | vs. a string of variable-length ASCII text.)
       | joosters wrote:
       | The time period example seems to miss an obvious weakness in the
       | described Time Period 'object' - it's implied that the end date
       | should be >= the start, but if you are representing a time period
       | with ( Date, Date ) then you are still allowing invalid states to
       | be represented - yet this is what the writer is trying to avoid.
       | Likewise, a timeline split into contiguous periods can still
       | represent out-of-order Dates.
       | a Time Period object of ( Date, Duration ) would fix the first
       | issue, and a TimeLine of ( Date, Duration, Duration, ... ) would
       | fix the second one (assuming unsigned Durations!)
         | [deleted]
         | lxe wrote:
         | I think the example's last visualization is confusing.
         | The timeline is {date1, date2, date3, date4}. Let's say you
         | have 2 periods, date1 - date3, and date2 - date4. Period 1 can
         | be represented as {date1, date2, date3}. Period 2 can be
         | {date2, date3, date4}.
         | Am I understanding this correctly?
         | jxf wrote:
         | I think you misunderstood the author's point: time periods
         | aren't explicitly represented as (Date: start, Date: end).
         | Instead they're a set of dates. The time periods are then
         | implied by the set, making "end date before start date"
         | impossible.
           | VBprogrammer wrote:
           | It also deals with the problem of the bounds on the ranges
           | (open-closed, open-open etc) implicitly in a way which is
           | harder to mess up.
           | One complication it's potentially missing is exactly who's
           | days we are talking about i.e. is it days starting in GMT or
           | UTC or EST or whatever the 'suppliers' or the 'customers'
           | timezone is, are we actually talking about some day concept
           | perhaps from start of business. Representing this as a
           | datetime start / end certainly makes it possible to represent
           | these concepts, if perhaps not making anything else
           | particularly easier.
           | [deleted]
           | zimpenfish wrote:
           | I was going to say that this has a weakness in testing
           | because you can't `assert(end > start)` without knowing which
           | is which because `assert(later > sooner)` will always be true
           | but then I guess you'd have input (and other) validations
           | before it got to that point anyway.
         | choeger wrote:
         | You should re-read the article. The author explicitly mentions
         | to use _sets_ of dates. Hence the ordering is implicit. For an
         | interval you can do the same and use either a set or an
         | unordered pair.
           | joosters wrote:
           | A set doesn't necessarily imply an ordering (unless this
           | article is about some specific programming language, but it
           | seemed fairly generic to me). e.g. Java and C++ have many Set
           | implementations, some sorted (e.g. a TreeSet) and some not
           | (e.g. HashSet)
             | matsemann wrote:
             | That's the point. There's no ordering of the items. So the
             | representation can never have an error like [yesterday,
             | tomorrow, today]. It's just a set of (yesterday, tomorrow,
             | today).
               | joosters wrote:
               | Doesn't that just push all the responsibility onto every
               | piece of code that uses the datatype? 'Remember to sort
               | the contents of this set every time before using it'
               | sounds like it is asking for trouble.
               | kevinmgranger wrote:
               | The fact that it's a set can be hidden, and any queries
               | that depend upon an ordering can be presented as a sorted
               | view or whatnot.
               | secondcoming wrote:
               | So finding out a contract type from a timestamp is
               | linear?
             | rocqua wrote:
             | The set, and an ordering over the elements is sufficient.
             | It perfectly defines and represents the intervals
             | mathematically. What implementation of a set you use is a
             | practical detail.
             | When implementing this in practice, you probably want to
             | use a set implementation that gives fast ordering results.
             | But that is a performance consideration. Not a data-
             | representation consideration.
       | browsergrip wrote:
       | This is good article. But the second example seems to suffer from
       | the defect of the first. Removing default contracts and
       | representing fixed contracts as intervals leaves it possible that
       | these fixed contracts can overlap....which is
       | probably...undesirable?
       | In that case, applying the remedy of the first example (a set of
       | dates, and inferring that every 2nd (even zero length, to account
       | for adjacent fixed) interval will be default,) introduces another
       | bug where if you lop off any random date in that set or list, you
       | invert everything.
       | I love the concept represented here, it _is_ akin to
       | normalization as in...simplify the representation so no
       | redundancy is introduced as this usually leads to better
       | results...but it seems it 's no guarantee of better results.
       | But maybe that's just because the "model" we are simplifying from
       | was not an optimal representation. Perhaps there's a better model
       | of the second example that doesn't end up with the defect of the
       | first example.
       | I really like this article but am struck by how something that I
       | wanted to be almost a silver bullet trick for modeling, ends up
       | being a mass of compromises mired in tradeoffs that doesn't show
       | any clear way forward in the general case. Still, probably a good
       | rule of thumb, but I guess this rule is not optimal...as it can
       | have so many unworkable misinterpretations/misapplications.
       | It would be cool to see a list of, like, "Programming
       | Heuristics", ranked by decreasing general applicability, of which
       | this rule was a member somewhere far down the list.
         | chrisweekly wrote:
         | "It would be cool to see a list of, like, "Programming
         | Heuristics", ranked by decreasing general applicability, of
         | which this rule was a member somewhere far down the list."
         | +1 for this! Anyone got good links to share?
         | JangoSteve wrote:
         | > In that case, applying the remedy of the first example (a set
         | of dates, and inferring that every 2nd (even zero length, to
         | account for adjacent fixed) interval will be default,)
         | introduces another bug where if you lop off any random date in
         | that set or list, you invert everything.
         | This is a good point, and something I usually describe
         | recoverable problems versus non-recoverable problems. If I make
         | start/end dates in the first example instead of just a set of
         | start dates, then I can always create application-level or
         | database-level constraints that don't allow either overlapping
         | or incomplete segments. When business rules change, I can
         | delete the constraints and update the business logic as
         | necessary with no change to underlying data structures.
         | However, if I miss implementing a constraint and it erroneously
         | allows overlapping or incomplete segments, I can easily run a
         | query to identify all such invalid entries. Then I can then
         | investigate and decide how to fix them.
         | However, if I go with the start-date-only set-based approach,
         | and miss implementing a constraint, and it leads to a deleted
         | date creating incomplete segments... I'm screwed. There will be
         | no query you can do to identify incomplete segments to
         | investigate or fix, because all segments are assumed to extend
         | to the next start date. You can irreversibly lengthen one
         | segment by deleting another, due to a forgotten constraint
         | preventing you from making the change.
         | These could both be errors on the developer's part, depending
         | on the requirements at the time, but one data design may lead
         | to more non-recoverable issues than the other. Add in the
         | flexibility of the former approach, and I'd probably be more
         | likely to implement the former approach than the one proposed.
       | moron4hire wrote:
       | This violates one of my core, learned-the-hard-way, database
       | design constraints: querying of rows should never have to rely on
       | any linear dependence on any other row in the same table. If you
       | ever have to do some sort of inner join of a table on itself to
       | bring in the single "next" row to tell you information about the
       | current row, then you are stuffed. Query complexity explodes.
       | Performance takes a nose dive.
       | The end date of a contract is a property of _that_ contract, not
       | any other.
       | Forgetting all other contracts for a moment, what do you need to
       | know about one contract? There should be a straight forward way
       | to query that contract on its own, with a query that represents a
       | tree through tables in the database. It should not become a
       | graph, with the potential for cycles that graphs allow.
       | And I get the business requirements could need no overlaps, but
       | gaps are clearly possible if a customer leaves for a while and
       | then comes back later. Does that person need to then become a new
       | "customer", because you don't allow gaps? And then are your
       | customers' PII only allowed to be registered to a single account?
       | Comcast has been a huge pain in the ass in years past because of
       | moving, gaps, and email address reuse.
       | ajuc wrote:
       | This is part of approach to programming that is more popular in
       | functional world.
       | You take requirements and make system exactly right to fit these
       | requirements perfectly and don't bother with any other concerns.
       | When you design this close to the requirements you get better,
       | faster and more elegant code that's easier to understand - but
       | when requirements change you have to do much more work to adapt.
       | Suddenly a state which was previously invalid is valid, or a part
       | of system that only needed one kind of input needs 4 different
       | inputs from separate parts of your code. Have fun basically
       | rewriting your program.
       | That's IMHO the main motivation between differences in functional
       | and OO programming - how close to the requirements you want to
       | design your code.
         | pjmlp wrote:
         | This kind of approach (type driven development) was already
         | popular in Algol derived languages, hence why the cowboy coder
         | would call us on Ada, Modula-2, Object Pascal side of the
         | fence, programming with straitjacket.
           | ajuc wrote:
           | I don't think it maps 1-1 with strong vs weak typing.
           | C (very loosely typed language) code is usually pretty close
           | to the requirements and very strongly typed functional
           | languages (like Haskell) are often "make DSL and write the
           | specification in it, then run it".
           | Meanwhile object oriented languages often have pretty strong
           | typesystems and cultures of using them extensively, but they
           | also encourage designing with margin for changes (and thus
           | using lots of layers of abstractions instead of just
           | implementing the specification as elegantly as possible).
       | mlthoughts2018 wrote:
       | "Make invalid states unrepresentable" is a type of fool's gold.
       | You don't care that invalid states are unrepresentable, you only
       | ever care that a specific instance of your running program is
       | very unlikely to enter an invalid state - and the difference
       | between formally disallowing invalid states vs. test coverage
       | that proves a reasonable likelihood of avoiding invalid states is
       | huge.
       | The extra code and conceptual complexity spent to make type
       | designs that disallow invalid cases is a liability, it comes with
       | its own bugs, maintenance and huge risks of premature abstraction
       | and brittleness in the face of changing requirements.
       | If it takes anything more than a simple enum-style menu of
       | permitted options, then it's a code small. Things like Scala case
       | classes (especially with sealed behavior), or pattern matching
       | against type constructors, or phantom types - these are all very
       | bad ideas, where the costs far outweigh the benefits.
       | Most of the time you can just ignore enforcement of assumptions,
       | and add a few assert statements plus lightweight unit tests and
       | integration tests that generate an abundance of real world
       | example cases - and achieve all the safety you need for a
       | fraction of the code & conceptual complexity and tech debt
       | incurred by false promises of enforcing correctness with type
       | system designs.
       | nefitty wrote:
       | Software engineer here.
       | Does anyone have any questions?
         | codeulike wrote:
         | lol, you should post this in every thread
           | nefitty wrote:
           | It's not going well
             | codeulike wrote:
             | Wait, is kevinmahoney.co.uk your blog?
               | nefitty wrote:
               | No, just a software engineer
               | [deleted]
               | codeulike wrote:
               | Dude, this is HN - we're all either software engineers,
               | or people pretending to be software engineers
               | edit: or related hangers-on, like entrepreneurs or
               | "thought leaders"
               | nefitty wrote:
               | I can answer questions for VCs then I guess
               | [deleted]
               | alch- wrote:
               | It was a joke, son.
               | codeulike wrote:
               | But jokes aren't allowed here
         | 8note wrote:
         | Sure: what's the value in $, engineer time, number of clients
         | calling, etc, where this is a good tradeoff or a bad tradeoff?
         | alch- wrote:
         | Lol, nice one.
       | judofyr wrote:
       | In general I agree that it's nice to make invalid states
       | unrepresentable, but I'm not sure if I agree that this counts as
       | a fundamental "invalid state". There is nothing about contracts
       | which require that you can only have one active at the same time,
       | or that that current one must be open ended.
       | From a practical point of view it might be advantageous if you
       | maintain only a single contract with a customer at all times, but
       | that is a _business_ requirement which might be changed in the
       | future.
       | I mention this mostly from experience: Multiple times I've
       | designed systems where I've reduced the representable states to
       | the minimum, and when some requirements change I realize I have
       | to re-design the full system.
       | The new presented representation might make sense in _this_
       | situation, but I 'd be very wary of taking current business
       | practices and make all other alternatives _impossible_ to
       | represent. It 's a balancing act of course as you can go in the
       | opposite direction and make it way too flexible.
       | > This poor choice was not just a theoretical problem - gaps in
       | contracts were found on more than one occasion, requiring hours
       | of engineering effort to hunt down and fix.
       | I'd like to hear more about what happened here. Was the problem
       | that the default contract was not re-applied correctly? If so,
       | changing the representation might not actually solve any problems
       | -- it make actually make it _worse_. A renewal of a contract
       | typically involves some automated process where other services
       | are involved (payment, invoicing, emails). The previous
       | representation (with explicit start/end dates) made it possible
       | for you to verify that everything was correct and lined up.
         | goto11 wrote:
         | > Multiple times I've designed systems where I've reduced the
         | representable states to the minimum, and when some requirements
         | change I realize I have to re-design the full system.
         | Yes, if requirements change, you change the design and code to
         | support the new requirements.
         | Compromising the consistency and maintainability of the current
         | design to accommodate a hypothetical future requirement change
         | is a bad trade-off IMHO, since you can't predict the future. A
         | requirement change may happen in a completely different
         | direction than the one you anticipated, and then you have the
         | worst of both worlds.
         | It is better to make code _maintainable_ than making it
         | _flexible_.
           | judofyr wrote:
           | > Yes, if requirements change, you change the design and code
           | to support the new requirements.
           | Code and representation (i.e. schema) are vastly different.
           | In my experience it's takes an order of magnitude longer to
           | change a representation than to change code. Once there are
           | multiple services/tools which works with a representation you
           | typically have to support both the new and the old
           | representation at the same time (since you can't rollout
           | everything simultaneously).
           | > Compromising the consistency and maintainability of the
           | current design to accommodate a hypothetical future
           | requirement change is a bad trade-off IMHO
           | Designing a representation which can handle possible
           | requirement changes does not necessarily mean "compromising
           | the consistency and maintainability". We have great tools for
           | ensuring consistency (e.g. transactions and constraints in
           | SQL databases), and I don't exactly see how this new
           | representation is more "maintainable" than the old one
           | (although we don't have all the information in this article).
             | skybrian wrote:
             | Getting good at data migrations (with tools and processes
             | to do this) can pay off. It's a more general way of
             | preparing for the future than attempting to anticipate
             | specific changes.
             | On the other hand, some may say YAGNI.
           | tonyarkles wrote:
           | > since you can't predict the future
           | That's one of those statements that makes sense, but is often
           | not true. Very rarely does a client comes to me with a
           | feature request that _does_ require a pretty significant
           | design change, but most of the time they 're changes that
           | were foreseen.
           | Using this current article as an example, I love the way that
           | they're storing the intervals to guarantee that they can't
           | overlap. That's awesome! What I would likely end up doing,
           | though, is use that as the underlying representation but
           | still return individual interval objects through the query
           | API with a start and end date on each interval. That way, if
           | the "only one at a time" rule changes, the changes required
           | are localized.
             | hondo77 wrote:
             | > What I would likely end up doing, though, is use that as
             | the underlying representation but still return individual
             | interval objects through the query API with a start and end
             | date on each interval.
             | How the model you present to the user is represented in the
             | database is an implementation detail.
             | philwelch wrote:
             | > Using this current article as an example, I love the way
             | that they're storing the intervals to guarantee that they
             | can't overlap. That's awesome! What I would likely end up
             | doing, though, is use that as the underlying representation
             | but still return individual interval objects through the
             | query API with a start and end date on each interval.
             | The article addresses that concept:
             | > It is sometimes still useful to represent the periods as
             | a sequence of start and end dates. It is trivial to project
             | the set of dates in to this form. As long as the canonical
             | representation is the set, the constraints will still hold.
           | philwelch wrote:
           | I like to call this, "speculative complexity". I've seen many
           | cases where speculative complexity was added, and persisted
           | for a long time, for reasons that fundamentally mispredicted
           | the way the system would evolve and actually inhibited that
           | evolution.
             | cle wrote:
             | I've seen this too. And I've also seen the converse--
             | complexity that was added because engineers refused
             | anything but the most myopic designs, using thought-
             | terminating cliches like "YAGNI" or "that's hypothetical".
             | "Never think ahead" is obviously not good advice. There's
             | no silver bullet here--we have to think about how likely
             | future scenarios are, and plan for them based on the
             | business context and needs. Many of them are unlikely or
             | too costly to do anything about...and many of them aren't.
               | jpindar wrote:
               | But then you'd have to have domain knowledge and why
               | would you learn domain knowledge when the ideal career is
               | a new job in a new industry every few months. </HN>
               | goto11 wrote:
               | Under what circumstances did following YAGNI lead to
               | added complexity?
               | refactor_master wrote:
               | I think if you misinterpret YAGNI as "you don't need to
               | change this later". So your becomes rigidly hard coded,
               | instead of having easily configurable variables and
               | arguments. The over-engineered solution (the real YAGNI)
               | was an interface, an object, methods and fields, only
               | serving a very niche purpose with a lot of boilerplate.
               | tsimionescu wrote:
               | A common pattern is writing code as a series of isolated
               | cases, when taking some time to design the general case
               | would greatly reduce the amount of code. You add a bool
               | parameter to a function to modify one small bit of what
               | it does, then another one, and you add some new return
               | value, and before long, you've got a class with several
               | getters and instance variables represented as code in a
               | single function, with parameters controlling which actual
               | method is run.
               | thewebcount wrote:
               | A real world case I recently ran into where thankfully we
               | realized we _would_ need it and added it is versioning. A
               | struct you pass to or receive from an API that has a
               | version in it means the difference between being able to
               | make changes to the internals without changing the
               | externals and not being able to. We didn 't _need_ the
               | version field until the second version was released. Had
               | we just said,  "Oh, we aren't going to need it," on the
               | first version we would have been boned.
               | jax_the_dog wrote:
               | Not OP, but it would add complexity because that method
               | you "weren't going to need" turns out to actually be
               | needed.
               | Now you have to work around your simplified design
               | because you decided that you didn't need anything more.
             | dnautics wrote:
             | Correctly judging this tradeoff is what makes the
             | difference between a good architect and a great architect.
             | There are definitely cases where I have put in a bit of
             | effort (a week or so's worth of programming) to make things
             | flexible because due to the requirements I knew they would
             | be necessary in the 9-12 month timeframe (I've also been
             | wrong about architectural decisions). Then when the time
             | came around, it was painless to make the transition.
             | I suppose if you were cynical, you could claim that if it's
             | painless no one sees how important you are. And then you
             | wind up leaving the company, because they think everything
             | is easy and don't provide you with the autonomy to achieve
             | what you need to make their system work. And then they
             | discover that it's actually hard.
         | hackerfromthefu wrote:
         | I agree very much with your comment.
         | I've found the fundamental principle that helps to keep the
         | system extensible is to: make your system model the real world
         | accurately.
         | This involves building the system concepts closely mapped onto
         | real world concepts without taking shortcuts.
         | That way when the requirements change, all the fundamental
         | pieces of the system stay valid, and only the piece that is
         | changing tends to need updating.
         | This helps to void the problem you mentioned of needing to re-
         | design the full system, keeping the system extensible.
         | ahoka wrote:
         | Then the code will be changed. Making up business requirements
         | is the number one reason for instant legacy code. Code is not
         | set in concrete, you can add that flexibility later when it is
         | needed, but making everything overly generic to make it easier
         | to "implement new requirements" only leads to code that is hard
         | to change in my experience. Also don't forget that this is only
         | an example.
           | fennecfoxen wrote:
           | A few basic invariants about how your code is structured does
           | a lot more than it being excessively "generic". Writing
           | simple components that actually adhere to the single-
           | responsibility principle (and composing them into more
           | complex logic) doesn't just help out the mysterious future.
           | It makes your existing codebase easier to understand and to
           | validate up front.
           | chii wrote:
           | The YAGNI (you aint gonna need it) principle overrides the
           | DRY principle imo.
             | chrisweekly wrote:
             | Yes - as does AHA (Avoid Hasty Abstractions). IMHO.
             | pc86 wrote:
             | This is partly why I've found it helpful to wait until
             | there are at least 3 _identical_ (not nearly identical)
             | implementations of something before trying to make a more
             | generic /abstracted version of it.
               | dllthomas wrote:
               | I think these rules of thumb are... okay. But I think
               | it's more helpful to go back to how DRY was initially
               | defined ("Every piece of knowledge must have a single,
               | unambiguous, authoritative representation within a
               | system") and ask whether what I'm dealing with is
               | actually "a piece of knowledge." There can be 10
               | identical copies, and if they just _happen_ to be
               | identical but they represent different things that might
               | change independently, they should probably remain 10
               | identical, independent copies. Alternatively, if there
               | are exactly two places something occurs in the code, but
               | if they 're out of sync the result is a broken system,
               | you should think about unifying.
               | DRY is too often treated as purely (or principally)
               | syntactic, when that's actually much less useful.
               | pc86 wrote:
               | This is a great way to look at it
               | bluGill wrote:
               | Computers have been widespread for between 70 and 30
               | years (there is reason to debate, but whatever). Nearly
               | everything has been done before: what you are doing isn't
               | fundamentally new. There is lots of opportunity to add
               | minor new features, reliability, or better user
               | interfaces. But the fundamentals of what you are doing
               | isn't new anymore. You can look at your past versions and
               | what competitors do for guidance on what you will
               | probably need next. If you have any broad knowledge of
               | your problem domain you can make reasonable guesses as to
               | what you will need and what you won't need. When
               | replacing a subsystem I know if there will be 100 users
               | of it in the future or if it is a leaf with 1 user -
               | because I know what the old crufty subsystem has (the
               | first case I spend days thinking about the interface, the
               | second I design the interface when I integrate the one
               | subsystem)
             | [deleted]
           | dkarl wrote:
           | The requirements here aren't clear, but I'm guessing the
           | requirement is to model the contracts the company actually
           | has with customers.
           | The business _also_ tells you that there are never two
           | contracts running at the same time. But are you actually
           | going to believe that? Is this condition really
           | "impossible?"
           | A vital and necessary factor here is whether the system being
           | designed has complete control of the creation of contracts.
           | This is perhaps taken for granted by the author, but it's too
           | important to leave implicit. You have three choices in a
           | situation like this: _make_ it impossible for contracts to
           | overlap, model it, or don 't model it and accept the
           | consequences. Depending on the frequency and the consequences
           | of the assumption being incorrect, maybe it's acceptable not
           | to model it. Maybe not. My point is that you can't assume
           | something is impossible unless you can actually prevent it
           | from happening, and the author should not have sidestepped
           | this part of the analysis (though possibly they meant for it
           | to be understood that contract creation happens through this
           | data model.)
           | > Also don't forget that this is only an example.
           | The problem is that this is an example meant to illustrate
           | and justify a rule of thumb, but it's extremely, extremely
           | simple. How often do you deal with requirements that are this
           | simple, this mathematical? Is this really the kind of example
           | you want to build a rule of thumb from?
           | Realistically, when I hear requirements like this, I assume
           | they're wrong (very common at the beginning of a project) and
           | I get together with the product manager and ideally a domain
           | expert representing the customer (if the product manager
           | isn't too territorial about that being their job) and figure
           | out what the hell the actual requirements are. What if a
           | customer has a contract to rent 10 units of space at $5 and
           | in the middle of that contract needs more 5 more units but
           | the price has gone up $10? Do you tell them they have to
           | cancel the existing contract at $5 and pay $10 for all their
           | units if they want to add some? Or give them the new units at
           | the old price? Or is it okay to represent the same customer
           | by distinct customer records?
           | I do like the principle of making invalid states
           | unrepresentable, but I would like to add two supplementary
           | principles:
           |  _1. Oftentimes what the business tells you about the data
           | they produce is purely aspirational._
           | "There will never be overlapping contracts," often means, "We
           | swear we're going to stop creating overlapping contracts, and
           | this time we really mean it." You have to follow up with
           | questions like, "How often have we had overlapping contracts
           | in the past? When was the most recent occurrence?" You should
           | even ask, "When do we anticipate signing the next one?" A
           | logically-oriented software developer might expect someone to
           | take offense if you respond to "we don't sign contracts like
           | that" with "Do you have any currently in the pipeline?" but
           | this is a totally normal kind of question to ask.
           |  _2. When users give you a rule in their business
           | requirements, they often take it for granted that the
           | software will handle exceptions to the rule gracefully._
           | They don't necessarily appreciate how bad things can go in
           | software when something "impossible" happens. When they say,
           | "Contracts will never overlap," you have to say, "What should
           | happen when they do?" If you are talking to a mathematician
           | or a programmer this might come off as questioning their
           | competence, but most people will not find it unusual at all
           | or at least will appreciate that the question is motivated by
           | experience rather than disrespect. It's not like a math
           | problem in school; it is legitimate to question the givens.
             | arethuza wrote:
             | I once did weeks of work trying to unpack what my employer
             | meant by the term "customer" - if your customers are large
             | companies with hundreds of legal entities and hundreds of
             | locations across the world things can get pretty complex
             | pretty fast e.g. does "never two contracts running at the
             | same time" mean that you can't have a contract with a
             | subsidiary and another separate subsidiary of the same
             | parent (which might make sense for credit checking
             | purposes)? What about subsidiaries in different countries?
             | What about partly owned subsidiaries...
           | [deleted]
         | stickfigure wrote:
         | I think the fundamental problem here is that the table/entity
         | is incredibly badly named. What kind of contract has only three
         | fields?! This isn't a case of YAGNI; no real-world "contract"
         | is this simple.
         | It appears to actually be some sort of contract_period or
         | contract_duration, and probably has a link to a real "contract"
         | object somewhere that contains the real meat of the concept.
         | But it's hard to tell what's literal and what's the author
         | trying to simplify the example for us.
         | layoutIfNeeded wrote:
         | This.
         | Basically this is what event sourcing tries to solve: it lets
         | you change your state representation to reflect new
         | requirements, because you can always rebuild everything from
         | the event log.
         | steve_g wrote:
         | There are trade-offs, of course, but I'm generally not a fan of
         | using implicit defaults for business applications (i.e., the
         | application infers the default when there's no data).
         | If things go well, business data outlives business
         | applications. After years or decades, it can be a major pain to
         | figure out all the "secret" values that aren't actually in the
         | data.
         | Zenst wrote:
         | > There is nothing about contracts which require that you can
         | only have one active at the same time, or that that current one
         | must be open ended.
         | Agreed and yet for many systems, that flexability still eludes.
         | One example I personally experienced was changing a phone
         | contract. The contract had run for many years, so could be
         | cancelled any time with one months notice. The new plan and
         | contract was much better and yet a limitation played out doing
         | this. Ended up that the systems at the telco was unable to
         | activate the new contract until the old contract had ended.
         | Whilst a new contract could be physically signed in a shop with
         | a start date of the day of signing, and logged into the system.
         | The provisioning backed was unable to activate it until the old
         | contract had ceased as you can't have two contracts for the
         | same phone number.
         | That I do believe is a case of - whilst some things can run in
         | parallel, others are locked to a single dependant resource.
         | But every rule has an exception, it is with good design that
         | you limit those exceptions impact.
         | tlarkworthy wrote:
         | Hard agree. The type system is no place for business logic. If
         | you want to get fancy, maybe some database constraints you can
         | change later. Business logic is constraints over the
         | representation... not the representation itself.
         | Also in generally the principle of "make invalid states
         | unrepresentable" ends with the realization that only Idris can
         | properly do it, which is not pragmatically useful.
         | bcrosby95 wrote:
         | Absent business requirements, I would love to see what you
         | think fundamental invalid states for a contract would be. Every
         | property of a contract I can come up with seems like a business
         | requirement.
       | seer wrote:
       | While this is great if you know exactly what you want to achieve,
       | it does "lock you in" those constraints on a more fundamental
       | level. More times than I can count I've seen business
       | requirements change to require those "unrepresentable" states,
       | and since you've now designed you whole data model around it you
       | need to add awful hacks to make it work.
       | The timeline example is actually very telling. A lot of times
       | you'd actually want to encode overlapping time periods at the
       | edges.
       | You'd be laughed out off a meeting where a business asks about
       | this and you smugly explain how it is unrepresentable.
       | I guess what I'm saying is that it might be worth over designing
       | your system a bit to leave you some wiggle room, unless you have
       | hard guarantees that something should be "impossible"
         | ImprobableTruth wrote:
         | I agree that this is an issue, but I think the answer is simply
         | using a very flexible basic data representation (which admits
         | invalid state) and then using predicates to refine it. e.g.
         | starting with a list of (start,end) intervals and then adding
         | predicates for valid intervals (start <= end), ordered, non-
         | overlapping and continuous.
         | If any of the requirements change, it's easy to either add more
         | predicates or relax/even outright remove them.
           | jerf wrote:
           | If you play your cards right, you can even get your type
           | system to alert you to every place in your code that needs to
           | change when you change the constraints.
           | I don't deny there's a certain art to that, and I can't
           | explain it all myself. But I don't even necessarily mean
           | amazing clever type tricks like you might see in Haskell or
           | something, I mean that something as simple as "I've changed
           | the definition of what a 'Customer' is in some fundamental
           | manner, so I'm going to rename that class to 'CustomerNew',
           | use the compiler to point me at every single place that
           | breaks, audit it, change the local name to CustomerNew, and
           | then, once everything is fixed, use my IDE's rename feature
           | to rename CustomerNew back to Customer before my final
           | commit". Many times you can get by just by renaming a field
           | or something to similar effect, but in the worst case you may
           | need to audit everything.
           | It's one of the more tedious bits of the job sometimes, but
           | net-net this can still be a timesaver, if you account for the
           | full cost of the trickle of bugs this sort of thing can
           | prevent.
             | [deleted]
         | pc86 wrote:
         | Even "hard guarantees" are worthless. All it takes is one
         | client with a checkbook to change business requirements.
         | erpellan wrote:
         | That could be fixed with 1 additional concept. Instead of
         | Customer: 1 - 1: Set(Dates)
         | What if it was                   Customer 1 - *: Contract: 1 -
         | 1: Set(Dates)
         | Overlap achieved.
         | kpmah wrote:
         | I think you are vastly overstating the risks of changing
         | requirements. It is usually easy to go from a less permissive
         | model to a more permissive one, but the opposite is often
         | difficult.
         | the-smug-one wrote:
         | You could always split your types into frontend and backend
         | types where the backend ones are more open and the frontend
         | ones are more restricted. I don't necessarily mean FE/BE as on
         | the web. A lot of code is only interested in shuffling around
         | data anyway, the shape is fairly uninteresting.
           | nemetroid wrote:
           | I would put it the other way around: make your basic
           | representation restricted, but present a more permissive API.
           | This way, the data model helps enforce your constraints, but
           | you don't need to redesign the API when the requirements
           | (inevitably) change.
           | gambler wrote:
           | Kind of like internal data/algorithms and external interface
           | in OOP?
         | TooCreative wrote:
         | Additionally, turning (startDate, endDate) into a set of dates
         | will make the code more complex in some places. Before:
         | SELECT event FROM events WHERE endDate<"2021"
         | After:                   Whatever additional complexity you
         | add to your codebase to query end         dates.
           | zimpenfish wrote:
           | It's not _hugely_ complex although definitely more than the
           | first example. I guess it depends whether it 's offset by the
           | benefits...
           | https://dbfiddle.uk/?rdbms=postgres_11&fiddle=50e6a963cd1db0.
           | ..
           | (YMMV, obvs.)
           | dghf wrote:
           | It's not that much more complex: if you do
           | SELECT event FROM events WHERE startDate < "2021"
           | then all but one of the results (the one with the greatest
           | `startDate`) will also have an implicit end-date prior to
           | 2021.
             | mobjack wrote:
             | Doing all the logic in SQL requires more complexity using
             | subqueries.
             | It gets uglier if you need to find the contract valid on a
             | certain date based off of a join.
             | These issues can be covered up with code as it will a
             | easier to have reusable functions, but it makes the job of
             | a data analysts much more difficult and error prone.
               | dghf wrote:
               | Uglier than having to find the record after the one
               | you're inserting (so you can determine your new record's
               | end date from the subsequent record's start date) _and_
               | the record before (so you can modify its end date to
               | match your new record 's start date)?
             | peteradio wrote:
             | Until business tells you that endDate is not necessarily
             | greater than startDate. <- real world experience
               | NegativeLatency wrote:
               | Sounds interesting can you explain more about it?
             | jdmichal wrote:
             | This is fine for an open-ended query like the one given,
             | because you still receive all the relevant data. But if
             | you're looking at a range, for the same reason you have one
             | extra at the end, you also have one _missing_ at the
             | beginning. And you can 't just filter away missing data.
       | noisy_boy wrote:
       | I don't see how the contracts example is simplifying things while
       | staying realistic. What if I need to have separate kind of
       | default contracts for different classes of customers? What if I
       | need to modify the details of a certain type of default contracts
       | for all consumers that are using it? And on and on. If storing
       | them in contracts table is not good (which I don't really agree
       | with), then where should we store these variety of default
       | contracts? How do I join them with contracts table to know which
       | consumers have default contracts? Or groups consumers by type of
       | default contracts?
       | Keep the model sensible. Contracts belong to contracts. Add basic
       | sanity to the model. The service that manages the data guards the
       | data beyond basic data model sanity checks. Also, explicit is
       | better than implicit.
       | gambler wrote:
       | _> I think this happens because of atomistic, object-orientated
       | thinking._
       | If you think storing a list of date tuples is "OOP thinking", you
       | have no clue what OOP really is. Educate yourself by listening to
       | people who invented it, not Java consultants or FP zealots.
       | OOP is about interacting with things via interfaces and messages,
       | rather than data. An OOP solution to inconsistencies of this sort
       | would be an interface that either automatically corrects
       | inconsistencies or throws errors when you try to introduce them.
       | _The whole point_ of OOP approach is that you 're not locked into
       | a single data representation, so, for example, you can improve
       | how you store data without re-engineering everything in your
       | system that relies on that data.
       | dudul wrote:
       | Just use the language of the business/domain to write your
       | model/API. No need to reinvent names and rules that already
       | exist. Do whatever you want with the underlying implementation,
       | your DB, etc.
       | 1-more wrote:
       | A great talk on this subject is "Making Impossible States
       | Impossible" by Richard Feldman from Elm Conf 2016
       | https://www.youtube.com/watch?v=IcgmSRJHu_8
       | unwind wrote:
       | This is sound advice for sure, and I think it applies much more
       | broadly (or do I mean deeply?) than just for databases.
       | For instance, one micro-application of it that makes a lot fo
       | sense to me is the const-ness of variables in languages like C.
       | Since a normal variable can be overwritten, and that affects the
       | use and semantics of that variable, marking them as const
       | whenever possible really helps in my opinion.
       | For instance, take this micro-snippet of code from Redis [1]:
       | int time_independent_strcmp(char *a, char *b) {             char
       | /* The above two strlen perform len(a) + len(b) operations where
       | either              * a or b are fixed (our password) length, and
       | the difference is only              * relative to the length of
       | the user provided string, so no information              * leak
       | is possible in the following two lines of code. */
       | unsigned int alen = strlen(a);             unsigned int blen =
       | strlen(b);             unsigned int j;             int diff = 0;
       | Here, it seems quite important that the values of 'alen' and
       | 'blen' do not change during the execution of the function, since
       | it's iterating over them. The 'diff' variable on the other hand
       | is intended to change as a function of all the characters in both
       | strings, that's the whole purpose of the function.
       | So, I think the middle two lines should be:
       | const size_t alen = strlen(a);         const size_t blen =
       | strlen(b);
       | That "locks" the values in, so you know that for the rest of the
       | function at least these two values stay the same. Since changing
       | either length mid-function would represent an invalid state, I
       | think this is close to the OP's point.
       | Also please note that I have massive amounts of respect for Redis
       | and Antirez, I'm not trying to say that the code is bad or
       | anything, it was simply the first file in the first high-profile
       | open source project that came to mind. Obviously this code
       | _works_ and has probably been more tested than most things I 've
       | written, again I'm NOT trying to somehow paint that program(mer)
       | in a bad light.
       | Btw, changing the type (to me) to size_t is also an obvious,
       | free, improvement since it frees the reader from having to worry
       | about why the type was unsigned int to begin with. Also 'int' can
       | be less wide than 'size_t', which again is probably not a problem
       | _in practice_ since the CONFIG_AUTHPASS_MAX_LEN is probably
       | always going to be even less, but still. It 's pointless
       | complexity that triggers anxiety in people like me. :)
       | [1]: https://github.com/redis/redis/blob/unstable/src/acl.c
       | Kaze404 wrote:
       | There's a great talk from Richard Feldman that talks about this
       | in the context of Elm.
       | https://www.youtube.com/watch?v=IcgmSRJHu_8
       | UglyToad wrote:
       | I was pleased and a bit surprised to see this post talk about a
       | database level approach to this problem. As important an idea it
       | is at the application code level where most posts discuss it,
       | especially in the context of type systems like Haskell, I think
       | it gets neglected when it comes to persistence.
       | For those of us developers who are mere CRUD peons I think it's
       | the most important factor in system stability that is mostly
       | negected; either in favour of speed of iteration (NoSQL) or
       | checks at the application code layer.
       | As I'm increasingly coming to appreciate, systems without
       | enforced integrity at the database level are a breeding ground
       | for bugs. You can add checks in application code but all it takes
       | is 1 bad commit, or 1 check that slipped your notice and now you
       | have bad data and all future code in the system needs to support
       | and work around the bad data. With foundations of sand even the
       | most elegant structure in application code is doomed to a short
       | and catastrophic future.
       | As other commenters mention hindsight is 20:20 and you won't
       | always know what the constraints should have been until after the
       | fact, or the constraints might be wrong. But the 'trendy'
       | development practices treat good old fashioned SQL constraints
       | and data integrity as decidedly unsexy, to the detriment of a lot
       | of systems.
       | MySQL didn't even have check constraints (well, actually apply
       | them) until version 8 which shows how ignored these things are. I
       | appreciate the post is more about the fundamental design of the
       | stored data but people are also forgetting unique constraints,
       | foreign keys and all the other tried and tested tools which
       | protect the most important part of most CRUD systems, the data,
       | from devolving into an awful mess.
       | sakoht wrote:
       | Presuming if you are 100% sure what "invalid" is: It is possible
       | that being _unable_ to represent a logic error might mean that a
       | logic error is represented as a "valid but incorrect" value,
       | which is even more dangerous.
       | Take the example of storing a continuous series of date ranges.
       | If I only store the first date of each pair, I can never
       | accidentally have an overlap or gap. But if my code has a logic
       | error that incorrectly calculates a range, being able to
       | represent it could throw an error. If that code error translates
       | to an incorrect break-point instead, I haven't prevented a bug,
       | I've hidden it.
       | JoeAltmaier wrote:
       | Surely the data model is helpful, not only in keeping data
       | integrity intact by disallowing invalid state. But also to help
       | you think about your data, and discover simplifications and
       | subtle rules to improve your model.
       | I attacked the old 8-queens problem years ago as part of a
       | contest in Byte Magazine. All the solutions published modelled
       | the board as an 8X8 array with a 1 or zero to indicate the
       | presence of a queen. They all ran slow and suffered from invalid
       | game states confusing the algorithms.
       | My solution was to observe that only 1 queen could be in each
       | column (all solutions require queens to not be able to capture
       | one another, and they can capture vertically). So I represented
       | the board by an array of 8 values, the height of the queen in the
       | column represented by indexing the array.
       | Further, since only one queen can be in each row, the values were
       | the numbers 1-8.
       | My solution then, was to seed the array with the value 1, 2, 3,
       | 4, 5, 6, 7, 8. Then test if array[i]-array[j] == (i-j) or (j-i),
       | which would mean a diagonal capture.
       | Simply permuting the values, searched a subset of board states
       | that had to contain all possible solutions. And the permutation
       | tree could be truncated as soon as for any (i,j) the test failed.
       | Anyway, the program was tiny and finished in negligible time. A
       | pity I didn't enter the contest!
       | ChrisMarshallNY wrote:
       | I'm not sure that this is what I think about, when I think of
       | "Making Invalid States representable" (A concept that I
       | practice).
       | That said, it's an excellent, commonsense article that describes
       | a highly usable approach to information architecture.
       | I also agree that OOP programmers have always considered their
       | designs to "represent the 'Real World'(tm)." In my experience, I
       | use OOP constructs to represent many things that should never be
       | exposed to the user (like messages, adapters, states, and state
       | transitions).
       | There's the classic usability concept of the "Mental Model." That
       | is the model that the user builds in their head, as they navigate
       | the UX. These mental models can be drastically different from
       | what happens internally, and a good UX designer can reinforce a
       | desired model (which the user may then ignore).
         | ChrisMarshallNY wrote:
         | Um..." _un_ representable". :P
       | H8crilA wrote:
       | So Google's protocol buffers have this feature called "required"
       | fields, which enforce schema in the type system. You should never
       | use it. Never. It's one of those things that sound good until
       | you're a few years into the project. Similar to how you should
       | never be using meaningful IDs as primary keys for objects, always
       | use meaningless fingerprint-like integers. Or how all integers
       | should be signed unless you're dead sure the number is unsigned
       | (like a fingerprint). And how many integers should actually be
       | strings, unless you're dead sure this is a number (externally
       | provided IDs, such as for example customer account IDs, are not
       | numbers). Or how you should be careful to use bytes rather than
       | unicode strings.
       | Make your schema permissible and your code paranoid, it will pay
       | off later. Build a data linter if necessary, but don't tie the
       | schema.
         | sathorn wrote:
         | This might make sense for a transport schema because you can
         | receive messages from the past or the future but it does not
         | translate to internal program state or database schemas where
         | this is not the case.
         | Making invalid states unrepresentable is basically the process
         | of taking human-checked invariants and turning them into type-
         | checked invariants. This reduces the likelihood of bugs and
         | guides humans to use the system correctly.
         | nemetroid wrote:
         | I feel that this advice is almost opposite to that given in the
         | article. By making all fields optional, your data model no
         | longer helps in making invalid states unrepresentable.
           | H8crilA wrote:
           | Exactly! Because today's invalid states are commonplace in
           | the future.
         | srtjstjsj wrote:
         | "required" is an example of making an invalid state
         | _representable_ and having it ruin your program.
           | dodobirdlord wrote:
           | Yea, a better example of making invalid state unrepresentable
           | in Google's protocol buffers is to use the "oneof" feature to
           | mark that a set of fields are mutually exclusive. If A, B,
           | and C are mutually exclusive you can put them in a oneof,
           | which also saves space in the binary representation. If in
           | future you discover that A and B but not C needs to be a
           | valid state, you can add a 4th AB option inside of the oneof.
         | titanomachy wrote:
         | That feature's been dropped in proto3, all fields are optional.
         | jdmichal wrote:
         | > And how many integers should actually be strings, unless
         | you're dead sure this is a number.
         | I phrase this as: If it doesn't make sense to do math on it,
         | it's not a number. What does adding one to a customer account
         | number mean? Absolutely nothing -- you get a completely
         | different account number. So it's not a number, but a numeric
         | string.
           | lukasLansky wrote:
           | It's not a string either though: in the same way integer
           | addition (almost always) does not make sense, string
           | concatenation (almost always) does not make sense either. The
           | proper type would allow for equality check and explicit
           | string (de)serialization only.
             | jdmichal wrote:
             | I meant string in a more casual sense. From a more
             | technical sense, it would be a symbol. I'll be more precise
             | in the future.
         | trevor-e wrote:
         | Can you elaborate more on the "required" fields point? We've
         | been using a similar feature for several years now in APIs at
         | my work and haven't run into any issues, though we do only use
         | it very sparingly for fields that logically can never be
         | missing. At some point a client has to make the call for what
         | they consider essential, so pushing it in the schema makes this
         | less ambiguous from what I've seen. Maybe it's fine for our
         | use-case (mostly static APIs), whereas what you're saying is
         | good advice in general.
           | srtjstjsj wrote:
           | https://capnproto.org/faq.html#how-do-i-make-a-field-
           | require...
           | Required now means requires forever because people can't
           | migrate safely. But technically you can change a protocol
           | descriptor from required to optional, which is invalid
           | (usually, in a distributed non-transactional system (the
           | common kin) but nothing stops you from doing it. So why not
           | make required forever? Well, do you really want to commit to
           | anything _forever_?
             | trevor-e wrote:
             | After reading that article my take-away is not that
             | "required" is bad and should never be used ever, but rather
             | it was bad with how Google wanted to use it. And since this
             | is Google's project, it makes sense for them to remove the
             | feature if it's causing data center outages, it's not worth
             | the risk at that point.
             | For example, in the case of the message bus they say "And
             | even though the message bus doesn't care about message
             | content", and later on "The right answer is for
             | applications to do validation as-needed in application-
             | level code." Strict schema and validation is most helpful
             | for application developers, not some middleware routing
             | code. Was it not possible for them to write a parser that
             | doesn't fully validate the message for use-cases like this?
             | dodobirdlord wrote:
             | Protocol buffers already require you to commit to some
             | things forever, like the type of a field, or whether two
             | fields belong in a oneof together. I'm not saying that
             | "required" was a great feature, but it's not exactly
             | unique.
               | joshuamorton wrote:
               | No they don't. An optional field can be deprecated and
               | replaced with a different field. This can be done to
               | change the type (also some types can be changed, although
               | you probably shouldn't).
               | Required usually cannot be deprecated.
         | yongjik wrote:
         | I may be in the minority, but after happily using protobuf for
         | years, I believe that there's nothing inherently wrong with
         | required fields - instead, what's "wrong" is the protocol
         | buffer API.
         | Namely, when constructing a protobuf, theoretically, there
         | might be two different ways: (A) first gather all the fields,
         | and then construct the protobuf from these fields; (B) first
         | construct an empty protobuf, and fill in the fields as
         | necessary. The actual protobuf uses (B) - which is convenient
         | in most cases, because when you start constructing a protobuf
         | usually you don't have all the data ready yet.
         | However, with required fields, this means when you construct
         | the protobuf it starts with all required fields missing - i.e.,
         | an invalid state!
         | I'm not sure what's the best way to fix it, because it would be
         | infeasible to rewrite all the code to gather all the fields and
         | then construct the protobuf - also it will be hugely
         | inefficient in many cases. However, I feel the "no required
         | fields" rule is essentially a null pointer (the "billion dollar
         | mistake") in disguise - the actual problem is that the API
         | doesn't enforce type safety.
           | joshuamorton wrote:
           | This isn't actually the issue with required fields (some
           | languages, like java and (usually) python, use a construct-
           | once style).
           | Imagine you have an innocent `required` field. You have a
           | producer and a consumer of that field that communicate over
           | the wire. (or instead of the wire, imagine a database).
           | You send or store an instance of that protobuf. Now let's say
           | that you want to make the field optional (or remove it). With
           | an already-optional field, this is easy. You stop setting it,
           | and maybe eventually you clean it up.
           | With a required field, however, you can't do that. If any of
           | your _clients_ don 't have the newest schema version, you
           | can't unset the field (so imagine that you support mobile
           | clients who may never update). Or if there's middleware you
           | don't know about that introspects your proto. Even if you do
           | the dance right and update your server and client before not
           | setting the new field, you could crash outdated middleware
           | that you didn't know about. Whoops!
           | Or with the database, you now need to dual write or something
           | complex because if you need to roll-back to an older version,
           | you'd be unable to _read_ the protos that don 't include the
           | required field.
           | Required doesn't do well over time. It has nothing to do with
           | setting the values.
             | grogers wrote:
             | If you have old clients that expect that field you are
             | removing to be there in a meaningful way, you still have to
             | update all clients before you can stop setting it. Having
             | the protobuf schema itself use optional or required doesn't
             | change that, it just makes the dependency explicit there,
             | instead of only in the code at the endpoints.
             | Changing required to optional isn't a magic fix for
             | protocol compatibility. If it were (for your limited use
             | case) you can just make that change to the protobuf client
             | side as it doesn't affect the wire
             | representation/interpretation.
               | joshuamorton wrote:
               | > If you have old clients that expect that field you are
               | removing to be there in a meaningful way
               | Right, there's the rub. `required` means that _anyone who
               | deserializes your proto_ falls into this category. That
               | 's a much larger group than "anyone who reads a specific
               | field".
               | (Note also that there's lots of ways to make reading a
               | field that is empty fallback to doing some reasonable
               | non-catastrophic behavior, required doesn't let you do
               | those things).
             | [deleted]
       | dahauns wrote:
       | This line raised a huge red flag for me:
       | "If the customer doesn't have a fixed contract, it is assumed
       | they are on a default contract"
       | No. Don't assume, specify. Explicitly.
       | If this is part of your specification, have a DefaultContract
       | entity of some kind somewhere. And don't call this table just
       | "Contracts", make it clear that those exist in addition to or
       | overlay a default contract.
       | It might sound like overkill, but in my experience in business
       | application development, one of the single largest and most
       | painful sources of errors and refactoring headaches are implicit
       | assumptions in the data model.
         | Justsignedup wrote:
         | Yeah that's a lot to be said about being explicit. For example,
         | what if even on a default contract you want to start tracking
         | payment adherence, or maybe sales information and attributing
         | them to sales' numbers.
         | Or maybe you start reporting and want reports on contracts vs
         | default contracts.
         | And then things get messy. Because the real world gets messy.
         | pdonis wrote:
         | _> If this is part of your specification, have a
         | DefaultContract entity of some kind somewhere._
         | Yes, but not in the database table for contracts. That's how I
         | read this part of the post. I would expect the assumption that
         | if the customer doesn't have a fixed contract, they are on a
         | default contract to be encoded in business logic somewhere in
         | an application that uses this database.
           | dahauns wrote:
           | > _I would expect the assumption that if the customer doesn
           | 't have a fixed contract, they are on a default contract to
           | be encoded in business logic somewhere in an application that
           | uses this database._
           | That's exactly the kind of harmful assumption I'm talking
           | about. Harmful in that when people act on that assumption and
           | actually implement it that way.
           | How a default contract might be represented may vary, and of
           | course it is in no way required or even sensible to be stored
           | as a row in the contracts table.
           | But to think that it is such a fundamentally different kind
           | of data that it should be represented apart from the rest, in
           | a different system, in a different layer, in a completely
           | different form of representation - this is where madness
           | lies.
             | pdonis wrote:
             | _> How a default contract might be represented may vary,
             | and of course it is in no way required or even sensible to
             | be stored as a row in the contracts table._
             | But you're saying it does need to be represented somewhere
             | in the database? That putting it anywhere else is harmful?
             | Can you elaborate? Why is it harmful?
             | pdonis wrote:
             | _> to think that it is such a fundamentally different kind
             | of data that it should be represented apart from the rest,
             | in a different system_
             | The concept of a "contract" is a business logic concept to
             | begin with. That concept is represented in the application
             | already. How (or whether) the data associated with a
             | particular contract is stored in the database is an
             | implementation detail.
         | kpmah wrote:
         | > have a DefaultContract entity of some kind somewhere
         | This is the OO mindset described at the bottom - the odd
         | compulsion to have a reified entity for every concept.
         | A lot of people have missed that the representation you persist
         | doesn't have to match the representation you present. In this
         | case, don't store default contracts, but present them e.g. via
         | a database view.
           | dahauns wrote:
           | But nothing of this has to do with OO. This is an argument at
           | the relational level.
           | And yes, if there's something as fundamental a concept in the
           | business model as a default contract that's in effect when no
           | other contracts overrule it, then IMO it damn well should be
           | represented explicitly in the _persisted_ data model.
           | I didn't talk about the specific nature of the
           | representation. The important part is that the intent of the
           | data should be explicit - data lives longer than code.
           | I've seen far too many DB schemas leaning too hard on
           | implicit assumptions and inferring information that lead to
           | hard to understand data models, unnecessary complex (and hard
           | to optimize) data access (no matter the paradigm), and well,
           | lots of errors.
       (page generated 2020-10-05 23:01 UTC)