hngopher.com

       [HN Gopher] Zero one infinity rule
       ___________________________________________________________________
        
       Zero one infinity rule
        
       Author : azhenley
       Score  : 60 points
       Date   : 2023-03-18 18:01 UTC (4 hours ago)
        
 (HTM) web link (en.wikipedia.org)
 (TXT) w3m dump (en.wikipedia.org)
        
       | readthenotes1 wrote:
       | Zero one infinity (many) is a good rubric for test driving, and
       | for thinking how close to infinity your system can/should
       | actually support
        
       | furyofantares wrote:
       | > It argues that arbitrary limits on the number of instances of a
       | particular type of data or structure should not be allowed.
       | 
       | Either this is a misuse of "arbitrary" or it's not really arguing
       | for no limits beyond 1. In the Carmack example from another
       | comment, if you know your solution isn't going to scale well
       | beyond 1000 items, that's not an arbitrary limit, and it's not 0
       | or 1 either. When we choose not to allow users to back up their
       | hard drive in the username field of their account, that's not an
       | arbitrary limit.
        
       | xlii wrote:
       | I have a feeling that a lot of commenters mix up structure design
       | with structure implementation.
       | 
       | Rule talks about the design (and I find approach interesting) yet
       | implementation can (and most likely should) be limited.
       | 
       | E.g. we are designing a world. In this design we say children can
       | have toys. It doesn't make sense to design universe so that
       | children are limited to 5 toys or any other arbitrary number. But
       | no room can fit infinite number of toys so implementation is
       | limited. For some it's 500 toys but for some it's 500.000 toys.
       | 
       | Names or parents are good real world examples. Sure, most people
       | have 2 parents pr at most 2 names but this rule argues that if
       | there's more than one it shouldn't be limited at all (which also
       | makes sense in real life even though IT system might limit names
       | to 256 characters because hardware).
        
         | eternalban wrote:
         | Then change the name of the rule :)
         | 
         |  _Zero, One, or Many_. I think that would satisfy all
         | concerned.
        
           | xlii wrote:
           | Absolutely agree. However that's for the rule creator. I
           | think that infinity in this context sounds slightly poetic.
        
       | antiquark wrote:
       | RIP the byte.
        
         | layer8 wrote:
         | It's just one octet.
        
       | travisjungroth wrote:
       | A metarule for using rules like this: instead of thinking of it
       | as a rule, think of it as an option you should usually consider.
       | You get lot of the benefits and less of the costs if you just ask
       | questions like "Would it be easier to solve for N than 3? Easier
       | to use and maintain?". The answer will never always be yes or no,
       | but you'll be more likely to get the right answer if you ask the
       | question.
        
       | Kim_Bruning wrote:
       | Am I the only person who sometimes makes an exception for "two" ?
        
         | ChainOfFools wrote:
         | I seem to recall that there are certain languages (human
         | languages) that have cases specifically for talking about pairs
         | of objects, situated semantically between singular and plural.
         | I think one of these is an archaic form of Greek, maybe ancient
         | or Hellenistic.
        
         | mmphosis wrote:
         | How about a sign bit?
         | 
         | https://en.wikipedia.org/wiki/Unum_(number_format)
        
         | playingalong wrote:
         | It has always "bothered" me that AWS allowed for up to two API
         | keys per IAM user.
        
           | jdreaver wrote:
           | That's probably so you can rotate your keys without downtime.
        
         | adhesive_wombat wrote:
         | A ping-pong buffer certainly seems like generalising to a ping-
         | pong-pang-peng-pung-p(infinity)ng buffer would be overkill.
        
         | hprotagonist wrote:
         | i fairly regularly want pairwise, yep.
        
         | jerf wrote:
         | No.
         | 
         | But I have the decency to feel bad and leave an apologetic
         | comment.
        
         | GauntletWizard wrote:
         | Wherever I need "Two" I usually find that what I actually need
         | is a one and a many; there's not a master and a replica,
         | there's one master and many replicas. There's not two places
         | that the data is when doing a raid restripe, there's many
         | places the data probably is and one place it should be.
        
         | grensley wrote:
         | Two is quite relevant for relationships
        
           | ChickeNES wrote:
           | Well, if you ignore polyamory and polygamy...
        
           | layer8 wrote:
           | You can have n-ary relationships. Relational databases are
           | built on that.
        
         | alex_sf wrote:
         | The argument against two would be: if you can do two, why can
         | you not do three?
        
       | GauntletWizard wrote:
       | I learned this rule through, of all things, the children's series
       | Animorphs. One book they encounter a colony of ants, who mention
       | that "You are one, we are many" and count that way; any finite
       | number is one, and infinite numbers are many.
        
       | hsjqllzlfkf wrote:
       | John Carmack argued the opposite. He said he would hardcore
       | limits into his data structures. Limits that would never be hit
       | under normal operating circumstances. He argued that when you
       | design software you should have an idea under what circumstances
       | it will run and optimize for those. The fact that people normally
       | don't do this, is why software often lags - that algo you
       | implemented worked worked just fine when it's hashmap has 1000
       | keys, but not when it has 1m keys.
       | 
       | I haven't articulated his point very well, but I lost interest in
       | this comment mid way thru.
        
         | cjfd wrote:
         | Well, John Carmack is a game programmer, at least originally.
         | In games one has strong upper limits on how long it can take to
         | calculate the next frame. So, it may very well be that
         | something can be said for this in this context. I am not sure
         | in general, though. If one limits the length of the name of a
         | person in some record one can start waiting until one day a
         | person arrives that has a name that is one character longer and
         | then a programmer needs to get involved to increase the limit.
         | Okay, maybe there exists a ridiculously large maximum name
         | length such that this problem is unlikely to occur. Then one
         | may still have the problem that if all people would have names
         | that approach this limit, the system still might get too slow
         | as well. So maybe we should impose a limit on the sum of all
         | the lengths of all the names in the system. Hmmm... Kind of
         | gets a bit too complicated and arbitrary for my tastes. I think
         | for many systems Carmarcks advice really is not very good.
        
           | layer8 wrote:
           | There should be limits. What if someone pastes the text of
           | the bible into the name field? What if you want to send an
           | email containing the name, and you hit the maximum email size
           | accepted by the SMTP server (they all have a limit)? What if
           | you want to send a postal letter and print the name on the
           | envelope?
           | 
           | Not setting explicit limits either means that you still have
           | implicit limits, but you don't know what they are and where
           | and when you'll run into them, or it means you're opening
           | yourself up to DoS attacks, because your database or your
           | backups or your AWS bill will run amok.
        
             | GauntletWizard wrote:
             | I have legitimately used the complete works of William
             | Shakespeare, unabridged, as a password. Even that is in
             | megabytes, and not significant load for bcrypt2.
             | 
             | Not that there should be no limits at all, but the upper
             | bound should be relatively high.
        
               | asynchronous wrote:
               | Doesn't bcrypt2 essentially truncate every source input
               | to no longer than 35 characters?
        
         | mhh__ wrote:
         | > 1m keys
         | 
         | As opposed to not working at all?
        
         | quelsolaar wrote:
         | Ive come to agree with this. Say you have a struct that
         | contains a name. If you limit the name size to say 64 bytes,
         | then you can store it in the struct, otherwise you need to have
         | a separate allocation and an indirection. This makes the code
         | slower, more error prone and more complex to use. So think hard
         | of when "infinite" is justified.
        
           | meghan_rain wrote:
           | > more error prone
           | 
           | not when using Rust ;-)
        
           | AlotOfReading wrote:
           | You should also be careful about incorrectly imposing limits
           | on human input, as every "falsehoods programmers should know
           | about x" repeatedly hits on. To continue the name example,
           | there are people with legal names and titles that take far
           | more than 64 bytes to store and truncation is not an
           | appropriate solution. You _should_ store the entire thing,
           | even if that 's a small bit more difficult technically. The
           | other limit also applies. Some people don't have any
           | characters in their legal names at all, like my mother (birth
           | certificate name was blank).
        
             | PKop wrote:
             | Or you can make a tradeoff and simply exclude these
             | outliers from using your product, instead of optimizing for
             | their unique case.
        
             | quelsolaar wrote:
             | Who said anything about the name being the name of a
             | person?
        
             | zeroonetwothree wrote:
             | I think "should" is a bit strong here. There are always
             | tradeoffs. Having support for very long names can cause
             | other problems, like the opportunity for abuse, making it
             | hard to format UIs, making human inspection of data much
             | harder, or even limiting algorithmic efficiency.
             | 
             | For example, US Passports have a relatively small limit for
             | name length (I think it's around 20 characters each for
             | first/middle/last).
        
               | jiggawatts wrote:
               | "Who you are is inconvenient for me."
        
               | GMoromisato wrote:
               | Conversely, "The needs of the many outweigh the needs of
               | the few, or the one."
        
           | zeroonetwothree wrote:
           | This is true, but note that the "Zero one infinity rule" only
           | applies to the _number of items_ , not the size of an item.
           | Limits on size are a lot more typical anyway.
           | 
           | (Yes I know someone will point out that to store a list of
           | items requires a linear amount of space, so therefore an
           | unbounded number of items inherently implies an unbounded
           | amount of space)
        
         | MawKKe wrote:
         | Sure if you know the operating domain, just hard code some
         | sensible upper limit and move on. Not that this way is "better"
         | in all cases. YAGNI and all that
        
         | adhesive_wombat wrote:
         | You can still have both: an algorithm that technically has no
         | formal numerical limit, but have warnings or errors when it
         | reaches some condition that is clearly outside the design
         | scope.
         | 
         | For example, you might have a system that warns somehow if you
         | start inserting items at the front of a vector with over a
         | million items (when you originally expected the vector to be
         | used for tens of items). If the structure is suddenly being
         | used for something way outside of it's design scope, it's
         | probably in need of a re-think.
        
           | stefan_ wrote:
           | Nope. "No formal limit" or "infinity" is just another way of
           | saying "dynamic memory allocation". There is a reason you
           | usually find this advice in game adjacent places; to punt
           | everything to the dynamic memory allocator makes it very
           | difficult to know if you are keeping within the memory-
           | starved limits of e.g. the game console you are developing
           | for, or on mobile to keep a garbage collector from stepping
           | in and ruining your day.
        
             | adhesive_wombat wrote:
             | "Nope." An "algorithm" can be about things other than
             | memory allocation. Say you have a collection of sensors the
             | you poll. ZOI is saying that the system that polls them
             | shouldn't[1] have some kind of hard assumption that there
             | will only ever be, say, specifically 4. You could still
             | statically allocate the storage on a given system, for
             | example in an embedded system where the software is
             | compile-time configured for a given hardware.
             | 
             | However, if you pass the "poll_sensors" function a size of
             | 60 million when the system was designed for "about 4, I
             | guess", it's likely that you're operating the algorithm in
             | a regime it wasn't designed for. You may (or may not, this
             | is just another trade-off) wish to know about it.
             | 
             | [1]: of course you can always construct exceptions. If you
             | follow every rule you subscribe to dogmatically, then
             | you're doing something more akin to religion then
             | engineering.
        
         | shoo wrote:
         | > when you design software you should have an idea under what
         | circumstances it will run and optimize for
         | 
         | This philosophy is sometimes referred to as "data-oriented
         | design"
         | 
         | see also: Mike Acton's CppCon 2014 talk:
         | https://www.youtube.com/watch?v=rX0ItVEVjHc
         | 
         | Data-oriented design seems appropriate for shipping high-
         | performance games, and shipping them on schedule and on budget.
         | The goal is not to design and code the algorithm with the best
         | theoretical scalability characteristics - and publish it in a
         | paper, or invent the most mathematically general version of the
         | algorithm to be factored out into a library for use by 100
         | hypothetical future projects. The goal is not to design a
         | system using hexagonal architecture (or whatever) that can more
         | flexibly accommodate arbitrary changes of requirements in
         | future, at the expense of performance. The source code is not
         | the product that the customers buy, the game is the product.
         | 
         | The goal is to understand the problem you actually have -
         | processing data of a certain finite size or distribution of
         | sizes, on some fixed finite set of target hardware platforms,
         | and figure out how to do this in a way that hits performance
         | targets, with a limited budget of engineering time, to keep
         | moving forward on schedule toward shipping the product.
        
         | roflyear wrote:
         | Only works when fail early is better than run slow which IME is
         | not often the case
        
         | TazeTSchnitzel wrote:
         | You can see this kind of design in action in the Source engine,
         | perhaps inherited from Quake (which is partly Carmack's work of
         | course). There are hard limits on things like how many objects
         | can exist at once. Occasionally those get hit... usually due to
         | a memory leak![0] So the artificial limit helps find real
         | problems. That's also been my experience with PHP, which limits
         | the amount of memory a request or script execution can consume
         | by default.
         | 
         | [0] https://www.youtube.com/watch?v=pw2X1yhrDdE -- check out
         | shounic's other videos too, there's lots of interesting content
         | there about how complex game systems fail, including about
         | these limits
        
       | at_a_remove wrote:
       | I've had a different take on this that goes "Zero, One, Many."
       | And that when you end up really using your software, the things
       | that were "one" often become "many," and the unthinkable zeroes
       | frequently became ones.
        
       | zeroonetwothree wrote:
       | While this has some theoretical merit, IME, limits are quite
       | useful to catch bugs (or prevent degenerate cases). For example I
       | was recently working on a permissions system wherein there can be
       | members of groups. I set a reasonable limit on the size of a
       | group based on a maximum of actual usage and what I could foresee
       | being reasonable. A few days later, this limit was triggered, and
       | I got a bug report. But it turned out there was some bad caller
       | that had essentially an infinite loop that was adding more and
       | more members to the group. Without the limit this probably would
       | only have detected after it cascaded and caused other failures.
       | 
       | As another example, Facebook famously had (maybe still has? I
       | have no idea) a limit of 5,000 friends. This was based on the
       | idea that you were supposed to be friends with people you
       | actually kind of know at least a little bit, which would probably
       | fit within 5,000 for most people. After some more popular users
       | started hitting the limit, it led to the development of public
       | profiles/pages as an alternative product to handle that use case.
       | So in this case the limit allowed the company to identify a new
       | market that they could better serve in an alternative way.
       | 
       | BTW, the George Gamow book that the name comes from (One, Two,
       | Three... Infinity) is a great read for anyone interested in
       | popular descriptions of physics or math. He also lived a very
       | interesting life, I would recommend his autobiography "My World
       | Line".
        
         | josephg wrote:
         | Yeah; expected limits are also fantastically useful in
         | performance engineering. It's very common your code needs to
         | handle an arbitrarily sized input, but 99% of the time the
         | input will be bounded. (Or generally simpler). Special casing
         | the common code path can make a lot of code run much faster.
         | 
         | For example, in some code I'm writing at the moment I have
         | lists of integers all over the place. I call them lists -
         | usually they only have 1 element. Sometimes they have 2
         | elements (10%) and very occasionally more than 2 elements or
         | they're empty (<1%).
         | 
         | If I used a language like Javascript, I'd use Arrays. But
         | arrays are quite expensive performance wise - they need to be
         | allocated and tracked by the GC and the array contents are
         | stored indirectly.
         | 
         | Instead, I'm using an array type which stores up to 2 items
         | inline in the container object (or the stack) without
         | allocating. It only allocates memory on the heap when there are
         | 3 or more items. This decreases allocations by 2 orders of
         | magnitude, which makes a really big difference for performance
         | in my library. And my code is just as readable.
         | 
         | I'm using the smallvec crate. There's plenty of libraries in C
         | and Rust for this sort of thing in arrays and strings. Swift
         | (like obj-c before it) builds small string optimizations into
         | the standard library. I think that's a great idea.
         | 
         | https://crates.io/crates/smallvec
        
         | pyuser583 wrote:
         | The Wiki page says the problem isn't with limits, but with
         | _arbitrary_ limits.
         | 
         | When a program limits me to 256 of something, it doesn't seem
         | arbitrary.
         | 
         | I've heard stories of programmers setting limits to multiples
         | of two simply so nobody asks why.
        
         | crazygringo wrote:
         | Totally agreed. I think it's absolutely a best practice to set
         | limits.
         | 
         | Code is generally designed to operate within certain
         | "reasonable" performance boundaries, and when it goes outside
         | those you need to think whether code should be rewritten to
         | accomodate it.
         | 
         | Just a tiny example, but I regularly deal with long (800+ page)
         | PDF's on my iPad, reading parts of them in the stock Books app.
         | When I select text to highlight, the context menu unfortunately
         | puts "select all" directly next to "highlight". Of course,
         | every so often I accidentally hit "select all", and then I have
         | to force-close the app because otherwise it just freezes for 10
         | minutes as it extracts and selects the text on every single
         | page.
         | 
         | When really, it needs a limit to detect that, hey, if the PDF
         | is over 20 pages long then just don't allow "select all"
         | anymore, because this isn't the right tool for the job.
        
           | psychphysic wrote:
           | Although that's also ripe for the annoyance of not being able
           | to select all in border cases.
           | 
           | I'm much in favour of don't fuck with the basics (i.e. the
           | core expect feature UI like select, copy and paste.).
        
         | solveit wrote:
         | I think a better way to formulate the same rule of thumb is
         | that zero, one, and infinity are the only numbers that don't
         | need to be _justified_. You should always have an answer to the
         | question  "why 5000 and not 6000?"
         | 
         | Of course, the justification can be as simple as "it has to be
         | some number and 5000 is as good as any", but that opens the
         | door to the discussion of whether 5000 really is as good as any
         | other number, which is often surprisingly enlightening.
        
         | pixl97 wrote:
         | What should the limits be? And are the limits clear to the
         | users of the system? Are they clear to other components of the
         | system?
         | 
         | I agree we should have limits in software because we don't have
         | unlimited memory and processing time, but I commonly find these
         | limits are encoded by the imagination of the programmer working
         | on the software at the time, and it's often you find limits
         | that were not considered in systems design of the product.
        
         | HervalFreire wrote:
         | >While this has some theoretical merit,
         | 
         | What theoretical merit? It sounds like a completely made up
         | rule of thumb based off of a persons anecdotal experience.
        
       | heyzk wrote:
       | Which, I assume, led to one of my favorite idioms: there are only
       | three numbers in computing: 0, 1, and n.
        
       | deergomoo wrote:
       | I think this is something programmers intuitively learn pretty
       | quickly (in fact I'm only just learning that there is actually a
       | name for this idea), but it's an interesting principle to explain
       | to non-developers.
       | 
       | Back when I was doing agency work, on more than one occasion I
       | had clients question why modifying some entity to permit multiple
       | related entities instead of one would be more work than they were
       | anticipating.
       | 
       | The question was usually along the lines of "would it help if we
       | were to limit related $foobars to three?". But of course, from a
       | structural perspective supporting three related records is
       | usually exactly the same amount of work as supporting arbitrary
       | amounts. Yeah I could probably hack in some fixed amount of extra
       | records, but that's just going to create an increasingly large
       | mess when your business grows and you realise
       | three...twelve...one hundred wasn't enough. Might as well do it
       | properly from the start.
        
       | dragontamer wrote:
       | This is probably a rule I believed in as a beginner, and
       | completely don't believe in anymore today.
       | 
       | Lets take a look at an exceptionally well designed system:
       | single-error correction double-error detection codes. Also known
       | as "ECC RAM". The implementation of this only works for *exactly*
       | 64 bits + 8-parity bits.
       | 
       | Now sure, we can talk about how we can generalize SECED ECC to
       | different numbers of bits or whatever. But I'll also point out
       | that the only implementation in widespread use is ECC Memory, the
       | 64-bits + 8-parity bits (72-bits total) implementation.
        
         | Nullabillity wrote:
         | It is generalized, you can chain together infinitely many
         | blocks.
        
         | alex_sf wrote:
         | I would argue your example is a special case of only allowing
         | one.
        
           | dragontamer wrote:
           | But it detects two-bit errors.
           | 
           | Which is nonzero, non-one, and non-infinity.
           | 
           | --------
           | 
           | SEC without DED is the classic implementation of Hamming
           | error correction codes. The double error detection (aka DED)
           | but is grafted on top of the theory, because it's useful in
           | practice.
        
       | crazygringo wrote:
       | I've read the article but I'm having trouble understanding the
       | context of what this was _in response to_. The article quotes:
       | 
       | > _I formulated it in the early 70s, when I was working on
       | programming language design and annoyed by all the arbitrary
       | numbers that appeared in some of the languages of the day._
       | 
       | Was it just an issue with languages in the 70's then? Because I'm
       | struggling to think of any "arbitrary number" limits in modern
       | languages.
       | 
       | Obviously we have numerical limits depending on if a variable is
       | a signed 32-bit integer or whatever. But those aren't arbitrary.
       | 
       | Like would older languages just arbitrarily say there couldn't be
       | more than 100 elements in an array, or something?
        
         | layer8 wrote:
         | Lots of examples (e.g. maximum number of nesting levels), see
         | for example:
         | 
         | http://www.tendra.org/tdfc2-config/chapter2
         | 
         | https://www.ibm.com/docs/en/epfz/5.3?topic=reference-limits
         | 
         | One aim of finite implementation limits is to define which
         | programs are guaranteed to compile successfully, so that you
         | don't run into the situation that a program compiles on one
         | implementation but not on another implementation, which would
         | raise the question of whether its the program or the compiler
         | that is nonconforming. If both are conforming, you want the
         | program to compile 100% of the time. This isn't possible if
         | programs can have unlimited complexity.
         | 
         | The mindset is similar to embedded programming, where you have
         | limited memory and absolutely don't want to run into the
         | situation that there's not enough memory left to perform the
         | operation that needs to be performed.
        
         | gdprrrr wrote:
         | Early C compilers had a limit of 6 characters for an identifier
        
       | neilv wrote:
       | I first heard of "no arbitrary limits" in connection with GNU
       | software. It was one of ways that GNU software not only had
       | different licensing than alternatives, but was also usually
       | technically superior.
       | 
       | (At the time, there were a lot of different workstation computer
       | vendors, each with their own flavor of Unix. But where GNU tools
       | existed, they'd usually be better than all of the Unix
       | workstation ones. One exception was that the premium vendor C and
       | C++ compilers generated better code and/or were more trusted than
       | GCC at the time. GCC didn't take over until Linux, initially on
       | x86.)
        
       ___________________________________________________________________
       (page generated 2023-03-18 23:00 UTC)