[HN Gopher] Rob Pike's Rules of Programming (1989)
       ___________________________________________________________________
        
       Rob Pike's Rules of Programming (1989)
        
       Author : gjvc
       Score  : 315 points
       Date   : 2020-08-12 18:30 UTC (4 hours ago)
        
 (HTM) web link (users.ece.utexas.edu)
 (TXT) w3m dump (users.ece.utexas.edu)
        
       | cactus2093 wrote:
       | Interesting that the first 3 are all about performance. Which
       | strikes me as a bit ironic given rule #1, which could be
       | summarized as don't worry about performance until you have to.
        
       | game_the0ry wrote:
       | > Data dominates. If you've chosen the right data structures and
       | organized things well, the algorithms will almost always be self-
       | evident. Data structures, not algorithms, are central to
       | programming.
       | 
       | :O
       | 
       | Epiphany
        
       | gentleman11 wrote:
       | > Data dominates. If you've chosen the right data structures and
       | organized things well, the algorithms will almost always be self-
       | evident
       | 
       | What does this have to say about the careers and roles of data
       | scientists vs programmers? A data scientists entire job is to
       | categorize and model data in a useful way. In the future, will
       | they fundamentally more important than coders, or will the two
       | roles just merge?
        
         | ttamslam wrote:
         | I think you're conflating two things: in my mind, working on
         | the shape of data is different than pulling inferences out of
         | that data.
        
       | mywittyname wrote:
       | > "write stupid code that uses smart objects".
       | 
       | Writing stupid code is actually really difficult.
       | 
       | For me, it takes a little bit of iterating before I know just the
       | right place to insert stupid.
        
         | lmkg wrote:
         | I'm reminded of the old adage "I'm writing you a long letter,
         | because I don't have time to write a short one."
         | 
         | (Often attributed to Mark Twain, but similar sentiments were
         | expressed by many before him.)
        
         | RangerScience wrote:
         | I have two projects that I consider to have nearly-perfect
         | code. Both are their third iterations, and I think they're
         | stable at this point.
         | 
         | Kinda goes: 1) Make a bad solution exploring the problem 2)
         | Explore a good idea for how to solve the now-understood problem
         | 3) Mature the good idea through usage.
        
         | dnautics wrote:
         | Do you use TDD? I'm not religious about it in general, but when
         | I'm lost, confused, and easily distracted, I start with TDD to
         | write the dumbest possible code.
        
           | sukilot wrote:
           | TDD is good for features (like web apps) but not so much for
           | algorithms.
           | 
           | The difference is that you only need to support a tiny
           | fraction of possible features / use cases, but your
           | algorithms need to be correct for a wide range of inputs.
        
             | dnautics wrote:
             | I disagree. You code your algo for one input, come up with
             | a corner case, write a failing test, refactor, repeat.
             | 
             | For an algo, let's say it operates on a list, I'll start
             | with test f([]) == 0, and implement f to output the
             | constant 0.
             | 
             | And then go from there.
        
           | mywittyname wrote:
           | No.
           | 
           | It's really an issue of not being sure at first what needs to
           | be flexible & data-driven vs handled in code. If make
           | everything data driven, then it becomes this horrible mess
           | where your input is basically a program and your actual code
           | ends up being a terrible interpreter.
           | 
           | I tend to just build things bottom-up, and start with a small
           | bit of functionality, then when I have enough small bits, I
           | bolt them together and decide what I need to abstract at that
           | point, do refactoring on the smaller bits and provide data to
           | them from the caller. Then repeat that continuously until I
           | have all of the functionality I need.
           | 
           | It might be different for other people, but I need to have
           | working code before I can it abstract.
        
       | gridlockd wrote:
       | All this advice against "premature optimization" has created
       | generations of programmers that don't understand how to use
       | hardware efficiently.
       | 
       | Here's the problem: If you profile software that is 100x slower
       | than it needs to be on every level, _there are no obvious
       | bottlenecks_. Your whole program is just slow across the board,
       | because you used tons of allocations, abstractions and
       | indirections at every step of the way.
       | 
       | Rob Pike probably has never written a program where performance
       | _really_ mattered, because if he did, he would 've found that you
       | need to think about performance right from the beginning and all
       | the way during development, because making bad decisions early on
       | can force you to rewrite almost everything.
       | 
       | For instance, if you start writing a Go program with the mindset
       | that you can just heap-allocate all the time, the garbage
       | collector will eventually come back to bite you and the
       | "bottleneck" will be your entire codebase.
        
         | byronr wrote:
         | > Rob Pike probably has never written a program where
         | performance really mattered
         | 
         | Rob Pike has written window system software which ran in what
         | now would be called a "thin client" over a 9600 baud modem and
         | rendered graphics using a 2MHz CPU. He probably knows a thing
         | or two about performance tuning.
        
       | Areading314 wrote:
       | These resonate, especially #1, but I'm not so sure about #5.
       | Although it makes sense to choose good data structures, I don't
       | think that guarantees a simpler algorithm. For example you can
       | store your data in a heap (tree), and still need to write a tree
       | traversal algorithm to print out the elements in order.
        
       | jorangreef wrote:
       | Rule 1 and 2 depend on context, whether you're working on an
       | existing program or a new program. They can be true or false.
       | They can really help or they can really hurt. Are you going into
       | an existing system to do performance optimization? Sure, don't
       | guess, measure. Are you designing a new system? Throw out those
       | parroted premature optimization mantras... you are responsible
       | for designing for performance upfront. You will always measure
       | but depending on context you will design for speed first and then
       | test your prototype with measurements. There's no way around an
       | initial hypothesis when you're designing new systems. You have to
       | start somewhere. That's where Jeff Dean's rule always to do back
       | of the envelope guesses will pay off in orders of magnitude, many
       | times over.
       | 
       | Rule 3 and 4 are gold and always true.
       | 
       | Rule 5 is the key to good design.
        
       | dogbox wrote:
       | > Rule 5. Data dominates. If you've chosen the right data
       | structures and organized things well, the algorithms will almost
       | always be self-evident. Data structures, not algorithms, are
       | central to programming.
       | 
       | Is "data structures" the correct term here? Assuming I'm not
       | misinterpreting, the usage of "data structures" can be misleading
       | - one usually thinks of things like BST's and hash tables, which
       | are inherently tied to algorithms. I feel like "data modeling"
       | better captures the intended meaning here.
        
         | [deleted]
        
       | sethammons wrote:
       | A quote from one of our founders that I've always liked:
       | 
       | If you make an optimization that was not at a bottleneck, you did
       | not make an optimization.
        
         | renewiltord wrote:
         | Read _The Goal_ by Eliyahu Goldratt. While it 's possible your
         | founder came upon the idea independently, this is one of many
         | that are repeated in that book. It's relatively short and
         | entertaining to read and has definitely survived the 36 years
         | since first publishing quite well.
        
           | mplewis wrote:
           | This was adapted into a novel about IT/devops called The
           | Phoenix Project. It's an excellent read.
        
         | NewEntryHN wrote:
         | You made an optimization for the future when enough bottlenecks
         | have been fixed such that this one part becomes the bottleneck.
        
           | chubot wrote:
           | Except that there are infinite such non-bottlenecks, and all
           | the effort you spend on there is effort not spent on the real
           | bottlenecks.
           | 
           | In other words, all engineering is time- and cost-
           | constrained. Anybody can build a good chair for $10,000 or a
           | good PC for $100,000. Doesn't mean it's good engineering.
        
             | jorangreef wrote:
             | "Anybody can build a good chair for $10,000 or a good PC
             | for $100,000."
             | 
             | And some people can build a great PC for $1,000 that runs
             | circles around the good PC for $100,000.
             | 
             | There's so much more to engineering than thinking in terms
             | of time and cost constraints. Those are real constraints,
             | but they're not the most important.
             | 
             | Engineering is design. If you have good design, good
             | insight, you can do things that people with infinite time
             | and budget could never dream to achieve. You can start
             | making a product that's a hundred times more powerful for a
             | tenth of the price in a fraction of the time. If you don't
             | have good design, good insight, then no amount of time or
             | budget can help you.
        
             | nix23 wrote:
             | >not spent on the real bottlenecks
             | 
             | BE LOGICAL! Of course you first fix the big bottlenecks.
             | 
             | >good PC for $100,000. Doesn't mean it's good engineering.
             | 
             | Of course it is...or can you gold platter a pc case?
        
             | Koshkin wrote:
             | Yes they can indeed: https://blogs.systweak.com/someone-
             | has-built-a-gigantic-1000...
        
           | tjalfi wrote:
           | There are several assumptions that are far from a given with
           | premature optimization.
           | 
           | 1. Adding the optimization didn't make the code more
           | complicated.
           | 
           | 2. Adding the optimization didn't introduce a bug.
           | 
           | 3. This part of the code will be a bottleneck in the future.
           | The time spent optimizing is a write-off if the project is
           | canceled or that portion is replaced.
        
         | hirundo wrote:
         | That's less true when you're paying the cloud for compute by
         | the second.
        
         | nix23 wrote:
         | That's a bit broad i think, talking about pure speed of your
         | task you are right, talking about energy consumption then it's
         | not always the case. Or small but often repeated task should be
         | optimized no mater if they are bottlenecks, when the system
         | grows they will become bottlenecks, optimize like a Vulcan is
         | what my boss once said...be logical and nothing else (my
         | interpretation)
        
           | jorangreef wrote:
           | "optimize like a Vulcan"... classic!
        
             | nix23 wrote:
             | Best boss ever!! He even had a bottle of his best whisky
             | refilled in a bottle called "Saurian brandy", so everyone
             | who said that this is illegal or ohh that's "start trek"
             | got one...well not a bottle but a glas :)
        
         | atombender wrote:
         | Not all optimization candidates are about bottlenecks. Reducing
         | allocation is also optimization, for example.
        
           | erik_seaberg wrote:
           | Peak memory or garbage collection throughput can become a
           | bottleneck. But if you know you have more memory than you
           | need, further reducing allocation is arguably a waste of your
           | time.
           | 
           | This can become a tragedy of the commons in desktop and
           | mobile apps, where you don't know how much memory the end
           | user has or needs, but you do know _you_ aren 't paying for
           | it.
        
       | svec wrote:
       | Pike himself says "there's a 6th rule":
       | https://twitter.com/rob_pike/status/998681790037442561?lang=...
       | 
       | (6. There is no Rule 6.)
       | 
       | And points to the best source he finds for it on the web:
       | http://doc.cat-v.org/bell_labs/pikestyle
        
         | tinco wrote:
         | dang if this sticks to the frontpage, can you change the title
         | to "Rob Pike's 5 rules of programming _in C_ "? As the title is
         | now it misrepresents Rob Pikes words.
        
       | ed_elliott_asc wrote:
       | When people say worry about the data structures, what do they
       | mean?
        
       | jonfw wrote:
       | "Write stupid code that uses smart objects"
       | 
       | That's a good one. It's amazing how much complexity can be
       | created by using the wrong abstractions.
        
         | chubot wrote:
         | FWIW I find this is especially important for compilers and
         | interpreters.
         | 
         | It's not an exaggeration to say that such programs are
         | basically big data structures, full of compromises to
         | accomodate the algorithms you need to run on them.
         | 
         | For example LLVM IR is just a big data structure. Lattner has
         | been saying for awhile that a major design mistake in Clang is
         | not to have its own IR (in the talks on the new MLIR project).
         | 
         | SSA is data structure with some invariants that make a bunch of
         | algorithms easier to write (and I think it improves their
         | computational complexity over naive algorithms in several
         | cases)
         | 
         | ----
         | 
         | In Oil I used a DSL to describe an elaborate data structure
         | that describes all of shell:
         | 
         |  _What is Zephyr ASDL?_
         | http://www.oilshell.org/blog/2016/12/11.html
         | 
         | https://www.oilshell.org/release/0.8.pre9/source-code.wwz/fr...
         | 
         | I added some nice properties that algebraic data types in some
         | language don't have, e.g. variants are "first class" unlike in
         | Rust.
         | 
         | Related: I noticed recently that Rust IDE support has a related
         | DSL for its data structure representation:
         | https://internals.rust-lang.org/t/announcement-simple-produc...
        
           | mamcx wrote:
           | > FWIW I find this is especially important for compilers and
           | interpreters.
           | 
           | Totally. I'm building a relational language and start to get
           | very obvious why RDBMS not fit certain purity ideals of the
           | relational model (like all relations are sets, not bags).
           | 
           | I'm stuck in deciding which structures provide by default.
           | Dancing between flat vectors or ndarrays or split between
           | flat vectors (columns), and HashMaps/BTree with n-values
           | (this is my intuition now).
           | 
           | --- > I added some nice properties that algebraic data types
           | in some language don't have, e.g. variants are "first class"
           | unlike in Rust.
           | 
           | This sound cool, where I can learn about this?
        
             | chubot wrote:
             | FWIW I found this post thought provoking in thinking about
             | data models of languages.
             | 
             | https://news.ycombinator.com/item?id=13293290
             | 
             | ---
             | 
             | About first class variants:
             | 
             | https://lobste.rs/s/77nu3d/oil_s_parser_is_160x_200x_faster
             | _...
             | 
             | https://github.com/rust-lang/rfcs/pull/2593
             | 
             | Another way I think of this is "types vs. tags": https://oi
             | lshell.zulipchat.com/#narrow/stream/208950-zephyr-...
             | (Zulip, requires login)
             | 
             | Basically variant can types stand alone, and have a unique
             | tag. Tags are discriminated at RUNTIME with "pattern
             | matching".
             | 
             | But a variant can belong to multiple sum types, and that's
             | checked statically. This is modeled with multiple
             | inheritance in OOP, but there's no implementation
             | inheritance. Related:
             | https://pling.jondgoodwin.com/post/when-sum-types-inherit/
             | 
             | So basically in the ASDL and C++ and Python type system I
             | can model:
             | 
             | - a Token type is a leaf in an arithmetic expression
             | 
             | - a Token type is a leaf in an word expression
             | 
             | But it's not a leaf in say what goes in a[i], or dozens of
             | other sum types. Shell is a big composition of
             | sublanguages, so this is very useful and natural. Another
             | construct that appears in multiple places is ${x}.
             | 
             | So having these invariants modeled by the type system is
             | very useful, and actually C++ and MyPy are surprisingly
             | more expressive than Rust! (due to multiple inheritance)
             | 
             | Search for %Token here, the syntax I made up for including
             | a first class variant into a sum type:
             | 
             | https://www.oilshell.org/release/0.8.pre9/source-
             | code.wwz/fr...
             | 
             | There is a name for the type, and a name for the tag (and
             | multiple names for the same integer tag). Tags (dynamic)
             | and types (static) are decoupled.
        
         | chrisweekly wrote:
         | Yes! We should replace DRY (don't repeat yourself) with AHA
         | (avoid hasty abstractions), as the dominant rule of thumb.
        
         | jaggederest wrote:
         | I frequently find that, when refactoring especially, you find a
         | lot of big ol' god objects that are incomprehensible. But when
         | you break them down into 5-10 small objects, suddenly the
         | operation they were trying to do makes perfect sense.
        
           | aciswhat wrote:
           | mmmm very true with Redux --> context+hook state
        
       | munificent wrote:
       | _> Rule 1. You can 't tell where a program is going to spend its
       | time. Bottlenecks occur in surprising places, so don't try to
       | second guess and put in a speed hack until you've proven that's
       | where the bottleneck is._
       | 
       | I agree with the general thrust of this. But it's worth pointing
       | out that often the easiest way to prove where a bottleneck is (or
       | at least) isn't, is to try an optimization and see if it helps. I
       | like profiling tools immensely, and this kind of trial an error
       | optimization doesn't scale well to widespread performance
       | problems. But there's something to be said for doing a couple of
       | quick optimizations as tracer bullets to see if you get lucky and
       | find the problem before bringing in the big guns.
       | 
       | The last three rules bug me. I wish we had a name for aphorisms
       | that perfectly encapsulate an idea _once you already have the
       | wisdom to understand it_ , but that don't actually _teach_
       | anything. They may help you remember a concept--a sort of
       | aphoristic mnemonic--but don 't _illuminate_ it. The problem with
       | these is that espousing them is more a way of bragging ( "look
       | how smart I am for understanding this!") than really helping
       | others.
       | 
       | For example:
       | 
       |  _> Rule 5. Data dominates. If you 've chosen the right data
       | structures and organized things well, the algorithms will almost
       | always be self-evident. Data structures, not algorithms, are
       | central to programming. _
       | 
       | OK, well what are the "right" data structures? The answer is "the
       | ones that let you perform the operations you need to perform
       | easily or efficiently". So you still need to know what the code
       | is doing too. And the algorithms are only "self-evident" because
       | you chose data structures expressely to give you the luxury of
       | using simple algorithms.
        
       | dkarl wrote:
       | > Rule 1. You can't tell where a program is going to spend its
       | time. Bottlenecks occur in surprising places, so don't try to
       | second guess and put in a speed hack until you've proven that's
       | where the bottleneck is.
       | 
       | I wish people would follow this rule and just let stuff work. I
       | recently encountered the most extreme version of this I've ever
       | seen in my career: a design review where a guy proposed a Redis
       | caching layer _and_ a complex custom lookup scheme for a  <1GB,
       | moderate read volume, super low write volume MySQL database. And
       | of course he wants to put the bulk of the data in JSON fields and
       | manage any schema evolution in our application code.
       | 
       | Can't we just let stuff work? I'm no fan of MySQL, but can't we
       | admit that a ubiquitous and battle-tested piece of technology,
       | applied to a canonical use case, on tiny data under near-ideal
       | circumstances, is probably going to work just fine? At least give
       | it a chance before you spend days designing and documenting a
       | bunch of fancy tricks to save MySQL from being crushed under a
       | few megabytes of data.
        
         | maxk42 wrote:
         | This is particularly exasperating for me. I can't tell you how
         | many times in my professional career I've ended up speeding up
         | systems by removing two or three layers of improperly-
         | implemented "caching" and using good ol' MySQL and a basic
         | understanding of algorithmic time complexity to simplify
         | things.
        
           | flukus wrote:
           | Me too. I've seen a few systems that replaced their simple
           | request requiring 10,000 queries "optimized" by requiring
           | 10,000 cache lookups when they should have just added some
           | joins. The bottleneck is the network latency, not the
           | database. The worst I've seen is an nHibernate cache stored
           | in a session variable, half the database was being
           | serialized/deserialized on every http request. Fortunately
           | that was a small database.
           | 
           | Even with in memory caches I've seen systems grind to a halt
           | by death of a thousand cuts, dictionary based entity
           | attribute systems where each attribute is looked up
           | individually. There seems to be a mentality that constant
           | lookup == free lookup and devs don't seem to realize that
           | constant * $bignumber == $biggerNumber. Caching shouldn't be
           | granular.
           | 
           | Obligatory latency numbers every programmer should know:
           | https://gist.github.com/jboner/2841832
        
         | farhaven wrote:
         | Do you work at my company? Because I have a coworker that is
         | always proposing _exactly_ that solution. No matter what issues
         | the code has, for him its usage of MySQL is always "the worst
         | moment".
         | 
         | Asking for benchmark just gets a repeat of "our worst moment is
         | MySQL and we can solve that with some NoSQL cache".
        
       | rubyn00bie wrote:
       | > Rule 5. Data dominates. If you've chosen the right data
       | structures and organized things well, the algorithms will almost
       | always be self-evident. Data structures, not algorithms, are
       | central to programming.
       | 
       | That one hits me in the feels because I think a lot of folks
       | focus on algorithms (including myself), and code patterns, before
       | their data and as a result a lot of things end up being harder
       | than they need to be. I've always liked this quote from Torvalds
       | on the subject speaking on git's design (first line is for some
       | context):
       | 
       | > ... git actually has a simple design, with stable and
       | reasonably well-documented data structures.
       | 
       | then continues:
       | 
       | > In fact, I'm a huge proponent of designing your code around the
       | data, rather than the other way around, and I think it's one of
       | the reasons git has been fairly successful [...] I will, in fact,
       | claim that the difference between a bad programmer and a good one
       | is whether he considers his code or his data structures more
       | important. Bad programmers worry about the code. Good programmers
       | worry about data structures and their relationships.
       | 
       | When I have good data structures most things just sort of fall
       | into place. I honestly can't think of a time where I've
       | figuratively (or literally) said "my data structure really whips
       | the llamas ass" and then immediately said "it's going to be
       | horrible to use." On the contrary, I _have_ written code that is
       | both so beautiful and esoteric, its pedantry would be lauded for
       | the ages-- had only I glanced over at my data model during my
       | madness. No, instead, I awaken to find I spent my time quite
       | aptly digging a marvelous hole, filling said hole with shit, and
       | then hopping in hoping to not get shitty.
       | 
       | One thing that really has helped me make better data structures
       | and models is taking advanced courses on things like multivariate
       | linear regression analysis specifically going over identifying
       | things like multicolinearity and heteroskedasticity. Statistical
       | tools are incredibly powerful in this field, even if you aren't
       | doing statistical analysis everyday. Making good data models
       | isn't necessarily easy, nor obvious, and I've watched a lot of
       | experienced folks make silly mistakes simply because they didn't
       | want something asinine like _two_ fields instead of one.
        
         | gen220 wrote:
         | It makes sense, when we bring in another aphorism "code _is_
         | data ". It's easier to write good code with good libraries. And
         | it's easier to write good data models that extend good data
         | models. The main distinction is that code is very dynamic,
         | flexible, and malleable, whereas data models need not be.
         | 
         | Data models are the "bones" of an application, as part of the
         | application as code is. Data models fundamentally limit the
         | application's growth, but if they're well-placed, they can
         | allow you to do things that are really powerful.
         | 
         | You always want to have good bones. But the Anna Karenina
         | Principle is a thing [0].
         | 
         | So, applying this, I think baby ideas should not have many
         | constraints on the bones, to allow them to move around in the
         | future. Instead, there should be a ton of crap code
         | implementing the idea's constraints, because they change every
         | week, month, quarter, and the implementer is still learning the
         | domain.
         | 
         | Once the implementer reaches a certain point of maturity in the
         | domain, all of the lessons learned writing that crap code can
         | be compressed into a very clever data model that minimizes the
         | amount of "code" necessary, and simultaneously makes the
         | project more maintainable, interface-stable, and extensible: in
         | other words, making it an excellent platform to _build on_. The
         | crap code can be thrown out, because it was designed to
         | halfway-ensure invariants that the database can now take care
         | of.
         | 
         | I think most software we consider "good" these days followed
         | this development cycle. multics -> unix, <kversion_control> ->
         | git, ed -> vi -> vim.
        
         | allover wrote:
         | The counter argument would be that git is the poster-child of
         | poor UX, which could be blamed on the fact that it exposes too
         | much of its internal data structure and general inner-workings
         | to the user.
         | 
         | I.e. too much focus has been put on data structures and not
         | enough on the rest of the tool.
         | 
         | A less efficient data structure, but more focus on UX could
         | have saved millions of man hours by this point.
        
           | wtetzner wrote:
           | Or perhaps learning Git just requires a different approach:
           | you understand the model first, not the interface. Once you
           | understand the model (which is quite simple), the interface
           | is easy.
        
             | rabidrat wrote:
             | People keep repeating this, but it's not true. The
             | interface has so many "this flag in this case" but "this
             | other flag in that case" and "that command doesn't support
             | this flag like that" etc. There's no composability or
             | orthoganality or suggestiveness. It's nonsensical and
             | capricious and unmemorable, even though I understand the
             | "simple" underlying model and have for years.
        
           | gen220 wrote:
           | It's difficult, because git's exposition of it's data
           | structures _enables_ you to use it in ways that would not
           | otherwise be possible.
           | 
           | I think git is more of a power-tool than people sometimes
           | want it to be. It's more like vi than it is like MS Word, but
           | it's ubiquity makes people wish it had an MS-word mode.
           | 
           | So, I think that it's hard to fault git's developers for
           | where it is today. It's a faithful implementation of it's
           | mission.
           | 
           | FWIW, I have never used a tool with better documentation than
           | git in 2020 (it hasn't always had good --help documentation,
           | but it absolutely does today).
        
         | dexen wrote:
         | It's worth noting the same holds true for UI: data dominates.
         | Design your widgets, layout, and workflow around the data.
        
           | chrisweekly wrote:
           | Amen. My 23 years experience in webdev says React (the
           | paradigm, not the lib per se) is dominating web UI precisely
           | because it is all about unidirectional data flow.
        
           | rubyn00bie wrote:
           | > It's worth noting the same holds true for UI: data
           | dominates. Design your widgets, layout, and workflow around
           | the data.
           | 
           | I couldn't agree more.
           | 
           | I think the current state of UI programming is like the
           | pathological case to be honest. Too often folks are concerned
           | with representing their database 1-to-1 in their UI instead
           | of representing their view.
           | 
           | If anyone is suffering from brittle UI code, where somehow
           | caching issues and stale data are affecting your application,
           | this is very likely why. You have muddled your persistence
           | and view concerns together and it's not manageable or pretty.
           | What this means for folks using something like React- don't
           | directly use your persistence models in your views, create
           | "view models" which directly represent whatever the hell it
           | is you're trying to display. Bind your data in your view
           | models, and not your views, and then pass the view model in
           | as props.
        
       | commandlinefan wrote:
       | > Tony Hoare's famous maxim "Premature optimization is the root
       | of all evil."
       | 
       | Actually that was Donald Knuth - it's an urban legend that it's
       | an urban legend that it was originally Knuth. Hoare was quoting
       | Knuth, but Knuth forgot he said it, and re-mis-attributed the
       | quote to Hoare.
        
         | RoutinePlayer wrote:
         | This reminds me of that Woody Allen joke about someone
         | translating all the T.S. Eliot's poems into English after some
         | vandals had broken into the school library and translated them
         | into French.
        
         | karmakaze wrote:
         | And it is usually quoted out of its context.
         | 
         | "We should forget about small efficiencies, say about 97% of
         | the time: premature optimization is the root of all evil. Yet
         | we should not pass up our opportunities in that critical 3%."
        
           | eps wrote:
           | It's also often interpreted literally.
           | 
           | Premature _complex_ optimization is a bad idea, but simple
           | (read, cheap to code) optimization for common bottleneck
           | patterns is a perfectly reasonable thing to do.
        
           | hombre_fatal wrote:
           | I don't think that changes the meaning. Once that 3% matters
           | to you and you've invested the work to measure that 3%, it's
           | not premature anymore.
           | 
           | That "premature" and "optimization" are undefined and left up
           | for debate is what makes it trite.
        
         | karmakaze wrote:
         | Ha. So the truth is that Knuth did quote Hoare, not aware that
         | he was quoting Knuth--indirectly Knuth was quoting himself.
        
         | bendbro wrote:
         | I've always been uncomfortable with these kinds of ideas. The
         | odds that the idea will be correctly applied is heavily tied to
         | intelligence, culture, and situation. Instead of reducing the
         | space of options you must consider, all it says is that you
         | should "do it this way when you should and do it the other when
         | you shouldn't." I suppose perhaps it is useful to highlight
         | that the decision exists, but I would be surprised if anyone
         | working in the space is unaware of the existence of the
         | decision.
         | 
         | The scientific method has a similar problem. A scientist should
         | form their hypothesis before gathering data to evaluate the
         | hypothesis. If a scientist fails to do this, and starts
         | engaging in p-hacking or data dredging, the quality of their
         | research greatly declines. But proving that a hypothesis was
         | obtained before data was collected is not usually provable when
         | just looking at the publication itself. And further, there are
         | ways that data dredging can unintentionally sneak into the
         | scientific process, especially around the phase before
         | hypothesis- observation.
         | 
         | This kind of idea has large technical impact, but doesn't have
         | a solid technical reason. It's proof is closer to aesthetics
         | than reason. And much like other aesthetic beliefs, a
         | population believes it based on no deeper reasoning. Only
         | exclusion or indoctrination can ensure the population's view,
         | and only illogical rhetoric will change it.
        
       | RcouF1uZ4gsC wrote:
       | One of the big problems with fancy algorithms is that they either
       | access data out of order and/or do pointer chasing. Simple
       | algorithms tend to access the data in order.
       | 
       | CPU's have a lot of logic for making in order data access very
       | fast.
        
       | [deleted]
        
       | sam_lowry_ wrote:
       | "Bad programmers worry about the code. Good programmers worry
       | about data structures and their relationships."
       | 
       | -- Linus Torvalds
        
       | andrewl wrote:
       | In _The Mythical Man Month_ Fred Brooks said  "Show me your
       | flowchart and conceal your tables, and I shall continue to be
       | mystified. Show me your tables, and I won't usually need your
       | flowchart; it'll be obvious."
       | 
       | I first read that on Guy Steele's site:
       | http://www.dreamsongs.com/ObjectsHaveNotFailedNarr.html
        
         | vishnugupta wrote:
         | > Show me your tables, and I won't usually need your flowchart
         | 
         | A couple of years ago I spent quite some time trying to
         | evaluate the tech stack (and general engineering culture) of
         | merger/acquisition targets of my employer. It was quite a fun
         | exercise, all said and done. I encountered all sorts; from a
         | small team start up who had their tech sorted out more or less
         | to a largish organisation who relied on IBM's ESB which exactly
         | one person in their team knew how it worked!!
         | 
         | I discovered this exact method during the third tech evaluation
         | exercise. When the team began explaining various modules top-
         | down and user-flows etc., I politely interrupted them and asked
         | for DB schema. It was just on a whim because I was bored of
         | typical one way session interrupted by me asking minor
         | questions. Once I had a hang of their schema rest of the
         | session was literally me telling them what their control and
         | user flows were and them validating it.
         | 
         | Since then it's become my magic wand to understand a new
         | company or team. Just go directly to the schema and work
         | backwards.
         | 
         | Conversely, I've begun paying more attention to data modelling.
         | Because once a data model is fixed it's very hard to change and
         | once enough data accumulates the inertia just increases and
         | instead if changing the data model (for the fear of data
         | migration etc.,) the tendency is to beat the use cases to fit
         | the data model. It's not your usual fail-fast-and-iterate
         | thing.
        
         | gumby wrote:
         | That's Dick Gabriel's site; he posted gls's essay there with
         | attribution (so you didn't realize which site it is). He and
         | quux are friends and collaborators.
        
         | screye wrote:
         | as someone in ML, I see myself wanting the opposite.
         | 
         | ML researchers drown their algorithms in huge tables of
         | results, effectively spending time on "how well" rather than
         | the "what".
         | 
         | It often leads to things being added as long as they are
         | better, with the conclusion of it being a gargantuan monster of
         | models and hand-engineered changes. All with no one
         | understanding how the whole things works as a single unit.
         | 
         | Flow charts are incredibly effective as the top most layer of
         | abstraction. Does the whole process, when viewed in an
         | end-2-end manner, make sense ? We dive into the details only if
         | it passes that sniff test of a flow chart.
         | 
         | I might be missing the point being made here, but they can claw
         | flowcharts from my cold dead hands.
        
           | dllthomas wrote:
           | When Brooks says tables, I believe he means the internal data
           | representation, rather than "tables of results".
        
           | everybodyknows wrote:
           | "Flowchart" has historically, in Brook's time, meant "flow-
           | of-control chart", and these usually degenerate into vast
           | webs of minutia -- useless as abstractions.
           | 
           | But perhaps you meant "flow-of-data between structures" -- in
           | which case we have agreement on engineering, but a muddle on
           | semantics.
        
         | rjsw wrote:
         | >I first read that on Guy Steele's site.
         | 
         | It isn't Guy Steele's website. That page was written by him but
         | the website is owned by Richard P Gabriel.
        
         | monocasa wrote:
         | Resaid by Linus with a bit more modern nomeclature (and Linus's
         | trademark bluntness):
         | 
         | > Bad programmers worry about the code. Good programmers worry
         | about data structures and their relationships
        
           | mgkimsal wrote:
           | > Bad programmers worry about the code
           | 
           | And yet, I see a whole swath of the industry hyper-focused on
           | various linters/styling/rules.
        
             | rumanator wrote:
             | > And yet, I see a whole swath of the industry hyper-
             | focused on various linters/styling/rules.
             | 
             | It seems to me that what you're actually seeing is an
             | entire industry trying to eliminate all code-related
             | issues, specially bike-shedding ones.
             | 
             | This is patently obvious to anyone who was forced to waste
             | their time in code review iterations discussing, say, where
             | a brace should go and how many spaces someone should have
             | added.
        
               | Quekid5 wrote:
               | This usually a good time to apply the When In Rome rule.
               | Do not reformat needlessly, follow the code style of the
               | code you're modifying. Done.
               | 
               | (If multiple people are arguing back and forth in code
               | review -- when following the WIR rule -- tell them about
               | the WIR rule and that should settle it. If not, you have
               | bigger problems in your team.)
        
               | Osiris wrote:
               | I completely agree. One reason I like prettier is that it
               | only has about 6 options you can change. It removes the
               | bike-shedding. Just let it do it's thing and worry about
               | more important things.
               | 
               | It also remove all debate in PRs about style and
               | formatting.
               | 
               | (note: before prettier, I was fairly particular about how
               | I formatted my code, and I disagreed with prettier in
               | some cases, but now, I love having one less thing to
               | think about)
        
               | s17n wrote:
               | Nobody was ever "forced to waste their time" on this
               | stuff. I have a simple rule - I don't comment on other
               | people's style, and if people comment on my style, I just
               | go with their suggestions. Problem solved.
        
               | ses1984 wrote:
               | You've never been on a team with two people with opposing
               | opinions I guess.
        
               | Quekid5 wrote:
               | Just out of morbid curiosity... have you actually
               | experienced multiple 'seniors' giving conflicting code
               | review comments about code _style_ (of all things)?
               | 
               | That sounds quite dysfunctional.
               | 
               | (EDIT: Sure, nitpicks may differ, but...)
        
               | lallysingh wrote:
               | Every have more than one reviewer in your CR?
        
               | johnisgood wrote:
               | > I just go with their suggestions.
               | 
               | Why though? I am not going to go with suggestions if they
               | make the code less readable for me!
        
             | burke wrote:
             | Bad programmers inflicting their worry upon the others.
        
             | Koshkin wrote:
             | And that's because it's Bad Programmers who need help!
        
               | mgkimsal wrote:
               | but... they need help on data structures/relations and up
               | front thinking about those issues, not where curly braces
               | should go, or tabs-v-spaces.
        
               | wvenable wrote:
               | That stuff is hard! Better to just shove your data
               | somewhere unstructured and then you don't have to worry
               | about data structures and relations.
        
               | Benjammer wrote:
               | Right, because the hyper-focus on linting of the industry
               | is the symptom here, it's not a misguided treatment for
               | the underlying problem of bad programmers.
        
               | rapind wrote:
               | ... and "rules" for programming.
        
         | [deleted]
        
         | mathattack wrote:
         | An early mentor put it as "learn the data, which won't change,
         | before learning the fancy stuff on top, which will"
         | 
         | That carried me very well.
        
           | rumanator wrote:
           | Plenty of professional developers would benefit greatly if
           | they read Domain-Driven Design.
        
       | badrequest wrote:
       | There are actually six rules!
        
       | ChrisMarshallNY wrote:
       | I've always taken a very practical, results-oriented approach to
       | software development.
       | 
       | That makes sense, to me.
       | 
       | One of the first things that we learned, when optimizing our
       | code, was to _use a profiler_.
       | 
       | Bottlenecks could be in very strange places, like breaking L2
       | caches. That would happen when data was just a bit too big, or a
       | method was called; forcing a stack frame update.
       | 
       | We wouldn't see this kind of thing until we looked at a profiler;
       | sometimes, a rather advanced one, provided by Intel.
        
       | karl11 wrote:
       | Everyone building no code tools is learning or will learn that
       | the problem most businesses have is not a lack of coding skill,
       | or the inability to build the algorithm, but rather how to
       | structure and model data in a sensible way in the first place.
        
         | throwaway894345 wrote:
         | Modeling the data and structuring the program are indeed the
         | harder tasks, but orgs have lots of smart people who have those
         | skills but not the familiarity with various existing syntaxes
         | and standard libraries and so on that a programmer learns over
         | the decades of their career. Further, those same orgs probably
         | have many people with experience in the latter but without any
         | special ability to think abstractly. This significantly limits
         | the ability to create tools. Further, the no code tools often
         | abstract at a more appropriate level than general purpose
         | programming languages' standard libraries because these tools
         | aren't trying to be general purpose (at least not to the same
         | degree as general purpose programming languages). Lastly, I've
         | seen business people use certain no code tools to build
         | internal solutions quickly that would have taken a programmer
         | considerable (but not crazy) time to crank out, especially
         | considering things like CI/CD pipelines, etc. Nocode won't
         | replace Python, but it serves a valuable niche.
        
         | tmaly wrote:
         | If no code tools are anything like ORMs, there will be some
         | interesting surprises when one encounters non-normalized data
         | structures.
        
       | OliverJones wrote:
       | In long-lived systems (systems that run for many years) it's
       | almost impossible to choose the "right data structures" for the
       | ages. The sources and uses of your data will not last nearly as
       | long as the data itself.
       | 
       | What to do about this? Two things:
       | 
       | STORE YOUR TIMESTAMPS IN UTC. NOT US Pacific or any other local
       | timezone. If you start out with the wrong timezone you'll never
       | be able to fix it. And generations of programmers will curse your
       | name.
       | 
       | Keep your data structures simple enough to adapt to the future.
       | Written another way: respect the programmers who have to use your
       | data when you're not around to explain it.
       | 
       | And, a rule that's like the third law of thermodynamics. You can
       | never know when you're designing data how long it will last.
       | Written another way: your kludges will come back to bite you in
       | the xxx.
        
         | aserafini wrote:
         | Sometimes storing in UTC is simply not correct. For example a
         | shop opening time. The shop opens 10am local time, whether DST
         | or not. Their opening time is 10am local time all year but
         | their UTC opening time actually changes depending on the time
         | of year!
        
           | wtetzner wrote:
           | But a shop opening time is not a timestamp, so I think the
           | original advice is still good. A timestamp is the time at
           | which some event happened, which is different than a
           | date/time used for specifying a schedule.
           | 
           | For example, if you wanted to track the history of when the
           | shop actually opened, it would make sense to store a UTC
           | timestamp.
        
           | wvenable wrote:
           | I made that mistake early my career following this exact
           | advice and I ended up with a lot things were randomly 1 hour
           | off depending on when the record was created and the date
           | entered.
        
           | Benjammer wrote:
           | Totally. "Store everything in UTC" is just another flavor of
           | "pick a timezone to store everything." In a lot of cases, you
           | probably need to go ahead and just store the fully qualified
           | date including timezone/offset for each record.
        
           | karmakaze wrote:
           | The most interesting case of this I encountered was for photo
           | 'timestamps' on a global sharing site. UTC was being used and
           | I was proposing a change to local time. There was great
           | debate as many drank the UTC juice and stopped thinking.
           | 
           | It was when I showed them that we also have a 'shot at'
           | location then proceeded to show Christmas eve photos showing
           | the UTC time converted to the viewers local timezone (not
           | always evening, not always Dec 24) alongside where the photo
           | was taken. Just as in space-time a photo needs both a time
           | and a place.
        
       | terandle wrote:
       | Am I wrong to avoid writing O(n^2) code if at all possible when
       | it is fairly easy to use hash tables for a better time
       | complexity? Sure when n is small the O(n^2) one will be faster
       | but when n is small /anything/ you do is fast in absolute terms
       | so I'm trying to not leave traps in my code just waiting for n to
       | get bigger than initially expected.
        
         | bluGill wrote:
         | That depends.
         | 
         | If you are writing the code yourself then the maintenance costs
         | of everyone else after you trying to understand it makes it
         | wrong. However most programming languages have generic
         | programing features such that you can just use an algorithm and
         | so you aren't writing either the algorithm. In that case the
         | code for the fast hash is equal to the O(n^2) code and so of
         | course you select the faster one as a application of don't
         | prematurely pessimise your code rule. If your programing
         | language doesn't already have built in generic algorihtms for
         | your data, then you are using the wrong language (unless your
         | job is to write the generic algorithms for the language in
         | which case this doesn't apply because you can assume your
         | algorithm will be used in a performance critical part at some
         | time)
        
         | [deleted]
        
         | fabian2k wrote:
         | I would not read these rules as anything like suggesting to
         | write O(n^2) code. I think at best I'd read them as favoring
         | O(n) instead of (1) for small n, but I really think that
         | "fancy" algorithms in this case doesn't mean obvious
         | optimizations like this.
         | 
         | If you can't guarantee n is small, I think it is entirely
         | sensible to use a dictionary/hash table instead of looping over
         | an array or list. As long as n is small the overhead is
         | probably irrelevant, and it'll prevent surprises if n gets
         | large as you said. And if the difference actually matters, you
         | get back to rule 1 and 2 and measure first anyway.
        
         | mav3rick wrote:
         | Using array based data structures also get you more cache hits.
         | For smaller data sets, this may well be faster than a fancy
         | algorithm.
        
           | bluGill wrote:
           | This is true only if you are iterating over the entire array
           | often. If you only rarely need to access one data member the
           | array will not be in cache and so you have less to load from
           | memory. Depending on how your data is structured the array
           | may or may not save time even in small sizes.
        
             | mav3rick wrote:
             | Of course temporal and spatial locality are important.
        
         | coliveira wrote:
         | It depends. If you don't need to worry about performance, then
         | it will work. But in some performance-dependent situations, the
         | brute force algorithm will win over the hash table, and the
         | only way to know is measuring which one is better.
        
         | fasterpython wrote:
         | If it's fairly easy, then I think it still fits the spirit of
         | the rules. KISS and all.
         | 
         | On the other hand, the idea that one might be setting traps is
         | slightly weird... if you _know_ 90% that n will be large then
         | pick an algorithm that's efficient (and since it's easy to
         | implement, it's a win-win). If n is always going to be small,
         | then does the choice really matter?
        
           | naet wrote:
           | If N is known small you should favor simplicity and
           | readability over complex optimization.
        
         | dimitrios1 wrote:
         | Yes. If you don't measure, you will never know when the time
         | complexity moves in your favor due to the constant, and other
         | factors. For example, quicksort moves to a O(n^2) algorithm
         | (insertion sort) for the last few iterations of a branch of
         | work(typically n=1000) because this reduces the total sort
         | time.
        
         | dragontamer wrote:
         | > Am I wrong to avoid writing O(n^2) code if at all possible
         | when it is fairly easy to use hash tables for a better time
         | complexity
         | 
         | Are you sure that std::unordered_map is faster than
         | std::vector? Did you measure?
         | 
         | Every time you access an element in std::vector, you also
         | access nearby ones (thanks to L1 cache, as well as CPU-
         | prefetching of in-line data).
         | 
         | In contrast, your std::unordered_map or hash-table has almost
         | no benefits to L1 cache. (It should be noted that linear-
         | probing, despite being a O(N^2) version of hash-tables worst-
         | case, is actually one of the better performers due to L1 cache
         | + prefetching)
        
           | zabzonk wrote:
           | Also, creating a hash isn't free. And often ordering is
           | required.
        
             | dragontamer wrote:
             | > Also, creating a hash isn't free.
             | 
             | Hmmm... I argue that the hash is nearly free actually.
             | 
             | An unordered_map traversal is probably DDR4 latency bound.
             | That's ~50-nanoseconds (200 clock ticks) per access. What's
             | the CPU going to do in that time?
             | 
             | Well, spending 10 to 20 clock ticks on a typical hash
             | algorithm is fine. Then it will wait the other 180 clock
             | ticks for RAM. If you got hyperthreading, maybe the CPU
             | will go to another thread and do meaningful work while
             | waiting for RAM... but... I think you get the gist.
             | 
             | Even IF the hash were free, the CPU is waiting for RAM
             | anyway. So you got plenty of time to make that hash
             | worthwhile. Even an integer division/modulo operator (worst
             | case ~80 clock ticks) can fit in there while waiting for
             | RAM, with plenty of room to spare.
             | 
             | I guess if everything was in L1 cache, the story is a bit
             | different. A lot of "depends", depends on the data, the
             | access frequency, etc. etc.
        
           | krzat wrote:
           | Worrying about performance of small collections is premature
           | optimization.
           | 
           | Using maps or sets nowadays is mostly for clarity, as they
           | are used to solve certain kind of problems.
        
             | dragontamer wrote:
             | I agree with you. But what you're talking about is
             | completely different from what I was responding to
             | originally.
             | 
             | If you need a set, use a set. But don't assume that its
             | faster than a std::vector.
             | 
             | Even then, std::vector has set-like operations through
             | binary_search or std::make_heap in C++, so it really isn't
             | that hard using a sorted (or make_heap'd) std::vector in
             | practice.
             | 
             | --------
             | 
             | Even if you don't plan on doing optimization work, its
             | important to have a proper understanding of a modern CPU.
             | The effects of L1 and prefetching are non-trivial, and make
             | simple arrays and std::vectors extremely fast data
             | structures, far faster than compared to 80s or 90s
             | computers anyway. A lot of optimization advice from the
             | past has become outdated because of the evolution of CPUs.
             | 
             | So its important to bring up these changes in discussion,
             | from time to time, to remind others to restudy computers.
             | Things change.
        
         | mywittyname wrote:
         | Keep in mind that a lot of this was written during an era where
         | even modestly complex data structures/algorithms had to be
         | rolled by hand.
         | 
         | The philosophy is really to not waste time implementing
         | optimizations that may not be necessary. Naturally you should
         | reach for the best tool you have in your tool box. So if you
         | language of choice has a hashmap that can be used with no
         | additional work, go for it. But don't wait two days rolling
         | your own red-black tree because it might be better.
        
         | yashap wrote:
         | I think, in most cases, if you have n items in memory, and you
         | want to find m of them by id, or some such thing, your default
         | should be to represent the n items as a hashmap, not a list,
         | and look them up from the hashmap. There will be cases where
         | this isn't the right choice, but it's a good default. And it's
         | almost always virtually no extra complexity to represent them
         | this way, i.e. often something as simple as: myMap =
         | myList.groupBy(_.id)
         | 
         | Don't write complex optimizations until you know you need them,
         | but I think defaulting to code with good O(N) complexity, where
         | it's simple to do so, is a good default.
        
         | BurningFrog wrote:
         | Yeah, I do well writing dumb inefficient code as default, and
         | optimizing it when needed, which is almost never.
         | 
         | If I know beforehand we'll handle a lot of data, I can pick
         | something fast and complex to begin with, but that effort is
         | probably mostly a waste.
        
         | dtech wrote:
         | Use whatever makes for clearer code, unless you are certain
         | it's the bottleneck.
        
       | dang wrote:
       | If curious see also
       | 
       | 2017 https://news.ycombinator.com/item?id=15265356,
       | 
       | https://news.ycombinator.com/item?id=15776124
       | 
       | 2014 https://news.ycombinator.com/item?id=7994102
       | 
       | Pete_D gets credit for the date:
       | https://news.ycombinator.com/item?id=15266498. These rules come
       | from "Notes on Programming in C"
       | (http://www.lysator.liu.se/c/pikestyle.html), which has its own
       | sequence of threads:
       | 
       | 2017 https://news.ycombinator.com/item?id=15399028,
       | 
       | https://news.ycombinator.com/item?id=13852734
       | 
       | 2014 https://news.ycombinator.com/item?id=7728084
       | 
       | 2011 https://news.ycombinator.com/item?id=3333044
       | 
       | 2010 https://news.ycombinator.com/item?id=1887442
        
       | bob1029 wrote:
       | Modeling the problem domain is so important that I dont know why
       | the first year of every compsci undergrad program isnt entirely
       | dedicated to teaching the idea.
       | 
       | Instead, day 1 is installing python or java, running hello world
       | and talking about pointers, binary, encoding, logic gates, etc.
       | 
       | We should be teaching students on day 1 that code is a liability
       | and to be avoided whenever it is convenient to do so.
        
       | Ozzie_osman wrote:
       | It turns out rule 5 (Data dominates. If you've chosen the right
       | data structures and organized things well, the algorithms will
       | almost always be self-evident) is both true but also hard.
       | 
       | Eric Evans' Domain Driven Design is a good book on the topic.
        
         | Supermancho wrote:
         | > If you've chosen the right data structures and organized
         | things well, the algorithms will almost always be self-evident)
         | is both true but also hard.
         | 
         | The problem is not the self-evident algorithm, but the delicate
         | implementation (or god forbid, at scale).
         | 
         | Take in 1000 web requests per second. The data is all strictly
         | validated and has about 60 fields a record/req plus dealing
         | with errors.
         | 
         | How does that go from webserver to ( _rolls dice_ ) kafka to a
         | ( _rolls dice_ ) cassandra that can be queried accurately and
         | timely? How much does that cost?
         | 
         | Oh, that's not a programmer problem. Except it is. Creating a
         | fantasy niche of describing problems as data vs algorithm is
         | the canonical ivory tower disconnect.
        
           | veets wrote:
           | It seems you are arguing something different, although I am
           | having a hard time understanding what you have written. I
           | think you are saying algorithms and data structures aren't
           | hard, distributed systems are hard. In my experience choosing
           | the correct data structures and algorithms in your
           | services/programs/whatever can dramatically simplify the
           | design of your systems overall.
        
         | paloaltokid wrote:
         | Not only is it hard, it's the one thing that if you get it
         | right, your technical foundation will be rock solid. But it's
         | the thing most teams and organizations neglect to spend enough
         | time on. I often wonder why this is the case -- my first
         | mentors taught me that logical data modeling was a really
         | important skill. But I never talk about third normal form or
         | any such things with my peers.
        
         | yashap wrote:
         | I'm such a huge fan of DDD. IMO, if you only ever read one book
         | on Software Architecture, that's the one to pick.
         | 
         | A key point, though, is that you learn the right domain
         | models/abstractions over time. Refactoring is critical as you
         | gain more insight into the domain. If you're constantly
         | questioning your modelling of the domain, and refactoring
         | towards a better one, you'll end up with a great model and thus
         | a clean, understandable, easy to extend/modify system. If you
         | stick with whatever abstractions you chose at the start, when
         | you knew way less about the domain/business problems, you'll
         | likely end up with poor abstractions, and a code based that's
         | slow, tedious and error prone to modify.
         | 
         | Convincing the business that it's worth setting aside time to
         | constantly refactor towards better domain models is often the
         | hardest part, but crucial.
        
       | lliamander wrote:
       | Rule 5 seems to mirror one of my favorite insights from Alexander
       | Stepanov:
       | 
       | > In 1976, still back in the USSR, I got a very serious case of
       | food poisoning from eating raw fish. While in the hospital, in
       | the state of delirium, I suddenly realized that the ability to
       | add numbers in parallel depends on the fact that addition is
       | associative. (So, putting it simply, STL is the result of a
       | bacterial infection.) In other words, I realized that a parallel
       | reduction algorithm is associated with a semigroup structure
       | type. That is the fundamental point: algorithms are defined on
       | algebraic structures.
       | 
       | This is also exemplified in the analytics infrastructure used at
       | stripe: https://www.infoq.com/presentations/abstract-algebra-
       | analyti...
        
         | skybrian wrote:
         | But adding floating point numbers _isn 't_ associative, in
         | general. Sometimes you need to do it the right way to avoid
         | catastrophic cancellation.
         | 
         | I guess the key is to know how to deal with things that are
         | only mostly true.
        
           | Koshkin wrote:
           | That's why in C++ we have traits and overloading.
        
             | rumanator wrote:
             | Could you explain where do you see traits and overloading
             | helping you with floating point operations?
        
           | quietbritishjim wrote:
           | > But adding floating point numbers isn't associative, in
           | general. Sometimes you need to do it the right way to avoid
           | catastrophic cancellation.
           | 
           | That exactly proves his point. Systems that are associative
           | can be processed by the parallel algorithm he was thinking
           | of. Floating point numbers, if you care about their non-
           | associativity, cannot be processed by that algorithm. So the
           | validity is that algorithm depends on whether the system is
           | associative.
        
           | lliamander wrote:
           | That's true about floating point numbers. I assume that
           | depending upon the context, it may not be a big issue (e.g.
           | GPU compute)?
           | 
           | In any case, the point Stepanov was making is that if you
           | want to be able to use a certain algorithm, then you have to
           | make a choice to represent your data in a way that enables
           | that algorithm, and the way you know whether the structure is
           | appropriate for that algorithm is the algebraic properties of
           | the structure.
        
         | kens wrote:
         | In case you've wondered what a monoid is, that's a monoid.
         | Something with an associative operation (and an identity), so
         | you can do the operation on chunks in parallel, like addition.
        
           | lliamander wrote:
           | Yep. And if what you have is an Abelian Group, then you also
           | get _distributed_ computation as well (thanks to
           | commutativity).
        
             | gnulinux wrote:
             | You can distribute the computation on just a monoid as well
             | but it needs more bookkeeping. In particular, your reduce
             | function should know
             | 
             | * lhs is before rhs
             | 
             | * There is no data between lhs and rhs
        
               | dllthomas wrote:
               | One way of looking at it is that equipping our data with
               | that bookkeeping gives us something that commutes.
        
               | gnulinux wrote:
               | Hmm sure, but it is not a requirement that your
               | underlying algebraic structure should commute, so I think
               | original phrasing was misleading. The bookkeeping allows
               | you to commute _a specific_ list of objects, even though
               | the underlying operation is anti-commutative (i.e. exists
               | a,b a.b != b.a).
               | 
               | At the moment of computation, you can build a new
               | structure that commutes by enumerating the data. I guess
               | it's true that you need a commuting intermediate data
               | structure to be able to distribute.
        
             | dllthomas wrote:
             | While true, that's too strict. An Abelian group (like any
             | group) needs inverses. You get distributed computation if
             | you've got an Abelian semigroup.
        
               | Koshkin wrote:
               | To be fair, Abel did not know (or care) about semigroups.
        
               | lliamander wrote:
               | Thanks for the correction. I think that in Avi Bryant's
               | talk (that I linked to above) Stripe ended up using
               | Abelian groups for some reason, rather than Abelian
               | semigroups, though if so I forget the reason why.
        
               | dllthomas wrote:
               | Inverses don't show up as much as I'd (aesthetically?)
               | like in computing. There was an interesting application
               | here: https://www.reddit.com/r/haskell/comments/9x684a/ed
               | ward_kmet...
        
           | dllthomas wrote:
           | Every monoid is a semigroup, but it's only a monoid if there
           | is also a value that serves as an identity.
        
         | algebra-history wrote:
         | Recently I searched the Web, trying to find out the origin of
         | monoids as an approach to distributed computing, and couldn't
         | find it. This quote is a great find for me! Is this the origin?
        
         | sukilot wrote:
         | And that was 10 years before Haskell went huge in that idea.
        
           | lliamander wrote:
           | I'm not super familiar with either the C++ or Haskell
           | communities, but Stepanov's notion of Generic Programming[1]
           | certainly seems to fit with the Haskell ethos.
           | 
           | [1]http://www.generic-programming.org/
        
       | nurettin wrote:
       | These rules look silly when you know that your tight loop that
       | waits for IO or redundantly computes things needs caching. No,
       | you don't need to measure that, and you know that your tight loop
       | function is going to be the bottleneck. Everyone knows that.
       | 
       | Now it does make sense when you introduce an entire constraint
       | library instead of looping over 3-4 variables with a small search
       | space. But again, you know it is a small search space. You know
       | you don't have to optimize it.
       | 
       | I really don't get these rules.
       | 
       | Edit: Go ahead and roast me, but keep in mind I've probably been
       | there and back.
        
       ___________________________________________________________________
       (page generated 2020-08-12 23:00 UTC)