[HN Gopher] Rob Pike's Rules of Programming (1989) ___________________________________________________________________ Rob Pike's Rules of Programming (1989) Author : gjvc Score : 315 points Date : 2020-08-12 18:30 UTC (4 hours ago) (HTM) web link (users.ece.utexas.edu) (TXT) w3m dump (users.ece.utexas.edu) | cactus2093 wrote: | Interesting that the first 3 are all about performance. Which | strikes me as a bit ironic given rule #1, which could be | summarized as don't worry about performance until you have to. | game_the0ry wrote: | > Data dominates. If you've chosen the right data structures and | organized things well, the algorithms will almost always be self- | evident. Data structures, not algorithms, are central to | programming. | | :O | | Epiphany | gentleman11 wrote: | > Data dominates. If you've chosen the right data structures and | organized things well, the algorithms will almost always be self- | evident | | What does this have to say about the careers and roles of data | scientists vs programmers? A data scientists entire job is to | categorize and model data in a useful way. In the future, will | they fundamentally more important than coders, or will the two | roles just merge? | ttamslam wrote: | I think you're conflating two things: in my mind, working on | the shape of data is different than pulling inferences out of | that data. | mywittyname wrote: | > "write stupid code that uses smart objects". | | Writing stupid code is actually really difficult. | | For me, it takes a little bit of iterating before I know just the | right place to insert stupid. | lmkg wrote: | I'm reminded of the old adage "I'm writing you a long letter, | because I don't have time to write a short one." | | (Often attributed to Mark Twain, but similar sentiments were | expressed by many before him.) | RangerScience wrote: | I have two projects that I consider to have nearly-perfect | code. Both are their third iterations, and I think they're | stable at this point. | | Kinda goes: 1) Make a bad solution exploring the problem 2) | Explore a good idea for how to solve the now-understood problem | 3) Mature the good idea through usage. | dnautics wrote: | Do you use TDD? I'm not religious about it in general, but when | I'm lost, confused, and easily distracted, I start with TDD to | write the dumbest possible code. | sukilot wrote: | TDD is good for features (like web apps) but not so much for | algorithms. | | The difference is that you only need to support a tiny | fraction of possible features / use cases, but your | algorithms need to be correct for a wide range of inputs. | dnautics wrote: | I disagree. You code your algo for one input, come up with | a corner case, write a failing test, refactor, repeat. | | For an algo, let's say it operates on a list, I'll start | with test f([]) == 0, and implement f to output the | constant 0. | | And then go from there. | mywittyname wrote: | No. | | It's really an issue of not being sure at first what needs to | be flexible & data-driven vs handled in code. If make | everything data driven, then it becomes this horrible mess | where your input is basically a program and your actual code | ends up being a terrible interpreter. | | I tend to just build things bottom-up, and start with a small | bit of functionality, then when I have enough small bits, I | bolt them together and decide what I need to abstract at that | point, do refactoring on the smaller bits and provide data to | them from the caller. Then repeat that continuously until I | have all of the functionality I need. | | It might be different for other people, but I need to have | working code before I can it abstract. | gridlockd wrote: | All this advice against "premature optimization" has created | generations of programmers that don't understand how to use | hardware efficiently. | | Here's the problem: If you profile software that is 100x slower | than it needs to be on every level, _there are no obvious | bottlenecks_. Your whole program is just slow across the board, | because you used tons of allocations, abstractions and | indirections at every step of the way. | | Rob Pike probably has never written a program where performance | _really_ mattered, because if he did, he would 've found that you | need to think about performance right from the beginning and all | the way during development, because making bad decisions early on | can force you to rewrite almost everything. | | For instance, if you start writing a Go program with the mindset | that you can just heap-allocate all the time, the garbage | collector will eventually come back to bite you and the | "bottleneck" will be your entire codebase. | byronr wrote: | > Rob Pike probably has never written a program where | performance really mattered | | Rob Pike has written window system software which ran in what | now would be called a "thin client" over a 9600 baud modem and | rendered graphics using a 2MHz CPU. He probably knows a thing | or two about performance tuning. | Areading314 wrote: | These resonate, especially #1, but I'm not so sure about #5. | Although it makes sense to choose good data structures, I don't | think that guarantees a simpler algorithm. For example you can | store your data in a heap (tree), and still need to write a tree | traversal algorithm to print out the elements in order. | jorangreef wrote: | Rule 1 and 2 depend on context, whether you're working on an | existing program or a new program. They can be true or false. | They can really help or they can really hurt. Are you going into | an existing system to do performance optimization? Sure, don't | guess, measure. Are you designing a new system? Throw out those | parroted premature optimization mantras... you are responsible | for designing for performance upfront. You will always measure | but depending on context you will design for speed first and then | test your prototype with measurements. There's no way around an | initial hypothesis when you're designing new systems. You have to | start somewhere. That's where Jeff Dean's rule always to do back | of the envelope guesses will pay off in orders of magnitude, many | times over. | | Rule 3 and 4 are gold and always true. | | Rule 5 is the key to good design. | dogbox wrote: | > Rule 5. Data dominates. If you've chosen the right data | structures and organized things well, the algorithms will almost | always be self-evident. Data structures, not algorithms, are | central to programming. | | Is "data structures" the correct term here? Assuming I'm not | misinterpreting, the usage of "data structures" can be misleading | - one usually thinks of things like BST's and hash tables, which | are inherently tied to algorithms. I feel like "data modeling" | better captures the intended meaning here. | [deleted] | sethammons wrote: | A quote from one of our founders that I've always liked: | | If you make an optimization that was not at a bottleneck, you did | not make an optimization. | renewiltord wrote: | Read _The Goal_ by Eliyahu Goldratt. While it 's possible your | founder came upon the idea independently, this is one of many | that are repeated in that book. It's relatively short and | entertaining to read and has definitely survived the 36 years | since first publishing quite well. | mplewis wrote: | This was adapted into a novel about IT/devops called The | Phoenix Project. It's an excellent read. | NewEntryHN wrote: | You made an optimization for the future when enough bottlenecks | have been fixed such that this one part becomes the bottleneck. | chubot wrote: | Except that there are infinite such non-bottlenecks, and all | the effort you spend on there is effort not spent on the real | bottlenecks. | | In other words, all engineering is time- and cost- | constrained. Anybody can build a good chair for $10,000 or a | good PC for $100,000. Doesn't mean it's good engineering. | jorangreef wrote: | "Anybody can build a good chair for $10,000 or a good PC | for $100,000." | | And some people can build a great PC for $1,000 that runs | circles around the good PC for $100,000. | | There's so much more to engineering than thinking in terms | of time and cost constraints. Those are real constraints, | but they're not the most important. | | Engineering is design. If you have good design, good | insight, you can do things that people with infinite time | and budget could never dream to achieve. You can start | making a product that's a hundred times more powerful for a | tenth of the price in a fraction of the time. If you don't | have good design, good insight, then no amount of time or | budget can help you. | nix23 wrote: | >not spent on the real bottlenecks | | BE LOGICAL! Of course you first fix the big bottlenecks. | | >good PC for $100,000. Doesn't mean it's good engineering. | | Of course it is...or can you gold platter a pc case? | Koshkin wrote: | Yes they can indeed: https://blogs.systweak.com/someone- | has-built-a-gigantic-1000... | tjalfi wrote: | There are several assumptions that are far from a given with | premature optimization. | | 1. Adding the optimization didn't make the code more | complicated. | | 2. Adding the optimization didn't introduce a bug. | | 3. This part of the code will be a bottleneck in the future. | The time spent optimizing is a write-off if the project is | canceled or that portion is replaced. | hirundo wrote: | That's less true when you're paying the cloud for compute by | the second. | nix23 wrote: | That's a bit broad i think, talking about pure speed of your | task you are right, talking about energy consumption then it's | not always the case. Or small but often repeated task should be | optimized no mater if they are bottlenecks, when the system | grows they will become bottlenecks, optimize like a Vulcan is | what my boss once said...be logical and nothing else (my | interpretation) | jorangreef wrote: | "optimize like a Vulcan"... classic! | nix23 wrote: | Best boss ever!! He even had a bottle of his best whisky | refilled in a bottle called "Saurian brandy", so everyone | who said that this is illegal or ohh that's "start trek" | got one...well not a bottle but a glas :) | atombender wrote: | Not all optimization candidates are about bottlenecks. Reducing | allocation is also optimization, for example. | erik_seaberg wrote: | Peak memory or garbage collection throughput can become a | bottleneck. But if you know you have more memory than you | need, further reducing allocation is arguably a waste of your | time. | | This can become a tragedy of the commons in desktop and | mobile apps, where you don't know how much memory the end | user has or needs, but you do know _you_ aren 't paying for | it. | svec wrote: | Pike himself says "there's a 6th rule": | https://twitter.com/rob_pike/status/998681790037442561?lang=... | | (6. There is no Rule 6.) | | And points to the best source he finds for it on the web: | http://doc.cat-v.org/bell_labs/pikestyle | tinco wrote: | dang if this sticks to the frontpage, can you change the title | to "Rob Pike's 5 rules of programming _in C_ "? As the title is | now it misrepresents Rob Pikes words. | ed_elliott_asc wrote: | When people say worry about the data structures, what do they | mean? | jonfw wrote: | "Write stupid code that uses smart objects" | | That's a good one. It's amazing how much complexity can be | created by using the wrong abstractions. | chubot wrote: | FWIW I find this is especially important for compilers and | interpreters. | | It's not an exaggeration to say that such programs are | basically big data structures, full of compromises to | accomodate the algorithms you need to run on them. | | For example LLVM IR is just a big data structure. Lattner has | been saying for awhile that a major design mistake in Clang is | not to have its own IR (in the talks on the new MLIR project). | | SSA is data structure with some invariants that make a bunch of | algorithms easier to write (and I think it improves their | computational complexity over naive algorithms in several | cases) | | ---- | | In Oil I used a DSL to describe an elaborate data structure | that describes all of shell: | | _What is Zephyr ASDL?_ | http://www.oilshell.org/blog/2016/12/11.html | | https://www.oilshell.org/release/0.8.pre9/source-code.wwz/fr... | | I added some nice properties that algebraic data types in some | language don't have, e.g. variants are "first class" unlike in | Rust. | | Related: I noticed recently that Rust IDE support has a related | DSL for its data structure representation: | https://internals.rust-lang.org/t/announcement-simple-produc... | mamcx wrote: | > FWIW I find this is especially important for compilers and | interpreters. | | Totally. I'm building a relational language and start to get | very obvious why RDBMS not fit certain purity ideals of the | relational model (like all relations are sets, not bags). | | I'm stuck in deciding which structures provide by default. | Dancing between flat vectors or ndarrays or split between | flat vectors (columns), and HashMaps/BTree with n-values | (this is my intuition now). | | --- > I added some nice properties that algebraic data types | in some language don't have, e.g. variants are "first class" | unlike in Rust. | | This sound cool, where I can learn about this? | chubot wrote: | FWIW I found this post thought provoking in thinking about | data models of languages. | | https://news.ycombinator.com/item?id=13293290 | | --- | | About first class variants: | | https://lobste.rs/s/77nu3d/oil_s_parser_is_160x_200x_faster | _... | | https://github.com/rust-lang/rfcs/pull/2593 | | Another way I think of this is "types vs. tags": https://oi | lshell.zulipchat.com/#narrow/stream/208950-zephyr-... | (Zulip, requires login) | | Basically variant can types stand alone, and have a unique | tag. Tags are discriminated at RUNTIME with "pattern | matching". | | But a variant can belong to multiple sum types, and that's | checked statically. This is modeled with multiple | inheritance in OOP, but there's no implementation | inheritance. Related: | https://pling.jondgoodwin.com/post/when-sum-types-inherit/ | | So basically in the ASDL and C++ and Python type system I | can model: | | - a Token type is a leaf in an arithmetic expression | | - a Token type is a leaf in an word expression | | But it's not a leaf in say what goes in a[i], or dozens of | other sum types. Shell is a big composition of | sublanguages, so this is very useful and natural. Another | construct that appears in multiple places is ${x}. | | So having these invariants modeled by the type system is | very useful, and actually C++ and MyPy are surprisingly | more expressive than Rust! (due to multiple inheritance) | | Search for %Token here, the syntax I made up for including | a first class variant into a sum type: | | https://www.oilshell.org/release/0.8.pre9/source- | code.wwz/fr... | | There is a name for the type, and a name for the tag (and | multiple names for the same integer tag). Tags (dynamic) | and types (static) are decoupled. | chrisweekly wrote: | Yes! We should replace DRY (don't repeat yourself) with AHA | (avoid hasty abstractions), as the dominant rule of thumb. | jaggederest wrote: | I frequently find that, when refactoring especially, you find a | lot of big ol' god objects that are incomprehensible. But when | you break them down into 5-10 small objects, suddenly the | operation they were trying to do makes perfect sense. | aciswhat wrote: | mmmm very true with Redux --> context+hook state | munificent wrote: | _> Rule 1. You can 't tell where a program is going to spend its | time. Bottlenecks occur in surprising places, so don't try to | second guess and put in a speed hack until you've proven that's | where the bottleneck is._ | | I agree with the general thrust of this. But it's worth pointing | out that often the easiest way to prove where a bottleneck is (or | at least) isn't, is to try an optimization and see if it helps. I | like profiling tools immensely, and this kind of trial an error | optimization doesn't scale well to widespread performance | problems. But there's something to be said for doing a couple of | quick optimizations as tracer bullets to see if you get lucky and | find the problem before bringing in the big guns. | | The last three rules bug me. I wish we had a name for aphorisms | that perfectly encapsulate an idea _once you already have the | wisdom to understand it_ , but that don't actually _teach_ | anything. They may help you remember a concept--a sort of | aphoristic mnemonic--but don 't _illuminate_ it. The problem with | these is that espousing them is more a way of bragging ( "look | how smart I am for understanding this!") than really helping | others. | | For example: | | _> Rule 5. Data dominates. If you 've chosen the right data | structures and organized things well, the algorithms will almost | always be self-evident. Data structures, not algorithms, are | central to programming. _ | | OK, well what are the "right" data structures? The answer is "the | ones that let you perform the operations you need to perform | easily or efficiently". So you still need to know what the code | is doing too. And the algorithms are only "self-evident" because | you chose data structures expressely to give you the luxury of | using simple algorithms. | dkarl wrote: | > Rule 1. You can't tell where a program is going to spend its | time. Bottlenecks occur in surprising places, so don't try to | second guess and put in a speed hack until you've proven that's | where the bottleneck is. | | I wish people would follow this rule and just let stuff work. I | recently encountered the most extreme version of this I've ever | seen in my career: a design review where a guy proposed a Redis | caching layer _and_ a complex custom lookup scheme for a <1GB, | moderate read volume, super low write volume MySQL database. And | of course he wants to put the bulk of the data in JSON fields and | manage any schema evolution in our application code. | | Can't we just let stuff work? I'm no fan of MySQL, but can't we | admit that a ubiquitous and battle-tested piece of technology, | applied to a canonical use case, on tiny data under near-ideal | circumstances, is probably going to work just fine? At least give | it a chance before you spend days designing and documenting a | bunch of fancy tricks to save MySQL from being crushed under a | few megabytes of data. | maxk42 wrote: | This is particularly exasperating for me. I can't tell you how | many times in my professional career I've ended up speeding up | systems by removing two or three layers of improperly- | implemented "caching" and using good ol' MySQL and a basic | understanding of algorithmic time complexity to simplify | things. | flukus wrote: | Me too. I've seen a few systems that replaced their simple | request requiring 10,000 queries "optimized" by requiring | 10,000 cache lookups when they should have just added some | joins. The bottleneck is the network latency, not the | database. The worst I've seen is an nHibernate cache stored | in a session variable, half the database was being | serialized/deserialized on every http request. Fortunately | that was a small database. | | Even with in memory caches I've seen systems grind to a halt | by death of a thousand cuts, dictionary based entity | attribute systems where each attribute is looked up | individually. There seems to be a mentality that constant | lookup == free lookup and devs don't seem to realize that | constant * $bignumber == $biggerNumber. Caching shouldn't be | granular. | | Obligatory latency numbers every programmer should know: | https://gist.github.com/jboner/2841832 | farhaven wrote: | Do you work at my company? Because I have a coworker that is | always proposing _exactly_ that solution. No matter what issues | the code has, for him its usage of MySQL is always "the worst | moment". | | Asking for benchmark just gets a repeat of "our worst moment is | MySQL and we can solve that with some NoSQL cache". | rubyn00bie wrote: | > Rule 5. Data dominates. If you've chosen the right data | structures and organized things well, the algorithms will almost | always be self-evident. Data structures, not algorithms, are | central to programming. | | That one hits me in the feels because I think a lot of folks | focus on algorithms (including myself), and code patterns, before | their data and as a result a lot of things end up being harder | than they need to be. I've always liked this quote from Torvalds | on the subject speaking on git's design (first line is for some | context): | | > ... git actually has a simple design, with stable and | reasonably well-documented data structures. | | then continues: | | > In fact, I'm a huge proponent of designing your code around the | data, rather than the other way around, and I think it's one of | the reasons git has been fairly successful [...] I will, in fact, | claim that the difference between a bad programmer and a good one | is whether he considers his code or his data structures more | important. Bad programmers worry about the code. Good programmers | worry about data structures and their relationships. | | When I have good data structures most things just sort of fall | into place. I honestly can't think of a time where I've | figuratively (or literally) said "my data structure really whips | the llamas ass" and then immediately said "it's going to be | horrible to use." On the contrary, I _have_ written code that is | both so beautiful and esoteric, its pedantry would be lauded for | the ages-- had only I glanced over at my data model during my | madness. No, instead, I awaken to find I spent my time quite | aptly digging a marvelous hole, filling said hole with shit, and | then hopping in hoping to not get shitty. | | One thing that really has helped me make better data structures | and models is taking advanced courses on things like multivariate | linear regression analysis specifically going over identifying | things like multicolinearity and heteroskedasticity. Statistical | tools are incredibly powerful in this field, even if you aren't | doing statistical analysis everyday. Making good data models | isn't necessarily easy, nor obvious, and I've watched a lot of | experienced folks make silly mistakes simply because they didn't | want something asinine like _two_ fields instead of one. | gen220 wrote: | It makes sense, when we bring in another aphorism "code _is_ | data ". It's easier to write good code with good libraries. And | it's easier to write good data models that extend good data | models. The main distinction is that code is very dynamic, | flexible, and malleable, whereas data models need not be. | | Data models are the "bones" of an application, as part of the | application as code is. Data models fundamentally limit the | application's growth, but if they're well-placed, they can | allow you to do things that are really powerful. | | You always want to have good bones. But the Anna Karenina | Principle is a thing [0]. | | So, applying this, I think baby ideas should not have many | constraints on the bones, to allow them to move around in the | future. Instead, there should be a ton of crap code | implementing the idea's constraints, because they change every | week, month, quarter, and the implementer is still learning the | domain. | | Once the implementer reaches a certain point of maturity in the | domain, all of the lessons learned writing that crap code can | be compressed into a very clever data model that minimizes the | amount of "code" necessary, and simultaneously makes the | project more maintainable, interface-stable, and extensible: in | other words, making it an excellent platform to _build on_. The | crap code can be thrown out, because it was designed to | halfway-ensure invariants that the database can now take care | of. | | I think most software we consider "good" these days followed | this development cycle. multics -> unix, <kversion_control> -> | git, ed -> vi -> vim. | allover wrote: | The counter argument would be that git is the poster-child of | poor UX, which could be blamed on the fact that it exposes too | much of its internal data structure and general inner-workings | to the user. | | I.e. too much focus has been put on data structures and not | enough on the rest of the tool. | | A less efficient data structure, but more focus on UX could | have saved millions of man hours by this point. | wtetzner wrote: | Or perhaps learning Git just requires a different approach: | you understand the model first, not the interface. Once you | understand the model (which is quite simple), the interface | is easy. | rabidrat wrote: | People keep repeating this, but it's not true. The | interface has so many "this flag in this case" but "this | other flag in that case" and "that command doesn't support | this flag like that" etc. There's no composability or | orthoganality or suggestiveness. It's nonsensical and | capricious and unmemorable, even though I understand the | "simple" underlying model and have for years. | gen220 wrote: | It's difficult, because git's exposition of it's data | structures _enables_ you to use it in ways that would not | otherwise be possible. | | I think git is more of a power-tool than people sometimes | want it to be. It's more like vi than it is like MS Word, but | it's ubiquity makes people wish it had an MS-word mode. | | So, I think that it's hard to fault git's developers for | where it is today. It's a faithful implementation of it's | mission. | | FWIW, I have never used a tool with better documentation than | git in 2020 (it hasn't always had good --help documentation, | but it absolutely does today). | dexen wrote: | It's worth noting the same holds true for UI: data dominates. | Design your widgets, layout, and workflow around the data. | chrisweekly wrote: | Amen. My 23 years experience in webdev says React (the | paradigm, not the lib per se) is dominating web UI precisely | because it is all about unidirectional data flow. | rubyn00bie wrote: | > It's worth noting the same holds true for UI: data | dominates. Design your widgets, layout, and workflow around | the data. | | I couldn't agree more. | | I think the current state of UI programming is like the | pathological case to be honest. Too often folks are concerned | with representing their database 1-to-1 in their UI instead | of representing their view. | | If anyone is suffering from brittle UI code, where somehow | caching issues and stale data are affecting your application, | this is very likely why. You have muddled your persistence | and view concerns together and it's not manageable or pretty. | What this means for folks using something like React- don't | directly use your persistence models in your views, create | "view models" which directly represent whatever the hell it | is you're trying to display. Bind your data in your view | models, and not your views, and then pass the view model in | as props. | commandlinefan wrote: | > Tony Hoare's famous maxim "Premature optimization is the root | of all evil." | | Actually that was Donald Knuth - it's an urban legend that it's | an urban legend that it was originally Knuth. Hoare was quoting | Knuth, but Knuth forgot he said it, and re-mis-attributed the | quote to Hoare. | RoutinePlayer wrote: | This reminds me of that Woody Allen joke about someone | translating all the T.S. Eliot's poems into English after some | vandals had broken into the school library and translated them | into French. | karmakaze wrote: | And it is usually quoted out of its context. | | "We should forget about small efficiencies, say about 97% of | the time: premature optimization is the root of all evil. Yet | we should not pass up our opportunities in that critical 3%." | eps wrote: | It's also often interpreted literally. | | Premature _complex_ optimization is a bad idea, but simple | (read, cheap to code) optimization for common bottleneck | patterns is a perfectly reasonable thing to do. | hombre_fatal wrote: | I don't think that changes the meaning. Once that 3% matters | to you and you've invested the work to measure that 3%, it's | not premature anymore. | | That "premature" and "optimization" are undefined and left up | for debate is what makes it trite. | karmakaze wrote: | Ha. So the truth is that Knuth did quote Hoare, not aware that | he was quoting Knuth--indirectly Knuth was quoting himself. | bendbro wrote: | I've always been uncomfortable with these kinds of ideas. The | odds that the idea will be correctly applied is heavily tied to | intelligence, culture, and situation. Instead of reducing the | space of options you must consider, all it says is that you | should "do it this way when you should and do it the other when | you shouldn't." I suppose perhaps it is useful to highlight | that the decision exists, but I would be surprised if anyone | working in the space is unaware of the existence of the | decision. | | The scientific method has a similar problem. A scientist should | form their hypothesis before gathering data to evaluate the | hypothesis. If a scientist fails to do this, and starts | engaging in p-hacking or data dredging, the quality of their | research greatly declines. But proving that a hypothesis was | obtained before data was collected is not usually provable when | just looking at the publication itself. And further, there are | ways that data dredging can unintentionally sneak into the | scientific process, especially around the phase before | hypothesis- observation. | | This kind of idea has large technical impact, but doesn't have | a solid technical reason. It's proof is closer to aesthetics | than reason. And much like other aesthetic beliefs, a | population believes it based on no deeper reasoning. Only | exclusion or indoctrination can ensure the population's view, | and only illogical rhetoric will change it. | RcouF1uZ4gsC wrote: | One of the big problems with fancy algorithms is that they either | access data out of order and/or do pointer chasing. Simple | algorithms tend to access the data in order. | | CPU's have a lot of logic for making in order data access very | fast. | [deleted] | sam_lowry_ wrote: | "Bad programmers worry about the code. Good programmers worry | about data structures and their relationships." | | -- Linus Torvalds | andrewl wrote: | In _The Mythical Man Month_ Fred Brooks said "Show me your | flowchart and conceal your tables, and I shall continue to be | mystified. Show me your tables, and I won't usually need your | flowchart; it'll be obvious." | | I first read that on Guy Steele's site: | http://www.dreamsongs.com/ObjectsHaveNotFailedNarr.html | vishnugupta wrote: | > Show me your tables, and I won't usually need your flowchart | | A couple of years ago I spent quite some time trying to | evaluate the tech stack (and general engineering culture) of | merger/acquisition targets of my employer. It was quite a fun | exercise, all said and done. I encountered all sorts; from a | small team start up who had their tech sorted out more or less | to a largish organisation who relied on IBM's ESB which exactly | one person in their team knew how it worked!! | | I discovered this exact method during the third tech evaluation | exercise. When the team began explaining various modules top- | down and user-flows etc., I politely interrupted them and asked | for DB schema. It was just on a whim because I was bored of | typical one way session interrupted by me asking minor | questions. Once I had a hang of their schema rest of the | session was literally me telling them what their control and | user flows were and them validating it. | | Since then it's become my magic wand to understand a new | company or team. Just go directly to the schema and work | backwards. | | Conversely, I've begun paying more attention to data modelling. | Because once a data model is fixed it's very hard to change and | once enough data accumulates the inertia just increases and | instead if changing the data model (for the fear of data | migration etc.,) the tendency is to beat the use cases to fit | the data model. It's not your usual fail-fast-and-iterate | thing. | gumby wrote: | That's Dick Gabriel's site; he posted gls's essay there with | attribution (so you didn't realize which site it is). He and | quux are friends and collaborators. | screye wrote: | as someone in ML, I see myself wanting the opposite. | | ML researchers drown their algorithms in huge tables of | results, effectively spending time on "how well" rather than | the "what". | | It often leads to things being added as long as they are | better, with the conclusion of it being a gargantuan monster of | models and hand-engineered changes. All with no one | understanding how the whole things works as a single unit. | | Flow charts are incredibly effective as the top most layer of | abstraction. Does the whole process, when viewed in an | end-2-end manner, make sense ? We dive into the details only if | it passes that sniff test of a flow chart. | | I might be missing the point being made here, but they can claw | flowcharts from my cold dead hands. | dllthomas wrote: | When Brooks says tables, I believe he means the internal data | representation, rather than "tables of results". | everybodyknows wrote: | "Flowchart" has historically, in Brook's time, meant "flow- | of-control chart", and these usually degenerate into vast | webs of minutia -- useless as abstractions. | | But perhaps you meant "flow-of-data between structures" -- in | which case we have agreement on engineering, but a muddle on | semantics. | rjsw wrote: | >I first read that on Guy Steele's site. | | It isn't Guy Steele's website. That page was written by him but | the website is owned by Richard P Gabriel. | monocasa wrote: | Resaid by Linus with a bit more modern nomeclature (and Linus's | trademark bluntness): | | > Bad programmers worry about the code. Good programmers worry | about data structures and their relationships | mgkimsal wrote: | > Bad programmers worry about the code | | And yet, I see a whole swath of the industry hyper-focused on | various linters/styling/rules. | rumanator wrote: | > And yet, I see a whole swath of the industry hyper- | focused on various linters/styling/rules. | | It seems to me that what you're actually seeing is an | entire industry trying to eliminate all code-related | issues, specially bike-shedding ones. | | This is patently obvious to anyone who was forced to waste | their time in code review iterations discussing, say, where | a brace should go and how many spaces someone should have | added. | Quekid5 wrote: | This usually a good time to apply the When In Rome rule. | Do not reformat needlessly, follow the code style of the | code you're modifying. Done. | | (If multiple people are arguing back and forth in code | review -- when following the WIR rule -- tell them about | the WIR rule and that should settle it. If not, you have | bigger problems in your team.) | Osiris wrote: | I completely agree. One reason I like prettier is that it | only has about 6 options you can change. It removes the | bike-shedding. Just let it do it's thing and worry about | more important things. | | It also remove all debate in PRs about style and | formatting. | | (note: before prettier, I was fairly particular about how | I formatted my code, and I disagreed with prettier in | some cases, but now, I love having one less thing to | think about) | s17n wrote: | Nobody was ever "forced to waste their time" on this | stuff. I have a simple rule - I don't comment on other | people's style, and if people comment on my style, I just | go with their suggestions. Problem solved. | ses1984 wrote: | You've never been on a team with two people with opposing | opinions I guess. | Quekid5 wrote: | Just out of morbid curiosity... have you actually | experienced multiple 'seniors' giving conflicting code | review comments about code _style_ (of all things)? | | That sounds quite dysfunctional. | | (EDIT: Sure, nitpicks may differ, but...) | lallysingh wrote: | Every have more than one reviewer in your CR? | johnisgood wrote: | > I just go with their suggestions. | | Why though? I am not going to go with suggestions if they | make the code less readable for me! | burke wrote: | Bad programmers inflicting their worry upon the others. | Koshkin wrote: | And that's because it's Bad Programmers who need help! | mgkimsal wrote: | but... they need help on data structures/relations and up | front thinking about those issues, not where curly braces | should go, or tabs-v-spaces. | wvenable wrote: | That stuff is hard! Better to just shove your data | somewhere unstructured and then you don't have to worry | about data structures and relations. | Benjammer wrote: | Right, because the hyper-focus on linting of the industry | is the symptom here, it's not a misguided treatment for | the underlying problem of bad programmers. | rapind wrote: | ... and "rules" for programming. | [deleted] | mathattack wrote: | An early mentor put it as "learn the data, which won't change, | before learning the fancy stuff on top, which will" | | That carried me very well. | rumanator wrote: | Plenty of professional developers would benefit greatly if | they read Domain-Driven Design. | badrequest wrote: | There are actually six rules! | ChrisMarshallNY wrote: | I've always taken a very practical, results-oriented approach to | software development. | | That makes sense, to me. | | One of the first things that we learned, when optimizing our | code, was to _use a profiler_. | | Bottlenecks could be in very strange places, like breaking L2 | caches. That would happen when data was just a bit too big, or a | method was called; forcing a stack frame update. | | We wouldn't see this kind of thing until we looked at a profiler; | sometimes, a rather advanced one, provided by Intel. | karl11 wrote: | Everyone building no code tools is learning or will learn that | the problem most businesses have is not a lack of coding skill, | or the inability to build the algorithm, but rather how to | structure and model data in a sensible way in the first place. | throwaway894345 wrote: | Modeling the data and structuring the program are indeed the | harder tasks, but orgs have lots of smart people who have those | skills but not the familiarity with various existing syntaxes | and standard libraries and so on that a programmer learns over | the decades of their career. Further, those same orgs probably | have many people with experience in the latter but without any | special ability to think abstractly. This significantly limits | the ability to create tools. Further, the no code tools often | abstract at a more appropriate level than general purpose | programming languages' standard libraries because these tools | aren't trying to be general purpose (at least not to the same | degree as general purpose programming languages). Lastly, I've | seen business people use certain no code tools to build | internal solutions quickly that would have taken a programmer | considerable (but not crazy) time to crank out, especially | considering things like CI/CD pipelines, etc. Nocode won't | replace Python, but it serves a valuable niche. | tmaly wrote: | If no code tools are anything like ORMs, there will be some | interesting surprises when one encounters non-normalized data | structures. | OliverJones wrote: | In long-lived systems (systems that run for many years) it's | almost impossible to choose the "right data structures" for the | ages. The sources and uses of your data will not last nearly as | long as the data itself. | | What to do about this? Two things: | | STORE YOUR TIMESTAMPS IN UTC. NOT US Pacific or any other local | timezone. If you start out with the wrong timezone you'll never | be able to fix it. And generations of programmers will curse your | name. | | Keep your data structures simple enough to adapt to the future. | Written another way: respect the programmers who have to use your | data when you're not around to explain it. | | And, a rule that's like the third law of thermodynamics. You can | never know when you're designing data how long it will last. | Written another way: your kludges will come back to bite you in | the xxx. | aserafini wrote: | Sometimes storing in UTC is simply not correct. For example a | shop opening time. The shop opens 10am local time, whether DST | or not. Their opening time is 10am local time all year but | their UTC opening time actually changes depending on the time | of year! | wtetzner wrote: | But a shop opening time is not a timestamp, so I think the | original advice is still good. A timestamp is the time at | which some event happened, which is different than a | date/time used for specifying a schedule. | | For example, if you wanted to track the history of when the | shop actually opened, it would make sense to store a UTC | timestamp. | wvenable wrote: | I made that mistake early my career following this exact | advice and I ended up with a lot things were randomly 1 hour | off depending on when the record was created and the date | entered. | Benjammer wrote: | Totally. "Store everything in UTC" is just another flavor of | "pick a timezone to store everything." In a lot of cases, you | probably need to go ahead and just store the fully qualified | date including timezone/offset for each record. | karmakaze wrote: | The most interesting case of this I encountered was for photo | 'timestamps' on a global sharing site. UTC was being used and | I was proposing a change to local time. There was great | debate as many drank the UTC juice and stopped thinking. | | It was when I showed them that we also have a 'shot at' | location then proceeded to show Christmas eve photos showing | the UTC time converted to the viewers local timezone (not | always evening, not always Dec 24) alongside where the photo | was taken. Just as in space-time a photo needs both a time | and a place. | terandle wrote: | Am I wrong to avoid writing O(n^2) code if at all possible when | it is fairly easy to use hash tables for a better time | complexity? Sure when n is small the O(n^2) one will be faster | but when n is small /anything/ you do is fast in absolute terms | so I'm trying to not leave traps in my code just waiting for n to | get bigger than initially expected. | bluGill wrote: | That depends. | | If you are writing the code yourself then the maintenance costs | of everyone else after you trying to understand it makes it | wrong. However most programming languages have generic | programing features such that you can just use an algorithm and | so you aren't writing either the algorithm. In that case the | code for the fast hash is equal to the O(n^2) code and so of | course you select the faster one as a application of don't | prematurely pessimise your code rule. If your programing | language doesn't already have built in generic algorihtms for | your data, then you are using the wrong language (unless your | job is to write the generic algorithms for the language in | which case this doesn't apply because you can assume your | algorithm will be used in a performance critical part at some | time) | [deleted] | fabian2k wrote: | I would not read these rules as anything like suggesting to | write O(n^2) code. I think at best I'd read them as favoring | O(n) instead of (1) for small n, but I really think that | "fancy" algorithms in this case doesn't mean obvious | optimizations like this. | | If you can't guarantee n is small, I think it is entirely | sensible to use a dictionary/hash table instead of looping over | an array or list. As long as n is small the overhead is | probably irrelevant, and it'll prevent surprises if n gets | large as you said. And if the difference actually matters, you | get back to rule 1 and 2 and measure first anyway. | mav3rick wrote: | Using array based data structures also get you more cache hits. | For smaller data sets, this may well be faster than a fancy | algorithm. | bluGill wrote: | This is true only if you are iterating over the entire array | often. If you only rarely need to access one data member the | array will not be in cache and so you have less to load from | memory. Depending on how your data is structured the array | may or may not save time even in small sizes. | mav3rick wrote: | Of course temporal and spatial locality are important. | coliveira wrote: | It depends. If you don't need to worry about performance, then | it will work. But in some performance-dependent situations, the | brute force algorithm will win over the hash table, and the | only way to know is measuring which one is better. | fasterpython wrote: | If it's fairly easy, then I think it still fits the spirit of | the rules. KISS and all. | | On the other hand, the idea that one might be setting traps is | slightly weird... if you _know_ 90% that n will be large then | pick an algorithm that's efficient (and since it's easy to | implement, it's a win-win). If n is always going to be small, | then does the choice really matter? | naet wrote: | If N is known small you should favor simplicity and | readability over complex optimization. | dimitrios1 wrote: | Yes. If you don't measure, you will never know when the time | complexity moves in your favor due to the constant, and other | factors. For example, quicksort moves to a O(n^2) algorithm | (insertion sort) for the last few iterations of a branch of | work(typically n=1000) because this reduces the total sort | time. | dragontamer wrote: | > Am I wrong to avoid writing O(n^2) code if at all possible | when it is fairly easy to use hash tables for a better time | complexity | | Are you sure that std::unordered_map is faster than | std::vector? Did you measure? | | Every time you access an element in std::vector, you also | access nearby ones (thanks to L1 cache, as well as CPU- | prefetching of in-line data). | | In contrast, your std::unordered_map or hash-table has almost | no benefits to L1 cache. (It should be noted that linear- | probing, despite being a O(N^2) version of hash-tables worst- | case, is actually one of the better performers due to L1 cache | + prefetching) | zabzonk wrote: | Also, creating a hash isn't free. And often ordering is | required. | dragontamer wrote: | > Also, creating a hash isn't free. | | Hmmm... I argue that the hash is nearly free actually. | | An unordered_map traversal is probably DDR4 latency bound. | That's ~50-nanoseconds (200 clock ticks) per access. What's | the CPU going to do in that time? | | Well, spending 10 to 20 clock ticks on a typical hash | algorithm is fine. Then it will wait the other 180 clock | ticks for RAM. If you got hyperthreading, maybe the CPU | will go to another thread and do meaningful work while | waiting for RAM... but... I think you get the gist. | | Even IF the hash were free, the CPU is waiting for RAM | anyway. So you got plenty of time to make that hash | worthwhile. Even an integer division/modulo operator (worst | case ~80 clock ticks) can fit in there while waiting for | RAM, with plenty of room to spare. | | I guess if everything was in L1 cache, the story is a bit | different. A lot of "depends", depends on the data, the | access frequency, etc. etc. | krzat wrote: | Worrying about performance of small collections is premature | optimization. | | Using maps or sets nowadays is mostly for clarity, as they | are used to solve certain kind of problems. | dragontamer wrote: | I agree with you. But what you're talking about is | completely different from what I was responding to | originally. | | If you need a set, use a set. But don't assume that its | faster than a std::vector. | | Even then, std::vector has set-like operations through | binary_search or std::make_heap in C++, so it really isn't | that hard using a sorted (or make_heap'd) std::vector in | practice. | | -------- | | Even if you don't plan on doing optimization work, its | important to have a proper understanding of a modern CPU. | The effects of L1 and prefetching are non-trivial, and make | simple arrays and std::vectors extremely fast data | structures, far faster than compared to 80s or 90s | computers anyway. A lot of optimization advice from the | past has become outdated because of the evolution of CPUs. | | So its important to bring up these changes in discussion, | from time to time, to remind others to restudy computers. | Things change. | mywittyname wrote: | Keep in mind that a lot of this was written during an era where | even modestly complex data structures/algorithms had to be | rolled by hand. | | The philosophy is really to not waste time implementing | optimizations that may not be necessary. Naturally you should | reach for the best tool you have in your tool box. So if you | language of choice has a hashmap that can be used with no | additional work, go for it. But don't wait two days rolling | your own red-black tree because it might be better. | yashap wrote: | I think, in most cases, if you have n items in memory, and you | want to find m of them by id, or some such thing, your default | should be to represent the n items as a hashmap, not a list, | and look them up from the hashmap. There will be cases where | this isn't the right choice, but it's a good default. And it's | almost always virtually no extra complexity to represent them | this way, i.e. often something as simple as: myMap = | myList.groupBy(_.id) | | Don't write complex optimizations until you know you need them, | but I think defaulting to code with good O(N) complexity, where | it's simple to do so, is a good default. | BurningFrog wrote: | Yeah, I do well writing dumb inefficient code as default, and | optimizing it when needed, which is almost never. | | If I know beforehand we'll handle a lot of data, I can pick | something fast and complex to begin with, but that effort is | probably mostly a waste. | dtech wrote: | Use whatever makes for clearer code, unless you are certain | it's the bottleneck. | dang wrote: | If curious see also | | 2017 https://news.ycombinator.com/item?id=15265356, | | https://news.ycombinator.com/item?id=15776124 | | 2014 https://news.ycombinator.com/item?id=7994102 | | Pete_D gets credit for the date: | https://news.ycombinator.com/item?id=15266498. These rules come | from "Notes on Programming in C" | (http://www.lysator.liu.se/c/pikestyle.html), which has its own | sequence of threads: | | 2017 https://news.ycombinator.com/item?id=15399028, | | https://news.ycombinator.com/item?id=13852734 | | 2014 https://news.ycombinator.com/item?id=7728084 | | 2011 https://news.ycombinator.com/item?id=3333044 | | 2010 https://news.ycombinator.com/item?id=1887442 | bob1029 wrote: | Modeling the problem domain is so important that I dont know why | the first year of every compsci undergrad program isnt entirely | dedicated to teaching the idea. | | Instead, day 1 is installing python or java, running hello world | and talking about pointers, binary, encoding, logic gates, etc. | | We should be teaching students on day 1 that code is a liability | and to be avoided whenever it is convenient to do so. | Ozzie_osman wrote: | It turns out rule 5 (Data dominates. If you've chosen the right | data structures and organized things well, the algorithms will | almost always be self-evident) is both true but also hard. | | Eric Evans' Domain Driven Design is a good book on the topic. | Supermancho wrote: | > If you've chosen the right data structures and organized | things well, the algorithms will almost always be self-evident) | is both true but also hard. | | The problem is not the self-evident algorithm, but the delicate | implementation (or god forbid, at scale). | | Take in 1000 web requests per second. The data is all strictly | validated and has about 60 fields a record/req plus dealing | with errors. | | How does that go from webserver to ( _rolls dice_ ) kafka to a | ( _rolls dice_ ) cassandra that can be queried accurately and | timely? How much does that cost? | | Oh, that's not a programmer problem. Except it is. Creating a | fantasy niche of describing problems as data vs algorithm is | the canonical ivory tower disconnect. | veets wrote: | It seems you are arguing something different, although I am | having a hard time understanding what you have written. I | think you are saying algorithms and data structures aren't | hard, distributed systems are hard. In my experience choosing | the correct data structures and algorithms in your | services/programs/whatever can dramatically simplify the | design of your systems overall. | paloaltokid wrote: | Not only is it hard, it's the one thing that if you get it | right, your technical foundation will be rock solid. But it's | the thing most teams and organizations neglect to spend enough | time on. I often wonder why this is the case -- my first | mentors taught me that logical data modeling was a really | important skill. But I never talk about third normal form or | any such things with my peers. | yashap wrote: | I'm such a huge fan of DDD. IMO, if you only ever read one book | on Software Architecture, that's the one to pick. | | A key point, though, is that you learn the right domain | models/abstractions over time. Refactoring is critical as you | gain more insight into the domain. If you're constantly | questioning your modelling of the domain, and refactoring | towards a better one, you'll end up with a great model and thus | a clean, understandable, easy to extend/modify system. If you | stick with whatever abstractions you chose at the start, when | you knew way less about the domain/business problems, you'll | likely end up with poor abstractions, and a code based that's | slow, tedious and error prone to modify. | | Convincing the business that it's worth setting aside time to | constantly refactor towards better domain models is often the | hardest part, but crucial. | lliamander wrote: | Rule 5 seems to mirror one of my favorite insights from Alexander | Stepanov: | | > In 1976, still back in the USSR, I got a very serious case of | food poisoning from eating raw fish. While in the hospital, in | the state of delirium, I suddenly realized that the ability to | add numbers in parallel depends on the fact that addition is | associative. (So, putting it simply, STL is the result of a | bacterial infection.) In other words, I realized that a parallel | reduction algorithm is associated with a semigroup structure | type. That is the fundamental point: algorithms are defined on | algebraic structures. | | This is also exemplified in the analytics infrastructure used at | stripe: https://www.infoq.com/presentations/abstract-algebra- | analyti... | skybrian wrote: | But adding floating point numbers _isn 't_ associative, in | general. Sometimes you need to do it the right way to avoid | catastrophic cancellation. | | I guess the key is to know how to deal with things that are | only mostly true. | Koshkin wrote: | That's why in C++ we have traits and overloading. | rumanator wrote: | Could you explain where do you see traits and overloading | helping you with floating point operations? | quietbritishjim wrote: | > But adding floating point numbers isn't associative, in | general. Sometimes you need to do it the right way to avoid | catastrophic cancellation. | | That exactly proves his point. Systems that are associative | can be processed by the parallel algorithm he was thinking | of. Floating point numbers, if you care about their non- | associativity, cannot be processed by that algorithm. So the | validity is that algorithm depends on whether the system is | associative. | lliamander wrote: | That's true about floating point numbers. I assume that | depending upon the context, it may not be a big issue (e.g. | GPU compute)? | | In any case, the point Stepanov was making is that if you | want to be able to use a certain algorithm, then you have to | make a choice to represent your data in a way that enables | that algorithm, and the way you know whether the structure is | appropriate for that algorithm is the algebraic properties of | the structure. | kens wrote: | In case you've wondered what a monoid is, that's a monoid. | Something with an associative operation (and an identity), so | you can do the operation on chunks in parallel, like addition. | lliamander wrote: | Yep. And if what you have is an Abelian Group, then you also | get _distributed_ computation as well (thanks to | commutativity). | gnulinux wrote: | You can distribute the computation on just a monoid as well | but it needs more bookkeeping. In particular, your reduce | function should know | | * lhs is before rhs | | * There is no data between lhs and rhs | dllthomas wrote: | One way of looking at it is that equipping our data with | that bookkeeping gives us something that commutes. | gnulinux wrote: | Hmm sure, but it is not a requirement that your | underlying algebraic structure should commute, so I think | original phrasing was misleading. The bookkeeping allows | you to commute _a specific_ list of objects, even though | the underlying operation is anti-commutative (i.e. exists | a,b a.b != b.a). | | At the moment of computation, you can build a new | structure that commutes by enumerating the data. I guess | it's true that you need a commuting intermediate data | structure to be able to distribute. | dllthomas wrote: | While true, that's too strict. An Abelian group (like any | group) needs inverses. You get distributed computation if | you've got an Abelian semigroup. | Koshkin wrote: | To be fair, Abel did not know (or care) about semigroups. | lliamander wrote: | Thanks for the correction. I think that in Avi Bryant's | talk (that I linked to above) Stripe ended up using | Abelian groups for some reason, rather than Abelian | semigroups, though if so I forget the reason why. | dllthomas wrote: | Inverses don't show up as much as I'd (aesthetically?) | like in computing. There was an interesting application | here: https://www.reddit.com/r/haskell/comments/9x684a/ed | ward_kmet... | dllthomas wrote: | Every monoid is a semigroup, but it's only a monoid if there | is also a value that serves as an identity. | algebra-history wrote: | Recently I searched the Web, trying to find out the origin of | monoids as an approach to distributed computing, and couldn't | find it. This quote is a great find for me! Is this the origin? | sukilot wrote: | And that was 10 years before Haskell went huge in that idea. | lliamander wrote: | I'm not super familiar with either the C++ or Haskell | communities, but Stepanov's notion of Generic Programming[1] | certainly seems to fit with the Haskell ethos. | | [1]http://www.generic-programming.org/ | nurettin wrote: | These rules look silly when you know that your tight loop that | waits for IO or redundantly computes things needs caching. No, | you don't need to measure that, and you know that your tight loop | function is going to be the bottleneck. Everyone knows that. | | Now it does make sense when you introduce an entire constraint | library instead of looping over 3-4 variables with a small search | space. But again, you know it is a small search space. You know | you don't have to optimize it. | | I really don't get these rules. | | Edit: Go ahead and roast me, but keep in mind I've probably been | there and back. ___________________________________________________________________ (page generated 2020-08-12 23:00 UTC)