[HN Gopher] Making Code Faster ___________________________________________________________________ Making Code Faster Author : zdw Score : 82 points Date : 2022-06-12 16:16 UTC (1 days ago) (HTM) web link (www.tbray.org) (TXT) w3m dump (www.tbray.org) | mathiasrw wrote: | You cant make code faster. You can only make it do less things. | dr-detroit wrote: | 1vuio0pswjnm7 wrote: | Can anyone state the JSON parsing problem more concisely. For | example, | | 1. Here is the input, e.g., | https://data.sfgov.org/api/views/acdm-wktn/rows.json?accessT... | | 2. Here is the desired output, e.g., a sample showing what the | output is supposed to look like | lumost wrote: | I'm honestly skeptical if | | > Make it work, then make it right, then make it fast. | | Applies to core language choice, I used to hear a lot about | rewriting interpreted langs to something faster... but the | reality is that a team that's just spent 1 year making an app in | python aren't going to pivot to writing Java, Go, or Rust one | day. The new team isn't going to be building something new and | exciting, they are going to be on a tight timeline to deliver a | port of something which already exists. | [deleted] | secondcoming wrote: | Maybe, I mainly use python as a more sophisticated bash script, | or to prototype something I'll port to a performant language | later. | welder wrote: | Just did this, rewrote a deployed cli from Python[1] to Go[2]. | It delivered on all the expectations. Less CPU & RAM usage, | less dependence on system stuff like openssl, more diverse | platform support. | | [1] https://github.com/wakatime/legacy-python-cli | | [2] https://github.com/wakatime/wakatime-cli | ChrisMarshallNY wrote: | In my case, the choice is made for me (I write Apple native, so | Swift, it is...). | | I have found that his advice on using profilers was very | important. I _thought_ I knew where it would be slow, but the | profiler usually stuck its tongue out at me and laughed. | | When I ran a C++ shop, optimization was a black art. Things | like keeping code and pipelines in low-level caches could have | 100X impact on performance, so we would do things like optimize | to avoid breaking cache. This often resulted in things like | copying and pasting lines of code to run sequentially, instead | of in a loop, or as a subroutine/method. | | It's difficult. There's usually some "low-hanging fruit" that | will give _awesome_ speedups, then, it gets difficult. | mrfox321 wrote: | Do your tricks still apply to modern c++ compilers? | moonchild wrote: | I cannot speak for c++, but I recently rewrote an optimised | c routine in assembly. A 2-4x speedup obtained. I might | have gotten partway with careful tuning of the source, the | way the parent suggests, but I could have not gotten all | the way. | favorited wrote: | [Not GP] Modern optimizers are great, but writing cache- | efficient code is still up to the programmer. | jandrewrogers wrote: | Modern C++ compilers are mixed bag when it comes to | optimization, brilliant in some areas and inexplicably | obtuse in others requiring the programmer to be quite | explicit. This has improved significantly with time but I | am still sometimes surprised by the code the compiler has | difficulty optimizing, or which are only recognized if | written in very narrow ways. As a practical matter, | compilers have a limited ability to reason over large areas | of code in the way a programmer can but even very local | optimizations are sometimes missed. | | This is why it is frequently helpful to look at the code | generated by the C++ compiler. It gives you insight into | the kinds of optimizations the compiler can see and which | ones it can't, so you can focus on the ones it struggles | with. This knowledge becomes out-of-date on the scale of | years, so I periodically re-check what I think I know about | what the compiler can optimize. | | For some things, like vectorization, the compiler's | optimizer almost never produces a good result for non- | trivial code and you'll have to do it yourself. | orangepurple wrote: | I recommend using Godbolt (https://godbolt.org/) to view | the assembly output of your compiled language (C++, etc) | jandrewrogers wrote: | I should add that some types of optimizations (cache | efficiency being a big one) are outside the scope of the | compiler because it is implicitly part of the code | specification that the compiler needs to faithfully | reproduce e.g. data structure layout is required to be a | certain way for interoperability reasons. | colechristensen wrote: | I don't know, I've been places where select pieces of | infrastructure were being rewritten from language X to Y for | performance reasons, and it seemed to be going just fine. It | wasn't "we're rewriting everything now" but finding performance | bottlenecks and fixing them by rewriting pieces in the new | language. | | It works if you have lots of things communicating over APIs. | anothernewdude wrote: | My team literally do this all the time. Python for everything, | Rust for the places where it doesn't keep up. | edflsafoiewq wrote: | IME fast software doesn't actually use this "waterfall" model. | There needs to be feedback from performance considerations into | semantics. | ayberk wrote: | This definitely doesn't apply to developer tools in general | (and sometimes also to infrastructure). | | Anyone who had the "pleasure" of using Amazon's internal tools | can talk about how "Make it work ASAP" attitude has worked out | for their internal tooling :) LPT anyone? | djmips wrote: | premature optimization is the root of all evil. | | "I approve too! But... Sometimes it just has to be fast, and | sometimes that means the performance has to be designed in. | Nelson agrees and has smart things to say on the subject." | | - so... you're saying premature optimization isn't the root of | all evil. Maybe it's time to retire this tired 'conventional | wisdom' | [deleted] | runevault wrote: | I hate "premature optimization" so much. | | The only optimization type I understand that with is very | isolated but convoluted fixes that buy performance at the cost | of readability/etc. Intelligent architecture that is fast, so | long as it is does not completely obfuscate the intent of the | code, is not premature. | jaywalk wrote: | If you already know that something has to be fast, then it's | not really _premature_ optimization. | trashtester wrote: | I would say, in most cases it pays to make the overall design | of the code in a way that enables fast execution. This | includes selection of languages and libraries. | | Identifying loops or recursion with many iterations (high N) | with a high Big O order may also be avoided during the design | stage, of you KNOW the N and O-order at design time. (And | don't worry about the O-order for low N problems, that is | premature optimization most of the time.). | | On the other hand, tweaking every single statement or test in | ways that save 1-2 cycles is rarely worth it. Often you end | up with obfuscated code that may even make it harder for the | compiler to optimize for. This is the 97% of the code where | premature optimization makes things harder. | | Some programmers may turn this on its head. They don't | understand the implications of the design choices, but try to | make up for that by employing all sorts of tricks (that may | or may not provide some small benefit) in the main body of | the code. | bee_rider wrote: | I don't think this follows. If premature optimization is the | root of all evil, essentially, we're saying "for all evil, | there exists a premature optimization which leads to it." If | evil, then sourced in a premature optimization. | | Even if we parse his "But..." as saying, "there exists some | premature optimization which is not the root of an evil" | (ignoring probably valid quibbles about whether the | optimizations he's talking about truly are premature), this | doesn't contradict the original statement. | | In fact, "the root of all evil" seems to be an expression which | invites us to commit the fallacy of denying the antecedent -- | if premature optimization, then evil, in this case -- because | it is almost always used to indicate that the first thing is | bad. | dahart wrote: | > you're saying premature optimization isn't the root of all | evil. Maybe it's time to retire this tired 'conventional | wisdom' | | Funny you mention it. I have a bit of a habit now of reminding | people what the remainder of Knuth's quote actually was. | | "We should forget about small efficiencies, say about 97% of | the time: premature optimization is the root of all evil. Yet | we should not pass up our opportunities in that critical 3%." | | The irony of walking away with the impression that Knuth was | saying to not do optimization is that his point was the exact | opposite of that, he was emphasizing the word _premature_ , and | then saying we absolutely should optimize, after profiling. | | This all agrees completely with the article's takeaways: build | it right first, then _measure_ , and then optimize the stuff | that you can see is the slowest. | astrange wrote: | The 97% does have its own performance opportunities; rather | than making it faster you want to stop it from getting in the | way of the important stuff, by reducing code size or tendency | to stomp on all the caches or things like that. | | Anyone optimizing via microbenchmarks or wall time | exclusively isn't going to see this. | dralley wrote: | Not to mention the paragraph that immediately follows the more | famous one, in Djiksta's essay: | | >> Yet we should not pass up our opportunities in that critical | 3%. A good programmer will not be lulled into complacency by | such reasoning, he will be wise to look carefully at the | critical code; but only after that code has been identified. | doodpants wrote: | No, they're saying that not all optimization is premature. Or | that upfront performance considerations in the design are not | necessarily a case of premature optimization. | huachimingo wrote: | Which one is faster? (C code) Return: see if abs(num) > x. | | / _logical comparison_ / int greater_abs(int num, int x){ return | (num > x) || (num+x < 0); } | | / _squared approach_ / int greater_abs2(int num, int x){ return | num*num > x; } | | See it by yourself, with (and without) optimizations: | https://godbolt.org/ | | What would happen if x is a compile-time constant? | dahart wrote: | Math & logic are rarely the bottleneck over memory & allocation | bottlenecks, right? Does Godbolt assume x86? Does the answer | change depending on whether you're using an AMD or NVIDIA GPU, | or an Apple, ARM or Intel processor? Does it depend on which | instruction pipelines are full or stalled from the surrounding | code, e.g., logic vs math? Hard to say if one of these will | always be better. There are also other alternatives, e.g. | bitmasking, that might generate fewer instructions... maybe | "abs(num) > x" will beat both of those examples? | masklinn wrote: | > Does Godbolt assume x86? | | Godbolt uses whatever compilers, targets, and optimisations | you ask it to. | | It is, in fact, a very useful tool for comparing different | compilers, architectures, and optimization settings. | astrange wrote: | Most questions like these have no answer because if any of the | parameters is known (which it usually is) it'll get folded away | to nothing. | WalterGR wrote: | No idea. | | How frequently am I calling `abs`? | [deleted] | jjice wrote: | If you write math heavy code, probably a lot more than if | you're writing web apps. Depends on what kind of software you | write. | WalterGR wrote: | Got it. | | Well if that were the case, I'd use a profiler to see if | spending time on optimizing 'abs' would realistically be | worth it. | throwaway744678 wrote: | I don't know which one is faster, but I know that one is not | correct (squared approach). | saghm wrote: | Wouldn't the second one also potentially be incorrect due to | overflow? | pjscott wrote: | Yes. Suppose that both numbers are positive, that x>num, | and that x+num is bigger than INT_MAX. In that case we hit | signed integer overflow, which is undefined behavior. If | signed integer overflow happens to wrap around, which it | might, then the result could be negative and the function | would return the wrong result. Or anything else could | happen; undefined behavior is undefined. | | In practice, just writing "abs(num) > x" gives quite good | machine code, and it does so without introducing hard-to- | see bugs. | [deleted] | zasdffaa wrote: | Depends. In the first it will depend on the branch predictor | which will depend on the relative expected magnitudes of num | and x | | In the 2nd, which I assume should be { return | num*num > x * x; } | | then it depends on the micro-arch, as it's one basic block so | no branches and assuming a deep pipeline on x64, one multiplier | (pipelined), probably this is faster for 'random-ish' num and | x. | [deleted] | dhosek wrote: | Your squared approach is wrong: greater_abs2(3, 4) returns true | but should return false. | [deleted] | jbverschoor wrote: | > CPU time is always cheaper than an engineer's time. | | I hate this quote, but less than the "Memory is cheap" mantra.. | | For "CPU time", if it's a critical path for something with a lot | of users and/or where performance is key, the engineer's time is | just a fraction. | [deleted] | [deleted] | bagels wrote: | The critical factor is scale. Thousands of servers replaced | with hundreds by more efficient code can be worthwhile. | djmips wrote: | That's not the only factor. Sometimes it's weight or power or | a fixed system like embedded, console or a particular product | that's not easily upgradable. | trashtester wrote: | You don't need thousands of servers to make it worthwhile. If | you have code that runs constantly on 11 servers (possibly in | k8s) that each cost $1k/month in AWS, and you spend a few | weaks optimizing it down to needing 1 server, those weeks | just generated the equivalent of a YEARLY $120k revenue | stream. | | If you require an ROI of 5 years, no interest included, you | just created $600k in value over a few weeks. | jcalabro wrote: | Yeah for sure. One thing I think about often is if you're | writing a compiler and it's slow (say it takes 10s per | build), if you have 1000's of engineers running it 100x per | day, that starts to add up quick. If you could get it down to | 1 second, then you'd save a lot of actual engineer time, just | not your own. | | Scale is key here, but fast software is always a much better | user experience than slow stuff as well. With the compiler | example, if it takes 1s as opposed to 1h, then users can | iterate much more quickly and get a lot of flexibility. | [deleted] | TimPC wrote: | This means working at companies with massive scale can be far | better because you get to make code performant and optimize | things rather than just focus on getting features out the | door. | jbverschoor wrote: | Yup. It should, and in general it is.. Look at operating | systems, compared to <insert random app>. | | Somehow programmers care about the big O notation, but not | when it's about other people's time. | jacobolus wrote: | Moreover, when some operation gets sped up by a orders of | magnitude, it can be used for new things that you'd never | consider when it was relatively more expensive. | | Something that used to be precomputed offline can be done in | real time. Something that used to work on a small samples can | be applied to the whole data set. Something that used to be | coarsely approximated can be computed precisely. Something that | used to require large clusters of machines can be handled on | customers' client devices. Something that used to be only an | end in itself can be used as a building block for a higher- | level computation. Etc. | exyi wrote: | Yea. CPU time might be cheap but if someone is waiting for that | CPU, you are now wasting someone's time. | secondcoming wrote: | It explains why the modern web experience is typically so | crappy. | saagarjha wrote: | Indeed. At scale quotes like these are generally put on the | backburner and the performance team will deliver gains that | justify staffing the team, and then some. At work (though I'm | not directly involved in this) there's even a little table that | gives you a rough idea of what kind of win you need to save the | company the equivalent of an engineer. If you do it in the | right spot it's not really even that much (though, of course, | finding a win there is probably going to be very difficult). | runevault wrote: | The part where he talks about the quality of benchmarking tools | reminded me, a library I think worth mentioning for the .NET | crowd is BenchmarkdotNet. Not only can it tell you time for a | given test to run (after doing warmups to try and mitigate | various problems with running tests cold), it also has options to | see how the GC performed, at every generation level. Is this code | getting to later generations, or is it staying ephemeral which is | cheaper to use and destroy? Or can you avoid heap allocations | entirely? | | Edit: Oh and I should mention if you want the method level | tracking on a run similar to some of his screenshots, if you pay | for a Resharper license that comes with dotTrace I believe which | gives you that sort of tracking. | GordonS wrote: | dotTrace fantastic, really essential for performance profiling. | Likewise, dotMemory is really good when trying to reduce or | understand memory usage (tho dotTrace does have some memory | tooling too). I've been happily paying for a JetBrains Ultimate | personal license for a few years now. | | There are very few companies that I'm really rooting for, but | JetBrains is absolutely one. | runevault wrote: | I have a resharper ultimate license through work and a full | Jetbrains ultimate at home (I switched to Rider for | C#/F#/Unity dev in the past 6 months and really liking it, | along with CLion for the times I'm writing rust). | | One time at work I dug up something that removed 75% of the | runtime of an application because it turned out taking the | length of an image was actually a method even though it | looked like a simple property, so I cached it at the start of | processing each image instead of foring over it over and | over. It was insane how much faster the code became. I | tracked that down with dotTrace. | | And yeah dotMemory is also fantastic, I've dug up some GNARLY | memory usage with it. Probably should have mentioned it since | I was bringing up the memory portion of BenchmarkdotNet. | cube2222 wrote: | Go's built-in performance analysis tooling is so excellent. | | The profiler, which can do CPU, memory, goroutines, blocking, | etc. and can display all of that as a graph or a flame graph, as | well as `go tool trace` which gives you a full execution trace, | including lots of details about the work the GC does. All that | with interactive local web-based viewers. | | Performance optimization is always so fun with it. | timbray wrote: | Yep, but you know, I can't stand that web-based viewer, it's | got this hair trigger zoom and if I breathe in the direction of | the mouse the graph goes hurtling in or out. I used to look at | the PDFs but now I just stay in GoLand, which give you | everything you need. | henning wrote: | > If you're writing Ruby code to turn your blog drafts into HTML, | it doesn't much matter if republishing takes a few extra seconds | | Unless your software is intended to reload blog posts live like | Hugo/Jekyll/other static site generators and it takes ~5,000 ms | on an i9 machine when it could take 100 ms if different languages | and different implementation choices were made. This is the story | of modern software: "I don't care how much time and computing | power I waste. It's not the core app of some FAANG giant, so I'm | not going to bother, ever." | dwrodri wrote: | I agree to the extent that people fail to realize how they | could be missing out opportunities to innovate or corner a | market when they leave performance on the table. Quite often, | new products become possible when a basic task like rendering | HTML goes from taking 10 seconds to 10ms. I think you can paint | the problem in broad strokes from either side of the "how | important is performance?" argument. | | From a "get the bills paid" point-of-view, any good project | manager also has to know when to tell an engineer to focus on | getting the product shipped instead of chasing that next 5% in | throughput/latency reduction/overhead. I've seen my fair share | of programmers (including myself) refuse to ship to a project | because the pursuit of some "undesirable" latency and not | finish more important features. | | For tasks like video streaming, Automation software (CI | pipelines to robotics), video games, professional tools for | content creation (DAWs, video editing, Blender, etc.) | performance is the feature, but then your product is helping | them get the bills paid faster. Medical apparatus(es?) and | guidance software on autonomous vehicles are examples of where | latency is a life-or-death situation. | | I think everyone would benefit from playing with OpenRTOS, or | writing some code that deals with video/audio where there are | hard deadlines on latency. But I'm never gonna hold some | weekend-project static site generator in Ruby to the same | standard as macOS. | makapuf wrote: | Agreed with what you said, but even for Blender, performance | is important but the feature are Free software, good modeler, | correct, feature-packed, good looking renderer AND | performance. | Jtsummers wrote: | Probably why he had the next paragraph: | | > But for code that's on the critical path of a service back- | end running on a big sharded fleet with lots of zeroes in the | "per second" numbers, it makes sense to think hard about | performance at design time. | | Your scenario falls under that category. | _gabe_ wrote: | > Unless your software is intended to reload blog posts live | like Hugo/Jekyll/other static site generators | | _This_ falls into this? | | > But for code that's on the critical path of a service back- | end running on a big sharded fleet with lots of zeroes in the | "per second" numbers | | A static site generator is a simple utility that should take | 100ms at most on a modern crappy laptop. It's not some | backend service that's in a critical path, but it's also not | a difficult engineering task. Parse input, produce output. | This isn't something that should take seconds, which I think | is what the OP was getting at. But because of the language | choices made, and the millions of lines of bloated code, it | does take seconds. | Jtsummers wrote: | Yes. Maybe not directly, but more like it versus a one-off | "I just made a new blog post and can wait 10 seconds for it | to render and deploy." If you're reloading the blog posts | live then you're on the critical path, not a one-off | anymore. So you need to think about performance. | | In the former case, the performance really doesn't matter. | If you've got a personal blog, does it matter if your | update takes 10 seconds or 1 second? Probably not, if it | does it's the most important blog in the world. | | If you've got customers with many blogs and need to take | their updates and render them, then the performance matters | because it's shifted from 1 to many (hundreds? thousands?). | And now that 10 second delay is a big issue, you're either | using a fleet of servers to handle the load or some | customers don't see updates for days (oops). | paganel wrote: | It was the "bad" untyped languages like PHP, Python and Ruby | (Perl was too complicated for us, mere mortals) that saved the | web from becoming a Microsoft monopoly, or, more likely, an | oligopoly between the same MS, probably Sun, probably IBM or | some such. I'm talking about most of the 2000s decade. True, | the web has become sort of an oligopoly right now, but at least | that was not caused by the programming languages and web | frameworks that power it. | | What I'm trying to say is that those languages that everyone is | quick to judge right now have given us 10, maybe 15 years of | extra "life", a period when most of us have "made" our careers | and, well, most of our money (those who have managed to make | that money, that is). We wouldn't have had (what basically are) | web companies worth tens, if not hundreds of billions of | dollars, if the web had still meant relying on Struts or on | whatever it is Microsoft was putting forward as a web framework | in the mid-2000s. We wouldn't have had engineers taking home TC | worth 500-600k and then complaining that Python or Ruby are not | what the world needs. | rentfree-media wrote: | But you might have a viable visual page builder. It's | honestly a tough choice at times... | j-james wrote: | BlueGriffon is pretty good. | dahfizz wrote: | I wonder how heavily other fields sacrifice in the name of | "ease of implementation". | | Could we have houses that are 100x stronger and longer lasting | if we allowed a few extra weeks of construction time? Could we | 10x battery capacity with a slightly more sophisticated | manufacturing process? | | I don't think many developers nowadays understand how fast | computers are, nor how much needless bloat is in their | software. | ajmurmann wrote: | For the economics of that comparison to make sense, we should | also include material inputs to the construction. It doesn't | matter to the cost of the final product if it was materials | that were saved or labor cost. Software is the special case | where cost almost entirely comes down to labor. So given | that, we are of course constantly compromising quality. We | realize that less though because buildings are more | standardized. | adamdusty wrote: | I work in III/V semiconductors. The product development goes | like: identify need, design, attempt a proof of concept, DoEs | to determine manufacturing costs/viability, repeat until you | have a statistically controlled viable process. There is | essentially no room for technical creativity like there is in | software. If we spent an extra year on our worst performing | product (we've done this), we would get at best 5% | improvement with iffy reproducibility. | | I dont know about construction or batteries. | TimPC wrote: | The big issue in other fields mostly isn't manufacturing | design or construction process. The main bottleneck is with | physical goods and long distances travelled you're subject to | high shipping costs so people cut all sorts of corners to | make things less bulky and lighter. Cheap plastics are far | lighter than wood or metals for instance so we see more | plastics get used. | astrange wrote: | You could have a lot more good enough and much cheaper houses | in the US, but we banned manufactured/prefab houses due to | lobbying from the "build everything individually out of wood" | lobby, require huge setbacks due to the front lawn lobby, and | various other things like overly wide roads because of out of | date fire codes. | agumonkey wrote: | aws recently blogged about price benefits of using more | performant compiled languages on their servers, it's coming ___________________________________________________________________ (page generated 2022-06-13 23:00 UTC)