[HN Gopher] Tracking developer build times to decide if the M3 M...
       ___________________________________________________________________
        
       Tracking developer build times to decide if the M3 MacBook is worth
       upgrading
        
       Author : paprikati
       Score  : 165 points
       Date   : 2023-12-28 19:41 UTC (1 days ago)
        
 (HTM) web link (incident.io)
 (TXT) w3m dump (incident.io)
        
       | lawrjone wrote:
       | Author here, thanks for posting!
       | 
       | Lots of stuff in this from profiling Go compilations, building a
       | hot-reloader, using AI to analyse the build dataset, etc.
       | 
       | We concluded that it was worth upgrading the M1s to an M3 Pro
       | (the max didn't make much of a difference in our tests) but the
       | M2s are pretty close to the M3s, so not (for us) worth upgrading.
       | 
       | Happy to answer any questions if people have them.
        
         | BlueToth wrote:
         | Hi, Thanks for the interesting comparison. What I would like to
         | see added would be a build on a 8GB memory machine (if you have
         | one available).
        
         | aranke wrote:
         | Hi,
         | 
         | Thanks for the detailed analysis. I'm wondering if you factored
         | in the cost of engineering time invested in this analysis, and
         | how that affects the payback time (if at all).
         | 
         | Thanks!
        
       | LanzVonL wrote:
       | We've found that distributed building has pretty much eliminated
       | the need to upgrading developer workstations. Super easy to set
       | up, too.
        
         | packetlost wrote:
         | Distributed building of _what_? Because for every language the
         | answer of whether it 's easy or not is probably different.
        
           | LanzVonL wrote:
           | We don't use new-fangled meme languages so everything is very
           | well supported.
        
         | lawrjone wrote:
         | I'm not sure this would work well for our use case.
         | 
         | The distributed build systems only really benefit from
         | aggressively caching the modules that are built, right? But the
         | majority of the builds we do are almost fully cached, having
         | changed just one module that needs recompiling then the linker
         | sticks everything back together, which the machines would then
         | need to download from the distributed builder and at 300MB a
         | binary that's gonna take a while.
         | 
         | I may have this totally wrong though. Would distributed builds
         | actually get us a new binary faster to the local machine?
         | 
         | I suspect we wouldn't want this anyway (lots of our company
         | work on the go, train WiFi wouldn't cut it for this!) but
         | interested nonetheless.
        
           | dist-epoch wrote:
           | > The distributed build systems only really benefit from
           | aggressively caching the modules that are built, right
           | 
           | Not really, you have more cores to build on. Significant
           | difference for slow to compile languages like C++.
           | 
           | > I may have this totally wrong though. Would distributed
           | builds actually get us a new binary faster to the local
           | machine?
           | 
           | Yes, again, for C++.
        
         | closeparen wrote:
         | A Macbook-equivalent AWS instance prices to at least the cost
         | of a Macbook per year.
        
           | lawrjone wrote:
           | Yes I actually did the maths on this.
           | 
           | If you want a GCP instance that is comparable to an M3 Pro
           | 36GB, you're looking at an n2-standard-8 with a 1TB SSD,
           | which comes out at $400/month.
           | 
           | Assuming you have it running just 8 hours a day (if your
           | developers clock in at exact times) and you can 1/3 that to
           | make it $133/month, or $1600/year.
           | 
           | We expect these MacBooks to have at least a 2 year life,
           | which means you're comparing the cost of the MacBook to 2
           | years of running the VM for 8 hours a day, which means $2800
           | vs $3200, so the MacBook still comes in $400 cheaper over
           | it's lifetime.
           | 
           | And the kicker is you still need to buy people laptops so
           | they can connect to the build machine, and you can no longer
           | work if you have bad internet connection. So for us the
           | trade-off doesn't work whichever way you cut it.
        
             | throwaway892238 wrote:
             | 1. With a savings plan or on-demand?       2. Keeping one
             | instance on per developer indefinitely, or only when
             | needed?       3. Shared nodes? Node pools?       4.
             | Compared to what other instance types/sizes?       5. Spot
             | pricing?
             | 
             | Shared nodes brought up on-demand with a savings plan and
             | spot pricing is the same cost if not cheaper than dedicated
             | high-end laptops. And on top of that, they can actually
             | scale their resources much higher than a laptop can, and do
             | distributed compute/test/etc, and match production. And
             | with a remote dev environment, you can easily fix issues
             | with onboarding where different people end up with
             | different setups, miss steps, need their tooling re-
             | installed or to match versions, etc.
        
               | lawrjone wrote:
               | 1. That was assuming 8 hours of regular usage a day that
               | has GCP's sustained use discounts applied, though not the
               | committed usage discounts you can negotiate (but this is
               | hard if you don't want 24/7 usage).
               | 
               | 2. The issue with only-when-needed is the cold-start time
               | starts hurting you in ways we're trying to pay to avoid
               | (we want <30s feedback loops if possible) as would
               | putting several developers on the same machine.
               | 
               | 3. Shared as in cloud multi-tenant? Sure, we wouldn't be
               | buying the exclusive rack for this.
               | 
               | 4. n2-standard-8 felt comparable.
               | 
               | 5. Not considered.
               | 
               | If it's interesting, we run a build machine for when
               | developers push their code into a PR and we build a
               | binary/container as a deployable artifact. We have one
               | machine running a c3-highcpu-22 which is 22 CPUs and 44GB
               | memory.
               | 
               | Even at the lower frequency of pushes to master the build
               | latency spikes a lot on this machine when developers push
               | separate builds simultaneously, so I'd expect we'd need a
               | fair bit more capacity in a distributed build system to
               | make the local builds (probably 5-10x as frequent) behave
               | nicely.
        
           | mgaunard wrote:
           | Anything cloud is 3 to 10 times the price of just buying
           | equivalent hardware.
        
         | boricj wrote:
         | At one of my former jobs, some members of our dev team (myself
         | included) had manager-spec laptops. They were just good enough
         | to develop and run the product on, but fairly anemic overall.
         | 
         | While I had no power over changing the laptops, I was co-
         | administrator of the dev datacenter located 20 meters away and
         | we had our own budget for it. Long story short, that dev
         | datacenter soon had a new, very beefy server dedicated for CI
         | jobs "and extras".
         | 
         | One of said extras was providing Docker containers to the team
         | for running the product during development, which also happened
         | to be perfectly suitable for remote development.
        
       | vessenes wrote:
       | The upshot: M3 Pro is slightly better than M2 and significantly
       | better than M1 Pro is what I've experienced with running local
       | LLMs on my Macs; currently M3 memory bandwidth options are lower
       | than for M2, and that may be hampering the total performance.
       | 
       | Performance per watt and rendering performance are both better in
       | the M3, but I ultimately decided to wait for an M3 Ultra with
       | more memory bandwidth before upgrading my daily driver M1 Max.
        
         | lawrjone wrote:
         | This is pretty much aligned with our findings (am the author of
         | this post).
         | 
         | I came away feeling that:
         | 
         | - M1 is a solid baseline
         | 
         | - M2 improves performance by about 60% - M3 Pro is marginal on
         | the M2, more like 10%
         | 
         | - M3 Max (for our use case) didn't seem that much different on
         | the M3 Pro, though we had less data on this than other models
         | 
         | I suspect Apple saw the M3 Pro as "maintain performance and
         | improve efficiency" which is consistent with the reduction in
         | P-cores from the M2.
         | 
         | The bit I'm interested about is that you say the M3 Pro is only
         | a bit better than the M2 at LLM work, as I'd assumed there were
         | improvements in the AI processing hardware between the M2 and
         | M3. Not that we tested that, but I would've guessed it.
        
           | vessenes wrote:
           | Yeah, agreed. I'll say I do use the M3 Max for Baldur's gate
           | :).
           | 
           | On LLMs, the issue is largely that memory bandwidth: M2 Ultra
           | is 800GB/s, M3 Max is 400GB/s. Inference on larger models are
           | simple math on what's in memory, so the performance is
           | roughly double. Probably perf / watt suffers a little, but
           | when you're trying to chew through 128GB of RAM and do math
           | on all of it, you're generally maxing your thermal budget.
           | 
           | Also, note that it's absolutely incredible how cheap it is to
           | run a model on an M2 Ultra vs an H100 -- Apple's integrated
           | system memory makes a lot possible at much lower price
           | points.
        
             | lawrjone wrote:
             | Ahh right, I'd seen a few comments about the memory
             | bandwidth when it was posted on LinkedIn, specifically that
             | the M2 was much more powerful.
             | 
             | This makes a load of sense, thanks for explaining.
        
           | Aurornis wrote:
           | > - M2 improves performance by about 60%
           | 
           | This is the most shocking part of the article for me since
           | the difference between M1 and M2 build times has been more
           | marginal in my experience.
           | 
           | Are you sure the people with M1 and M2 machines were really
           | doing similar work (and builds)? Is there a possibility that
           | the non-random assignment of laptops (employees received M1,
           | M2, or M3 based on when they were hired) is showing up in the
           | results as different cohorts aren't working on identical
           | problems?
        
             | lawrjone wrote:
             | The build events track the files that were changed that
             | triggered the build, along with a load of other stats such
             | as free memory, whether docker was running, etc.
             | 
             | I took a selection of builds that were triggered by the
             | same code module (one that frequently changes to provide
             | enough data) and compared models on just that, finding the
             | same results.
             | 
             | This feels as close as you could get for an apples-to-
             | apples comparison, so I'm quite confident these figures are
             | (within statistical bounds of the dataset) correct!
        
               | sokoloff wrote:
               | > apples-to-apples comparison
               | 
               | No pun intended. :)
        
       | Erratic6576 wrote:
       | Importing a couple thousand RAW pictures into a Capture One
       | library would take 2 h on my 2017 iMac.
       | 
       | 5 min on my m3 mbp pro.
       | 
       | Geekbench score differences were quite remarkable.
       | 
       | I am still wondering if I should return it, though
        
         | lawrjone wrote:
         | Go on, I'll bite: why?
        
           | ac2u wrote:
           | They miss the 2 hours procrastination time. It's a version of
           | "code's compiling" :)
        
             | Erratic6576 wrote:
             | Ha ha ha. You can leave it overnight and importing files is
             | a 1 time process so not much to win
        
             | teaearlgraycold wrote:
             | The foam swords are collecting dust.
        
           | Erratic6576 wrote:
           | 2,356 EUR is way over my budget. The machine is amazing but
           | the specs are stingy. Returning it and getting a cheaper one
           | would give me a lot of disposable money to spend in
           | restaurants
        
             | tomaskafka wrote:
             | Get a 10-core M1 Pro then - I got mine for about 1200 eur
             | used (basically undistinguishable from new), and the
             | difference (except GPU) is very small.
             | https://news.ycombinator.com/item?id=38810228
        
       | kingTug wrote:
       | Does anyone have any anecdoctal evidence around the snappiness of
       | VsCode with Apple Silicon? I very begrudgingly switched over from
       | SublimeText this year (after using it as my daily driver for
       | ~10yrs). I have a beefy 2018 MBP but VScode just drags. This is
       | the only thing pushing me to upgrade my machine right now but I'd
       | be bummed if there's still not a significant improvement with an
       | m3 pro.
        
         | lawrjone wrote:
         | If you're using an Intel Mac at this point, you should 100%
         | upgrade. The performance of the MX chips blows away the Intel
         | chips and there's almost no friction with the arm architecture
         | at this point.
         | 
         | I don't use VSCode but most of my team do and I frequently pair
         | with them. Never noticed it to be anything other than very
         | snappy. They all have M1s or up (I am the author of this post,
         | so the detail about their hardware is in the link).
        
           | hsbauauvhabzb wrote:
           | There can be plenty of friction depending on your use case.
        
         | whalesalad wrote:
         | I have 2x intel macbook pro's that are honestly paperweights.
         | Apple Silicon is infinitely faster.
         | 
         | It's a bummer because one of them is also a 2018 fully loaded
         | and I would have a hard time even selling it to someone because
         | of how much better the M2/M3 is. It's wild when I see people
         | building hackintoshes on like a Thinkpad T480 ... its like
         | riding a pennyfarthing bicycle versus a ducati.
         | 
         | My M2 Air is my favorite laptop of all time. Keyboard is
         | finally back to being epic (esp compared to 2018 era, which I
         | had to replace myself and that was NOT fun). It has no fan so
         | it never makes noise. I rarely plug it in for AC power. I can
         | hack almost all day on it (using remote SSH vscode to my beefy
         | workstation) without plugging in. The other night I worked for
         | 4 hours straight refactoring a ton of vue components and it
         | went from 100% battery to 91% battery.
        
           | ghaff wrote:
           | That assumes you only use one laptop. I have a couple 2015
           | Macs that are very useful for browser tasks. They're not
           | paperweights and I use them daily.
        
             | whalesalad wrote:
             | I have a rack in my basement with a combined 96 cores and
             | 192gb of ram (proxmox cluster), and a 13900k/64gb desktop
             | workstation for most dev work. I usually will offload
             | workloads to those before leveraging one of these old
             | laptops that is usually dead battery. If I need something
             | for "browser tasks" (I am interpreting this as cross-
             | browser testing?) I have dedicated VMs for that. For just
             | browsing the web, my M2 is still king as it has zero fan,
             | makes no noise, and will last for days without charging if
             | you are just browsing the web or writing documentation.
             | 
             | I would rather have a ton of beefy compute that is remotely
             | accessible and one single lightweight super portable
             | laptop, personally.
             | 
             | I should probably donate these mac laptops to someone who
             | is less fortunate. I would love to do that, actually.
        
               | xp84 wrote:
               | > should donate
               | 
               | Indeed. I keep around a 2015 MBP with 16GB (asked my old
               | job's IT if I could just keep it when I left since it had
               | already been replaced and wouldn't ever be redeployed) to
               | supplement my Mac Mini which is my personal main
               | computer. I sometimes use screen sharing, but mostly when
               | I use the 2015 it's just a web browsing task. With
               | adblocking enabled, it's 100% up to the task even with a
               | bunch of tabs.
               | 
               | Given probably 80% of people probably use webapps for
               | nearly everything, there's a huge amount of life left in
               | a late-stage Intel Mac for people who will never engage
               | in the types of tasks I used to find sluggish on my 2015
               | (very large Excel sheet calculations and various kinds of
               | frontend code transpilation). Heck, even that stuff ran
               | amazingly better on my 16" 2019 Intel MBP, so I'd assume
               | for web browsing your old Macs will be amazing for
               | someone in need, assuming they don't have bad keyboards.
        
         | fragmede wrote:
         | Your 5 year old computer is, well, 5 years old. It was once
         | beefy but that's technology for you.
        
         | orenlindsey wrote:
         | VSCode works perfectly.
        
         | baq wrote:
         | I've got a 12700k desktop with windows and an M1 macbook (not
         | pro!) and my pandas notebooks run _noticeably_ faster on the
         | mac unless I 'm able to max out all cores on the Intel chip
         | (this is after, ahem, _fixing_ the idiotic scheduler which
         | would put the background python on E-cores.)
         | 
         | I couldn't believe it.
         | 
         | Absolutely get an apple silicon machine, no contest the best
         | hardware on the market right now.
        
         | kimixa wrote:
         | The 2018 macbook pros weren't even using the best silicon of
         | the time - they were in the middle of Intel's "14nm skylake
         | again" period, and an AMD GPU from 2016.
         | 
         | I suspect one of the reasons why Apple silicon looks _so_ good
         | is the previous generations were at a dip of performance. Maybe
         | they took the foot off the gas WRT updates as they _knew_ the M
         | series of chips was coming soon?
        
           | doublepg23 wrote:
           | My theory is Apple bought Intel's timeline as much as anyone
           | and Intel just didn't deliver.
        
         | eyelidlessness wrote:
         | On my 2019 MBP, I found VSCode performance poor enough to be
         | annoying on a regular basis, enough so that I would frequently
         | defer restarting it or my machine to avoid the lengthy
         | interruption. Doing basically anything significant would have
         | the fans running full blast pretty much constantly.
         | 
         | On my M2 Max, all of that is ~fully resolved. There is still
         | some slight lag, and I have to figure it's just the Electron
         | tax, but never enough to really bother me, certainly not enough
         | to defer restarting anything. And I can count the times I've
         | even heard the fans on one hand... and even so, never for more
         | than a few seconds (though each time has been a little
         | alarming, just because it's now so rare).
        
         | aragonite wrote:
         | It depends on what specifically you find slow about VSCode. In
         | my experience, some aspects of VSCode feel less responsive than
         | Sublime simply due to intentional design choices. For example,
         | VSCode's goto files and project symbol search is definitely not
         | as snappy as Sublime's. But this difference is due to VSCode's
         | choice to use debouncing (search is triggered after typing has
         | stopped) as opposed to throttling (restricts function execution
         | to a set time interval).
        
         | tmpfile wrote:
         | If you find your compiles are slow, I found a bug in vscode
         | where builds would compile significantly faster when the status
         | bar and panel are hidden. Compiles that took 20s would take 4s
         | with those panels hidden.
         | 
         | https://github.com/microsoft/vscode/issues/160118
        
         | mattgreenrocks wrote:
         | VSCode is noticeably laggy on my 2019 MBP 16in to the point
         | that I dislike using it. Discrete GPU helps, but it still feels
         | dog slow.
        
       | throwaway892238 wrote:
       | MacBooks are a waste of money. You can be just as productive with
       | a machine just as fast for 1/2 the price that doesn't include the
       | Apple Tax.
       | 
       | Moreover, if your whole stack (plus your test suite) doesn't fit
       | in memory, what's the point of buying an extremely expensive
       | laptop? Not to mention constantly replacing them just because a
       | newer, shinier model is released? If you're just going to test
       | one small service, that shouldn't require the fastest MacBook.
       | 
       | To test an entire product suite - especially one that has high
       | demands on CPU and RAM, and a large test suite - it's much more
       | efficient and cost effective to have a small set of remote
       | servers to run everything on. It's also great for keeping dev and
       | prod in parity.
       | 
       | Businesses buy MacBooks not because they're necessary, but
       | because developers just want shiny toys. They're status symbols.
        
         | cedws wrote:
         | It's OK to just not like Apple. You don't have to justify your
         | own feelings with pejoratives towards other peoples' choice of
         | laptop.
        
           | boringuser2 wrote:
           | You really need to learn what a"pejorative" is before using
           | the term publicly.
        
       | swader999 wrote:
       | My main metrics are 1) does the fan turn on, 2) does it respond
       | faster than I think and move? Can't be any happier with the M2 at
       | top end specs. It's an amazing silent beast.
        
       | LispSporks22 wrote:
       | I wish I needed a fast computer. It's the CI/CD that's killing
       | me. All this cloud stuff we use - can't test anything locally
       | anymore. Can't use the debugger. I'm back to glorified fmt.Printf
       | statements that hopefully have enough context that the 40 min
       | build/deploy time was worth it. At least it's differential
       | -\\_(tsu)_/- All I can say is "I compiles... I think?" The unit
       | tests are mostly worthless and the setup for sending something to
       | a lambda feels like JCL boiler plate masturbation from that z/OS
       | course I took out of curiosity last year. I only typing this out
       | because I just restarted CI/CD to redeploy what I already pushed
       | because even that's janky. Huh, it's an M3 they gave me.
        
         | lawrjone wrote:
         | Yeah everything you just said is exactly why we care so much
         | about a great local environment. I've not seen remote tools
         | approach the speed/ease/flexibility you can get from a fast
         | local machine yet, and it makes a huge difference when
         | developing.
        
           | LispSporks22 wrote:
           | In the back of my mind I'm worried that our competitors have
           | a faster software development cycle.
        
       | orenlindsey wrote:
       | This is pretty cool, also I love how you can use AI to read the
       | data. Would take minutes if not hours to do it even just a year
       | ago.
        
         | lawrjone wrote:
         | Yeah, I thought it was really cool! (am author)
         | 
         | It's pretty cool how it works, too: the OpenAI Assistant uses
         | the LLM to take your human instructions like "how many builds
         | is in the dataset?" and translate that into Python code which
         | is run in a sandbox on OpenAI compute with access to the
         | dataset you've uploaded.
         | 
         | Under the hood everything is just numpy, pandas and gnuplot,
         | you're just using a human interface to a Python interpreter.
         | 
         | We've been building an AI feature into our product recently
         | that behaves like this and it's crazy how good it can get. I've
         | done a lot of data analysis in my past and using these tools
         | blew me away, it's so much easier to jump into complex analysis
         | without tedious setup.
         | 
         | And a tip I figured out halfway through: if you want to, you
         | can ask the chat for an iPython notebook of it's calculations.
         | So you can 'disable autopilot' and jump into manual if you ever
         | want finer control over the analysis it runs. Pretty wild.
        
           | guax wrote:
           | I also got surprised about using it for this kind of work. I
           | don't have access to copilot and gpt-4 at work but my first
           | instinct is to ask, did you double check its numbers?
           | 
           | Knowing how it works now makes more sense that it would make
           | less mistakes but I'm still skeptical :P
        
       | tomaskafka wrote:
       | My personal research for iOS development, taking the cost into
       | consideration, concluded:
       | 
       | - M2 Pro is nice, but the improvement over 10 core (8 perf cores)
       | M1 Pro is not that large (136 vs 120 s in Xcode benchmark:
       | https://github.com/devMEremenko/XcodeBenchmark)
       | 
       | - M3 Pro is nerfed (only 6 perf cores) to better distinguish and
       | sell M3 Max, basically on par with M2 Pro
       | 
       | So, in the end, I got a slightly used 10 core M1 Pro and am very
       | happy, having spent less than half of what the base M3 Pro would
       | cost, and got 85% of its power (and also, considering that you
       | generally need to have at least 33 to 50 % faster CPU to even
       | notice the difference :)).
        
         | geniium wrote:
         | Basically the Pareto effect in choosing the right cpu vs cost
        
         | mgrandl wrote:
         | The M3 Pro being nerfed has been parroted on the Internet since
         | the announcement. Practically it's a great choice. It's much
         | more efficient than the M2 Pro at slightly better performance.
         | That's what I am looking for in a laptop. I don't really have a
         | usecase for the memory bandwidth...
        
           | tomaskafka wrote:
           | Everyone has a different needs - for me, even M1 Pro has more
           | battery life than I use or need, so further efficiency
           | differences bring little value.
        
           | dgdosen wrote:
           | I picked up an M3Pro/11/14/36GB/1TB to 'test' over the long
           | holiday return period to see if I need an M3 Max. For my
           | workflow (similar to blog post) - I don't! I'm very happy
           | with this machine.
           | 
           | Die shots show the CPU cores take up so little space compared
           | to GPUs on both the Pro and Max... I wonder why.
        
           | wlesieutre wrote:
           | I don't really have a usecase for even more battery life, so
           | I'd rather have it run faster
        
         | lawrjone wrote:
         | That's interesting you saw less of an improvement in the M2
         | than we saw in this article.
         | 
         | I guess not that surprising given the different compilation
         | toolchains though, especially as even with the Go toolchain you
         | can see how specific specs lend themselves to different parts
         | of the build process (such as the additional memory helping
         | linker performance).
         | 
         | You're not the only one to comment that the M3 is weirdly
         | capped for performance. Hopefully not something they'll
         | continue into the M4+ models.
        
           | tomaskafka wrote:
           | That's what Xcode benchmarks seem to say.
           | 
           | Yep, there appears to be no reason for getting M3 Pro instead
           | of M2 Pro, but my guess is that after this (unfortunate)
           | adjustment, they got the separation they wanted (a clear
           | hierarchy of Max > Pro > base chip for both CPU and GPU
           | power), and can then improve all three chips by a similar
           | amount in the future generations.
        
             | Reason077 wrote:
             | > _"Yep, there appears to be no reason for getting M3 Pro
             | instead of M2 Pro"_
             | 
             | There is if you care about efficiency / battery life.
        
         | Aurornis wrote:
         | My experience was similar: In real world compile times, the M1
         | Pro still hangs quite closely to the current laptop M2 and M3
         | models. Nothing as significant as the differences in this
         | article.
         | 
         | I could depend on the language or project, but in head-to-head
         | benchmarks of identical compile commands I didn't see any
         | differences this big.
        
         | jim180 wrote:
         | I love my M1 MacBook Air for iOS development. One thing, I'd
         | like to have from Pro line is the screen, and just the PPI
         | part. While 120Hz is a nice thing to have, it won't happen Air
         | laptops.
        
         | ramijames wrote:
         | I also made this calculation recently and ended up getting an
         | M1 Pro with maxed out memory and disk. It was a solid deal and
         | it is an amazing computer.
        
       | aschla wrote:
       | Side note, I like the casual technical writing style used here,
       | with the main points summarized along the way. Easily digestible
       | and I can go back and get the details in the main text at any
       | point if I want.
        
         | lawrjone wrote:
         | Thank you, really appreciate this!
        
       | isthisreallife9 wrote:
       | Is this what software development is like in late 2023?
       | 
       | Communicating in emojis as much as words? Speaking to an LLM to
       | do basic data aggregation because you don't know how to do it
       | yourself?
       | 
       | If you don't know how to do munge data and produce bar charts
       | yourself then it's just a small step to getting rid of you and
       | let the LLM do everything!
        
         | lawrjone wrote:
         | Fwiw I've spent my whole career doing data analysis but the
         | ease at which I was able to use OpenAI to help me for this post
         | (am author) blew me away.
         | 
         | The fact that I can do this type of analysis is why I
         | appreciate it so much. It's one of the reasons I'm convinced AI
         | engineering find its way into the average software engineer's
         | remit (https://blog.lawrencejones.dev/2023/#ai) because it
         | makes this analysis far more accessible than it was before.
         | 
         | I still don't think it'll make devs redundant, though. Things
         | the model can't help you with (yet, I guess):
         | 
         | - Providing it with clean data => I had to figure out what data
         | to collect, write software to collect it, ship it to a data
         | warehouse, clean it, then upload it into the model.
         | 
         | - Knowing what you want to achieve => it can help suggest
         | questions to ask, but people who don't know what they want will
         | still struggle to get results even from a very helpful
         | assistant.
         | 
         | These tools are great though, and one of the main reasons I
         | wrote this article was to convince other developers to start
         | experimenting with them like this.
        
           | gray_-_wolf wrote:
           | > it makes this analysis far more accessible than it was
           | before
           | 
           | How does the average engineer verify if the result is
           | correct? You claim (and I believe you) to be able to do this
           | "by hand", if required. Great, but that likely means you are
           | able to catch when LLM makes an mistake. Any ideas on how
           | average engineer, without much experience in this area,
           | should validate the results?
        
             | lawrjone wrote:
             | I mentioned this in a separate comment but it may be worth
             | bearing in mind how the AI pipeline works, in that you're
             | not pushing all this data into an LLM and asking it to
             | produce graphs, which would be prone to some terrible
             | errors.
             | 
             | Instead, you're using the LLM to generate Python code that
             | runs using normal libraries like Pandas and gnuplot. When
             | it makes errors it's usually generating totally the wrong
             | graphs rather than inaccurate data, and you can quickly ask
             | it "how many X Y Z" and use that to spot check the graphs
             | before you proceed.
             | 
             | My initial version of this began in a spreadsheet so it's
             | not like you need sophisticated analysis to check this
             | stuff. Hope that explains it!
        
         | PaulHoule wrote:
         | The medium is the message here, the macbook is just bait.
         | 
         | The pure LLM is not effective on tabular data (so many
         | transcripts of ChatGPT apologizing it got a calculation
         | wrong.). To be working as well as it seems to work they must be
         | loading results into something like a pandas data frame and
         | having the agent write and run programs on that data frame, tap
         | into stats and charting libraries, etc.
         | 
         | I'd trust it more if they showed more of the steps.
        
           | lawrjone wrote:
           | Author here!
           | 
           | We're using the new OpenAI assistants with the code
           | interpreter feature, which allows you to ask questions of the
           | model and have OpenAI turn those into python code that they
           | run on their infra and pipe the output back into the model
           | chat.
           | 
           | It's really impressive and removes need for you to ask it for
           | code and then run that locally. This is what powers many of
           | the data analysis product features that are appearing
           | recently (we're building one ourselves for our incident data
           | and it works pretty great!)
        
         | gumballindie wrote:
         | You need to be a little bit more gentle and understanding. A
         | lot of folks have no idea there are alternatives to apple's
         | products that are faster, of higher quality, and upgradeable.
         | Many seem to be blown away by stuff that has been available
         | with other brands for a while - fast RAM speeds being one of
         | them. Few years back when i broke free from apple i was shocked
         | how fast and reliable other products were. Not to mention the
         | size of my ram is larger than an entry level storage option
         | with apple's laptops.
        
       | Aurornis wrote:
       | This is a great write-up and I love all the different ways they
       | collected and analyzed data.
       | 
       | That said, it would have been much easier and more accurate to
       | simply put each laptop side by side and run some timed
       | compilations on the exact same scenarios: A full build,
       | incremental build of a recent change set, incremental build
       | impacting a module that must be rebuilt, and a couple more
       | scenarios.
       | 
       | Or write a script that steps through the last 100 git commits,
       | applies them incrementally, and does a timed incremental build to
       | get a representation of incremental build times for actual code.
       | It could be done in a day.
       | 
       | Collecting company-wide stats leaves the door open to significant
       | biases. The first that comes to mind is that newer employees will
       | have M3 laptops while the oldest employees will be on M1 laptops.
       | While not a strict ordering, newer employees (with their new M3
       | laptops) are more likely to be working on smaller changes while
       | the more tenured employees might be deeper in the code or working
       | in more complicated areas, doing things that require longer build
       | times.
       | 
       | This is just one example of how the sampling isn't truly as
       | random and representative as it may seem.
       | 
       | So cool analysis and fun to see the way they've used various
       | tools to analyze the data, but due to inherent biases in the
       | sample set (older employees have older laptops, notably) I think
       | anyone looking to answer these questions should start with the
       | simpler method of benchmarking recent commits on each laptop
       | before they spend a lot of time architecting company-wide data
       | collection
        
         | lawrjone wrote:
         | I totally agree with your suggestion, and we (I am the author
         | of this post) did spot-check the performance for a few common
         | tasks first.
         | 
         | We ended up collecting all this data partly to compare machine-
         | to-machine, but also because we want historical data on
         | developer build times and a continual measure of how the builds
         | are performing so we can catch regressions. We quite frequently
         | tweak the architecture of our codebase to make builds more
         | performant when we see the build times go up.
         | 
         | Glad you enjoyed the post, though!
        
         | pjot wrote:
         | newer employees will have M3 laptops while the oldest employees
         | will be on M1 laptops
         | 
         | While I read this from my work intel...
        
       | dash2 wrote:
       | As a scientist, I'm interested how computer programmers work with
       | data.
       | 
       | * They drew beautiful graphs!
       | 
       | * They used chatgpt to automate their analysis super-fast!
       | 
       | * ChatGPT punched out a reasonably sensible t test!
       | 
       | But:
       | 
       | * They had variation across memory and chip type, but they never
       | thought of using a linear regression.
       | 
       | * They drew histograms, which are hard to compare. They could
       | have supplemented them with simple means and error bars. (Or used
       | cumulative distribution functions, where you can see if they
       | overlap or one is shifted.)
        
         | mnming wrote:
         | I think it's partly because the audiences are often not
         | familiar with those statistics details either.
         | 
         | Most people hates nuances when reading data report.
        
         | jxcl wrote:
         | Yeah, I was looking at the histograms too, having trouble
         | comparing them and thinking they were a strange choice for
         | showing differences.
        
         | Herring wrote:
         | >They drew histograms, which are hard to compare.
         | 
         | Note that in some places they used boxplots, which offer
         | clearer comparisons. It would have been more effective to
         | present all the data using boxplots.
        
       | vaxman wrote:
       | 1. If, and only if, you are doing ML or multimedia, get a 128GB
       | system and because of the cost of that RAM, it would be foolish
       | not to go M3 Max SoC (notwithstanding the 192GB M2 Ultra SoC).
       | Full Stop. (Note: This is also a good option for people with more
       | money than brains.)
       | 
       | 2. If you are doing traditional heavyweight software development,
       | or are concerned with perception in an interview, promotional
       | context or just impressing others at a coffee shop, get a 32GB
       | 16" MBP system with as large a built-in SSD as you can afford (it
       | gets cheaper per GB as you buy more) and go for an M2 Pro SoC,
       | which is faster in many respects than an M3 Pro due to core count
       | and memory bandwidth. Full Stop. (You could instead go 64GB on an
       | M1 Max if you keep several VMs open, which isn't really a thing
       | anymore (use VPS), or if you are keeping a 7-15B parameter LLM
       | open (locally) for some reason, but again, if you are doing much
       | with local LLMs, as opposed to being always connectable to the
       | 1.3T+ parameter hosted ChatGPT, then you should have stopped at
       | #1.)
       | 
       | 3. If you are nursing mature apps along, maybe even adding ML,
       | adjusting UX, creating forks to test new features, etc.. then
       | your concern is with INCREMENTAL COMPILATION and the much bigger
       | systems like M3 Max will be slower (bc they need time to ramp up
       | multiple cores and that's not happening with bursty incremental
       | builds), so might as well go for a 16GB M1 MBA (add stickers or
       | whatever if you're ashamed of looking like a school kid) and
       | maybe invest the savings in a nice monitor like the 28" LG DualUp
       | (bearing in mind you can only use a single native-speed external
       | monitor on non-Pro/Max SoCs at a time). You can even get by with
       | the 8GB M1 MBA because the MacOS memory compressor is really good
       | and the SSD is really fast. Do you want an M2 MBA? No, it has
       | inferior thermals, is heavier, larger, fingerprints easy, lack's
       | respect and the price performance doesn't make sense given the
       | other options. Same goes for 13" M1/M2 Pro and all M3 Pro.
       | 
       | Also, make sure you keep hourly (or better) backups on all Apple
       | laptops. There is a common failure scenario where the buck
       | converter that drops voltage for the SSD fails, sending 13VDC
       | into the SSD for long enough to permanently destroy the data on
       | it. https://youtu.be/F6d58HIe01A
        
         | whatshisface wrote:
         | Good to know I have commercial options for overcoming my laptop
         | shame at interviews. /s
        
       | fsckboy wrote:
       | I feel like there is a correlation between fast-twitch
       | programming muscles and technical debt. Some coding styles that
       | are rewarded by fast compile times can be more akin to "throw it
       | at the wall, see if it sticks" style development. Have you ever
       | been summoned to help a junior colleague who is having a problem,
       | and you immediately see some grievous errors, errors that give
       | you pause. You point the first couple out, and the young buck is
       | ready to send you away and confidently forge ahead, with no sense
       | of "those errors hint that this thing is really broken".
       | 
       | but we were all young once, I remember thinking the only thing
       | holding me back was 4.77MHz
        
         | wtetzner wrote:
         | There's a lot of value in a short iteration loop when debugging
         | unexpected behavior. Often you end up needing to keep trying
         | different variations until you understand what's going on.
        
           | lawrjone wrote:
           | Yeah there's a large body of research that shows faster
           | feedback cycles help developers be more productive.
           | 
           | There's nothing that says you can't have fast feedback loops
           | _and_ think carefully about your code and next debugging
           | loop, but you frequently need to run and observe code to
           | understand the next step.
           | 
           | In those cases even the best programmer can't overcome a much
           | slower build.
        
       | LASR wrote:
       | Solid analysis.
       | 
       | A word of warning from personal experience:
       | 
       | I am part of a medium-sized software company (2k employees). A
       | few years ago, we wanted to improve dev productivity. Instead of
       | going with new laptops, we decided to explore offloading the dev
       | stack over to AWS boxes.
       | 
       | This turned out to be a multi-year project with a whole team of
       | devs (~4) working on it full-time.
       | 
       | In hindsight, the tradeoff wasn't worth it. It's still way too
       | difficult to remap a fully-local dev experience with one that's
       | running in the cloud.
       | 
       | So yeah, upgrade your laptops instead.
        
         | jiggawatts wrote:
         | https://xkcd.com/1205/
        
           | mdbauman wrote:
           | This xkcd seems relevant also: https://xkcd.com/303/
           | 
           | One thing that jumps out at me is the assumption that compile
           | time implies wasted time. The linked Martin Fowler article
           | provides justification for this, saying that longer feedback
           | loops provide an opportunity to get distracted or leave a
           | flow state while ex. checking email or getting coffee. The
           | thing is, you don't have to go work on a completely unrelated
           | task. The code is still in front of you and you can still be
           | thinking about it, realizing there's yet another corner case
           | you need to write a test for. Maybe you're not getting
           | instant gratification, but surely a 2-minute compile time
           | doesn't imply 2 whole minutes of wasted time.
        
             | chiefalchemist wrote:
             | Spot on. The mind often needs time and space to breathe,
             | especially after it's been focused and bearing down on
             | something. We're humans, not machines. Creativity (i.e.,
             | problem solving) needs to be nurtured. It can't be force
             | fed.
             | 
             | More time working doesn't translate to being more effective
             | and more productive. If that were the case then why are a
             | disproportionate percentage of my "Oh shit! I know what to
             | do to solve that..." in the shower, on my morning run,
             | etc.?
        
         | WaxProlix wrote:
         | I suspect things like GitHub's Codespaces offering will be more
         | and more popular as time goes on for this kind of thing. Did
         | you guys try out some of the AWS Cloud9 or other 'canned' dev
         | env offerings?
        
           | hmottestad wrote:
           | My experience with GitHub Codespaces is mostly limited to
           | when I forgot my laptop and had to work from my iPad. It was
           | a horrible experience, mostly because Codespaces didn't
           | support touch or Safari very well and I also couldn't use
           | IntelliJ which I'm more familiar with.
           | 
           | Can't really say anything for performance, but I don't think
           | it'll beat my laptop unless maven can magically take decent
           | advantage of 32 cores (which I unfortunately know it can't).
        
       | boringuser2 wrote:
       | I get a bit of a toxic vibe from a couple comments in that
       | article.
       | 
       | Chiefly, I think the problem is that the CTO solved the wrong
       | problem: the right problem to solve includes a combination of
       | assessing why company public opinion is generating mass movements
       | of people wanting a new MacBook literally every year, if this is
       | even worth responding to at all (it isn't), and keeping employees
       | happy.
       | 
       | Most employees are reasonsble enough to not be bothered if they
       | don't get a new MacBook every year.
       | 
       | Employers should already be addressing outdated equipment
       | concerns.
       | 
       | Wasting developer time on a problem that is easily solvable in
       | one minute isn't worthwhile. You upgrade the people 2-3 real
       | generations behind. That should already have been in the
       | pipeline, resources notwhistanding.
       | 
       | I just dislike this whole exercise because it feels like a
       | perfect storm of technocratic performativity, short sighted
       | "metric" based management, rash consumerism, etc.
        
         | BlueToth wrote:
         | It's really worth the money if it keeps employees happy!
         | Besides that the conclusion was updating M1 to M3, but not
         | every year.
        
         | lawrjone wrote:
         | Sorry you read it like this!
         | 
         | If it's useful: Pete wasn't really being combative with me on
         | this. I suggested we should check if the M3 really was faster
         | so we could upgrade if it was, we agreed and then I did the
         | analysis. The game aspect of this was more for a bit of fun in
         | the article than how things actually work.
         | 
         | And in terms of why we didn't have a process for this: the
         | company itself is about two years old, so this was the first
         | hardware refresh we'd ever needed to schedule. So we haven't a
         | formal process in place yet and probably won't until the next
         | one either!
        
       | joshspankit wrote:
       | Since RAM was a major metric, there should have been more focus
       | on IO Wait to catch cases where OSX was being hindered by
       | swapping to disk. (Yes, the drives are fast but you don't know
       | until you measure)
        
         | cced wrote:
         | This. I've routinely got a 10-15GB page file on an M2 pro and
         | need to justify bumping the memory up a notch or two. I'm
         | consistently in the yellow memory and in the red while
         | building.
         | 
         | How can I tell how much I would benefit from a memory bump?
        
       | mixmastamyk wrote:
       | A lot of the graphs near the end comparing side-to-side had
       | different scales on the Y axis. Take results with a grain of
       | salt.
       | 
       | https://incident.io/_next/image?url=https%3A%2F%2Fcdn.sanity...
        
         | lawrjone wrote:
         | They're normalised histograms so the y axis is deliberately
         | adjusted so you can compare the shape of the distribution, as
         | the absolute number of builds in each bucket means little when
         | there are a different count of builds for each platform.
        
       | hk1337 wrote:
       | I wonder why they didn't include Linux since the project they're
       | building is Go? Most CI tools, I believe, are going to be Linux.
       | Sure, you can explicitly select macOS in Github CI but Linux
       | seems like it would be the better generic option?
       | 
       | *EDIT* I guess if you needed a macOS specific build with Go you
       | would us macOS but I would have thought you'd use Linux too. Can
       | you build a Go project in Linux and have it run on macOS? I
       | suppose architecture would be an issue building on Linux x86
       | would not run on macOS Apple Silicon but the reverse is true too
       | a build on Apple Silicon would not work on Linux x86 maybe not
       | even Linux Arm.
        
         | xp84 wrote:
         | I know nothing about Go, but if it's like other platforms,
         | builds intended for production or staging environments are
         | indeed nearly always for x86_64, but those are done somewhere
         | besides laptops, as part of the CI process. The builds done on
         | the laptops are to run each developer's local instance of their
         | server-side application and its front-end components, That
         | instance is always being updated to whatever is in-progress at
         | the time. Then they check that code in, and eventually it gets
         | built for prod on an Intel, Linux system elsewhere.
        
       | SSLy wrote:
       | > Application error: a client-side exception has occurred (see
       | the browser console for more information).
       | 
       | When I open the page.
        
       | rendaw wrote:
       | > People with the M1 laptops are frequently waiting almost 2m for
       | their builds to complete.
       | 
       | I don't see this at all... the peak for all 3 is at right under
       | 20s. The long tail (i.e. infrequently) goes up to 2m, but for all
       | 3. M2 looks slightly better than M1, but it's not clear to me
       | there's an improvement from M2 to M3 at all from this data.
        
       ___________________________________________________________________
       (page generated 2023-12-29 23:00 UTC)