[HN Gopher] The simple joys of scaling up ___________________________________________________________________ The simple joys of scaling up Author : eatonphil Score : 78 points Date : 2023-05-18 15:04 UTC (7 hours ago) (HTM) web link (motherduck.com) (TXT) w3m dump (motherduck.com) | ThePhysicist wrote: | FWIW I really like this new "neo-brutalist" website style with | hard shadows, clear, solid lines and simple typography and | layout. | andrewstuart wrote: | Moore's law freezes in the cloud. | winrid wrote: | The i4i instances also have crazy fast disks to go along with | that 1tb of ram. I hope to move all our stuff off i3 instances | this year to i4. | waynesonfire wrote: | it's cloud companies raking in the profits of these hardware | improvements. "Widely-available machines now have 128 cores and a | terabyte of RAM." and I'm still paying $5 bucks for a couple | cores. | dangoodmanUT wrote: | I've been following DuckDB for a while now, and even tinkered | with a layer on top called "IceDB" (totally needs a rewrite: | https://blog.danthegoodman.com/introducing-icedb--a-serverle...) | | The issue I see now is that there is no good way to know what | files will match well when reading from remote (decoupled) | storage. | | While it does support hive partitioning (thank god), and S3 list | calls, if you are looking at doing inserts frequently you need | some way to merge these parquet files. | | The MergeTree engine is my favorite thing about ClickHouse, and | why it's still my go-to. I think if there was a serverless way to | merge parquet (which was the aim of IceDB) that would make DuckDB | massively more powerful as a primary OLAP db. | LewisJEllis wrote: | Yea, DuckDB is a slam dunk when you have a relatively static | dataset - object storage is your durable primary SSOT, and | ephemeral VMs running duckdb pointed at the object storage | parquet files are your scalable stateless replicas - but the | story gets trickier in the face of frequent ongoing writes / | inserts. ClickHouse handles that scenario well, but I suspect | the MotherDuck folks have answers for that in mind :) | marsupialtail_2 wrote: | You will always be limited by network throughput. Sure that wire | is getting bigger but so is your data | brundolf wrote: | Probably biased given that it's on the DuckDB site, but well- | reasoned and referenced, and my gut agrees with the overall | philosophy | | This feels like the kicker: | | > In the cloud, you don't need to pay extra for a "big iron" | machine because you're already running on one. You just need a | bigger slice. Cloud vendors don't charge proportionally more for | a larger slice, so your cost per unit of compute doesn't change | if you're working on a tiny instance or a giant one. | | It's obvious once you think about it: you aren't choosing between | a bunch of small machines and one big machine, you may very well | be choosing between a bunch of small slices of a big machine and | one big slice of a big machine. The only difference would be in | how your software sees it: as a complex distributed system, or as | a single system (that can eg. share memory with itself instead of | serializing and deserializing data over network sockets) | LeifCarrotson wrote: | The reason this feels non-obvious is that people like to think | that they're choosing a variable number of small slices of a | big _datacenter_ , scaling up and down hour-by-hour or minute- | by-minute to get maximum efficiency. | | Really, though, you're generating enormous overhead while | turning on and off small slices of a 128-core monster with a | terabyte of RAM. | JohnMakin wrote: | That's not the only difference - there are many more facets of | reliability guarantees than the brief hand-waving this author | does about it in the article. | paulddraper wrote: | This is absolutely correct (and gets more correct every year). | | A m5.large (2 vCPU, 8GB RAM) is $0.096/hr. m5.24xlarge (96 vCPU, | 384GB RAM) is $4.608/hr. | | Exactly 1:48 scale up, in capacity and cost. | | The largest AWS instance is x2iedn.32xlarge (128 vCPU, 4096GB | RAM) for $26.676/hr. Compared to m5.large, a 64x increase in | compute and 512x increase in memory for 277x the cost. | | Long story short.....you can scale up linearly for a long time in | the cloud. | samsquire wrote: | This is an interesting post, thank you. | | In my toy barebones SQL database, I store rows alternatedly on | different replicas based on a consistent hash. I also have a | "create join" statement, this keeps join keys colocated. | | Then when there is a join query issued, I can always join because | the join keys are available and the join query can be executed on | each replica and returned to the client to be aggregated. | | I want building distributed high throughput systems to be easier | and less error prone. I wonder if a mixture of scale up and scale | out could be useful architecture. | | You want minimum network round trips or crossovers between | threads (synchronization cost) as you can get. ___________________________________________________________________ (page generated 2023-05-18 23:01 UTC)