[HN Gopher] Production Twitter on one machine? 100Gbps NICs and ... ___________________________________________________________________ Production Twitter on one machine? 100Gbps NICs and NVMe are fast Author : trishume Score : 245 points Date : 2023-01-07 18:46 UTC (4 hours ago) (HTM) web link (thume.ca) (TXT) w3m dump (thume.ca) | agilob wrote: | The title reminded me about this | https://www.phoronix.com/news/Netflix-NUMA-FreeBSD-Optimized | (2019) and this 2 years later: | https://papers.freebsd.org/2021/eurobsdcon/gallatin-netflix-... | (2021) | syoc wrote: | Latest version: http://nabstreamingsummit.com/wp- | content/uploads/2022/05/202... (2022) | PragmaticPulp wrote: | Very cool exercise. I enjoyed reading it. | | I see a lot of comments here assuming that this proves something | about Twitter being inefficient. Before you jump to conclusions, | take a look at the author's code: | https://github.com/trishume/twitterperf | | Notably absent are things like _serving HTTP_ , not to even | mention HTTPS. This was a fun exercise in algorithms, I/O, and | benchmarking. It wasn't actually imitating anything that | resembles actual Twitter or even a usable website. | trishume wrote: | Which I think I'm perfectly clear about in the blog post. The | post is mostly about napkin math systems analysis, which does | cover HTTP and HTTPS. | | I'm now somewhat confident I could implement this if I tried, | but it would take many years, the prototype and math is to | check whether there's anything that would stop me if I tried | and be a fun blog post about what systems are capable of. | | I've worked on a team building a system to handle millions of | messages per second per machine, and spending weeks doing math | and building performance prototypes like this is exactly what | we did before we built it for real. | PerilousD wrote: | [flagged] | jeffbee wrote: | I like this kind of exercise. One thing I am not seeing is | analytics, logs and so forth that as I understand it are | significant portions of Twitter's production cost story. | tluyben2 wrote: | Anyone have a complete list of functional blocks that form | Twitter? Beyond the obvious and what we see? | Marazan wrote: | You need the blocks for the obvious for what we see because | it is not necessarily obvious to everyone. | | Over the last couple of months I've seen comments that | summarise Twitter as a read-only service that doesn't have | any real time posting requirements and similarly other | comments that treat it as a write-only service with no real | time read / fast search requirements. | | Without _all_ the blocks even the simple surface level | Twitter will have complexity people miss. | lazyasciiart wrote: | If it's this cheap to run you don't need analytics because you | don't need to monetize it, and if it's this simple you don't | need logs because it'll all work correctly the first time! | kevingadd wrote: | "You don't need to monetize it" who's going to fund your | Twitter-as-a-charity? What happens when the free money goes | away? Businesses have to pay the bills eventually one way or | another, you need to plan for that in advance | throwmeup123 wrote: | The title is highly misleading for some theoretical | "exploration". | dang wrote: | Ok, we've put a question mark up there to make it more | explorationy. | trishume wrote: | As the author, this sounds good to me! I'll probably even | change the actual title to match. I originally was going to | make it a question mark and the only reason I didn't is https | ://en.wikipedia.org/wiki/Betteridge%27s_law_of_headline... | when I think the answer is probably "could probably be | somewhat done" rather than "no". | dang wrote: | Well this may be the first time that's ever happened :) | | Betteridge antiexamples are always welcome. I once tried to | joke that Mr. Betteridge had "retired" and promptly got | corrected about his employment status | (https://news.ycombinator.com/item?id=10393754). | jiggawatts wrote: | Something I've found a lot modern IT architects seem to ignore is | "write amplification" or the equivalent effect for reads. | | If you have a 1 KB piece of data that you need to send to a | customer, ideally that should require _less_ than 1 KB of actual | NIC traffic thanks to HTTP compression. | | If processing that 1 KB takes more than 1 KB of total NIC traffic | within and out of your data centre, the you have some level of | _amplification_. | | Now, for writes, this is often unavoidable because redundancy is | pretty much mandatory for availability. Whenever there's a | transaction, an amplification factor of 2-3x is assumed for | replication, mirroring, or whatever. | | For reads, good indexing and data structures within a few large | boxes (like in the article) can reduce the amplification to just | 2-3x as well. The request will likely need to go through a load | balancer of some sort, which amplifies it, but that's it. | | So if you need to process, say, 10 Gbps of egress traffic, you | need a total of something like 30 Gbps at least, but 50 Gbps for | availability and handling of peaks. | | What happens in places like Twitter is that they go _crazy_ with | the microservices. Every service, every load balancer, every | firewall, proxy, envoy, NAT, firewall, and gateway adds to the | multiplication factor. Typical Kubernetes or similar setups will | have a minimum NIC data amplification of 10x _on top of_ the 2-3x | required for replication. | | Now _multiply_ that by the crazy inefficient JSON-based | protocols, the GraphQL, an the other insanity layered on to | "modern" development practices. | | This is how you end up serving 10 Gbps of egress traffic with | _terabits_ of internal communications. This is how Twitter | apparently "needs" 24 million vCPUs to host _text chat_. | | Oh, sorry... text chat with the occasional postage-stamp-sized, | potato quality static JPG image. | thriftwy wrote: | I remember Stack Overflow running on a single Windows Server box | and mocking fellow LAMP developers with their propensity towards | having dozens of VMs to same effect. | | That was some time ago, though. | varunkmohan wrote: | Good analysis. Obviously, this doesn't handle cases like | redundancy and doesn't handle some of other critical workloads | the company has. However, it does show how much real compute | bloat these companies actually have - | https://twitter.com/petrillic/status/1593686223717269504 where | they use 24 million vcpus and spend 300 million a month on cloud. | judge2020 wrote: | On the other hand, Twitter does (or did) handle over 450 | million monthly active users (based on stats websites), with a | target for 315 monetizable daily active users (based on their | earnings calls pre-privatization). Handling that amount of | concurrency and beaming millions of tweets a day to home feeds | and notifications is going to be logistically hard. | WJW wrote: | Is that 315 million monetizable DAUs? That sounds like a lot | if the total is only 450 MAU. OTOH, 315k DAU seems like it | wouldn't be enough to pay the bills. | judge2020 wrote: | There were some quarters with profit, some without; the | past few years were mostly without IIRC. | | They were targeting 315 mDAUs for Q4 2023, but in the final | earnings it was only 238 mDAUs. Actual MAU stats weren't | public iirc but some random stats sites seemed to say 450m | global MAUs, which likely includes people with no ad | preferences or who only view NSFW content (which can't be | shown next to (most?) ads). | | https://www.forbes.com/sites/johnkoetsier/2022/11/14/twitte | r... | varunkmohan wrote: | Posted this on a comment above but systems like Whatsapp | likely sent an insane amount of data as well but used only 16 | servers over 1.5 billion users at time of acquisition. Modern | NICs can handle millions of requests a second - I still feel | there is a lot of excess here. | veec_cas_tant wrote: | Feels like the comparison is irrelevant. I'm guessing | WhatsApp would have infrastructure challenges if all of | their chats were group messages including the entirety of | their user base, search, moderation, ranking, ads, etc. | Isn't WhatsApp more comparable to only DMs? | PragmaticPulp wrote: | > However, it does show how much real compute bloat these | companies actually have | | No, it doesn't. It's a fun exercise in approaching Twitter as | an academic exercise. It ignores all of the real-world | functionality that makes it a business rather than a toy. | | A lot of complicated businesses are easy to prototype out if | you discard all requirements other than the core feature. In | the real world, more engineering work often goes to ancillary | features that you never see as an end user. | varunkmohan wrote: | Genuinely asking, why do you think Twitter needs 24 million | vcpus to run? | | This is not apples to apples but Whatsapp is a product that | entirely ran on 16 servers at the time of acquisition (1.5 | billion users). It really begs the question why Twitter uses | so much compute if there are companies that have operated | significantly more efficiently. Twitter was unprofitable | during acquisition and spent around half their revenue on | compute, maybe some of these features were not really | necessary (but were just burning money)? | ignoramous wrote: | > _This is not apples to apples but Whatsapp is a product | that entirely ran on 16 servers at the time of acquisition | (1.5 billion users)._ | | - 450m DAUs at the time of facebook acquisition [0] | | - Twitter is not just DMs or Group Chat. | | > _It really begs the question why Twitter uses so much | compute if there are companies that have operated | significantly more efficiently._ | | A fair comparision might have been Instagram: While Systrom | did run a relatively lean eng org, they never had to | monetize and got acquired before they got any bigger than | ~50m? | | [0] https://www.sequoiacap.com/article/four-numbers-that- | explain... | no_way wrote: | Chat apps are mostly one on one interaction, it is much | harder run an open platform where every user can | potentially interact with every other user, not even | talking about search and how complex it gets. If Twitter is | bloated or not is a valid discussion, but comparison it to | WhatsApp is not. | xwdv wrote: | Whatsapp used Erlang. | Snoozus wrote: | Because the actual product is not showing people tweets but | to optimize who to show which ads based on their previous | interactions with the site. This is many orders of | magnitude harder. | danpalmer wrote: | Being E2E encrypted, WhatsApp can't do much with the | content, so it is much closer to a naive bitshuffler than | Twitter. | | Twitter, while still not profitable (maybe it was in some | recent quarters?) was much closer to it, having all the | components necessary to form a reasonable ad business. For | ads, analytics is critical, plus all the ad serving, plus | it's a totally different scale of compute being many to | many rather than one to ~one. | peterhunt wrote: | The real answer is twofold: | | 1. Lots of batch jobs. Sometimes it's unclear how much | value they produce / whether they're still used. | | 2. Twitter probably made a mistake early on in taking a | fanout-on-write approach to populate feeds. This is super | expensive and necessitates a lot of additional | infrastructure. There is a good video about it here: | https://www.youtube.com/watch?v=WEgCjwyXvwc | mapme wrote: | They did not spend half their revenue on compute. It's more | like 20-25% for running data enters/staff for DCs. Check | their earnings report. | | Whats app is not an applicable comparison because messages | and videos are stored on the client device. Better to look | at Pinterest and snap, which spend a lot on infra as well. | | The issue is storage, ads, and ML to name a few. For | example, from 2015: | | " Our Hadoop filesystems host over 300PB of data on tens of | thousands of servers. We scale HDFS by federating multiple | namespaces." | | You can also see their hardware usage broken down by | service as put in their blog. | | https://blog.twitter.com/engineering/en_us/topics/infrastru | c... | | https://blog.twitter.com/engineering/en_us/a/2015/hadoop- | fil.... | d23 wrote: | Presumably there's an entire data engineering / event | processing pipeline that's being used to track user | interactions at a fine grained level. These events are | going to be aggregated and munged by various teams for | things like business analytics, product / experiment | feature analysis, ad analysis, as well as machine learning | model feature development (just to name a few massive ones | off the top of my head). Each of these will vary in their | requirements of things like timeliness, how much compute is | necessary to do their work, what systems / frameworks are | used to do the aggregations or feature processing, and | tolerance to failure. | | > This is not apples to apples but Whatsapp | | And yeah, whatsapp isn't even close to an apt comparison. | It's a completely different business model with vastly | different engineering requirements. | | Is Twitter bloated? Perhaps, but it's probably driven by | business reasons, not (just) because engineers just wanted | to make a bunch of toys and promo projects (though this | obviously always plays some role). | [deleted] | ricardobeat wrote: | And for the most part, this herculean effort is wasted. | Most people just want to see latest tweets from people | they follow. Everything else is fluff to manipulate | engagement metrics, pad resumes and attempt to turn | twitter into something it's users never wanted. | veec_cas_tant wrote: | Just guessing, but a lot of the resources are probably | devoted to making money for the business, not padding | resumes. Others have pointed it out, but showing tweets | doesn't generate revenue without additional | infrastructure. | sofixa wrote: | Most people probably follow more people than they're | capable of reading all the latest tweets of, so some sort | of ranking/prioritisation makes total sense. And Twitter | is ad funded, so they need to also show relevant ads | where it makes sense/money. | threadweaver34 wrote: | Whatsapp doesn't do ranking. | BeefWellington wrote: | I'm going to preface this criticism by saying that I think | exercises like this are fun in an architectural/prototyping code- | golf kinda way. | | However, I think the author critically under-guesses the sizes of | things (even just for storage) by a reasonably substantial | amount. e.g.: Quote tweets do not go against the size limit of | the tweet field at Twitter. Likely they are embedding a tweet | reference in some manner or other in place of the text of the | quoted tweet itself but regardless a tweet takes up more than 280 | unicode characters. | | Also, nowhere in the article are hashtags mentioned. For a system | like this to work you need some indexing of hashtags so you | aren't doing a full scan of the entire tweet text of every tweet | anytime someone decides to search for #YOLO. The system as | proposed is missing a highly critical feature of the platform it | purports to emulate. I have no insider knowledge but I suspect | that index is maybe the second largest thing on disk on the | entire platform, apart from the tweets themselves. | trishume wrote: | Quote tweets I'd do as a reference and they'd basically have | the cost of loading 2 tweets instead of one, so increasing the | delivery rate by the fraction of tweets that are quote tweets. | | Hashtags are a search feature and basically need the same | posting lists as for search, but if you only support hashtags | the posting lists are smaller. I already have an estimate | saying probably search wouldn't fit. But I think hashtag-only | search might fit, mainly because my impression is people doing | hashtag searches are a small fraction of traffic nowadays so | the main cost is disk, not sure though. | | I did run the post by 5 ex-Twitter engineers and none of them | said any of my estimates were super wrong, mainly just brought | up additional features and things I didn't discuss (which I | edited into the post before publishing). Still possible that | they just didn't divulge or didn't know some number they knew | that I estimated very wrong. | mr90210 wrote: | > I did run the post by 5 ex-Twitter engineers and none of | them said any of my estimates were super wrong | | Absence of evidence is not evidence of absence. That being | said, given that you have access to ex-Twitter engineers you | may want to fact check with them for accuracy purposes, or | just add a remark about this topic under the assumptions | section. | | It's ok to assume as long as we document those. | | Cheers | Aeolun wrote: | I'm not sure why you are asking him to do the thing you've | literally quoted him doing? | literallyroy wrote: | I believed he was saying the author should run his idea | of indexes on disk taking up a lot of space by the | engineers. | 867-5309 wrote: | >Absence of evidence is not evidence of absence. | | >possible that they just didn't divulge | lelandfe wrote: | If an inspector reviews your house and finds no issues, | that is indeed evidence of absence. | drewg123 wrote: | How much bandwidth does Twitter use for images and videos? Less | than 1.4Tb/s globally? If so, we could probably fit that onto a | second machine. We can currently serve over 700Gb/s from a dual- | socket Milan based server[1]. I'm still waiting for hardware, but | assuming there are no new bottlenecks, that should directly scale | up to 1.4Tb/s with Genoa and ConnectX-7, given the IO pathways | are all at least twice the bandwidth of the previous generation. | | There are storage size issues (like how big is their long tail; | quite large I'd imagine), but its a fun thing to think about. | | [1] https://people.freebsd.org/~gallatin/talks/euro2022.pdf | cortesoft wrote: | It is way more than 1.4TBs a second globally. | xyzzy123 wrote: | I wonder how much is api traffic and how much is assets & | images. | koolba wrote: | I wonder how much of that is crypto spam bots replying to | each other. | JosephRedfern wrote: | I suppose that in practice you'd need to consider burst | bandwidth and not just 95/99 percentiles. | britneybitch wrote: | > colo cost + total server cost/(3 year) => $18,471/year | | Meanwhile the company I just left was spending more than this for | dozens of kubernetes clusters on AWS before signing a single | customer. Sometimes I wonder what I'm still doing in this | industry. | ummonk wrote: | Was it more than the salary of even a single software engineer? | tluyben2 wrote: | Very often. I work with companies spending 10x as much on bad | (bizarrely complex; indeed kubernetes, lambda, gateway, rds | etc) setup and bad code on aws. Almost no traffic (b2b). Makes | no sense at all. | threeseed wrote: | If they were a startup like you suggest then it's possible they | were running on AWS credits. | | You can get up to $100k and it's a big reason many startups go | in that direction. | | Also $20k is nothing when you factor in developer time etc. | Existenceblinks wrote: | Techies in tech industry are basically eating the rich .. A lot | of buzzword to suck investment money in. | paulryanrogers wrote: | Or rather cloud providers are eating the rich. Techies are | carrying the plates. | [deleted] | traceroute66 wrote: | > spending more than this for dozens of kubernetes clusters on | AWS before signing a single customer | | Yup. | | Cloud is 21st century Nickel & Diming. | | Sure _it sounds_ cheap, everything is priced in small sounding | cents per unit. | | But then it _very quickly_ becomes a compounding vicious circle | ... a dozen different cloud tools, each charged at cents per | unit, those units often being measured in increments of | hours....next thing you know is your cloud bill has as many | zeros on the end of it as the number of cloud services you are | using. ;-) | | And that's before we start talking about the data egress costs. | | With colo you can start off with two 1/4 rack spaces at two | different sites for resilience. You can get quite a lot of bang | for your buck in a 1/4 rack with today's kit. | richwater wrote: | > You can get quite a lot of bang for your buck in a 1/4 rack | with today's kit. | | Until very recently, while money was still very cheap, the | time overhead it would take to manage this just was not worth | the cost savings. | | Even with the market falling out from under VC, I think it | still is a good tradeoff for many shops. | toast0 wrote: | > Until very recently, while money was still very cheap, | the time overhead it would take to manage this just was not | worth the cost savings. | | You can also rent a whole server. There's not much | difference in time in managing a VM in a cloud or a whole | server you rent from someone. Depending on the vendor, | maybe some more setup time, since low end hosts don't | usually have great setup workflows, so maybe you need to | fiddle with the ipmi console once or twice to get it | started, but if you go with a higher tier provider, you can | fully automate everything if that floats your boat. It's | just bare metal rather than a VM, and typically much lower | cost for sustained usage (if you're really scaling up | signfigantly and down throughout the day, cloud costs can | work out less, although some vendors offer bare metal by | the hour, too) | Tepix wrote: | The new EPYC servers can be filled with 6TB of RAM and 96 cores | per socket. Fun times. | wonnage wrote: | This doesn't seem to support fetching a specific tweet by id? | jasonhansel wrote: | If you really wanted to run Twitter on one machine at any cost, | wouldn't an IBM mainframe be much more practical? | | You can even run Linux on them now. The specs he cites would | actually be fairly small for a mainframe, which can reach up to | 40TB of memory. | | I'm not saying this is a _good_ idea, but it seems better than | what the OP proposes. | wmf wrote: | No. A Genoa server is probably faster than a z16 and a | Superdome Flex is definitely faster. | bob1029 wrote: | If it's good enough for the payment card industry, I don't know | why it can't work for tweets. The amount of data per | transaction is very similar. | trishume wrote: | My friend mentioned this just before I published and I think | that probably is the fastest largest thing you can get which | would in some sense count as one machine. I haven't looked into | it, but I wouldn't be surprised if they could get around the | trickiest constraint, which is how many hard drives you can | plug in to a non-mainframe machine for historical image | storage. Definitely more expensive than just networking a few | standard machines though. | | I also bet that mainframes have software solutions to a lot of | the multi-tenancy and fault tolerance challenges with running | systems on one machine that I mention. | jasonhansel wrote: | Incidentally, a lot of people have argued that the massive | datacenters used by e.g. AWS are effectively single large | ("warehouse-scale") computers. In a way, it seems that the | mainframe has been reinvented. | dekhn wrote: | I wouldn't really agree with this since those machines | don't share address spaces or directly attached busses. | Better to say it's a warehouse-scale "service" provided by | many machines which are aggregated in various ways. | sterlind wrote: | I wonder though.. _could_ you emulate a 20k-core VM with | 100 terabytes of RAM on a DC? | | Ethernet is fast, you might be able to get in range of | DRAM access with an RDMA setup. cache coherency would | require some kind of crazy locking, but maybe you could | do it with FPGAs attached to the RDMA controllers that | implement something like Raft? | | it'd be kind of pointless and crash the second any | machine in the cluster dies, but kind of a cool idea. | | it'd be fun to see what Task Manager would make of it if | you could get it to last long enough to boot Windows. | sterlind wrote: | to me the line between machine and cluster is mostly about | real-time and fate-sharing. multiple cores on a single | machine can expect memory accesses to succeed, caches to be | coherent, interrupts to trigger within a deadline, clocks | not to skew, cores in a CPU not to drop out, etc. | | in a cluster, communication isn't real-time. packets drop, | fetches fail, clocks skew, machines reboot. | | IPC is a gray area. the remote process might die, its | threads might be preempted, etc. RTOSes make IPC work more | like a single machine, while regular OSes make IPC more | like a network call. | | so to me, the datacenter-as-mainframe idea falls apart | because you need massive amounts of software infrastructure | to treat a cluster like a mainframe. you have to use Paxos | or Raft for serializing operations, you have to shard data | and handle failures, etc. etc. | | but it's definitely getting closer, thanks to lots of | distributed systems engineering. | sayrer wrote: | It's a neat thought exercise, but wrong for so many reasons | (there are probably like 100s). Some jump out: spam/abuse | detection, ad relevance, open graph web previews, promoted | tweets that don't appear in author timelines, blocks/mutes, | etc. This program is what people think Twitter is, but | there's a lot more to it. | | I think every big internet service uses user-space networking | where required, so that part isn't new. | trishume wrote: | I think I'm pretty careful to say that this is a simplified | version of Twitter. Of the features you list: | | - spam detection: I agree this is a reasonably core feature | and a good point. I think you could fit something here but | you'd have to architect your entire spam detection approach | around being able to fit, which is a pretty tricky | constraint and probably would make it perform worse than a | less constrained solution. Similar to ML timelines. | | - ad relevance: Not a core feature if your costs are low | enough. But see the ML estimates for how much throughput | A100s have at dot producting ML embeddings. | | - web previews: I'd do this by making it the client's | responsibility. You'd lose trustworthiness though so users | with hacked clients could make troll web previews, they can | already do that for a site they control, but not a general | site. | | - blocks/mutes: Not a concern for the main timeline other | than when using ML, when looking at replies will need to | fetch blocks/mutes and filter. Whether this costs too much | depends on how frequently people look at replies. | | I'm fully aware that real Twitter has bajillions of | features that I don't investigate, and you couldn't fit all | of them on one machine. Many of them make up such a small | fraction of load that you could still fit them. Others do | indeed pose challenges, but ones similar to features I'd | already discussed. | sayrer wrote: | "web previews: I'd do this by making it the client's | responsibility." | | Actually a good example of how difficult the problem is. | A very common attack is to switch a bit.ly link or | something like that to a malicious destination. You would | also DoS the hosts... as the Mastodon folks are | discovering (https://www.jwz.org/blog/2022/11/mastodon- | stampede/) | | For blocks/mutes, you have to account for retweets and | quotes, it's just not a fun problem. | | Shipping the product is much more difficult that what's | in your post. It's not realistic at all, but it is fun to | think about. | sayrer wrote: | Here are some pointers: | | "Our approach to blocking links" | https://help.twitter.com/en/safety-and-security/phishing- | spa... | | "The Infrastructure Behind Twitter: Scale" https://blog.t | witter.com/engineering/en_us/topics/infrastruc... | | "Mux" https://twitter.github.io/finagle/guide/Protocols.h | tml#mux | | I do agree that some of this could be done better a | decade later (like, using Rust for some things instead of | Scala), but it was all considered. A single machine is a | fun thing to think about, but not close to realistic. CPU | time was not usually the concern in designing these | systems. | mschuster91 wrote: | > I haven't looked into it, but I wouldn't be surprised if | they could get around the trickiest constraint, which is how | many hard drives you can plug in to a non-mainframe machine | for historical image storage. | | Netapp is at something > 300TB storage per node IIRC, but in | any case it would make more sense to use some cloud service. | AWS EFS and S3 don't have any (practically reachable) limit | in size. | toast0 wrote: | > I wouldn't be surprised if they could get around the | trickiest constraint, which is how many hard drives you can | plug in to a non-mainframe machine for historical image | storage. | | Some commodity machines use external SAS to connect to more | disk boxes. IMHO, there's not a real reason to keep images | and tweets on the same server if you're going to need an | external disk box anyway. Rather than getting a 4u server | with a lot of disks and a 4u additional disk box, you may as | well get 4u servers with a lot of disks each, use one for | tweets and the other for images. Anyway, images are fairly | easy to scale horizontally, there's not much simplicity | gained by having them all in one host, like there is for | tweets. | trishume wrote: | Yah like I say in the post, the exactly one machine thing | is just for fun and as an illustration of how far vertical | scaling can go, practically I'd definitely scale storage | with many sharded smaller storage servers. | jiggawatts wrote: | > which is how many hard drives you can plug in to a non- | mainframe machine for historical image storage. | | You would be _surprised_. First off, SSDs are denser than | hard drives now if you 're willing to spend $$$. | | Second, "plug in" doesn't necessarily mean "in the chassis". | You can expand storage with external disk arrays in all sorts | of ways. Everything from external PCI-e cages to SAS disk | arrays, fibre channel, NVMe-over-Ethernet, etc... | | It's fairly easy to get several petabytes of fast storage | directly managed by one box. The only limit is the total | usable PCIe bandwidth of the CPUs, which for a current-gen | EPYC 9004 series processors in a dual-socket configuration is | something crazy like 512 GB/s. This vastly exceeds typical | NIC speeds. You'd have to balance available bandwidth between | _multiple_ 400 Gbps NICs and disks to be able to saturate the | system. | | People really overestimate the data volume put out by a | service like Twitter while simultaneously underestimating the | bandwidth capability of a single server. | justapassenger wrote: | Saying this is production Twitter is like saying that rsync is a | Dropbox. | pengaru wrote: | This post reminds me of an experience I had in ~2005 while @ | Hostway. | | Unsolicited story time: | | Prior to my joining the company Hostway had transitioned from | handling all email in a dispersed fashion across the shared | hosting Linux boxes with sendmail et al, to a centralized | "cluster" having disparate horizontally-scaled slices of edge- | SMTP servers, delivery servers, POP3 servers, IMAP servers, and | spam scanners. That seemed to be their scaling plan anyways. | | In the middle of this cluster sat a refrigerator sized EMC | fileserver for storing the Maildirs. I forget the exact model, | but it was quite expensive and exotic for the time, especially | for an otherwise run of the mill commodity-PC based hosting | company. It was a big shiny expensive black box, and everyone | involved seemed to assume it would Just Work and they could keep | adding more edge-SMTP/POP/IMAP or delivery servers if those | respective services became resource constrained. | | At some point a pile of additional customers were migrated into | this cluster, through an acquisition if memory serves, and things | started getting slow/unstable. So they go add more machines to | the cluster, and the situation just gets worse. | | Eventually it got to where every Monday was known as Monday | Morning Mail Madness, because all weekend nobody would read their | mail. Then come Monday, there's this big accumulation of new | unread messages that now needs to be downloaded and either | archived or deleted. | | The more servers they added the more NFS clients they added, and | this just increased the ops/sec experienced at the EMC. Instead | of improving things they were basically DDoSing their overpriced | NFS server by trying to shove more iops down its throat at once. | | Furthermore, by executing delivery and POP3+IMAP services on | separate machines, they were preventing any sharing of buffer | caches across these embarrassingly cache-friendly when colocated | services. When the delivery servers wrote emails through to the | EMC, the emails were also hanging around locally in RAM, and | these machines had several gigabytes of RAM - only to _never_ be | read from. Then when customers would check their mail, the POP3 | /IMAP servers _always_ needed to hit the EMC to access new | messages, data that was _probably_ sitting uselessly in a | delivery server 's RAM somewhere. | | None of this was under my team's purview at the time, but when | the castle is burning down every Monday, it becomes an all hands | on deck situation. | | When I ran the rough numbers of what was actually being performed | in terms of the amount of real data being delivered and | retrieved, it was a trivial amount for a moderately beefy PC to | handle at the time. | | So it seemed like the obvious thing to do was simply colocate the | primary services accessing the EMC so they could actually profit | from the buffer cache, and shut off most of the cluster. At the | time this was POP3 and delivery (smtpd), luckily IMAP hadn't | taken off yet. | | The main barrier to doing this all with one machine was the | amount of RAM required, because all the services were built upon | classical UNIX style multi-process implementations (courier-pop | and courier-smtp IIRC). So in essence the main reason most of | this cluster existed was just to have enough RAM for running | multiprocess POP and SMTP sessions. | | What followed was a kamikaze-style developed-in-production | conversion of courier-pop and courier-smtp to use pthreads | instead of processes by yours truly. After a week or so of | sleepless nights we had all the cluster's POP3 and delivery | running on a single box with a hot spare. Within a month or so | IIRC we had powered down most of the cluster, leaving just spam | scanning and edge-SMTP stuff for horizontal scaling, since it | didn't touch the EMC. Eventually even the EMC was powered down, | in favor of drbd+nfs on more commodity linux boxes w/coraid. | | According to my old notes it was a Dell 2850 w/8GB we ended up | with for the POP3+delivery server and identical hot spare, | replacing _racks_ of comparable machines just with less RAM. | >300,000 email accounts. | ricardobeat wrote: | > super high performance tiering RAM+NVMe buffer managers which | can access the RAM-cached pages almost as fast as a normal memory | access are mostly only detailed and benchmarked in academic | papers | | Isn't this exactly what modern key value stores like RocksDB, | LMDB etc are built for? | kureikain wrote: | Not to the extreme of fitting everything into one machine but I | have explorer the idea of separate stateless workload into its | own machine. | | However, the stateless workload can still operate in a read-only | manner if the stateful component failed. | | I run an email forwarding service[1], and one of challenge is how | can I ensure the email forwarding still work even if my primary | database failed. | | And I come up with a design that the app boot up, and load entire | routing data from my postgres into its memory data structure, and | persisted to local storage. So if postgres datbase failed, as | long as I have an instance of those app(which I can run as many | as I can), the system continue to work for existing customer. | | The app use listen/notify to load new data from postgres into its | memory. | | Not exactly the same concept as the artcile, but the idea is that | we try to design the system in a way where it can operate fully | on a single machine. Another cool thing is that it easiser to | test this, instead of loading data from Postgres, it can load | from config files, so essentially the core biz logic is isolated | into a single machine. | | --- | | https://mailwip.com | z3t4 wrote: | A Twitter clone could probably run in a teenagers closet, but not | after it has been iterated by 10000 monkeys. | morphle wrote: | Why not a single FPGA with 100Gbps ethernet or pcie with NVM | attached? Around $5K for the hardware and $5K for the traffic per | month. The software would be a bit trickier to write, but you now | get 100x performance for the same price | tluyben2 wrote: | That would be quite a nice project for fun. | mpoteat wrote: | Let's spend multi-million dollars a year on a team of highly | specialized FPGA engineers writing assembly and HDL so that we | can save 5k a month. Feature velocity will be 100x slower as | well, but at least our application is efficient. | | I think that this may make sense for some applications, but I | also think that if you can utilize software abstractions to | improve developer efficiency, it reduces risk in the long run. | tylerhou wrote: | A bit trickier is a huge understatement. | sethev wrote: | John Carmack tweeted something that made me noodle on this too: | | >It is amusing to consider how much of the world you could serve | something like Twitter to from a single beefy server if it really | was just shuffling tweet sized buffers to network offload cards. | Smart clients instead of web pages could make a very large | difference. [1] | | Very interesting to see the idea worked out in more detail. | | [1] https://twitter.com/id_aa_carmack/status/1350672098029694998 | varjag wrote: | Isn't that what an OPA sorta kinda does. | threeseed wrote: | > just shuffling tweet sized buffers to network offload cards | | Except that's not what it is doing at all. | | It assembles all the Tweets internally, applies an ML model to | produce a finalised response to the user. | seritools wrote: | > if it really was [which it isn't] | andrewstuart wrote: | Most projects I encounter these days instantly reach for | kubernetes, containers and microservices or cloud functions. | | I find it much more appealing to just make the whole thing run on | one fast machine. When you suggest this tend to people say "but | scaling!", without understanding how much capacity there is in | vertical. | | The thing most appealing about single server configs is the | simplicity. The more simple a system easy, likely the more | reliable and easy to understand. | | The software thing most people are building these days can easily | run lock stock and barrel on one machine. | | I wrote a prototype for an in-memory message queue in Rust and | ran it on the fastest EC2 instance I could and it was able to | process nearly 8 million messages a second. | | You could be forgiven for believing the only way to write | software is is a giant kablooie of containers, microservices, | cloud functions and kubernetes, because that's what the cloud | vendors want you to do, and it's also because it seems to be the | primary approach discussed. Every layer of such stuff add | complexity, development, devops, maintenance, support, | deployment, testing and (un)reliability. Single server systems | can be dramatically mnore simple because you can trim is as close | as possible down to just the code and the storage. | traceroute66 wrote: | > I find it much more appealing to just make the whole thing | run on one fast machine. | | Indeed. | | Lots of examples out there, one being Let's Encrypt[1] who run | off one MySQL server (with a few read replicas but only one | write). | | [1] https://letsencrypt.org/2021/01/21/next-gen-database- | servers... | anamexis wrote: | That doesn't really seem like an example, since the whole | thing doesn't run one machine. The database alone has | multiple machines. | traceroute66 wrote: | > That doesn't really seem like an example, since the whole | thing doesn't run one machine. | | It is an example. It shows you how you can run a service | that issues a few hundred million SSL certs a year off | relatively few pieces of hardware, i.e. no need to go | drinking the cloud Kool aid. | | There will never be a "perfect" example. The overall point | here is demonstrating that the first answer to everything | doesn't have to include the word "cloud". | | > The database alone has multiple machines. | | As I said, and the blog says ... there is only one writer. | The other nodes are smaller read replicas. | | Which again shows you don't need to go with the cloud | buzzword-filled database services. | erik wrote: | Last I read, Hacker News was still running on one big | machine. And still uses text files as its database. | threeseed wrote: | Twitter runs ads and generates billions in revenue. | | It can't just tolerate being down or having under-load | issues like HN often is. | threeseed wrote: | > there is only one writer | | And what happens if that writer goes down. Then the | service just stops. | | > buzzword-filled | | I love how your buzzwords e.g. read replicas are okay but | everyone else's are bad. | tambourine_man wrote: | >... without understanding how much capacity there is in | horizontal. | | I think you mean vertical, right? | andrewstuart wrote: | Ha ha yes I do! (corrected) | tambourine_man wrote: | Freudian slip :) | vasco wrote: | Kubernetes is useful if you have many teams working on things | in parallel and you want them to deploy in similar ways to not | have to reinvent the same wheel in 5 different ways by 5 | different teams. If you don't have multiple teams, you don't | need it. | jupp0r wrote: | It's also useful if you want your app to update without being | down, which even a single team might want to do. | dinosaurdynasty wrote: | You don't need k8s for that, teams have been doing that for | decades before k8s was ever a thing | jupp0r wrote: | Sure, but whatever they built themselves to accomplish | this is also complicated. I know because I have built | such systems (and replaced them by k8s). | erulabs wrote: | Hah, exactly. It's not that you can't accomplish all the | same things as k8s with your own bash scripts - it's that | k8s exists _to replace all your custom bash scripts_! | jupp0r wrote: | But you can still land on top of HN with a "I replaced | kubernetes with this 500 line bash script without unit | tests" blog article ;) | vbezhenar wrote: | Projects are optimized to be developed by so-called ordinary | developers. | | We have python service which consumes gigabytes of RAM for | quite simple task. I'm sure that I'd rewrite it with Rust to | consume tens of megabytes of RAM at most. Probably even less. | | But I don't have time for that, there are more important things | to consider and gigabytes is not that bad. Especially when you | have some hardware elasticity with cloud resources. | | I think that if you can develop world-scale twitter which could | run on a single computer, that's a great skill. But it's a rare | skill. It's safer to develop world-scale twitter which will run | on Kubernetes and will not require rare developer skills. | yodsanklai wrote: | > The thing most appealing about single server configs is the | simplicity. The more simple a system easy, likely the more | reliable and easy to understand. | | What if your unique machine crash? | readonlybarbie wrote: | [dead] | andrewstuart wrote: | Well you gotta have a backup strategy. I'm talking about the | primary machine here, I assumed that would be obvious but | maybe not. You build your failover strategy into your | architecture - there's lots of ways to do it - I use Postgres | so I would favor something based around log shipping. | ffssffss wrote: | And uptime is important, so you want to have that secondary | running and ready, with a proxy in front of everything so | you can switch as soon as you detect a failure. That's | three hosts, plus your alerting has to be separate too, so | that's four. Now, to orchestrate all this, we'll first get | out Puppet... | toast0 wrote: | If you're going from one machine to two, and you add an | automatically failover mechansim, chances are your load | switching mechanism is going to cause more downtime than | just running from your single machine, and manually | switching on failure (after being paged). | pclmulqdq wrote: | You spin up a backup as the new "unique machine." | fbdab103 wrote: | I think you should always plan for failures, but modern | enterprise hardware is quite reliable. I would even posit | that if you stood up a brand new physical server today, it | has a good chance of beating AWS uptime (well, not the AWS | dashboard numbers) over a one year period. | jupp0r wrote: | "hardware is quite reliable" is not a valid strategy. | Hardware fails with some non-zero probability. You need to | have a plan in place what to do if that happens, taking | into account service disruption, backups etc. | | Having a system in place that handles most of this | gracefully (like kubernetes) is one way of having such a | plan, there are others. Which one works best is dependent | on your app, cost of downtime, your team that's tasked with | bringing everything back up in the middle if the night, | etc. | | People who leave details like this out when they say | "kubernetes is complicated" just haven't seen the | complexities of operating a service well. | fbdab103 wrote: | My first sentence was to always plan for failures. | jupp0r wrote: | Yeah sorry it wasn't meant to dispute your argument, just | as an addition. | pixl97 wrote: | It doesn't matter if hardware is 99.99% reliable if you're | the .01% that day | yazzku wrote: | If I remember correctly, Lichess runs on a single server. | brightball wrote: | I tend to agree on simplicity. Really just depends on whether | you can tolerate downtime for either outages or deployments. | | As soon as you start accounting for redundancy you have to fan | out anyway. | kbumsik wrote: | > The thing most appealing about single server configs is the | simplicity. | | In my experience this ended up with more complicated. | | Those systems are typically developed by people who already | left and are undocumented, and they become extremely difficult | to figure out the config (packages, etc files... oh, where even | the service files are located?) and almost impossible to | reproduce. | | It might be okay to leave it there, but when we need to modify | or troubleshoot the system a nightmare begins... | | Maybe I was just unlucky, but at least k8s configs are more | organized and simpler than dealing with a whole custom | configured Linux system. | Thaxll wrote: | Because OP example is very simplistic and left on the table | very important details, you would base 250M on a single | machine? What about backups, obervability, how do you update | that stack without bringing down everything ... Also this is | napkin maths, this could be off by 10 or 100x which would | change everything. | | It's very simple to make a PoC on a very powerful machine, make | it ready from production serving hunderd of millions of users | is completely different. | raverbashing wrote: | > What about backups | | Several ways of doing this without relying on k8s | | > observability | | This doesn't require k8s neither and it's more on your app. | Systemd can restart systems by itself | | > how do you update that stack without bringing down | everything | | That's probably where redundancy helps the most. I wouldn't | run a big service without it (but again it found be at server | level) | threeseed wrote: | > but again it found be at server level | | Can you educate us on how to have a resilient app with no- | downtime updates at the server level. | | Because if you're doing this via software then it's no | different to Kubernetes. | PragmaticPulp wrote: | It's worth noting that the author's example doesn't do | anything like HTTP. It was purely an algorithmic benchmark. | | Nobody should be looking at this and thinking that it's | realistic to actually serve a _functional website_ at this | scale on a single machine with actual real world | requirements. | mpoteat wrote: | As well, you should be creating regional servers to minimize | latency for folks in other geographic regions. Can't beat c! | sitkack wrote: | It is a BoE system design. How is it off by 100x? | jupp0r wrote: | In addition, you should worry about what happens to your app | if a hardware error, network problem or natural disaster | makes your machine unavailable. | eddsh1994 wrote: | Split the DB from the app and replicate with a load | balancer? | threeseed wrote: | That database is going to need to be clustered as well | for resiliency. | | Sounds like you already have quite a number of different | containers already. | jupp0r wrote: | Now you have "Production Twitter on three machines". | Follow the rabbit and you end up with something like what | Twitter looks like today. | pixl97 wrote: | Heh, also praying that everything stays in the fast path. | If a small portion of the workload uses a higher portion of | machine resources then the moment an attacker figures it | out they have a great way of DDOSing your service | resources. | [deleted] | fifilura wrote: | I dont think k8s is what you are shooting at but the IPC that | is required to run a set of microservices. | | Like kafka. | | My impression is that it is the serialisation that comes with | each service-to-service communication that is really expensive. | judge2020 wrote: | Kubernetes and containers are a means to service architecture; | It enabled scalability but does not require it. You should | still be containerizing your applications to ensure a | consistent environment, even if you only throw it in a docker- | compose file on your production server. | ownagefool wrote: | So the goal is basically being able to do builds whilst | running as few setup steps as possible. | | Containers are a good common denominator because you | essentially start with the OS, and then there's a file that | automates installing further dependencies and building the | artifact, which typically includes the important parts of the | runtime environment. - They're stupidly | popular, so it basically nullifies the setup steps. - | Once setup, they by combinding both OS layers and App, they | solve more of the problem and are therefore slightly more | reliable. - They're self-documenting as long as you | understand bash, docker, and don't do weird shit like build | an undocumented intermediary layer. | | Infrastructure as Code does the same thing for the underlying | infra layers and kuberenetes is one of the nicer / quicker | implementations of this, but requires you have kubernetes | available. | | Together they largely solve the "works on my PC" problem. | KronisLV wrote: | > You should still be containerizing your applications to | ensure a consistent environment, even if you only throw it in | a docker-compose file on your production server. | | I'll say that this is a good point, especially because if you | don't use containers or a similar solution (even things like | shipping VM images, for all I care), you'll end up with | environment drift, unless your application is a statically | compiled executable with no system dependencies, like a | JDK/.NET/Python/Ruby runtime or worse yet, an application | server like Tomcat, all of which can have different versions. | Worse yet, if you need to install packages on the system, for | which you haven't pinned specific versions (e.g. needing | something that's installed through apt/yum, rather than | package.json or Gemfile, or requirements.txt and so on). | | That said, even when you don't use containers, you can still | benefit from some pretty nice suggestions that will help make | the software you develop easier to manage and run: | https://12factor.net/ | | I'd also suggest that you have a single mechanism for | managing everything that you need to run, so if it's not | containers and an orchestrator of some sort, at least write | systemd services or an equivalent for every process or group | of processes that should be running. | | Disclaimer: I still think that containers are a good idea, | just because of how much of a dumpsterfire managing different | OSes, their packages, language runtimes, application | dependencies, application executables, port mappings, | application resource limits, configuration, logging and other | aspects is. Kubernetes, perhaps a bit less so, although when | it works, it gets the job done... passably. Then again, | Docker Swarm to me felt better for smaller deployments (a | better fit for what you want to do vs the resources you | have), whereas Nomad was also pretty nice, even if HCL sadly | doesn't use the Docker Compose specification. | vbezhenar wrote: | When it comes to Java, everything could be used as a | directory installation. Like you need JDK, maven and | tomcat? Download and extract it somewhere. Modify your | current PATH to include java and that's about it. You can | build big tar.gz instead of OCI container which will work | just fine. | | So IMO it's perfectly possible to run Java applications | without containers. You would need to think about network | ports, about resource limits, but those are not hard | things. | | And tomcat even provides zero-downtime upgrades, although | it's not that easy to set up, but when it works, it does | work. | | After I've got some experience with Kubernetes, I'd reach | for it always because it's very simple and easy to use. But | that requires to go through some learning curve, for sure. | | The best and unbeatable thing about containers is that | there're plenty of ready ones. I have no idea how would I | install postgres without apt. I guess I could download | binaries (where?), put them somewhere, read docs, craft | config file with data dir pointing to anotherwere and so | on. That should be doable but that's time. I can docker run | it in seconds and that's saved time. Another example is | ingress-nginx + cert-manager. It would take hours if not | days from me to craft set of scripts and configs to | replicate thing which is available almost out of the box in | k8s, well tested and just works. | KronisLV wrote: | > When it comes to Java, everything could be used as a | directory installation. Like you need JDK, maven and | tomcat? Download and extract it somewhere. Modify your | current PATH to include java and that's about it. You can | build big tar.gz instead of OCI container which will work | just fine. | | I've seen something similar in projects previously, it | never worked all that well. | | While the idea of shipping one archive with _everything_ | is pretty good, people don 't want to include the full | JDK and Tomcat installs with each software delivery, | unlike with containers, where you get _some_ benefit out | of layer re-use when they haven 't changed, while having | the confidence that what you tested is what you'll ship. | Shipping 100 app versions with the same JDK + Tomcat | version will mean reused layers instead of 100 copies in | the archives. And if you _don 't_ ship everything | together, but merely suggest that release X should run on | JDK version Y, the possibility of someone not following | those instructions at least once approaches 100% with | every next release. | | Furthermore, Tomcat typically will need custom | configuration for the app server, as well as | configuration for the actual apps. This means that you'd | need to store the configuration in a bunch of separate | files and then apply (copy) it on top of the newly | delivered version. But you can't really do that directly, | so you'd need to use something like Meld to compare | whether the newly shipped default configuration doesn't | include something that your old custom configuration | doesn't (e.g. something new in web.xml or server.xml). | The same applies to something like cacerts within your | JDK install, if you haven't bothered to set up custom | files separately. | | Worse yet, if people aren't really disciplined about all | of this, you'll end up with configuration drift over time | - where your dev environment will have configuration A, | your test environment will have configuration B (which | will _sort of_ be like A), and staging or prod will have | something else. You 'll be able to ignore some of those | differences until everything will go horribly wrong one | day, or maybe you'll get degraded performance but without | a clear reason for it. | | > So IMO it's perfectly possible to run Java applications | without containers. You would need to think about network | ports, about resource limits, but those are not hard | things. | | This is only viable/easy/not brittle when you have self- | contained .jar files, which admittedly are pretty nice! | Though if shipping JDK with each delivery isn't in the | cards (for example, because of the space considerations), | that's not safe either - I've seen performance degrade | 10x because of a JDK patch release was different between | two environments, all because of JDK being managed | through the system packages. | | Resource limits are generally doable, though Xms and Xmx | lie to you, you'd need systemd slices or an equivalent | for hard resource limits, which I haven't seen anyone | seriously bother with, although they're at a risk of the | entire server/VM becoming unresponsive should their | process go rogue for whatever reason (e.g. CPU at 100%, | which is arguably worse than OOM because of bad memory | limits). | | Ports are okay when you are actually in control of the | software and nothing is hardcoded. Then again, another | aspect is being able to run multiple versions of software | at the same time (e.g. different MySQL/MariaDB releases | for different services/projects on the same node), which | most nix distributions are pretty bad at. | | > And tomcat even provides zero-downtime upgrades, | although it's not that easy to set up, but when it works, | it does work. | | I've seen this attempted, but it never worked properly - | the codebases might not have been good, but those | redeployments and integrating with Tomcat always lead to | either memory leaks or odd cases of the app server | breaking. That's why personally I actually enjoy the | approach of killing the entire thing alongside the app | and doing a full restart (especially good with embedded | Tomcat/Jetty/Undertow), using health checks for routing | traffic instead. | | I think doing these things at the app server level is | generally just asking for headaches, though the idea of | being able to do so is nice. Then again, I don't see | servers like Payara (like GlassFish) in use anymore, so I | guess Spring Boot with embedded Tomcat largely won, in | combination with other tools. | | > After I've got some experience with Kubernetes, I'd | reach for it always because it's very simple and easy to | use. But that requires to go through some learning curve, | for sure. | | I wouldn't claim that Kubernetes is simple if you need to | run your own clusters, though projects like K3s, K0s and | MicroK8s are admittedly pretty close. | | > The best and unbeatable thing about containers is that | there're plenty of ready ones. I have no idea how would I | install postgres without apt. I guess I could download | binaries (where?), put them somewhere, read docs, craft | config file with data dir pointing to anotherwere and so | on. That should be doable but that's time. I can docker | run it in seconds and that's saved time. Another example | is ingress-nginx + cert-manager. It would take hours if | not days from me to craft set of scripts and configs to | replicate thing which is available almost out of the box | in k8s, well tested and just works. | | This is definitely a benefit! | | Though for my personal needs, I build most (funnily | enough, excluding databases, but that's mostly because | I'm lazy) of my own containers from a common Ubuntu base. | Because of layer reuse, I don't even need tricks like | copying files directly, but can use the OS package | manager (though clean up package cache afterwards) and | pretty approachable configuration methods: | https://blog.kronis.dev/articles/using-ubuntu-as-the- | base-fo... | | In addition, my ingress is just a containerized instance | of Apache running on my nodes, with Docker Swarm instead | of Kubernetes: https://blog.kronis.dev/tutorials/how-and- | why-to-use-apache-... In my case, the distinction between | the web server running inside of a container and outside | of a container is minimal, with the exception that Docker | takes care of service discovery for me, which is | delightfully simple. | | I won't say that the ingress abstraction in Kubernetes | isn't nice, though you can occasionally run into | configurations which aren't as easy as they should be: | e.g. configuring Apache/Nginx/Caddy/Traefik certs which | has numerous tutorials and examples online vs trying to | feed your wildcard TLS cert into a Traefik ingress, with | all of the configuration so that your K3s cluster would | use it as the default certificate for the apps you want | to expose. Not that other ingresses aren't great (e.g. | Nginx), it's just that you're buying into additional | complexity and I've personally have also had cases where | removing and re-adding it hangs because of some resource | cleanup in Kubernetes failing to complete. | | I guess what I'm saying is that it's nice to use | containers for whatever the strong parts are (for | example, the bit about being able to run things easily), | though ideally without ending up with an abstraction that | might eventually become leaky (e.g. using lots of Helm | charts that have lots of complexity hiding under the | hood). Just this week I had CI deploys starting to | randomly fail because some of the cluster's certificates | had expired and kubectl connections wouldn't work. A | restart of the cluster systemd services helped make | everything rotate, but that's another thing to think | about, which otherwise wouldn't be a concern. | andrewstuart wrote: | I don't even use containers - I aim primarily for simplicity | and so far I have found I am able to build entire | sophisticated systems without a single container. Containers | I find make things much more complex. | John23832 wrote: | That just means you don't know how to architect a machine | with containers (and if you're effective in what you do, | that's ok). | | But it's a pretty objective notation that manually scaled | single machines don't scale as well as automation. | fragmede wrote: | Are you at least using Ansible or Chef or something? | TillE wrote: | Docker can be super simple. Like, if I want to run a Python | service, that's just a few lines in a Dockerfile and a | docker-compose.yml stub. Then I can trivially deploy that | anywhere. | paulryanrogers wrote: | Simple to start. Yet it is more complex at run time, | which can complicate (or simplify) debugging, depending | on the problem. | judge2020 wrote: | Observability in production is where APM solutions like | Datadog, Elastic, and Sentry come in; you can go from | just logging errors all the way up to continuously | profiling your application and beaming log files to them | to correlate with metrics and database query timings. | | If you're just doing a simple application, Sentry really | is the way to go, while Datadog and ELK are agent-based | and more intended for complex setups and big enterprises | (especially in their pricing structure/infra costs). | tluyben2 wrote: | By far most indeed. And if you want failover, just run 2 two | machines. More fun (for me) also than tying together all that | complexity. | John23832 wrote: | Kubernetes just orchestrates containers. You can still run | beefy machines and scale (if necessary) accordingly. | | If anything, Kubernetes allows you to save cost by going with a | scalable number of small, inexpensive, fully utilized machines, | vs one large, expensive, underused one. | sitkack wrote: | I would wager that the majority of users of k8s do so on a | cloud where they could provision VMs of the proper size to | begin with. The utilization argument is specious. | pikdum wrote: | I go with k8s even on a single server nowadays, it just makes | everything so much more convenient. | | https://k3s.io/ makes it really easy to set up, too. | vbezhenar wrote: | I never tried k3s, but what's wrong with kubeadm? I think | that's literally two commands to run single server k8s: | kubeadm init and kubectl taint something. | | The only thing bad about single server kubernetes is that | it'll eat like 1-2 GB of RAM by itself. When you whole server | could be 256 MB, that's a lot of wasted RAM. | pikdum wrote: | Nothing wrong with kubeadm, but k3s should be a bit more | lightweight. | samsquire wrote: | I recommend this table of latency figures for any software | engineer: | | https://gist.github.com/jboner/2841832 | | Essentially IO is expensive except within a datacenter but even | in a data center, you can do a lot of loop iterations in a hot | loop in the time it takes to ask a server for something. | | There is a whitepaper which talks about the raw throughput and | performance of single core systems outperforming scalable | systems. These should be required reading of those developing | distributed systems. | | http://www.frankmcsherry.org/assets/COST.pdf A summary: | http://dsrg.pdos.csail.mit.edu/2016/06/26/scalability-cost/ | eatonphil wrote: | This is a great exercise in napkin math, even with constraints | you've set for yourself that don't fully approximate Twitter | (yet). Thanks! | habibur wrote: | He will be up for surprise. | | HTTP with connection: keep-open can serve 100k req/sec. But | that's for one client being served repeatedly over 1 connection. | And this is the inflated number that's published in webserver | benchmark tests. | | For more practical down to earth test, you need to measure | performance w/o keep-alive. Request per second will drop to 12k / | sec then. | | And that's for HTTP without encryption or ssl handshake. Use | HTTPS and watch it fall down to only 400 req / sec under load | test [ without connection: keep-alive ]. | | That's what I observer. | trishume wrote: | I agree most HTTP server benchmarks are highly misleading in | that way, and mention in my post how disappointed I am at the | lack of good benchmarks. I also agree that typical HTTP servers | would fall over at much lower new connection loads. | | I'm talking about a hypothetical HTTPS server that used | optimized kernel-bypass networking. Here's a kernel-bypass HTTP | server benchmarked doing 50k new connections per core second | while re-using nginx code: https://github.com/F-Stack/f-stack. | But I don't know of anyone who's done something similar with | HTTPS support. | sayrer wrote: | Userspace networking is pretty common. The chair of the IETF | even wrote one: https://github.com/NTAP/quant | | "Quant uses the warpcore zero-copy userspace UDP/IP stack, | which in addition to running on on top of the standard Socket | API has support for the netmap fast packet I/O framework, as | well as the Particle and RIOT IoT stacks. Quant hence | supports traditional POSIX platforms (Linux, MacOS, FreeBSD, | etc.) as well as embedded systems." | lossolo wrote: | TLS handling would dominate your performance, kernel | bypassing would not help here unless you would also do TLS | NIC offloading, you still need to process new TLS sessions | from OP example and they would dominate your http processing | time (excluding application business logic processing). | pixl97 wrote: | And I would say real life Twitter involves mostly cell phone | use where we see companies like Google try to push HTTP/3 to | deal with head of line issues on lossy connections. Serving at | the millions of hits per day on lossy networks is going to | leave you with massive numbers of connections that have been | abandoned but you don't know it yet. Or connections that are | behaving like they are tar pitted and running at bits per | second. | lossolo wrote: | > Use HTTPS and watch it fall down to only 400 req / sec under | load test [ without connection: keep-alive ]. | | I'm running about 2000 requests/s in one of my real-world | production systems. All of the requests are without keep-alive | and use TLS. They use about one core for TLS and HTTP | processing. | keewee7 wrote: | In the coming years we will probably see a lot of complicated | microservice architectures be replaced by well-designed and | optimized Rust (and modern C++) monoliths that use simple | replication to scale horizontally. | pixl97 wrote: | Replication and simple never belong in the same sentence. DNS | which is one of the simplest replication systems I know of has | its own complex failure modes. | Cyph0n wrote: | More like "barebones, in-memory, English-only Twitter clone on | one machine". | | Edit: Still a nice writeup! | Dylan16807 wrote: | What strikes you as English-only about it? | Cyph0n wrote: | My bad - not English, but ASCII. The assumed max tweet size | is in bytes rather than (UTF) characters. | trishume wrote: | I specifically assumed a max tweet size based on the | maximum number of UTF-8 bytes a tweet can contain (560), | with a link to an analysis of that, and discussion of how | you could optimize for the common case of tweets that | contain way fewer UTF-8 bytes than that. Everything in my | post assumes unicode. | Tepix wrote: | Did you consider URLs? They don't seem to count against | tue size and can be very large indeed (like 4k) | modeless wrote: | URLs are shortened and the shortened size counts against | the tweet size. The URL shortener could be a totally | separate service that the core service never interacts | with at all. Though I think in real twitter URLs may be | partially expanded before tweets are sent to clients, so | if you wanted to maintain that then the core service | would need to interact with the URL shortener. | Cyph0n wrote: | Thanks for clarifying. I missed the max vs. average | analysis because I was focused on the text. Still, as | noted in the Rust code comment, the sample implementation | doesn't handle longer tweets. | Dylan16807 wrote: | That size in bytes is based on the max size in UTF-8 and | UTF-16. Codepoints below U+1100 are counted as one | "character" by twitter and will need at most 2 bytes. | Codepoints above it are counted as two "characters" by | twitter and will need at most 4 bytes. Therefore 560 bytes, | and it supports all languages. | | Side note, this is more pessimistic than it needs to be, if | you're willing to transcode. The larger codepoints fit into | 20-21 bits, and the smaller ones fit into 12-13 bits. | Cyph0n wrote: | I was referring to the comment in the Rust code, not the | analysis. | halfmatthalfcat wrote: | As in the instance will only be tuned to serve one language? | SilverBirch wrote: | I think one of the under-estimated interesting points of twitter | as a business is that this is the core. Yes, Twitter is 140 | characters, it's got "300m users" which is probably 5m real heavy | users. So yes, you could do a lot of "140 characters, a few | tweets per person, few million users" on very little hardware. | But that's why Twitters a shit business! | | How much RAM did your advertising network need? Becuase _that_ is | what makes twitter a business! How are you building your | advertiser profiles? Where are you accounting for fast roll out | of a Snapchat /Instagram/BeReal/Tiktok equivalent? Oh look, your | 140 characters just turned into a few hundreds megs of video that | you're going to transcode 16 different ways for Qos. Ruh Roh! | | How are your 1,000 engineers going to push their code to | production _on one machine_? | | Almost always the answer to "do more work" or "buy more machines" | is "buy more machines". | | All I'm saying is I'd change it to "Toy twitter on one machine" | not Production. | reacharavindh wrote: | The author claimed early on, and very clearly that this was a | fun exercise of thought and engineering rather than saying | "Look this is how Twitter should be run". After all this is | Hacker News. Such exercises, and engaging other hackers to pick | something out of there is how we progress(and get our tickles). | So, may be instead think about how one could tackle the | advertising/indexing needs in a similar fashion(could it be | done in just another server? 5 more servers?).. | kierank wrote: | This is as realistic as the moon rocket in my back garden. | fleddr wrote: | "Through intense digging I found a researcher who left a notebook | public including tweet counts from many years of Twitter's 10% | sampled "Decahose" API and discovered the surprising fact that | tweet rate today is around the same as or lower than 2013! Tweet | rate peaked in 2014 and then declined before reaching new peaks | in the pandemic. Elon recently tweeted the same 500M/day number | which matches the Decahose notebook and 2013 blog post, so this | seems to be true! Twitter's active users grew the whole time so I | think this reflects a shift from a "posting about your life to | your friends" platform to an algorithmic content-consumption | platform." | | I know it's not the core premise of the article, but this is very | interesting. | | I believe that 90% of tweets per day are retweets, which supports | the author's conclusion that Twitter is largely about reading and | amplifying others. | | That would leave 50 million "original" tweets per day, which you | should probably separate as main tweets and reply tweets. Then | there's bots and hardcore tweeters tweeting many times per day, | and you'll end up with a very sobering number of actual unique | tweeters writing original tweets. | | I'd say that number would be somewhere in the single digit | millions of people. Most of these tweets get zero engagement. | It's easy to verify this yourself. Just open up a bunch of rando | profiles in a thread and you'll notice a pattern. A symmetrical | amount of followers and following typically in the range of | 20-200. Individual tweets get no likes, no retweets, no replies, | nothing. Literally tweeting into the void. | | If you'd take away the zero engagement tweets, you'll arrive at | what Twitter really is. A cultural network. Not a social network. | Not a network of participation. A network of cultural influencers | consisting of journalists, politicians, celebrities, companies | and a few witty ones that got lucky. That's all it is: some tens | of thousands of people tweeting and the rest leeching and | responding to it. | | You could argue that is true for every social network, but I just | think it's nowhere this extreme. Twitter is also the only | "social" network that failed to (exponentially) grow in a period | that you might as well consider the golden age of social | networks. A spectacular failure. | | Musk bought garbage for top dollar. The interesting dynamic is | that many Twitter top dogs have an inflated status that cannot be | replicated elsewhere. They're kind of stuck. They achieved their | status with hot take dunks on others, but that tactic doesn't | really work on any other social network. | yodsanklai wrote: | > Musk bought garbage for top dollar | | Totally out of topic here, but could be he just wants the | ability to amplify his own ideas. Also, why measure Twitter | value (arbitrarily?) by number of unique tweets, rather than by | read tweets? ___________________________________________________________________ (page generated 2023-01-07 23:00 UTC)