hngopher.com

       [HN Gopher] Production Twitter on one machine? 100Gbps NICs and ...
       ___________________________________________________________________
        
       Production Twitter on one machine? 100Gbps NICs and NVMe are fast
        
       Author : trishume
       Score  : 245 points
       Date   : 2023-01-07 18:46 UTC (4 hours ago)
        
 (HTM) web link (thume.ca)
 (TXT) w3m dump (thume.ca)
        
       | agilob wrote:
       | The title reminded me about this
       | https://www.phoronix.com/news/Netflix-NUMA-FreeBSD-Optimized
       | (2019) and this 2 years later:
       | https://papers.freebsd.org/2021/eurobsdcon/gallatin-netflix-...
       | (2021)
        
         | syoc wrote:
         | Latest version: http://nabstreamingsummit.com/wp-
         | content/uploads/2022/05/202... (2022)
        
       | PragmaticPulp wrote:
       | Very cool exercise. I enjoyed reading it.
       | 
       | I see a lot of comments here assuming that this proves something
       | about Twitter being inefficient. Before you jump to conclusions,
       | take a look at the author's code:
       | https://github.com/trishume/twitterperf
       | 
       | Notably absent are things like _serving HTTP_ , not to even
       | mention HTTPS. This was a fun exercise in algorithms, I/O, and
       | benchmarking. It wasn't actually imitating anything that
       | resembles actual Twitter or even a usable website.
        
         | trishume wrote:
         | Which I think I'm perfectly clear about in the blog post. The
         | post is mostly about napkin math systems analysis, which does
         | cover HTTP and HTTPS.
         | 
         | I'm now somewhat confident I could implement this if I tried,
         | but it would take many years, the prototype and math is to
         | check whether there's anything that would stop me if I tried
         | and be a fun blog post about what systems are capable of.
         | 
         | I've worked on a team building a system to handle millions of
         | messages per second per machine, and spending weeks doing math
         | and building performance prototypes like this is exactly what
         | we did before we built it for real.
        
       | PerilousD wrote:
       | [flagged]
        
       | jeffbee wrote:
       | I like this kind of exercise. One thing I am not seeing is
       | analytics, logs and so forth that as I understand it are
       | significant portions of Twitter's production cost story.
        
         | tluyben2 wrote:
         | Anyone have a complete list of functional blocks that form
         | Twitter? Beyond the obvious and what we see?
        
           | Marazan wrote:
           | You need the blocks for the obvious for what we see because
           | it is not necessarily obvious to everyone.
           | 
           | Over the last couple of months I've seen comments that
           | summarise Twitter as a read-only service that doesn't have
           | any real time posting requirements and similarly other
           | comments that treat it as a write-only service with no real
           | time read / fast search requirements.
           | 
           | Without _all_ the blocks even the simple surface level
           | Twitter will have complexity people miss.
        
         | lazyasciiart wrote:
         | If it's this cheap to run you don't need analytics because you
         | don't need to monetize it, and if it's this simple you don't
         | need logs because it'll all work correctly the first time!
        
           | kevingadd wrote:
           | "You don't need to monetize it" who's going to fund your
           | Twitter-as-a-charity? What happens when the free money goes
           | away? Businesses have to pay the bills eventually one way or
           | another, you need to plan for that in advance
        
       | throwmeup123 wrote:
       | The title is highly misleading for some theoretical
       | "exploration".
        
         | dang wrote:
         | Ok, we've put a question mark up there to make it more
         | explorationy.
        
           | trishume wrote:
           | As the author, this sounds good to me! I'll probably even
           | change the actual title to match. I originally was going to
           | make it a question mark and the only reason I didn't is https
           | ://en.wikipedia.org/wiki/Betteridge%27s_law_of_headline...
           | when I think the answer is probably "could probably be
           | somewhat done" rather than "no".
        
             | dang wrote:
             | Well this may be the first time that's ever happened :)
             | 
             | Betteridge antiexamples are always welcome. I once tried to
             | joke that Mr. Betteridge had "retired" and promptly got
             | corrected about his employment status
             | (https://news.ycombinator.com/item?id=10393754).
        
       | jiggawatts wrote:
       | Something I've found a lot modern IT architects seem to ignore is
       | "write amplification" or the equivalent effect for reads.
       | 
       | If you have a 1 KB piece of data that you need to send to a
       | customer, ideally that should require _less_ than 1 KB of actual
       | NIC traffic thanks to HTTP compression.
       | 
       | If processing that 1 KB takes more than 1 KB of total NIC traffic
       | within and out of your data centre, the you have some level of
       | _amplification_.
       | 
       | Now, for writes, this is often unavoidable because redundancy is
       | pretty much mandatory for availability. Whenever there's a
       | transaction, an amplification factor of 2-3x is assumed for
       | replication, mirroring, or whatever.
       | 
       | For reads, good indexing and data structures within a few large
       | boxes (like in the article) can reduce the amplification to just
       | 2-3x as well. The request will likely need to go through a load
       | balancer of some sort, which amplifies it, but that's it.
       | 
       | So if you need to process, say, 10 Gbps of egress traffic, you
       | need a total of something like 30 Gbps at least, but 50 Gbps for
       | availability and handling of peaks.
       | 
       | What happens in places like Twitter is that they go _crazy_ with
       | the microservices. Every service, every load balancer, every
       | firewall, proxy, envoy, NAT, firewall, and gateway adds to the
       | multiplication factor. Typical Kubernetes or similar setups will
       | have a minimum NIC data amplification of 10x _on top of_ the 2-3x
       | required for replication.
       | 
       | Now _multiply_ that by the crazy inefficient JSON-based
       | protocols, the GraphQL, an the other insanity layered on to
       | "modern" development practices.
       | 
       | This is how you end up serving 10 Gbps of egress traffic with
       | _terabits_ of internal communications. This is how Twitter
       | apparently  "needs" 24 million vCPUs to host _text chat_.
       | 
       | Oh, sorry... text chat with the occasional postage-stamp-sized,
       | potato quality static JPG image.
        
       | thriftwy wrote:
       | I remember Stack Overflow running on a single Windows Server box
       | and mocking fellow LAMP developers with their propensity towards
       | having dozens of VMs to same effect.
       | 
       | That was some time ago, though.
        
       | varunkmohan wrote:
       | Good analysis. Obviously, this doesn't handle cases like
       | redundancy and doesn't handle some of other critical workloads
       | the company has. However, it does show how much real compute
       | bloat these companies actually have -
       | https://twitter.com/petrillic/status/1593686223717269504 where
       | they use 24 million vcpus and spend 300 million a month on cloud.
        
         | judge2020 wrote:
         | On the other hand, Twitter does (or did) handle over 450
         | million monthly active users (based on stats websites), with a
         | target for 315 monetizable daily active users (based on their
         | earnings calls pre-privatization). Handling that amount of
         | concurrency and beaming millions of tweets a day to home feeds
         | and notifications is going to be logistically hard.
        
           | WJW wrote:
           | Is that 315 million monetizable DAUs? That sounds like a lot
           | if the total is only 450 MAU. OTOH, 315k DAU seems like it
           | wouldn't be enough to pay the bills.
        
             | judge2020 wrote:
             | There were some quarters with profit, some without; the
             | past few years were mostly without IIRC.
             | 
             | They were targeting 315 mDAUs for Q4 2023, but in the final
             | earnings it was only 238 mDAUs. Actual MAU stats weren't
             | public iirc but some random stats sites seemed to say 450m
             | global MAUs, which likely includes people with no ad
             | preferences or who only view NSFW content (which can't be
             | shown next to (most?) ads).
             | 
             | https://www.forbes.com/sites/johnkoetsier/2022/11/14/twitte
             | r...
        
           | varunkmohan wrote:
           | Posted this on a comment above but systems like Whatsapp
           | likely sent an insane amount of data as well but used only 16
           | servers over 1.5 billion users at time of acquisition. Modern
           | NICs can handle millions of requests a second - I still feel
           | there is a lot of excess here.
        
             | veec_cas_tant wrote:
             | Feels like the comparison is irrelevant. I'm guessing
             | WhatsApp would have infrastructure challenges if all of
             | their chats were group messages including the entirety of
             | their user base, search, moderation, ranking, ads, etc.
             | Isn't WhatsApp more comparable to only DMs?
        
         | PragmaticPulp wrote:
         | > However, it does show how much real compute bloat these
         | companies actually have
         | 
         | No, it doesn't. It's a fun exercise in approaching Twitter as
         | an academic exercise. It ignores all of the real-world
         | functionality that makes it a business rather than a toy.
         | 
         | A lot of complicated businesses are easy to prototype out if
         | you discard all requirements other than the core feature. In
         | the real world, more engineering work often goes to ancillary
         | features that you never see as an end user.
        
           | varunkmohan wrote:
           | Genuinely asking, why do you think Twitter needs 24 million
           | vcpus to run?
           | 
           | This is not apples to apples but Whatsapp is a product that
           | entirely ran on 16 servers at the time of acquisition (1.5
           | billion users). It really begs the question why Twitter uses
           | so much compute if there are companies that have operated
           | significantly more efficiently. Twitter was unprofitable
           | during acquisition and spent around half their revenue on
           | compute, maybe some of these features were not really
           | necessary (but were just burning money)?
        
             | ignoramous wrote:
             | > _This is not apples to apples but Whatsapp is a product
             | that entirely ran on 16 servers at the time of acquisition
             | (1.5 billion users)._
             | 
             | - 450m DAUs at the time of facebook acquisition [0]
             | 
             | - Twitter is not just DMs or Group Chat.
             | 
             | > _It really begs the question why Twitter uses so much
             | compute if there are companies that have operated
             | significantly more efficiently._
             | 
             | A fair comparision might have been Instagram: While Systrom
             | did run a relatively lean eng org, they never had to
             | monetize and got acquired before they got any bigger than
             | ~50m?
             | 
             | [0] https://www.sequoiacap.com/article/four-numbers-that-
             | explain...
        
             | no_way wrote:
             | Chat apps are mostly one on one interaction, it is much
             | harder run an open platform where every user can
             | potentially interact with every other user, not even
             | talking about search and how complex it gets. If Twitter is
             | bloated or not is a valid discussion, but comparison it to
             | WhatsApp is not.
        
             | xwdv wrote:
             | Whatsapp used Erlang.
        
             | Snoozus wrote:
             | Because the actual product is not showing people tweets but
             | to optimize who to show which ads based on their previous
             | interactions with the site. This is many orders of
             | magnitude harder.
        
             | danpalmer wrote:
             | Being E2E encrypted, WhatsApp can't do much with the
             | content, so it is much closer to a naive bitshuffler than
             | Twitter.
             | 
             | Twitter, while still not profitable (maybe it was in some
             | recent quarters?) was much closer to it, having all the
             | components necessary to form a reasonable ad business. For
             | ads, analytics is critical, plus all the ad serving, plus
             | it's a totally different scale of compute being many to
             | many rather than one to ~one.
        
             | peterhunt wrote:
             | The real answer is twofold:
             | 
             | 1. Lots of batch jobs. Sometimes it's unclear how much
             | value they produce / whether they're still used.
             | 
             | 2. Twitter probably made a mistake early on in taking a
             | fanout-on-write approach to populate feeds. This is super
             | expensive and necessitates a lot of additional
             | infrastructure. There is a good video about it here:
             | https://www.youtube.com/watch?v=WEgCjwyXvwc
        
             | mapme wrote:
             | They did not spend half their revenue on compute. It's more
             | like 20-25% for running data enters/staff for DCs. Check
             | their earnings report.
             | 
             | Whats app is not an applicable comparison because messages
             | and videos are stored on the client device. Better to look
             | at Pinterest and snap, which spend a lot on infra as well.
             | 
             | The issue is storage, ads, and ML to name a few. For
             | example, from 2015:
             | 
             | " Our Hadoop filesystems host over 300PB of data on tens of
             | thousands of servers. We scale HDFS by federating multiple
             | namespaces."
             | 
             | You can also see their hardware usage broken down by
             | service as put in their blog.
             | 
             | https://blog.twitter.com/engineering/en_us/topics/infrastru
             | c...
             | 
             | https://blog.twitter.com/engineering/en_us/a/2015/hadoop-
             | fil....
        
             | d23 wrote:
             | Presumably there's an entire data engineering / event
             | processing pipeline that's being used to track user
             | interactions at a fine grained level. These events are
             | going to be aggregated and munged by various teams for
             | things like business analytics, product / experiment
             | feature analysis, ad analysis, as well as machine learning
             | model feature development (just to name a few massive ones
             | off the top of my head). Each of these will vary in their
             | requirements of things like timeliness, how much compute is
             | necessary to do their work, what systems / frameworks are
             | used to do the aggregations or feature processing, and
             | tolerance to failure.
             | 
             | > This is not apples to apples but Whatsapp
             | 
             | And yeah, whatsapp isn't even close to an apt comparison.
             | It's a completely different business model with vastly
             | different engineering requirements.
             | 
             | Is Twitter bloated? Perhaps, but it's probably driven by
             | business reasons, not (just) because engineers just wanted
             | to make a bunch of toys and promo projects (though this
             | obviously always plays some role).
        
               | [deleted]
        
               | ricardobeat wrote:
               | And for the most part, this herculean effort is wasted.
               | Most people just want to see latest tweets from people
               | they follow. Everything else is fluff to manipulate
               | engagement metrics, pad resumes and attempt to turn
               | twitter into something it's users never wanted.
        
               | veec_cas_tant wrote:
               | Just guessing, but a lot of the resources are probably
               | devoted to making money for the business, not padding
               | resumes. Others have pointed it out, but showing tweets
               | doesn't generate revenue without additional
               | infrastructure.
        
               | sofixa wrote:
               | Most people probably follow more people than they're
               | capable of reading all the latest tweets of, so some sort
               | of ranking/prioritisation makes total sense. And Twitter
               | is ad funded, so they need to also show relevant ads
               | where it makes sense/money.
        
             | threadweaver34 wrote:
             | Whatsapp doesn't do ranking.
        
       | BeefWellington wrote:
       | I'm going to preface this criticism by saying that I think
       | exercises like this are fun in an architectural/prototyping code-
       | golf kinda way.
       | 
       | However, I think the author critically under-guesses the sizes of
       | things (even just for storage) by a reasonably substantial
       | amount. e.g.: Quote tweets do not go against the size limit of
       | the tweet field at Twitter. Likely they are embedding a tweet
       | reference in some manner or other in place of the text of the
       | quoted tweet itself but regardless a tweet takes up more than 280
       | unicode characters.
       | 
       | Also, nowhere in the article are hashtags mentioned. For a system
       | like this to work you need some indexing of hashtags so you
       | aren't doing a full scan of the entire tweet text of every tweet
       | anytime someone decides to search for #YOLO. The system as
       | proposed is missing a highly critical feature of the platform it
       | purports to emulate. I have no insider knowledge but I suspect
       | that index is maybe the second largest thing on disk on the
       | entire platform, apart from the tweets themselves.
        
         | trishume wrote:
         | Quote tweets I'd do as a reference and they'd basically have
         | the cost of loading 2 tweets instead of one, so increasing the
         | delivery rate by the fraction of tweets that are quote tweets.
         | 
         | Hashtags are a search feature and basically need the same
         | posting lists as for search, but if you only support hashtags
         | the posting lists are smaller. I already have an estimate
         | saying probably search wouldn't fit. But I think hashtag-only
         | search might fit, mainly because my impression is people doing
         | hashtag searches are a small fraction of traffic nowadays so
         | the main cost is disk, not sure though.
         | 
         | I did run the post by 5 ex-Twitter engineers and none of them
         | said any of my estimates were super wrong, mainly just brought
         | up additional features and things I didn't discuss (which I
         | edited into the post before publishing). Still possible that
         | they just didn't divulge or didn't know some number they knew
         | that I estimated very wrong.
        
           | mr90210 wrote:
           | > I did run the post by 5 ex-Twitter engineers and none of
           | them said any of my estimates were super wrong
           | 
           | Absence of evidence is not evidence of absence. That being
           | said, given that you have access to ex-Twitter engineers you
           | may want to fact check with them for accuracy purposes, or
           | just add a remark about this topic under the assumptions
           | section.
           | 
           | It's ok to assume as long as we document those.
           | 
           | Cheers
        
             | Aeolun wrote:
             | I'm not sure why you are asking him to do the thing you've
             | literally quoted him doing?
        
               | literallyroy wrote:
               | I believed he was saying the author should run his idea
               | of indexes on disk taking up a lot of space by the
               | engineers.
        
             | 867-5309 wrote:
             | >Absence of evidence is not evidence of absence.
             | 
             | >possible that they just didn't divulge
        
             | lelandfe wrote:
             | If an inspector reviews your house and finds no issues,
             | that is indeed evidence of absence.
        
       | drewg123 wrote:
       | How much bandwidth does Twitter use for images and videos? Less
       | than 1.4Tb/s globally? If so, we could probably fit that onto a
       | second machine. We can currently serve over 700Gb/s from a dual-
       | socket Milan based server[1]. I'm still waiting for hardware, but
       | assuming there are no new bottlenecks, that should directly scale
       | up to 1.4Tb/s with Genoa and ConnectX-7, given the IO pathways
       | are all at least twice the bandwidth of the previous generation.
       | 
       | There are storage size issues (like how big is their long tail;
       | quite large I'd imagine), but its a fun thing to think about.
       | 
       | [1] https://people.freebsd.org/~gallatin/talks/euro2022.pdf
        
         | cortesoft wrote:
         | It is way more than 1.4TBs a second globally.
        
           | xyzzy123 wrote:
           | I wonder how much is api traffic and how much is assets &
           | images.
        
             | koolba wrote:
             | I wonder how much of that is crypto spam bots replying to
             | each other.
        
         | JosephRedfern wrote:
         | I suppose that in practice you'd need to consider burst
         | bandwidth and not just 95/99 percentiles.
        
       | britneybitch wrote:
       | > colo cost + total server cost/(3 year) => $18,471/year
       | 
       | Meanwhile the company I just left was spending more than this for
       | dozens of kubernetes clusters on AWS before signing a single
       | customer. Sometimes I wonder what I'm still doing in this
       | industry.
        
         | ummonk wrote:
         | Was it more than the salary of even a single software engineer?
        
         | tluyben2 wrote:
         | Very often. I work with companies spending 10x as much on bad
         | (bizarrely complex; indeed kubernetes, lambda, gateway, rds
         | etc) setup and bad code on aws. Almost no traffic (b2b). Makes
         | no sense at all.
        
         | threeseed wrote:
         | If they were a startup like you suggest then it's possible they
         | were running on AWS credits.
         | 
         | You can get up to $100k and it's a big reason many startups go
         | in that direction.
         | 
         | Also $20k is nothing when you factor in developer time etc.
        
         | Existenceblinks wrote:
         | Techies in tech industry are basically eating the rich .. A lot
         | of buzzword to suck investment money in.
        
           | paulryanrogers wrote:
           | Or rather cloud providers are eating the rich. Techies are
           | carrying the plates.
        
         | [deleted]
        
         | traceroute66 wrote:
         | > spending more than this for dozens of kubernetes clusters on
         | AWS before signing a single customer
         | 
         | Yup.
         | 
         | Cloud is 21st century Nickel & Diming.
         | 
         | Sure _it sounds_ cheap, everything is priced in small sounding
         | cents per unit.
         | 
         | But then it _very quickly_ becomes a compounding vicious circle
         | ... a dozen different cloud tools, each charged at cents per
         | unit, those units often being measured in increments of
         | hours....next thing you know is your cloud bill has as many
         | zeros on the end of it as the number of cloud services you are
         | using. ;-)
         | 
         | And that's before we start talking about the data egress costs.
         | 
         | With colo you can start off with two 1/4 rack spaces at two
         | different sites for resilience. You can get quite a lot of bang
         | for your buck in a 1/4 rack with today's kit.
        
           | richwater wrote:
           | > You can get quite a lot of bang for your buck in a 1/4 rack
           | with today's kit.
           | 
           | Until very recently, while money was still very cheap, the
           | time overhead it would take to manage this just was not worth
           | the cost savings.
           | 
           | Even with the market falling out from under VC, I think it
           | still is a good tradeoff for many shops.
        
             | toast0 wrote:
             | > Until very recently, while money was still very cheap,
             | the time overhead it would take to manage this just was not
             | worth the cost savings.
             | 
             | You can also rent a whole server. There's not much
             | difference in time in managing a VM in a cloud or a whole
             | server you rent from someone. Depending on the vendor,
             | maybe some more setup time, since low end hosts don't
             | usually have great setup workflows, so maybe you need to
             | fiddle with the ipmi console once or twice to get it
             | started, but if you go with a higher tier provider, you can
             | fully automate everything if that floats your boat. It's
             | just bare metal rather than a VM, and typically much lower
             | cost for sustained usage (if you're really scaling up
             | signfigantly and down throughout the day, cloud costs can
             | work out less, although some vendors offer bare metal by
             | the hour, too)
        
       | Tepix wrote:
       | The new EPYC servers can be filled with 6TB of RAM and 96 cores
       | per socket. Fun times.
        
       | wonnage wrote:
       | This doesn't seem to support fetching a specific tweet by id?
        
       | jasonhansel wrote:
       | If you really wanted to run Twitter on one machine at any cost,
       | wouldn't an IBM mainframe be much more practical?
       | 
       | You can even run Linux on them now. The specs he cites would
       | actually be fairly small for a mainframe, which can reach up to
       | 40TB of memory.
       | 
       | I'm not saying this is a _good_ idea, but it seems better than
       | what the OP proposes.
        
         | wmf wrote:
         | No. A Genoa server is probably faster than a z16 and a
         | Superdome Flex is definitely faster.
        
         | bob1029 wrote:
         | If it's good enough for the payment card industry, I don't know
         | why it can't work for tweets. The amount of data per
         | transaction is very similar.
        
         | trishume wrote:
         | My friend mentioned this just before I published and I think
         | that probably is the fastest largest thing you can get which
         | would in some sense count as one machine. I haven't looked into
         | it, but I wouldn't be surprised if they could get around the
         | trickiest constraint, which is how many hard drives you can
         | plug in to a non-mainframe machine for historical image
         | storage. Definitely more expensive than just networking a few
         | standard machines though.
         | 
         | I also bet that mainframes have software solutions to a lot of
         | the multi-tenancy and fault tolerance challenges with running
         | systems on one machine that I mention.
        
           | jasonhansel wrote:
           | Incidentally, a lot of people have argued that the massive
           | datacenters used by e.g. AWS are effectively single large
           | ("warehouse-scale") computers. In a way, it seems that the
           | mainframe has been reinvented.
        
             | dekhn wrote:
             | I wouldn't really agree with this since those machines
             | don't share address spaces or directly attached busses.
             | Better to say it's a warehouse-scale "service" provided by
             | many machines which are aggregated in various ways.
        
               | sterlind wrote:
               | I wonder though.. _could_ you emulate a 20k-core VM with
               | 100 terabytes of RAM on a DC?
               | 
               | Ethernet is fast, you might be able to get in range of
               | DRAM access with an RDMA setup. cache coherency would
               | require some kind of crazy locking, but maybe you could
               | do it with FPGAs attached to the RDMA controllers that
               | implement something like Raft?
               | 
               | it'd be kind of pointless and crash the second any
               | machine in the cluster dies, but kind of a cool idea.
               | 
               | it'd be fun to see what Task Manager would make of it if
               | you could get it to last long enough to boot Windows.
        
             | sterlind wrote:
             | to me the line between machine and cluster is mostly about
             | real-time and fate-sharing. multiple cores on a single
             | machine can expect memory accesses to succeed, caches to be
             | coherent, interrupts to trigger within a deadline, clocks
             | not to skew, cores in a CPU not to drop out, etc.
             | 
             | in a cluster, communication isn't real-time. packets drop,
             | fetches fail, clocks skew, machines reboot.
             | 
             | IPC is a gray area. the remote process might die, its
             | threads might be preempted, etc. RTOSes make IPC work more
             | like a single machine, while regular OSes make IPC more
             | like a network call.
             | 
             | so to me, the datacenter-as-mainframe idea falls apart
             | because you need massive amounts of software infrastructure
             | to treat a cluster like a mainframe. you have to use Paxos
             | or Raft for serializing operations, you have to shard data
             | and handle failures, etc. etc.
             | 
             | but it's definitely getting closer, thanks to lots of
             | distributed systems engineering.
        
           | sayrer wrote:
           | It's a neat thought exercise, but wrong for so many reasons
           | (there are probably like 100s). Some jump out: spam/abuse
           | detection, ad relevance, open graph web previews, promoted
           | tweets that don't appear in author timelines, blocks/mutes,
           | etc. This program is what people think Twitter is, but
           | there's a lot more to it.
           | 
           | I think every big internet service uses user-space networking
           | where required, so that part isn't new.
        
             | trishume wrote:
             | I think I'm pretty careful to say that this is a simplified
             | version of Twitter. Of the features you list:
             | 
             | - spam detection: I agree this is a reasonably core feature
             | and a good point. I think you could fit something here but
             | you'd have to architect your entire spam detection approach
             | around being able to fit, which is a pretty tricky
             | constraint and probably would make it perform worse than a
             | less constrained solution. Similar to ML timelines.
             | 
             | - ad relevance: Not a core feature if your costs are low
             | enough. But see the ML estimates for how much throughput
             | A100s have at dot producting ML embeddings.
             | 
             | - web previews: I'd do this by making it the client's
             | responsibility. You'd lose trustworthiness though so users
             | with hacked clients could make troll web previews, they can
             | already do that for a site they control, but not a general
             | site.
             | 
             | - blocks/mutes: Not a concern for the main timeline other
             | than when using ML, when looking at replies will need to
             | fetch blocks/mutes and filter. Whether this costs too much
             | depends on how frequently people look at replies.
             | 
             | I'm fully aware that real Twitter has bajillions of
             | features that I don't investigate, and you couldn't fit all
             | of them on one machine. Many of them make up such a small
             | fraction of load that you could still fit them. Others do
             | indeed pose challenges, but ones similar to features I'd
             | already discussed.
        
               | sayrer wrote:
               | "web previews: I'd do this by making it the client's
               | responsibility."
               | 
               | Actually a good example of how difficult the problem is.
               | A very common attack is to switch a bit.ly link or
               | something like that to a malicious destination. You would
               | also DoS the hosts... as the Mastodon folks are
               | discovering (https://www.jwz.org/blog/2022/11/mastodon-
               | stampede/)
               | 
               | For blocks/mutes, you have to account for retweets and
               | quotes, it's just not a fun problem.
               | 
               | Shipping the product is much more difficult that what's
               | in your post. It's not realistic at all, but it is fun to
               | think about.
        
               | sayrer wrote:
               | Here are some pointers:
               | 
               | "Our approach to blocking links"
               | https://help.twitter.com/en/safety-and-security/phishing-
               | spa...
               | 
               | "The Infrastructure Behind Twitter: Scale" https://blog.t
               | witter.com/engineering/en_us/topics/infrastruc...
               | 
               | "Mux" https://twitter.github.io/finagle/guide/Protocols.h
               | tml#mux
               | 
               | I do agree that some of this could be done better a
               | decade later (like, using Rust for some things instead of
               | Scala), but it was all considered. A single machine is a
               | fun thing to think about, but not close to realistic. CPU
               | time was not usually the concern in designing these
               | systems.
        
           | mschuster91 wrote:
           | > I haven't looked into it, but I wouldn't be surprised if
           | they could get around the trickiest constraint, which is how
           | many hard drives you can plug in to a non-mainframe machine
           | for historical image storage.
           | 
           | Netapp is at something > 300TB storage per node IIRC, but in
           | any case it would make more sense to use some cloud service.
           | AWS EFS and S3 don't have any (practically reachable) limit
           | in size.
        
           | toast0 wrote:
           | > I wouldn't be surprised if they could get around the
           | trickiest constraint, which is how many hard drives you can
           | plug in to a non-mainframe machine for historical image
           | storage.
           | 
           | Some commodity machines use external SAS to connect to more
           | disk boxes. IMHO, there's not a real reason to keep images
           | and tweets on the same server if you're going to need an
           | external disk box anyway. Rather than getting a 4u server
           | with a lot of disks and a 4u additional disk box, you may as
           | well get 4u servers with a lot of disks each, use one for
           | tweets and the other for images. Anyway, images are fairly
           | easy to scale horizontally, there's not much simplicity
           | gained by having them all in one host, like there is for
           | tweets.
        
             | trishume wrote:
             | Yah like I say in the post, the exactly one machine thing
             | is just for fun and as an illustration of how far vertical
             | scaling can go, practically I'd definitely scale storage
             | with many sharded smaller storage servers.
        
           | jiggawatts wrote:
           | > which is how many hard drives you can plug in to a non-
           | mainframe machine for historical image storage.
           | 
           | You would be _surprised_. First off, SSDs are denser than
           | hard drives now if you 're willing to spend $$$.
           | 
           | Second, "plug in" doesn't necessarily mean "in the chassis".
           | You can expand storage with external disk arrays in all sorts
           | of ways. Everything from external PCI-e cages to SAS disk
           | arrays, fibre channel, NVMe-over-Ethernet, etc...
           | 
           | It's fairly easy to get several petabytes of fast storage
           | directly managed by one box. The only limit is the total
           | usable PCIe bandwidth of the CPUs, which for a current-gen
           | EPYC 9004 series processors in a dual-socket configuration is
           | something crazy like 512 GB/s. This vastly exceeds typical
           | NIC speeds. You'd have to balance available bandwidth between
           | _multiple_ 400 Gbps NICs and disks to be able to saturate the
           | system.
           | 
           | People really overestimate the data volume put out by a
           | service like Twitter while simultaneously underestimating the
           | bandwidth capability of a single server.
        
       | justapassenger wrote:
       | Saying this is production Twitter is like saying that rsync is a
       | Dropbox.
        
       | pengaru wrote:
       | This post reminds me of an experience I had in ~2005 while @
       | Hostway.
       | 
       | Unsolicited story time:
       | 
       | Prior to my joining the company Hostway had transitioned from
       | handling all email in a dispersed fashion across the shared
       | hosting Linux boxes with sendmail et al, to a centralized
       | "cluster" having disparate horizontally-scaled slices of edge-
       | SMTP servers, delivery servers, POP3 servers, IMAP servers, and
       | spam scanners. That seemed to be their scaling plan anyways.
       | 
       | In the middle of this cluster sat a refrigerator sized EMC
       | fileserver for storing the Maildirs. I forget the exact model,
       | but it was quite expensive and exotic for the time, especially
       | for an otherwise run of the mill commodity-PC based hosting
       | company. It was a big shiny expensive black box, and everyone
       | involved seemed to assume it would Just Work and they could keep
       | adding more edge-SMTP/POP/IMAP or delivery servers if those
       | respective services became resource constrained.
       | 
       | At some point a pile of additional customers were migrated into
       | this cluster, through an acquisition if memory serves, and things
       | started getting slow/unstable. So they go add more machines to
       | the cluster, and the situation just gets worse.
       | 
       | Eventually it got to where every Monday was known as Monday
       | Morning Mail Madness, because all weekend nobody would read their
       | mail. Then come Monday, there's this big accumulation of new
       | unread messages that now needs to be downloaded and either
       | archived or deleted.
       | 
       | The more servers they added the more NFS clients they added, and
       | this just increased the ops/sec experienced at the EMC. Instead
       | of improving things they were basically DDoSing their overpriced
       | NFS server by trying to shove more iops down its throat at once.
       | 
       | Furthermore, by executing delivery and POP3+IMAP services on
       | separate machines, they were preventing any sharing of buffer
       | caches across these embarrassingly cache-friendly when colocated
       | services. When the delivery servers wrote emails through to the
       | EMC, the emails were also hanging around locally in RAM, and
       | these machines had several gigabytes of RAM - only to _never_ be
       | read from. Then when customers would check their mail, the POP3
       | /IMAP servers _always_ needed to hit the EMC to access new
       | messages, data that was _probably_ sitting uselessly in a
       | delivery server 's RAM somewhere.
       | 
       | None of this was under my team's purview at the time, but when
       | the castle is burning down every Monday, it becomes an all hands
       | on deck situation.
       | 
       | When I ran the rough numbers of what was actually being performed
       | in terms of the amount of real data being delivered and
       | retrieved, it was a trivial amount for a moderately beefy PC to
       | handle at the time.
       | 
       | So it seemed like the obvious thing to do was simply colocate the
       | primary services accessing the EMC so they could actually profit
       | from the buffer cache, and shut off most of the cluster. At the
       | time this was POP3 and delivery (smtpd), luckily IMAP hadn't
       | taken off yet.
       | 
       | The main barrier to doing this all with one machine was the
       | amount of RAM required, because all the services were built upon
       | classical UNIX style multi-process implementations (courier-pop
       | and courier-smtp IIRC). So in essence the main reason most of
       | this cluster existed was just to have enough RAM for running
       | multiprocess POP and SMTP sessions.
       | 
       | What followed was a kamikaze-style developed-in-production
       | conversion of courier-pop and courier-smtp to use pthreads
       | instead of processes by yours truly. After a week or so of
       | sleepless nights we had all the cluster's POP3 and delivery
       | running on a single box with a hot spare. Within a month or so
       | IIRC we had powered down most of the cluster, leaving just spam
       | scanning and edge-SMTP stuff for horizontal scaling, since it
       | didn't touch the EMC. Eventually even the EMC was powered down,
       | in favor of drbd+nfs on more commodity linux boxes w/coraid.
       | 
       | According to my old notes it was a Dell 2850 w/8GB we ended up
       | with for the POP3+delivery server and identical hot spare,
       | replacing _racks_ of comparable machines just with less RAM.
       | >300,000 email accounts.
        
       | ricardobeat wrote:
       | > super high performance tiering RAM+NVMe buffer managers which
       | can access the RAM-cached pages almost as fast as a normal memory
       | access are mostly only detailed and benchmarked in academic
       | papers
       | 
       | Isn't this exactly what modern key value stores like RocksDB,
       | LMDB etc are built for?
        
       | kureikain wrote:
       | Not to the extreme of fitting everything into one machine but I
       | have explorer the idea of separate stateless workload into its
       | own machine.
       | 
       | However, the stateless workload can still operate in a read-only
       | manner if the stateful component failed.
       | 
       | I run an email forwarding service[1], and one of challenge is how
       | can I ensure the email forwarding still work even if my primary
       | database failed.
       | 
       | And I come up with a design that the app boot up, and load entire
       | routing data from my postgres into its memory data structure, and
       | persisted to local storage. So if postgres datbase failed, as
       | long as I have an instance of those app(which I can run as many
       | as I can), the system continue to work for existing customer.
       | 
       | The app use listen/notify to load new data from postgres into its
       | memory.
       | 
       | Not exactly the same concept as the artcile, but the idea is that
       | we try to design the system in a way where it can operate fully
       | on a single machine. Another cool thing is that it easiser to
       | test this, instead of loading data from Postgres, it can load
       | from config files, so essentially the core biz logic is isolated
       | into a single machine.
       | 
       | ---
       | 
       | https://mailwip.com
        
       | z3t4 wrote:
       | A Twitter clone could probably run in a teenagers closet, but not
       | after it has been iterated by 10000 monkeys.
        
       | morphle wrote:
       | Why not a single FPGA with 100Gbps ethernet or pcie with NVM
       | attached? Around $5K for the hardware and $5K for the traffic per
       | month. The software would be a bit trickier to write, but you now
       | get 100x performance for the same price
        
         | tluyben2 wrote:
         | That would be quite a nice project for fun.
        
         | mpoteat wrote:
         | Let's spend multi-million dollars a year on a team of highly
         | specialized FPGA engineers writing assembly and HDL so that we
         | can save 5k a month. Feature velocity will be 100x slower as
         | well, but at least our application is efficient.
         | 
         | I think that this may make sense for some applications, but I
         | also think that if you can utilize software abstractions to
         | improve developer efficiency, it reduces risk in the long run.
        
         | tylerhou wrote:
         | A bit trickier is a huge understatement.
        
       | sethev wrote:
       | John Carmack tweeted something that made me noodle on this too:
       | 
       | >It is amusing to consider how much of the world you could serve
       | something like Twitter to from a single beefy server if it really
       | was just shuffling tweet sized buffers to network offload cards.
       | Smart clients instead of web pages could make a very large
       | difference. [1]
       | 
       | Very interesting to see the idea worked out in more detail.
       | 
       | [1] https://twitter.com/id_aa_carmack/status/1350672098029694998
        
         | varjag wrote:
         | Isn't that what an OPA sorta kinda does.
        
         | threeseed wrote:
         | > just shuffling tweet sized buffers to network offload cards
         | 
         | Except that's not what it is doing at all.
         | 
         | It assembles all the Tweets internally, applies an ML model to
         | produce a finalised response to the user.
        
           | seritools wrote:
           | > if it really was [which it isn't]
        
       | andrewstuart wrote:
       | Most projects I encounter these days instantly reach for
       | kubernetes, containers and microservices or cloud functions.
       | 
       | I find it much more appealing to just make the whole thing run on
       | one fast machine. When you suggest this tend to people say "but
       | scaling!", without understanding how much capacity there is in
       | vertical.
       | 
       | The thing most appealing about single server configs is the
       | simplicity. The more simple a system easy, likely the more
       | reliable and easy to understand.
       | 
       | The software thing most people are building these days can easily
       | run lock stock and barrel on one machine.
       | 
       | I wrote a prototype for an in-memory message queue in Rust and
       | ran it on the fastest EC2 instance I could and it was able to
       | process nearly 8 million messages a second.
       | 
       | You could be forgiven for believing the only way to write
       | software is is a giant kablooie of containers, microservices,
       | cloud functions and kubernetes, because that's what the cloud
       | vendors want you to do, and it's also because it seems to be the
       | primary approach discussed. Every layer of such stuff add
       | complexity, development, devops, maintenance, support,
       | deployment, testing and (un)reliability. Single server systems
       | can be dramatically mnore simple because you can trim is as close
       | as possible down to just the code and the storage.
        
         | traceroute66 wrote:
         | > I find it much more appealing to just make the whole thing
         | run on one fast machine.
         | 
         | Indeed.
         | 
         | Lots of examples out there, one being Let's Encrypt[1] who run
         | off one MySQL server (with a few read replicas but only one
         | write).
         | 
         | [1] https://letsencrypt.org/2021/01/21/next-gen-database-
         | servers...
        
           | anamexis wrote:
           | That doesn't really seem like an example, since the whole
           | thing doesn't run one machine. The database alone has
           | multiple machines.
        
             | traceroute66 wrote:
             | > That doesn't really seem like an example, since the whole
             | thing doesn't run one machine.
             | 
             | It is an example. It shows you how you can run a service
             | that issues a few hundred million SSL certs a year off
             | relatively few pieces of hardware, i.e. no need to go
             | drinking the cloud Kool aid.
             | 
             | There will never be a "perfect" example. The overall point
             | here is demonstrating that the first answer to everything
             | doesn't have to include the word "cloud".
             | 
             | > The database alone has multiple machines.
             | 
             | As I said, and the blog says ... there is only one writer.
             | The other nodes are smaller read replicas.
             | 
             | Which again shows you don't need to go with the cloud
             | buzzword-filled database services.
        
               | erik wrote:
               | Last I read, Hacker News was still running on one big
               | machine. And still uses text files as its database.
        
               | threeseed wrote:
               | Twitter runs ads and generates billions in revenue.
               | 
               | It can't just tolerate being down or having under-load
               | issues like HN often is.
        
               | threeseed wrote:
               | > there is only one writer
               | 
               | And what happens if that writer goes down. Then the
               | service just stops.
               | 
               | > buzzword-filled
               | 
               | I love how your buzzwords e.g. read replicas are okay but
               | everyone else's are bad.
        
         | tambourine_man wrote:
         | >... without understanding how much capacity there is in
         | horizontal.
         | 
         | I think you mean vertical, right?
        
           | andrewstuart wrote:
           | Ha ha yes I do! (corrected)
        
             | tambourine_man wrote:
             | Freudian slip :)
        
         | vasco wrote:
         | Kubernetes is useful if you have many teams working on things
         | in parallel and you want them to deploy in similar ways to not
         | have to reinvent the same wheel in 5 different ways by 5
         | different teams. If you don't have multiple teams, you don't
         | need it.
        
           | jupp0r wrote:
           | It's also useful if you want your app to update without being
           | down, which even a single team might want to do.
        
             | dinosaurdynasty wrote:
             | You don't need k8s for that, teams have been doing that for
             | decades before k8s was ever a thing
        
               | jupp0r wrote:
               | Sure, but whatever they built themselves to accomplish
               | this is also complicated. I know because I have built
               | such systems (and replaced them by k8s).
        
               | erulabs wrote:
               | Hah, exactly. It's not that you can't accomplish all the
               | same things as k8s with your own bash scripts - it's that
               | k8s exists _to replace all your custom bash scripts_!
        
               | jupp0r wrote:
               | But you can still land on top of HN with a "I replaced
               | kubernetes with this 500 line bash script without unit
               | tests" blog article ;)
        
         | vbezhenar wrote:
         | Projects are optimized to be developed by so-called ordinary
         | developers.
         | 
         | We have python service which consumes gigabytes of RAM for
         | quite simple task. I'm sure that I'd rewrite it with Rust to
         | consume tens of megabytes of RAM at most. Probably even less.
         | 
         | But I don't have time for that, there are more important things
         | to consider and gigabytes is not that bad. Especially when you
         | have some hardware elasticity with cloud resources.
         | 
         | I think that if you can develop world-scale twitter which could
         | run on a single computer, that's a great skill. But it's a rare
         | skill. It's safer to develop world-scale twitter which will run
         | on Kubernetes and will not require rare developer skills.
        
         | yodsanklai wrote:
         | > The thing most appealing about single server configs is the
         | simplicity. The more simple a system easy, likely the more
         | reliable and easy to understand.
         | 
         | What if your unique machine crash?
        
           | readonlybarbie wrote:
           | [dead]
        
           | andrewstuart wrote:
           | Well you gotta have a backup strategy. I'm talking about the
           | primary machine here, I assumed that would be obvious but
           | maybe not. You build your failover strategy into your
           | architecture - there's lots of ways to do it - I use Postgres
           | so I would favor something based around log shipping.
        
             | ffssffss wrote:
             | And uptime is important, so you want to have that secondary
             | running and ready, with a proxy in front of everything so
             | you can switch as soon as you detect a failure. That's
             | three hosts, plus your alerting has to be separate too, so
             | that's four. Now, to orchestrate all this, we'll first get
             | out Puppet...
        
               | toast0 wrote:
               | If you're going from one machine to two, and you add an
               | automatically failover mechansim, chances are your load
               | switching mechanism is going to cause more downtime than
               | just running from your single machine, and manually
               | switching on failure (after being paged).
        
           | pclmulqdq wrote:
           | You spin up a backup as the new "unique machine."
        
           | fbdab103 wrote:
           | I think you should always plan for failures, but modern
           | enterprise hardware is quite reliable. I would even posit
           | that if you stood up a brand new physical server today, it
           | has a good chance of beating AWS uptime (well, not the AWS
           | dashboard numbers) over a one year period.
        
             | jupp0r wrote:
             | "hardware is quite reliable" is not a valid strategy.
             | Hardware fails with some non-zero probability. You need to
             | have a plan in place what to do if that happens, taking
             | into account service disruption, backups etc.
             | 
             | Having a system in place that handles most of this
             | gracefully (like kubernetes) is one way of having such a
             | plan, there are others. Which one works best is dependent
             | on your app, cost of downtime, your team that's tasked with
             | bringing everything back up in the middle if the night,
             | etc.
             | 
             | People who leave details like this out when they say
             | "kubernetes is complicated" just haven't seen the
             | complexities of operating a service well.
        
               | fbdab103 wrote:
               | My first sentence was to always plan for failures.
        
               | jupp0r wrote:
               | Yeah sorry it wasn't meant to dispute your argument, just
               | as an addition.
        
             | pixl97 wrote:
             | It doesn't matter if hardware is 99.99% reliable if you're
             | the .01% that day
        
         | yazzku wrote:
         | If I remember correctly, Lichess runs on a single server.
        
         | brightball wrote:
         | I tend to agree on simplicity. Really just depends on whether
         | you can tolerate downtime for either outages or deployments.
         | 
         | As soon as you start accounting for redundancy you have to fan
         | out anyway.
        
         | kbumsik wrote:
         | > The thing most appealing about single server configs is the
         | simplicity.
         | 
         | In my experience this ended up with more complicated.
         | 
         | Those systems are typically developed by people who already
         | left and are undocumented, and they become extremely difficult
         | to figure out the config (packages, etc files... oh, where even
         | the service files are located?) and almost impossible to
         | reproduce.
         | 
         | It might be okay to leave it there, but when we need to modify
         | or troubleshoot the system a nightmare begins...
         | 
         | Maybe I was just unlucky, but at least k8s configs are more
         | organized and simpler than dealing with a whole custom
         | configured Linux system.
        
         | Thaxll wrote:
         | Because OP example is very simplistic and left on the table
         | very important details, you would base 250M on a single
         | machine? What about backups, obervability, how do you update
         | that stack without bringing down everything ... Also this is
         | napkin maths, this could be off by 10 or 100x which would
         | change everything.
         | 
         | It's very simple to make a PoC on a very powerful machine, make
         | it ready from production serving hunderd of millions of users
         | is completely different.
        
           | raverbashing wrote:
           | > What about backups
           | 
           | Several ways of doing this without relying on k8s
           | 
           | > observability
           | 
           | This doesn't require k8s neither and it's more on your app.
           | Systemd can restart systems by itself
           | 
           | > how do you update that stack without bringing down
           | everything
           | 
           | That's probably where redundancy helps the most. I wouldn't
           | run a big service without it (but again it found be at server
           | level)
        
             | threeseed wrote:
             | > but again it found be at server level
             | 
             | Can you educate us on how to have a resilient app with no-
             | downtime updates at the server level.
             | 
             | Because if you're doing this via software then it's no
             | different to Kubernetes.
        
           | PragmaticPulp wrote:
           | It's worth noting that the author's example doesn't do
           | anything like HTTP. It was purely an algorithmic benchmark.
           | 
           | Nobody should be looking at this and thinking that it's
           | realistic to actually serve a _functional website_ at this
           | scale on a single machine with actual real world
           | requirements.
        
           | mpoteat wrote:
           | As well, you should be creating regional servers to minimize
           | latency for folks in other geographic regions. Can't beat c!
        
           | sitkack wrote:
           | It is a BoE system design. How is it off by 100x?
        
           | jupp0r wrote:
           | In addition, you should worry about what happens to your app
           | if a hardware error, network problem or natural disaster
           | makes your machine unavailable.
        
             | eddsh1994 wrote:
             | Split the DB from the app and replicate with a load
             | balancer?
        
               | threeseed wrote:
               | That database is going to need to be clustered as well
               | for resiliency.
               | 
               | Sounds like you already have quite a number of different
               | containers already.
        
               | jupp0r wrote:
               | Now you have "Production Twitter on three machines".
               | Follow the rabbit and you end up with something like what
               | Twitter looks like today.
        
             | pixl97 wrote:
             | Heh, also praying that everything stays in the fast path.
             | If a small portion of the workload uses a higher portion of
             | machine resources then the moment an attacker figures it
             | out they have a great way of DDOSing your service
             | resources.
        
         | [deleted]
        
         | fifilura wrote:
         | I dont think k8s is what you are shooting at but the IPC that
         | is required to run a set of microservices.
         | 
         | Like kafka.
         | 
         | My impression is that it is the serialisation that comes with
         | each service-to-service communication that is really expensive.
        
         | judge2020 wrote:
         | Kubernetes and containers are a means to service architecture;
         | It enabled scalability but does not require it. You should
         | still be containerizing your applications to ensure a
         | consistent environment, even if you only throw it in a docker-
         | compose file on your production server.
        
           | ownagefool wrote:
           | So the goal is basically being able to do builds whilst
           | running as few setup steps as possible.
           | 
           | Containers are a good common denominator because you
           | essentially start with the OS, and then there's a file that
           | automates installing further dependencies and building the
           | artifact, which typically includes the important parts of the
           | runtime environment.                 - They're stupidly
           | popular, so it basically nullifies the setup steps.       -
           | Once setup, they by combinding both OS layers and App, they
           | solve more of the problem and are therefore slightly more
           | reliable.       - They're self-documenting as long as you
           | understand bash, docker, and don't do weird shit like build
           | an undocumented intermediary layer.
           | 
           | Infrastructure as Code does the same thing for the underlying
           | infra layers and kuberenetes is one of the nicer / quicker
           | implementations of this, but requires you have kubernetes
           | available.
           | 
           | Together they largely solve the "works on my PC" problem.
        
           | KronisLV wrote:
           | > You should still be containerizing your applications to
           | ensure a consistent environment, even if you only throw it in
           | a docker-compose file on your production server.
           | 
           | I'll say that this is a good point, especially because if you
           | don't use containers or a similar solution (even things like
           | shipping VM images, for all I care), you'll end up with
           | environment drift, unless your application is a statically
           | compiled executable with no system dependencies, like a
           | JDK/.NET/Python/Ruby runtime or worse yet, an application
           | server like Tomcat, all of which can have different versions.
           | Worse yet, if you need to install packages on the system, for
           | which you haven't pinned specific versions (e.g. needing
           | something that's installed through apt/yum, rather than
           | package.json or Gemfile, or requirements.txt and so on).
           | 
           | That said, even when you don't use containers, you can still
           | benefit from some pretty nice suggestions that will help make
           | the software you develop easier to manage and run:
           | https://12factor.net/
           | 
           | I'd also suggest that you have a single mechanism for
           | managing everything that you need to run, so if it's not
           | containers and an orchestrator of some sort, at least write
           | systemd services or an equivalent for every process or group
           | of processes that should be running.
           | 
           | Disclaimer: I still think that containers are a good idea,
           | just because of how much of a dumpsterfire managing different
           | OSes, their packages, language runtimes, application
           | dependencies, application executables, port mappings,
           | application resource limits, configuration, logging and other
           | aspects is. Kubernetes, perhaps a bit less so, although when
           | it works, it gets the job done... passably. Then again,
           | Docker Swarm to me felt better for smaller deployments (a
           | better fit for what you want to do vs the resources you
           | have), whereas Nomad was also pretty nice, even if HCL sadly
           | doesn't use the Docker Compose specification.
        
             | vbezhenar wrote:
             | When it comes to Java, everything could be used as a
             | directory installation. Like you need JDK, maven and
             | tomcat? Download and extract it somewhere. Modify your
             | current PATH to include java and that's about it. You can
             | build big tar.gz instead of OCI container which will work
             | just fine.
             | 
             | So IMO it's perfectly possible to run Java applications
             | without containers. You would need to think about network
             | ports, about resource limits, but those are not hard
             | things.
             | 
             | And tomcat even provides zero-downtime upgrades, although
             | it's not that easy to set up, but when it works, it does
             | work.
             | 
             | After I've got some experience with Kubernetes, I'd reach
             | for it always because it's very simple and easy to use. But
             | that requires to go through some learning curve, for sure.
             | 
             | The best and unbeatable thing about containers is that
             | there're plenty of ready ones. I have no idea how would I
             | install postgres without apt. I guess I could download
             | binaries (where?), put them somewhere, read docs, craft
             | config file with data dir pointing to anotherwere and so
             | on. That should be doable but that's time. I can docker run
             | it in seconds and that's saved time. Another example is
             | ingress-nginx + cert-manager. It would take hours if not
             | days from me to craft set of scripts and configs to
             | replicate thing which is available almost out of the box in
             | k8s, well tested and just works.
        
               | KronisLV wrote:
               | > When it comes to Java, everything could be used as a
               | directory installation. Like you need JDK, maven and
               | tomcat? Download and extract it somewhere. Modify your
               | current PATH to include java and that's about it. You can
               | build big tar.gz instead of OCI container which will work
               | just fine.
               | 
               | I've seen something similar in projects previously, it
               | never worked all that well.
               | 
               | While the idea of shipping one archive with _everything_
               | is pretty good, people don 't want to include the full
               | JDK and Tomcat installs with each software delivery,
               | unlike with containers, where you get _some_ benefit out
               | of layer re-use when they haven 't changed, while having
               | the confidence that what you tested is what you'll ship.
               | Shipping 100 app versions with the same JDK + Tomcat
               | version will mean reused layers instead of 100 copies in
               | the archives. And if you _don 't_ ship everything
               | together, but merely suggest that release X should run on
               | JDK version Y, the possibility of someone not following
               | those instructions at least once approaches 100% with
               | every next release.
               | 
               | Furthermore, Tomcat typically will need custom
               | configuration for the app server, as well as
               | configuration for the actual apps. This means that you'd
               | need to store the configuration in a bunch of separate
               | files and then apply (copy) it on top of the newly
               | delivered version. But you can't really do that directly,
               | so you'd need to use something like Meld to compare
               | whether the newly shipped default configuration doesn't
               | include something that your old custom configuration
               | doesn't (e.g. something new in web.xml or server.xml).
               | The same applies to something like cacerts within your
               | JDK install, if you haven't bothered to set up custom
               | files separately.
               | 
               | Worse yet, if people aren't really disciplined about all
               | of this, you'll end up with configuration drift over time
               | - where your dev environment will have configuration A,
               | your test environment will have configuration B (which
               | will _sort of_ be like A), and staging or prod will have
               | something else. You 'll be able to ignore some of those
               | differences until everything will go horribly wrong one
               | day, or maybe you'll get degraded performance but without
               | a clear reason for it.
               | 
               | > So IMO it's perfectly possible to run Java applications
               | without containers. You would need to think about network
               | ports, about resource limits, but those are not hard
               | things.
               | 
               | This is only viable/easy/not brittle when you have self-
               | contained .jar files, which admittedly are pretty nice!
               | Though if shipping JDK with each delivery isn't in the
               | cards (for example, because of the space considerations),
               | that's not safe either - I've seen performance degrade
               | 10x because of a JDK patch release was different between
               | two environments, all because of JDK being managed
               | through the system packages.
               | 
               | Resource limits are generally doable, though Xms and Xmx
               | lie to you, you'd need systemd slices or an equivalent
               | for hard resource limits, which I haven't seen anyone
               | seriously bother with, although they're at a risk of the
               | entire server/VM becoming unresponsive should their
               | process go rogue for whatever reason (e.g. CPU at 100%,
               | which is arguably worse than OOM because of bad memory
               | limits).
               | 
               | Ports are okay when you are actually in control of the
               | software and nothing is hardcoded. Then again, another
               | aspect is being able to run multiple versions of software
               | at the same time (e.g. different MySQL/MariaDB releases
               | for different services/projects on the same node), which
               | most nix distributions are pretty bad at.
               | 
               | > And tomcat even provides zero-downtime upgrades,
               | although it's not that easy to set up, but when it works,
               | it does work.
               | 
               | I've seen this attempted, but it never worked properly -
               | the codebases might not have been good, but those
               | redeployments and integrating with Tomcat always lead to
               | either memory leaks or odd cases of the app server
               | breaking. That's why personally I actually enjoy the
               | approach of killing the entire thing alongside the app
               | and doing a full restart (especially good with embedded
               | Tomcat/Jetty/Undertow), using health checks for routing
               | traffic instead.
               | 
               | I think doing these things at the app server level is
               | generally just asking for headaches, though the idea of
               | being able to do so is nice. Then again, I don't see
               | servers like Payara (like GlassFish) in use anymore, so I
               | guess Spring Boot with embedded Tomcat largely won, in
               | combination with other tools.
               | 
               | > After I've got some experience with Kubernetes, I'd
               | reach for it always because it's very simple and easy to
               | use. But that requires to go through some learning curve,
               | for sure.
               | 
               | I wouldn't claim that Kubernetes is simple if you need to
               | run your own clusters, though projects like K3s, K0s and
               | MicroK8s are admittedly pretty close.
               | 
               | > The best and unbeatable thing about containers is that
               | there're plenty of ready ones. I have no idea how would I
               | install postgres without apt. I guess I could download
               | binaries (where?), put them somewhere, read docs, craft
               | config file with data dir pointing to anotherwere and so
               | on. That should be doable but that's time. I can docker
               | run it in seconds and that's saved time. Another example
               | is ingress-nginx + cert-manager. It would take hours if
               | not days from me to craft set of scripts and configs to
               | replicate thing which is available almost out of the box
               | in k8s, well tested and just works.
               | 
               | This is definitely a benefit!
               | 
               | Though for my personal needs, I build most (funnily
               | enough, excluding databases, but that's mostly because
               | I'm lazy) of my own containers from a common Ubuntu base.
               | Because of layer reuse, I don't even need tricks like
               | copying files directly, but can use the OS package
               | manager (though clean up package cache afterwards) and
               | pretty approachable configuration methods:
               | https://blog.kronis.dev/articles/using-ubuntu-as-the-
               | base-fo...
               | 
               | In addition, my ingress is just a containerized instance
               | of Apache running on my nodes, with Docker Swarm instead
               | of Kubernetes: https://blog.kronis.dev/tutorials/how-and-
               | why-to-use-apache-... In my case, the distinction between
               | the web server running inside of a container and outside
               | of a container is minimal, with the exception that Docker
               | takes care of service discovery for me, which is
               | delightfully simple.
               | 
               | I won't say that the ingress abstraction in Kubernetes
               | isn't nice, though you can occasionally run into
               | configurations which aren't as easy as they should be:
               | e.g. configuring Apache/Nginx/Caddy/Traefik certs which
               | has numerous tutorials and examples online vs trying to
               | feed your wildcard TLS cert into a Traefik ingress, with
               | all of the configuration so that your K3s cluster would
               | use it as the default certificate for the apps you want
               | to expose. Not that other ingresses aren't great (e.g.
               | Nginx), it's just that you're buying into additional
               | complexity and I've personally have also had cases where
               | removing and re-adding it hangs because of some resource
               | cleanup in Kubernetes failing to complete.
               | 
               | I guess what I'm saying is that it's nice to use
               | containers for whatever the strong parts are (for
               | example, the bit about being able to run things easily),
               | though ideally without ending up with an abstraction that
               | might eventually become leaky (e.g. using lots of Helm
               | charts that have lots of complexity hiding under the
               | hood). Just this week I had CI deploys starting to
               | randomly fail because some of the cluster's certificates
               | had expired and kubectl connections wouldn't work. A
               | restart of the cluster systemd services helped make
               | everything rotate, but that's another thing to think
               | about, which otherwise wouldn't be a concern.
        
           | andrewstuart wrote:
           | I don't even use containers - I aim primarily for simplicity
           | and so far I have found I am able to build entire
           | sophisticated systems without a single container. Containers
           | I find make things much more complex.
        
             | John23832 wrote:
             | That just means you don't know how to architect a machine
             | with containers (and if you're effective in what you do,
             | that's ok).
             | 
             | But it's a pretty objective notation that manually scaled
             | single machines don't scale as well as automation.
        
             | fragmede wrote:
             | Are you at least using Ansible or Chef or something?
        
             | TillE wrote:
             | Docker can be super simple. Like, if I want to run a Python
             | service, that's just a few lines in a Dockerfile and a
             | docker-compose.yml stub. Then I can trivially deploy that
             | anywhere.
        
               | paulryanrogers wrote:
               | Simple to start. Yet it is more complex at run time,
               | which can complicate (or simplify) debugging, depending
               | on the problem.
        
               | judge2020 wrote:
               | Observability in production is where APM solutions like
               | Datadog, Elastic, and Sentry come in; you can go from
               | just logging errors all the way up to continuously
               | profiling your application and beaming log files to them
               | to correlate with metrics and database query timings.
               | 
               | If you're just doing a simple application, Sentry really
               | is the way to go, while Datadog and ELK are agent-based
               | and more intended for complex setups and big enterprises
               | (especially in their pricing structure/infra costs).
        
         | tluyben2 wrote:
         | By far most indeed. And if you want failover, just run 2 two
         | machines. More fun (for me) also than tying together all that
         | complexity.
        
         | John23832 wrote:
         | Kubernetes just orchestrates containers. You can still run
         | beefy machines and scale (if necessary) accordingly.
         | 
         | If anything, Kubernetes allows you to save cost by going with a
         | scalable number of small, inexpensive, fully utilized machines,
         | vs one large, expensive, underused one.
        
           | sitkack wrote:
           | I would wager that the majority of users of k8s do so on a
           | cloud where they could provision VMs of the proper size to
           | begin with. The utilization argument is specious.
        
         | pikdum wrote:
         | I go with k8s even on a single server nowadays, it just makes
         | everything so much more convenient.
         | 
         | https://k3s.io/ makes it really easy to set up, too.
        
           | vbezhenar wrote:
           | I never tried k3s, but what's wrong with kubeadm? I think
           | that's literally two commands to run single server k8s:
           | kubeadm init and kubectl taint something.
           | 
           | The only thing bad about single server kubernetes is that
           | it'll eat like 1-2 GB of RAM by itself. When you whole server
           | could be 256 MB, that's a lot of wasted RAM.
        
             | pikdum wrote:
             | Nothing wrong with kubeadm, but k3s should be a bit more
             | lightweight.
        
       | samsquire wrote:
       | I recommend this table of latency figures for any software
       | engineer:
       | 
       | https://gist.github.com/jboner/2841832
       | 
       | Essentially IO is expensive except within a datacenter but even
       | in a data center, you can do a lot of loop iterations in a hot
       | loop in the time it takes to ask a server for something.
       | 
       | There is a whitepaper which talks about the raw throughput and
       | performance of single core systems outperforming scalable
       | systems. These should be required reading of those developing
       | distributed systems.
       | 
       | http://www.frankmcsherry.org/assets/COST.pdf A summary:
       | http://dsrg.pdos.csail.mit.edu/2016/06/26/scalability-cost/
        
       | eatonphil wrote:
       | This is a great exercise in napkin math, even with constraints
       | you've set for yourself that don't fully approximate Twitter
       | (yet). Thanks!
        
       | habibur wrote:
       | He will be up for surprise.
       | 
       | HTTP with connection: keep-open can serve 100k req/sec. But
       | that's for one client being served repeatedly over 1 connection.
       | And this is the inflated number that's published in webserver
       | benchmark tests.
       | 
       | For more practical down to earth test, you need to measure
       | performance w/o keep-alive. Request per second will drop to 12k /
       | sec then.
       | 
       | And that's for HTTP without encryption or ssl handshake. Use
       | HTTPS and watch it fall down to only 400 req / sec under load
       | test [ without connection: keep-alive ].
       | 
       | That's what I observer.
        
         | trishume wrote:
         | I agree most HTTP server benchmarks are highly misleading in
         | that way, and mention in my post how disappointed I am at the
         | lack of good benchmarks. I also agree that typical HTTP servers
         | would fall over at much lower new connection loads.
         | 
         | I'm talking about a hypothetical HTTPS server that used
         | optimized kernel-bypass networking. Here's a kernel-bypass HTTP
         | server benchmarked doing 50k new connections per core second
         | while re-using nginx code: https://github.com/F-Stack/f-stack.
         | But I don't know of anyone who's done something similar with
         | HTTPS support.
        
           | sayrer wrote:
           | Userspace networking is pretty common. The chair of the IETF
           | even wrote one: https://github.com/NTAP/quant
           | 
           | "Quant uses the warpcore zero-copy userspace UDP/IP stack,
           | which in addition to running on on top of the standard Socket
           | API has support for the netmap fast packet I/O framework, as
           | well as the Particle and RIOT IoT stacks. Quant hence
           | supports traditional POSIX platforms (Linux, MacOS, FreeBSD,
           | etc.) as well as embedded systems."
        
           | lossolo wrote:
           | TLS handling would dominate your performance, kernel
           | bypassing would not help here unless you would also do TLS
           | NIC offloading, you still need to process new TLS sessions
           | from OP example and they would dominate your http processing
           | time (excluding application business logic processing).
        
         | pixl97 wrote:
         | And I would say real life Twitter involves mostly cell phone
         | use where we see companies like Google try to push HTTP/3 to
         | deal with head of line issues on lossy connections. Serving at
         | the millions of hits per day on lossy networks is going to
         | leave you with massive numbers of connections that have been
         | abandoned but you don't know it yet. Or connections that are
         | behaving like they are tar pitted and running at bits per
         | second.
        
         | lossolo wrote:
         | > Use HTTPS and watch it fall down to only 400 req / sec under
         | load test [ without connection: keep-alive ].
         | 
         | I'm running about 2000 requests/s in one of my real-world
         | production systems. All of the requests are without keep-alive
         | and use TLS. They use about one core for TLS and HTTP
         | processing.
        
       | keewee7 wrote:
       | In the coming years we will probably see a lot of complicated
       | microservice architectures be replaced by well-designed and
       | optimized Rust (and modern C++) monoliths that use simple
       | replication to scale horizontally.
        
         | pixl97 wrote:
         | Replication and simple never belong in the same sentence. DNS
         | which is one of the simplest replication systems I know of has
         | its own complex failure modes.
        
       | Cyph0n wrote:
       | More like "barebones, in-memory, English-only Twitter clone on
       | one machine".
       | 
       | Edit: Still a nice writeup!
        
         | Dylan16807 wrote:
         | What strikes you as English-only about it?
        
           | Cyph0n wrote:
           | My bad - not English, but ASCII. The assumed max tweet size
           | is in bytes rather than (UTF) characters.
        
             | trishume wrote:
             | I specifically assumed a max tweet size based on the
             | maximum number of UTF-8 bytes a tweet can contain (560),
             | with a link to an analysis of that, and discussion of how
             | you could optimize for the common case of tweets that
             | contain way fewer UTF-8 bytes than that. Everything in my
             | post assumes unicode.
        
               | Tepix wrote:
               | Did you consider URLs? They don't seem to count against
               | tue size and can be very large indeed (like 4k)
        
               | modeless wrote:
               | URLs are shortened and the shortened size counts against
               | the tweet size. The URL shortener could be a totally
               | separate service that the core service never interacts
               | with at all. Though I think in real twitter URLs may be
               | partially expanded before tweets are sent to clients, so
               | if you wanted to maintain that then the core service
               | would need to interact with the URL shortener.
        
               | Cyph0n wrote:
               | Thanks for clarifying. I missed the max vs. average
               | analysis because I was focused on the text. Still, as
               | noted in the Rust code comment, the sample implementation
               | doesn't handle longer tweets.
        
             | Dylan16807 wrote:
             | That size in bytes is based on the max size in UTF-8 and
             | UTF-16. Codepoints below U+1100 are counted as one
             | "character" by twitter and will need at most 2 bytes.
             | Codepoints above it are counted as two "characters" by
             | twitter and will need at most 4 bytes. Therefore 560 bytes,
             | and it supports all languages.
             | 
             | Side note, this is more pessimistic than it needs to be, if
             | you're willing to transcode. The larger codepoints fit into
             | 20-21 bits, and the smaller ones fit into 12-13 bits.
        
               | Cyph0n wrote:
               | I was referring to the comment in the Rust code, not the
               | analysis.
        
           | halfmatthalfcat wrote:
           | As in the instance will only be tuned to serve one language?
        
       | SilverBirch wrote:
       | I think one of the under-estimated interesting points of twitter
       | as a business is that this is the core. Yes, Twitter is 140
       | characters, it's got "300m users" which is probably 5m real heavy
       | users. So yes, you could do a lot of "140 characters, a few
       | tweets per person, few million users" on very little hardware.
       | But that's why Twitters a shit business!
       | 
       | How much RAM did your advertising network need? Becuase _that_ is
       | what makes twitter a business! How are you building your
       | advertiser profiles? Where are you accounting for fast roll out
       | of a Snapchat /Instagram/BeReal/Tiktok equivalent? Oh look, your
       | 140 characters just turned into a few hundreds megs of video that
       | you're going to transcode 16 different ways for Qos. Ruh Roh!
       | 
       | How are your 1,000 engineers going to push their code to
       | production _on one machine_?
       | 
       | Almost always the answer to "do more work" or "buy more machines"
       | is "buy more machines".
       | 
       | All I'm saying is I'd change it to "Toy twitter on one machine"
       | not Production.
        
         | reacharavindh wrote:
         | The author claimed early on, and very clearly that this was a
         | fun exercise of thought and engineering rather than saying
         | "Look this is how Twitter should be run". After all this is
         | Hacker News. Such exercises, and engaging other hackers to pick
         | something out of there is how we progress(and get our tickles).
         | So, may be instead think about how one could tackle the
         | advertising/indexing needs in a similar fashion(could it be
         | done in just another server? 5 more servers?)..
        
       | kierank wrote:
       | This is as realistic as the moon rocket in my back garden.
        
       | fleddr wrote:
       | "Through intense digging I found a researcher who left a notebook
       | public including tweet counts from many years of Twitter's 10%
       | sampled "Decahose" API and discovered the surprising fact that
       | tweet rate today is around the same as or lower than 2013! Tweet
       | rate peaked in 2014 and then declined before reaching new peaks
       | in the pandemic. Elon recently tweeted the same 500M/day number
       | which matches the Decahose notebook and 2013 blog post, so this
       | seems to be true! Twitter's active users grew the whole time so I
       | think this reflects a shift from a "posting about your life to
       | your friends" platform to an algorithmic content-consumption
       | platform."
       | 
       | I know it's not the core premise of the article, but this is very
       | interesting.
       | 
       | I believe that 90% of tweets per day are retweets, which supports
       | the author's conclusion that Twitter is largely about reading and
       | amplifying others.
       | 
       | That would leave 50 million "original" tweets per day, which you
       | should probably separate as main tweets and reply tweets. Then
       | there's bots and hardcore tweeters tweeting many times per day,
       | and you'll end up with a very sobering number of actual unique
       | tweeters writing original tweets.
       | 
       | I'd say that number would be somewhere in the single digit
       | millions of people. Most of these tweets get zero engagement.
       | It's easy to verify this yourself. Just open up a bunch of rando
       | profiles in a thread and you'll notice a pattern. A symmetrical
       | amount of followers and following typically in the range of
       | 20-200. Individual tweets get no likes, no retweets, no replies,
       | nothing. Literally tweeting into the void.
       | 
       | If you'd take away the zero engagement tweets, you'll arrive at
       | what Twitter really is. A cultural network. Not a social network.
       | Not a network of participation. A network of cultural influencers
       | consisting of journalists, politicians, celebrities, companies
       | and a few witty ones that got lucky. That's all it is: some tens
       | of thousands of people tweeting and the rest leeching and
       | responding to it.
       | 
       | You could argue that is true for every social network, but I just
       | think it's nowhere this extreme. Twitter is also the only
       | "social" network that failed to (exponentially) grow in a period
       | that you might as well consider the golden age of social
       | networks. A spectacular failure.
       | 
       | Musk bought garbage for top dollar. The interesting dynamic is
       | that many Twitter top dogs have an inflated status that cannot be
       | replicated elsewhere. They're kind of stuck. They achieved their
       | status with hot take dunks on others, but that tactic doesn't
       | really work on any other social network.
        
         | yodsanklai wrote:
         | > Musk bought garbage for top dollar
         | 
         | Totally out of topic here, but could be he just wants the
         | ability to amplify his own ideas. Also, why measure Twitter
         | value (arbitrarily?) by number of unique tweets, rather than by
         | read tweets?
        
       ___________________________________________________________________
       (page generated 2023-01-07 23:00 UTC)