[HN Gopher] Serving Netflix Video Traffic at 800Gb/s and Beyond ... ___________________________________________________________________ Serving Netflix Video Traffic at 800Gb/s and Beyond [pdf] Author : ksec Score : 434 points Date : 2022-08-19 11:53 UTC (11 hours ago) (HTM) web link (nabstreamingsummit.com) (TXT) w3m dump (nabstreamingsummit.com) | walrus01 wrote: | Everything old is new again: Anyone remember seeing a 32-bit/33 | MHz PCI (not pci-x, not pci-e) card for SSL acceleration in the | late 1990s? It was totally a thing at one point in time when your | typical 1U rackmount server was single-core CPUs and quite weak | in overall math processing power. | | OpenBSD had support for them like 22 years ago. | | https://www.google.com/search?client=firefox-b-d&q=SSL+accel... | | Now we have TLS1.2/TLS1.3 offload getting built into the PCI-E | 4.0 100/200/400GbE (whatever speed) NIC. | lprd wrote: | I wonder if they are using something like truenas or just | interfacing directly with OpenZFS (assuming they use ZFS). | BonoboIO wrote: | It amazes me, that Netflix is capable of such top of the line | engineering things, but is for the love of god unable to stream | HD Content to my iPhone. Tried everything gigabit wifi, cellular | ... | | It is better for me to pirate their content, play it with plex | and be happy. I pay for Netflix and it is absurd. | | I think the best years are over for Netflix. The hard awakening | is here to make content that the users want and they are a | movie/tv content company, not primarily a ,,tech company". | Infinitesimus wrote: | Some ISPs throttle Netflix. Not sure your background but it | might be helpful to have more details about the type phone (I'd | expect a difference in 13 pro max vs an old 7) and ISP to see | if others have similar problems. | BonoboIO wrote: | iPhone XS iOS 15.4 Netflix App Updated A1 Telekom both mobile | and wifi. It does not even work at work with a different ISP. | | Netflix has all the bandwidth data and metrics, but this is | not working since ages. Maybe a more basic setup on their end | would bring better results. Focus more and delivery, not 10 | different UI versions, AB Tests, Batch Job Workflows and so | on. They Post on their engineering blog how they test | multiple TVs, multiple Profiles in encoding, great things, | but if the basics don't work ... well what is it good for. | | I think they lost their focus. | ksec wrote: | And this is still on ConnectX-6 Dx, with PCI-Gen 5 and | ConnectX-7, Netflix should be able to push for _1.6Tbps_ per box. | This will hopefully keep drewg123 and his team busy for another | year :P | dragontamer wrote: | At that point, RAM itself would likely be the bottleneck. | | But maybe DDR5 will come out by then and get this team busy | again lol. | wmf wrote: | Genoa does indeed have roughly double the memory bandwidth. | Moral_ wrote: | A lot of the reasons they've had to build most of this stuff | themselfs is because they decided for some reason to use freeBSD. | | The NUMA work they did, I remember being in a meeting with them | as a Linux Developer at Intel at the time. They bought NVMe | drives or were saying they were going to buy NVMe drives from | Intel which got them access to "on the ground" kernel developers | and CPU people from Intel. Instead of talking about NVMe they | spent the entire meeting asking us about howt the Linux kernel | handles NUMA and corner cases around memory and scheudling. If I | recall correctly I think they asked if we could help them | upstream BSD code for NVMe and NUMA. I think in that meeting | there was even some L9 or super high up NUMA CPU guy from | Hillsborough they some how convinced to join. | | The conversation and technical discussion was quite fun, but it | was sort of funny to us at the time they were having to do all | this work on the BSD kernel that was solved years ago for linux. | | Technical debt I guess. | ksec wrote: | Is NUMA a solved issue on Linux? Correct me if I am wrong but I | was under the impression it may be better handled under certain | conditions, but NUMA, the problem in itself is hardly solved. | cperciva wrote: | Netflix tried Linux. FreeBSD worked better. | dboreham wrote: | By some definition of better. | trasz wrote: | It worked faster. It's a common misconception among newbies | that "Linux has NUMA" automatically means it will use NUMA | properly in a given workload. What it actually means is you | _should_ be able to use existing functionality. Sometimes | you'll only need to configure it, sometimes you'll need to | reimplement it from the scratch, and doing that in FreeBSD | is easier because there's less bloat. | throw0101c wrote: | *At the time when they created the OCA project. | | If someone was going to do a similar comparison now the | results _could_ be different. | jeffbee wrote: | I still don't get the NUMA obsession here. It seems like they | could have saved a lot of effort and a huge number of | powerpoint slides by building a box with half of these | resources and no NUMA: one CPU socket with all the memory and | one PCIe root complex and all the disks and NICs attached | thereto. It would be half the size, draw half the power, and be | way easier to program. | Bluecobra wrote: | If you are buying servers at scale the costs will certainly | add up vs. buying two processors. If you buy single proc | servers, that is double the amount of chassis, rail kits, | power supplies, power cables, drives, iLO/iDRAC licenses, | etc. | dboreham wrote: | You can build motherboards with two or more completely | isolated sets of CPU and memory, that are physically | compatible with standard racks etc. | Bluecobra wrote: | Good point, I forgot about those. It would be interesting | to see if 1x PowerEdge C6525 with four single processor | nodes is cheaper than 2x Dell R7525 servers. The C6525 | does support dual processor, so it does seem a bit | wasteful to me. | muststopmyths wrote: | Can you buy non NUMA mainstream CPUs though? Honest question | because I'd love to be rid of that BS too | jeffbee wrote: | NUMA is an outcome of system configuration. You can make a | non-NUMA platform using any CPU. You just limit yourself to | 1 CPU socket. | | Here's a Facebook engineering blog post about how they left | NUMA behind. https://engineering.fb.com/2016/03/09/data- | center-engineerin... | Dylan16807 wrote: | > You can make a non-NUMA platform using any CPU. You | just limit yourself to 1 CPU socket. | | Well, not on Epyc generation 1. Those have four NUMA | segments in each socket. | | Also those Xeon Platinum 9200 processors Intel made as an | attention grab. | jeffbee wrote: | EPYC Naples wasn't good for much of anything though, so I | am trying to forget it. | drewg123 wrote: | This is a testbed to see what breaks at higher speed. Our | normal production platforms are indeed single socket and run | at 1/2 this speed. I've identified all kinds of unexpected | bottlenecks on this testbed, so it has been worth it. | | We invested in NUMA back when Intel was the only game in | town, and they refused to give enough IO and memory bandwidth | per-socket to scale to 200Gb/s. Then AMD EPYC came along. And | even though Naples was single-socket, you had to treat it as | NUMA to get performance out of it. With Rome and Milan, you | can run them in 1NPS mode and still get good performance, so | NUMA is used mainly for forward looking performance testbeds. | pclmulqdq wrote: | This is amazing work from the Netflix team. I'm looking forward | to 1.6 Tb/s in 4 years. | | It is interesting that this work is happening on FreeBSD, and | potentially with diverging implementations than Linux. Linux | programs seem to be moving towards userspace getting more power, | with things like io_uring and increasing use of frameworks like | DPDK/SPDK. This work is all about getting userspace out of the | way, with things like async sendfile and kernel TLS. That's | pretty neat! | mgerdts wrote: | PCIe Gen 5 drives look poised for wide availability next year | and NVIDIA has been demoing CX7 [1] which is also PCIe Gen 5. | Intel already has some Gen 5 chips and AMD looks like they will | follow soon [2]. Surely there will be other bumps, but I bet | they pull it off in way less than 4 years. | | 1. https://www.servethehome.com/nvidia-connectx-7-shown-at- | isc-... | | 2. https://wccftech.com/amd-epyc-7004-genoa-32-zen-4-core- | cpu-s... | the8472 wrote: | kTLS has been added to linux too including offload. It also has | p2p-dma, so in principle you can shovel the file directly from | NVMe to the NIC and have the NIC encrypt it, so it'll never | touch the CPU or main memory. But that only works on specific | hardware. | [deleted] | robocat wrote: | Memory is the cache for popular content. You couldn't serve | fast enough directly from NVMe. | | "~200GB/sec of memory bandwidth is needed to serve 800Gb/s" | and "16x Intel Gen4 x4 14TB NVME". So each NVMe drive would | need to serve 12.5GB/s which is more than the 8GB/s limit for | PCIe 4.0 x4. Also popular content would need to be on every | drive, drastically lowering the total content stored. | | Also see drewg's comment on this for a different reason: | https://news.ycombinator.com/item?id=32523509 | pclmulqdq wrote: | With HBM2 sapphire rapids chips, I assume you can actually | get there. There is probably an insane price premium for | them, though, so I wouldn't hold my breath. | mschuster91 wrote: | They... serve 800 gigabytes a second on _one single content | server_ , do I get that right? | plucas wrote: | Gigabits, I presume, so 100 GB/s. | Aissen wrote: | Almost, it's 800 gigabits. Still _a lot_. | la64710 wrote: | Great engineering but how does this 800Gb/s throughput achieved | translate downstream all the way to the consumers? I suspect | there may be switches and routers from ISPs and others that | Netflix do not control in between that will reduce the effective | throughout to the end user. | kkielhofner wrote: | ISP routers have been more-or-less indistinguishable from | switches for decades at this point. They're all "line rate" | which is to say that regardless of features, packet size, etc | they'll push traffic between interfaces at whatever the | physical link is capable of without breaking a sweat. | | In the case of Netflix it is in the ISPs best interest to let | them push as much traffic to their customer eyeballs as | possible. After all, it's much "easier" and cheaper to build | out your internal fabric and network (which you have to do | anyway for the traffic) than it is to buy and/or acquire | transit to "the internet" for this level of traffic. | vbernat wrote: | Modern routers are unable to do line-rate regardless of | packet size. See for example the Q100 ASIC from Cisco. Rated | for 10.8 Tbps, it is only able to achieve 6 Bpps [1]. So it | needs 200-byte packets to hit line-rate. However, as for | Netflix, this is not problem since they only push big | packets. | | [1]: https://xrdocs.io/8000/tutorials/8201-architecture- | performan... | kkielhofner wrote: | Wow, I've been out of this space for a while! Last I was | paying close attention to any of this 10G ports were new. | Glad I learned something from my old life today! | | I stand corrected on "always line rate all the time in any | circumstance" but by your math and my general point < 1 | Tbps from one of these appliances across multiple 100G | ports isn't problematic in the least from a hardware | standpoint - especially for the Netflix traffic pattern | with relatively full (if not max MTU) packets. | [deleted] | Cyph0n wrote: | On the contrary, even older routers can handle this load with | no sweat. Service provider-grade routers can handle 10 to 200 | Tbps depending on size. | Aeolun wrote: | But then it gets to my home and it's trashed down to | 100Mbit/s | rayiner wrote: | Of course--the fat backbone pipe is progressively split | into smaller pipes as it gets to your house. The internet | isn't a big truck. It's a series of tubes. | jeffbee wrote: | That would be more than enough to watch half a dozen | Netflix streams at the same time. | paxys wrote: | The 800 Gb/s isn't going to a single user. There are switches | and routers in the middle, sure, but they are all doing their | job, which is to split up traffic. The end user only needs ~8 | Mb/s for a 4K stream. | jiripospisil wrote: | Here's the accompanying video: | https://cdnapisec.kaltura.com/index.php/extwidget/preview/pa... | antonio-ramadas wrote: | I found the same video on the website of the summit: | https://nabstreamingsummit.com/videos/2022vegas/ | | I'm on mobile and there does not seem to exist a direct link. | Search for: "Case Study: Serving Netflix Video Traffic at | 400Gb/s and Beyond" | Aissen wrote: | How do you deal with the higher power density of these servers | that needs to be put at the ISP locations ? Don't they have some | constraints for the open connect machines ? | Bluecobra wrote: | Delivering power is not the problem, cooling is. You can load | up a cabinet with four 60A PDU's (~50kW) but the challenge is | to cool all that hardware you packed in the cabinet. | Aissen wrote: | Yeah, I was including that in the budget (server fans), but | technically you're correct, DC cooling is powered separately. | kkielhofner wrote: | The Dell R7525 chassis is available with dual 800w power | supplies. General thinking for power supplies is that each | power supply is connected to completely separate power | distribution - independent cabling, battery backup, generators, | and points of entry to the facility. In many cases it's also | two different power grids. This is so that if one power source | fails anywhere the load can move over to the other power supply | without exceeding the power that can be delivered through a | single power supply or trip a breaker anywhere. Under normal | operating conditions each power supply is doing half the load. | | Additionally, the National Electric Code in the US specifies | that continuous load should not exceed 80% of given | circuit/breaker capacity. | | So with dual 800 power supplies at "max" 80% load that's "only" | 640 watts for one of these 2U servers. For 208V power that's | only 3 amps. High density (for sure) compared to the old days | but not as ridiculous as it may seem. | Aissen wrote: | You're right, it's not that much for 2U. But for this config, | I think they'd probably go for the 1400W power supplies: | | - 16 x 25W SSDs | | - 2 x 225W CPUs | | - On top of that, add RAM, cooling, etc. | | Honestly, it's still manageable. I doubt they'd put 10 of | those in a single rack (you'd need an ISP that would want | serve 2.2M subscribers in peak from a single location, not | necessarily desirable on their side); but if the site is | getting full, you'd feel the (power) pressure (slowly) | mounting. | kkielhofner wrote: | I didn't dig into the CPU config, etc but you're right | they'd probably go for the 1400W power supplies which is | 5.4 amps max at 208V. It's for an older config (based on | the other specs) but the current Netflix OpenConnect docs | call for 750w[0] which is more reasonable even for this | hardware configuration because no one really wants to | consistently run their power supplies at 80% (even in | branch loss) for obvious reasons. | | They absolutely wouldn't want to concentrate them. The | entire purpose is to reduce ISP network load and get as | close to the customer eyeballs as possible. I don't have | any experience with these but I imagine ISPs would install | them at their peering "hubs" in major cities - in my | experience the usual suspects like Chicago, NYC, Miami, | etc. | | [0] - https://openconnect.zendesk.com/hc/en- | us/articles/3600345383... | gmm1990 wrote: | Why use/make processors with numa if you have to go to all this | trouble not to use it? | 0x457 wrote: | Well, the point of NUMA is to allow you to do things like in | slides rather than - everyone suffers equally talking to North | Bridge. Fabric between NUMA nodes isn't the selling point - | fast and direct connection between CPU and other components is. | | Plus, not every workload is: read from disk -> encrypt -> send | to nic | 0x500x79 wrote: | Awesome feats of engineering here taking hardware and software | into account when designing the system for a holistic approach to | serving content as quickly as possible! | | The slide deck background though: At least half of the products | in the slide deck template are no longer on Netflix... | Joel_Mckay wrote: | How would this compare with 42 server slots running 100Gbps DRBD | in RAID 0? If I recall, it can pre-shard the data based on a | round-robin balancer. ;) | drewg123 wrote: | We don't consider solutions like DRBD that introduce inter- | dependencies between servers. Any CDN server has to be able to | fail and not take down our service. | ajross wrote: | So... the driver and device level seems happy here, but is anyone | else creeped out by "asynchronous sendfile()"? I mean, how do you | even specify that? You have a giant file you want dumped down the | pipe, so you call it, and... just walk away? How do you report | errors? What happens to all the data buffered if the other side | resets the connection? What happens if the connection just | stalls? | | In synchronous IO paradigms, this is all managed by the | application with whatever logic the app author wants to | implement. You can report the errors, ping monitoring, whatever. | | But with this async thing, what's the API for that? Do you have | to write kernel code that lives above the driver to implement | devops logic? How would one even document that? | | +1 for the technical wizardry, but seems like it's going to be a | long road from here to an "OS" feature that can be documented. | 0x457 wrote: | Here is announcement for the feature: | https://www.nginx.com/blog/nginx-and-netflix-contribute-new-... | | And here are the slides explaining it: | https://www.slideshare.net/facepalmtarbz2/new-sendfile-in-en... | | There are video of various talks by Gleb Smirnoff explaining | all this magic on YouTube. | | The feature is fully documented in `man 2 sendfile`, it was | part of the patch that did the work. | zackmorris wrote: | This was my thought too. I've been struggling with the concept | of "if you don't have anything nice to say, don't say anything | at all" lately, because I've been programming too long and just | see poison pills and better alternatives everywhere I look. | | But I believe that async is an anti-pattern. From the article: | * When an nginx worker is blocked, it cannot service other | requests * Solutions to prevent nginx from blocking like | aio or thread pools scale poorly | | Nothing against nginx (I use it all the time, it's great) but I | probably would have used a synchronous blocking approach. The | bottleneck there would be artificial limits on stuff like I/O | and the number of available sockets or processes. | | So.. why isn't anyone addressing these contrived limits of sync | blocking I/O at a fundamental level? We pretend that context | switching overhead is real, but it's not. It's an artifact of | poorly written kernels from 30+ years ago (especially in | Windows) where too many registers and too much thread state | must be saved while swapping threads. We're basically all | working around the fact that the big players have traditionally | dragged their feet on refactoring that latency. | | And that some of the more performant approaches like atomic | operations using compare and swap (CAS) on thread-safe queues | beat locks/mutexes/semaphores. And that content-addressable | memory with multiple busses or even network storage beats | vertical scaling optimizations. | | So I dunno, once again this feels like kind of a drink-the- | kool-aid article. If we had a better sync blocking foundation, | then a simple blocking shell script could serve video and this | whole PDF basically goes away. Rinse, repeat with most web | programming too, where miles-long async code becomes a single | deterministic blocking function that anyone can understand. | | I'm kind of reaching the point where I expect more from big | companies to fix the actual root causes that force these async | workarounds. I kind of gave up on stuff like that over the last | 10 years, so am behind the times on improvements to sync | blocking kernel code. I'd love to hear if anyone knows of an OS | that excels at that. | 0x457 wrote: | Slide 25 shows benchmark between "old" sendfile and "new" | sendfile: | | https://www.slideshare.net/facepalmtarbz2/new-sendfile-in- | en... | | > but I probably would have used a synchronous blocking | approach. | | Well, send a patch, then. | wmf wrote: | _I probably would have used a synchronous blocking approach_ | | Then Varnish is probably more your style. (A discussion | between phk and drewg would be fascinating to watch.) | | _We pretend that context switching overhead is real, but it | 's not._ | | This sounds crackpot to be honest. Linux has put a lot of | effort into optimizing context switching (that's why they | have NPTL instead of M:N) and I assume FreeBSD has as well. | | _...this whole PDF basically goes away_ | | Sync vs. async doesn't solve any of the NUMA or TLS issues | that this whole PDF is about. | drewg123 wrote: | This has been a feature upstream in FreeBSD for roughly 6 | years. | | If there is a connection RST, then the data buffered in the | kernel is released (either freed immediately, or put into the | page cache, depending on SF_NOCACHE). | | sendfile_iodone() is called for completion. If there is no | error, it marks the mbufs on the socket buffer holding the | pages that were recently brought in as ready, and pokes the TCP | stack to send them. If there was an error, it calls TCP's | pru_abort() function to tear down the connection and release | what's sitting on the socket buffer. See | https://github.com/freebsd/freebsd-src/blob/main/sys/kern/ke... | mrbonner wrote: | My work requires to deal with political crap day long to get | promoted to a staff role. I miss this kind of work. | ryanianian wrote: | > deal with political crap day long to get promoted to a staff | role | | That's largely what many staff+ engineers have to do, even in | otherwise healthy organizations. "Staff" isn't a glorified, | autonomous, and stress-free version of senior at most | companies. There's nothing wrong with staying at the senior | level indefinitely provided (1) the pay and other factors are | keeping up with your contributions and (2) the staff+ and | management folks are being effective umbrellas for the politics | and messy uninteresting details behind interesting problems | like this. | touisteur wrote: | Will be fun to see what can be done with pcie5 stuff and new 400g | NICs. Really amazed by the recent increase in bandwidth. Sfp56 | recently becoming 'mainstream' in datacenters with 200G | controlers at <1500 each, you can stuff 8 or 10 of those in your | server. And you get immediate x2 with next gen. If you can | offload some of the heavy work to one (or several) GPUs or these | FPGA accelerator boards (Alveo or more niche but also crazy | ReflexCES with eth-800G capability) you're really starting to get | a 'datacenter in a box' system. If compacity is important, the | next years are going to be very interesting. | jwmoz wrote: | I've wondered how they achieve it and it's so far beyond my | knowledge and skills, truly astounding. The level of expertise | and costs must be so high. | Aeolun wrote: | Spend a few years just thinking of how to optimize video | delivery and you'd be a lot closer to understanding :) | throw0101c wrote: | Previous discussions on 400Gb/s: | | * https://papers.freebsd.org/2021/eurobsdcon/gallatin-netflix-... | | * https://news.ycombinator.com/item?id=28584738 | skynetv2 wrote: | This is amazing work. I cant help but state that we have been | doing these in HPC environments for at least 15 years - User | space networking, offloads, NUMA domains aware scheduling, jitter | reduction ... great to see it being put to good use in more | mainstream workloads. Goes to show - software is eating the | world. | drewg123 wrote: | I worked in HPC as well, and I have to point out emphatically | that this _IS NOT USERSPACE NETWORKING_. The TCP stack resides | in the kernel. The vast majority of our CPU time is system + | interrupt time. | phantomathkg wrote: | > Serve only static media files This part is weird to me. My | understanding is DRM lock the file at a per user level, so the | DRM encrypted chunk would be different from yours. Unless for all | bitrate, and for all streaming format, Netflix has already pre- | computed everything. Otherwise, there must be some sort of pre- | computation before it can be served over TLS. | wmf wrote: | That's not how DRM works. The content is encrypted once and | that key is sent to the client. The content key is probably | wrapped in some per-session key (which may be wrapped in a per- | user key wrapped in a per-device key or something). | paxys wrote: | I love technical content like this. Not only is it incredibly | interesting and informative, it also serves as a perfect | counterpoint to the popular " _why does Netflix need X thousand | engineers, I could build it in a weekend_ " sentiment that is | frequently brought up on forums like this one. | | Building software and productionalizing/scaling it are two very | different problems, and the latter is far more difficult. Running | a successful company always requires an unlimited number of very | smart people who are willing to get their hands dirty optimizing | every aspect of the product and business. Too many people today | think that programming starts and ends at pulling a dozen popular | libraries and making some API calls. | Aaronmacaron wrote: | I think the problem is that the "easy" parts of netflix such as | the UI or the recommendation engine seem like they were hacked | together over the weekend. Of course deploying and maintaining | something of the scale of netflix is incredibly hard. But if | they can afford thousands of engineers who optimize the | performance why can't they hire a few UI/UX engineers to fix | the godawful interface which is slightly different on every | device? I think this is where this sentiment stems from. | stackbutterflow wrote: | That's what puzzles me about Uber. I believe that behind the | scenes it does pretty complex things as explained many times | on HN, but it's the worst app I've ever used. UI and UX wise | it's so bad that if you told me it was a bootcamp graduation | project I'd have no problem believing you. | bradstewart wrote: | I honestly find Netflix's the easiest to navigate, by far. | | Hulu did that big redesign, and it's extremely pretty to look | at, but even after a few years of trying to use it, I _still_ | struggle to do anything other than "resume episode". Finding | the previous episode, list episodes, etc is always an | exercise in randomly clicking, swiping, long pressing, | waiting for loading bars, etc. | | One thing Netflix _really_ got right as well: the "Watch It | Again" section. So many times I want to rewatch the episode I | just "finished" (because either my wife finished a show when | I leave the room, the kids fell off the table, I fell asleep | or wasn't paying attention, etc), and every other platform | makes this extremely difficult to find. | | Back to Hulu--the only way I know how is the search feature, | which is a PITA with a remote. | Shaanie wrote: | I'm surprised you think Netflix's UI and UX is that poor. | Which streaming service do you think does a better job? | hugey010 wrote: | None of them, since they basically all copied Netflix! The | grid view limits users to slowly looking over limited | categories of content. Any list based tree structure would | be better in my opinion. | zasdffaa wrote: | Why do you say "UI and UX"; how are they different in your | view? | | Jargon BS is invading people's heads and it has to stop. | brabel wrote: | Not OP, but I think the Swedish TV streaming service has a | simpler, while nicer UX (hope you can at least see this | from your country, if not play the content): | https://www.svtplay.se/ | | Admitedly, it follows the same pattern as Netflix, but I | like how it's more responsive and feels way | simpler/lighter. | paxys wrote: | You just linked to an exact copy of Netflix. | NavinF wrote: | > godawful interface which is slightly different on every | device | | Which devices are you referring to? I've only used the PC and | mobile interfaces both of which are quite pleasant. | paxys wrote: | Technically speaking I think Netflix's UX blows every other | streaming app out of the water. It loads instantly, scrolling | is smooth, search is instant. Buttons are where you'd expect | and do what you expect. They have well-performing and up-to- | date apps for every conceivable device and appliance. They | support all the latest audio and video codecs. | | This is all in stark contrast to services like HBO Max and | Disney+ which still stutter and crash multiple times a day. | Amazon for some reason treats every season of a TV show and | HD/SD versions of movies as independent items in their | library. I still haven't been able to download a HBO Max | video for offline viewing on iOS without the app crashing on | me at 99%. | | The problems you mention with Netflix are real, but they have | more to do with the business side of things. Netflix | recommendations seem crap because they don't have a lot of | third party content to recommend in the first place. Their | front page layout is optimized to maximize repetition and | make their library seem larger. They steer viewers to their | own shows because that's what the business team wants. None | of these are problems you can fix by reassigning engineers. | P5fRxh5kUvp2th wrote: | My complaint about netflix UI/UX aren't technical in | nature, I agree with you their player is the best out | there, hands down. | | The issue is the business polices surrounding it. The UI | itself is user-hostile. | rurp wrote: | > Buttons are where you'd expect and do what you expect. | | Wait, what? Netflix is the absolute worst at this. Every | time I log in the interface is different! Netflix could not | care less about users having a consistent seamless | experience. | | But as far as performance goes, I totally agree with you. | The performance is impressively good and noticeably better | than the other streaming apps I use. | | The UX is just so bad in so many ways (UI churn, autoplay, | useless ratings, useless categories, recaps that can be | watched exactly once, and so on...) it mostly ruins the app | for me. The actual video quality is great though. | 0x457 wrote: | Interface is the same, order of rows is different. Yes, | it sucks. However, other streaming apps are much worse: | | - by the time HBO Max finish loading, I've already lost | interest | | - Amazon Prime constantly gives me errors, and it's often | hard to find what you paid for and what you have to pay | for | | - Paramount+ often restart episode from beginning instead | of resuming. | | - Many leave shit in your queue with a few seconds left | for you to "Continue Watching". I still have shows in | Paramount+ that I've finished months ago in the queue, | and there is no way to delete them without watching end | credits. - HBO Max only allows you FF in small fixed | intervals | | - Plex...used to be okay, now it's pushing its streaming | services and works very bad offline | | - Apple TV has awful offline experience compared to | netflix in terms of UX | | Nah, I will take netflix constantly changing rows over | shit others do. | xnx wrote: | > Building software and productionalizing/scaling it are two | very different problems, and the latter is far more difficult. | | Is this claim based on some example I should know? Countless | companies never achieve product/market fit, but very few I can | think of fail because they weren't able to handle all their | customers. | ternaryoperator wrote: | This! I am frustrated at how often devs will not accept that | simple things become incredibly complicated at scale. That | favorite coding technique? That container you wrote? Those | tests you added? All good, but until you've tested them at | scale, don't assert that everyone should use them. This dynamic | is true in the other direction too: that techniques often taken | for granted simply are not feasible in highly resource- | constrained environments. With rare exception, the best we can | say with accuracy is that "I find X works well enough in the | typical situations I code for." | geodel wrote: | Seems to be mixing too many things here. Many scaling/ hardware | challenges need a lot of people but it can still be true that | Netflix has choke full of engineers making half-assed turd Java | frameworks day in and day out. I know this because we are | forced to use these crappy tools as they are made by Netflix so | supposed to be best. | | It's just that they succeeded in streaming market with low | competition and great success bring in lot of post facto | justifications on how outrageously great Netflix tech infra is. | | I mean it may be excellent for their purpose but to think their | solution can be industry wide replicated seems not true to me. | paxys wrote: | So Netflix published a framework which seemingly isn't | suitable for your use case, your managers forced you to use | it, and your response is to blame...Netflix? | tankenmate wrote: | Que? You don't seem to have much justification for your | points; it seems more like a rant as you have had a bad | experience using software provided by Netflix. It would be | great if you could provide more details about what was wrong | with it rather than just "we are forced to use these crappy | tools". I'm genuinely interested. | | In my personal experience lots of companies (admittedly all | large companies, but many of which sell their services / | software / hardware to smaller companies) have a use for | serving hundreds of Gbps of static file traffic as cheaply as | possible. And the slides for this talk seem exactly on the | money (again from my experience slinging lots of static data | to lots of users). | AtNightWeCode wrote: | Scaling streaming for a company at the size of Netflix is very | easy. You can use any edge cache solution, even homemade. The | complexity at N seems to stem from other things. | yibg wrote: | This is exactly the type of comment OP is referring to. Have | you build a steaming service at this scale? Do you actually | know what's involved? Or are you just looking at the surface | level, making a bunch of assumptions and reaching a gut feel | conclusion? | n0tth3dro1ds wrote: | >You can use any edge cache solution | | Umm, those solutions exist (from places like AWS and Azure) | _because_ Netflix was able to do it without them. The cloud | platforms recognized that others would want to build their | own streaming services, so they built video streaming | offerings. | | You have the cart in front of the horse. The out-of-the-box | solutions of today don't exist without Netflix (and YouTube) | building a planet scale video solution first. | AtNightWeCode wrote: | N had problems in US because they served data from CA. | Today, N uses edge caching and the data for me in Europe is | sent less than 10km to my home. And it should be cheap. We | are talking about serving static content here. It is not | very difficult. | jedberg wrote: | Why do you think Netflix served out of California? They | only did that for the first few months, until they | adopted Akamai, Limelight, and L3 CDNs. That was long | before Netflix launched in Europe. | AtNightWeCode wrote: | Well they use to, they tried to bully various ISPs into | increasing their throughput before they jumped the edge | cache wagon, long time after competitors. Akamai is a | stellar company. Don't think N uses A services today. At | the end of the day. N mostly serves static content to | users and I highly doubt that hardware costs is a very | relevant parameter. | jedberg wrote: | With all due respect, you have no idea what you're | talking about. I worked there during the transition from | 3rd party CDNs to OpenConnect. We got off 3rd party CDNs | in 2013/4 and operated solely out of OpenConnect, in | large part because no 3rd party CDN was capable of | serving our amount of video at any price, including | Akamai. We weren't even streaming out of our own | datacenter anymore by the time I started, and that was | when streaming was still free with your DVD plan. | | And your timeline is all wrong too. Netflix didn't even | engage with the ISPs about bandwidth until long after | moving out of our own datacenter. We started the | OpenConnect program specifically to make it easier for | ISPs, there was no bullying. The spat you're thinking of | is that Comcast didn't want to adopt the OpenConnect but | also didn't want to appropriately peer with other | networks to give their customers the advertised speeds. | | And hardware cost is a hugely relevant parameter. Being | efficient with hardware is the difference between | profitable streaming at that scale and not profitable. | AtNightWeCode wrote: | You mean all the heat maps provided by Comcast and so on | from 2014(?) are incorrect? That they lied about all the | traffic from CA caused by N? | jedberg wrote: | Please link those heat maps. I think you're reading them | wrong. | 0x457 wrote: | > any edge cache solution | | Someone still has to do the R&D for edge cache? These slides | are about Open Connect - their own edge cache solution that | gets installed in partners racks (i.e. ISPs and Exchanges). | Before things that Netflix and Nginx implemented in FreeBSD, | hardware compute power was wasted on various things they | discuss in slides. | | Yes, you can throw money at the problem and buy more | hardware. | AtNightWeCode wrote: | Fair. Point taken. I answered the comment not the article. | seydor wrote: | I dont see the point. A centralized data hose that is replacing | what internet was designed to be : a decentralized, multi | routed network. The problem may be useful to them, but unlikely | to be useful to anyone who doesn't already work there. I dunno, | if it was possible to monetize decentralized or bittorrent | video hosting, i think it would solve the problem in a more | interesting and resilient way. With fewer engineers. | | But it's like, every discussion today must end with something | about the pay and head count of engineers. | paxys wrote: | While we are at it let's just put video streaming on the | blockchain! Who needs all these engineers and servers. | jedberg wrote: | But only seven people can stream at once! | RexM wrote: | Once you download the chain you can watch anything you | want! You'll have a local copy of _everything_ | oleganza wrote: | I understand and even share a little bit of your sentiment, | but I'm tired of stretched "X is now not what X was supposed | to be". | | Strictly speaking, the Internet was supposed to help some | servers survive and continue working together despite some | others being destroyed by a nuke. That is more-or-less the | case today: we see how people use VPNs to route around | censorship. Whether you were supposed to stream TikTok videos | directly from the phones of their authors or through a | centralized data hose - i'm not sure that was ever the grand | idea. | | Also "decentralized" and "monetize" don't go well together | because innovation is stimulated by profit margins and rent- | free decentralized solutions by definition have those margins | equal to zero (otherwise the solution is not decentralized | enough). | jedberg wrote: | It's funny you mention this. When I worked at Netflix, we | looked at making streaming peer to peer. There were a lot of | problems with it though. Privacy issues, most people have | terrible upload bandwidth from home, people didn't like the | idea of their hardware serving other customers, home hardware | is flakey so you'd constantly be doing client selection, and | other problems. | | So it turns out decentralized multi routed is not a good | solution for video streaming. | gizajob wrote: | Works great for storing pirated content though | jedberg wrote: | Usually you aren't live streaming your pirated content | right off other people's boxes. You download it first and | then view it. So you don't need every chunk available at | just the right time. | monocasa wrote: | Popcorn Time worked pretty well with just that model; | watching more or less immediately is it's downloaded in | order from the swarm. | jedberg wrote: | Did you actually use Popcorn time? It got stuck all the | time waiting for a chunk. Also, again, people sharing | pirated content don't care about privacy and are happy to | share their home hardware for other people to use. Paying | customers care about that stuff. | monocasa wrote: | I have; it worked flawlessly for content that was | decently seeded. And that's without the sorts of table | stakes you'd expect for a streaming platform like the | same content encoded at different bit rates, but chunked | on the same boundaries so you can dynamically change | bitrate as your buffer depletes. | | And I'm not sure most people actually care if their home | hardware is being used for whatever by the service | they're using, or else there'd be pushback on electron | apps from more than just HN. | | The sense I always got from Netflix's P2P work was that | it was heavily tied into the political battles wrt the BS | arguments that Netflix should pay for peering with tier 2 | ISPs. Did this work there continue much after that | problem went quieter? | PaywallBuster wrote: | Used it dozens of times, usually works fine for the | popular content. | | Good quality, barely any buffering. | | The niche content may be too difficult for a "live" | streaming experience. | naet wrote: | Somebody I know (cough) starts torrent downloads in | sequential order after downloading the first and last | chunk, and then opens the file in VLC while it is | downloading. | | Works amazingly well for watching something front to back | if your download speed is fast enough; you'd never know | it wasn't being streamed. The hardest part is finding a | good torrent for what you want to watch. Ironically the | Netflix catalog is one of the most easily available to | pirate since people rip it directly from web. | seydor wrote: | recently i see a lot of people with very high upload | speeds. Nobody is using them though, but nominally they are | there. | jedberg wrote: | Sure, very recently. But all the other issues still | apply. A real time feed from random people's machines is | very difficult at best. | seydor wrote: | I ve watched a lot of HBO (not available here) on popcorn | time | chasd00 wrote: | wouldn't a peer-to-peer setup be a non-starter legally? | ..or at least incredibly high risk. I could see major ISPs | complaining if Netflix is using the upstream side of the | ISP's customers for profit. | jedberg wrote: | Yes. :) | tinus_hn wrote: | No, Microsoft is doing the same thing and nobody cares. | Just mention it the small print in the agreement and | offer a way to turn it off. | onlyrealcuzzo wrote: | To be pedantic, scaling by itself isn't _that_ difficult. | | Scaling cost-effectively is. | sllabres wrote: | Tell that e.g. Tesla | | What I've read they burned a lot of money and hat large | problems scaling nevertheless. Which I don't find too | surprising, not because they are unable, but because it isn't | easy to scale. | | From my experience and from what I read scaling people | roughly a power of ten is a larger change in an organisation | and therefor likely a challenge. For _any_ technical process | the boundaries might not be strictly a power of ten but i | would say that scaling a power of a hundred is a challenge if | this value is not already reached on any process in your | organisation. | onlyrealcuzzo wrote: | True. | | Scaling to - say - Paramount+ size should not be difficult | if you're willing to pay AWS / Azure / GCP 10-100x what it | would cost to serve it yourself (which in many cases | actually makes sense). | | It's possible at Netflix's size, they couldn't just run on | AWS anymore. Though, given enough lead time and a realistic | growth curve - I'm sure it's feasible. | | Obviously scaling manufacturing is not a solved problem | like (realistically) scaling network and compute usage. | yibg wrote: | Serving Netflix streaming traffic from AWS would be... | unwise. One the bandwidth cost would be enormous even if | they can handle it. And two I doubt they can handle that | much traffic. | eru wrote: | Yes, and No. At some point, even scaling at all would be | hard. | | (Just like sending a human to Alpha Centauri is hard, even if | you had unlimited funds.) | Dylan16807 wrote: | Like it how? Accomplishing a grand feat is nearly the | opposite of scaling. | | If Netflix built out more slower servers, that would be | acceptable scaling. I don't see any plausible scenario | where that becomes too difficult. Even if they had billions | of subscribers. | toast0 wrote: | Eh, sending a human to Alpha Centauri wouldn't be that | hard... Although it would be difficult to know for sure if | they arrived, and for ease of transport, you may want to | send a dead human. | kaba0 wrote: | It depends entirely on the problem domain. Sure, it is more | of a devops problem when the problem is trivially | parallelizable, but often you have a bottleneck service (e.g. | the database) that has to run on a single machine. No matter | how many instance serves the frontend * if every call will | have to pass through that single machine. | | * after a certain scale | le-mark wrote: | > Too many people today think that programming starts and ends | at pulling a dozen popular libraries and making some API calls. | | The needle keeps moving doesn't it? A tremendous breadth of | difficult problems can be effectively addressed by pulling | together libraries and calling APIs today that weren't possible | before. Today's hard problems are yesterday impossibilities. | The challenge for those seeking to make an impact is to dream | big enough. | ezconnect wrote: | The basic problem is the same, pushing the hardware to its | limits. | whatshisface wrote: | The basic problem is delivering value to someone. | ezconnect wrote: | Programmers are not passionate to deliver value to | someone, that's the businessman problem. | bcrosby95 wrote: | Not every programmer is passionate about the same thing. | I got into this field because I love building things that | make people's lives easier. | echelon wrote: | Sure. | | Anecdotal, but most of the people I've worked with as ICs | couldn't give a damn about that. They want dollarydoos. | | One of the 10X-ers I know (they exist and are real), told | me repeatedly how he'd much rather be doing his own | thing. He hates the business needs. But income is | important and that's why he's dedicated to doing it. I'm | surprised at how focused and good he is given his | disposition, and I want to hire him when I scale my | business more. Drive and passion are sometimes just | spontaneous. | | An old CEO of mine even quipped that we were not family | and that we were there to do a job. All true. Most of the | people doing that job were only there for the money. | | Most jobs that drive sales and revenue simply aren't fun | or rewarding. There's lots of infrastructural glue and | scaling. Tiring, boring, monotonous work. 24/7 oncall | work. The money is good, though. | [deleted] | [deleted] | nwallin wrote: | > the popular "why does Netflix need X thousand engineers, I | could build it in a weekend" sentiment that is frequently | brought up on forums like this one. | | I don't think that's a popular sentiment about Netflix. | Twitter, Reddit, Facebook, yes, but Netflix, YouTube, Zoom, not | so much. | mihaic wrote: | I don't think this actually answers why Netflix needs to many | engineers. This seems like the sort of thing that one or two | experienced engineers would spend a year refining, and it would | turn out like this. | | This is the sort of impressive work that I've never seen scale. | drewg123 wrote: | Author here... Yes, most of this work was done by me, with | help from a handful of us on the OCA kernel team at Netflix | (and external FreeBSD developers), and our vendor partners | (Mellanox/NVIDIA). | | With that said, we are standing on the shoulders of giants. | There are tons of other optimizations not mentioned in this | talk where removing any one of them could tank performance. | I'm giving a talk about that at EuroBSDCon next month. | tetha wrote: | The way I've been putting it to people lately is: Never | underestimate how hard a problem can grow by making it big. And | also, at times, it is hard to appreciate how difficult | something becomes if you haven't walked the path at least | partially. | | Like, from work, hosting postgres. At this point, I very much | understand why a consultant once said - "You cannot make | mistakes in a postgres 10GB or 100GB and a dozen transactions | per second in size". And he's right, give it some hardware, | don't touch knobs except for 1 or 2 and that's it. The average | application accessing our postgres clusters is just too small | to cause problems. | | And then we have 2 postgres clusters with a dataset size of 1TB | or 2TB peaking at like 300 - 400 transactions per second. | That's not necessarily big or busy for what postgres can do, | but it becomes noticeable that you have to do some things right | at this point and some patterns just stop working hard. | | And then there are people dealing with postgres instances 100 - | 1000x bigger than this. And that's becoming tangibly awesome | and frightening by now, using awesome in a more oldschool way | there. | mlrtime wrote: | Not only make it big, engineer it in a way that makes it | profitable for the business. | | I'm sure there are many teams that could design such a | network with nearly unlimited resources, but it is entirely | different when you have profit margins. | victor106 wrote: | As someone once said "Big is different" | Sytten wrote: | I think a fair criticism would be how many engineers they have | compared to their competitors. Disney+ is on a similar scale, | can they do the same/similar job with less people? And | considering netflix pays top of market, how much does Disney | spends for their engineering effort to get their result. Would | netflix benefit from just throwing more hardware at the problem | vs paying more engineers 400-500k/y to optimize? | paxys wrote: | Disney (the company) has 20x the number of employees as | Netflix, and just 2x the market cap (in fact they were | briefly worth the same last year), ~2x the revenue and 2/5 | the net income. So Netflix is clearly doing something right. | eru wrote: | Perhaps they are just running different business models? | | Walmart's market cap per employee is probably much, much | lower than Disney or Netflix, too. That doesn't mean | Walmart is doing anything wrong. | ziddoap wrote: | > _Disney (the company) has 20x the number of employees_ | | Is that all of Disney or just Disney+? | | It doesn't seem like that would be a useful statistic if | that includes completely unrelated positions (e.g. does | that 20x statistic include Disney employees working at | Disney Land/World serving up hotdogs? Because they probably | don't contribute much to the streaming service) | briffle wrote: | Netflix also has production studios they now own making | content. | thfuran wrote: | Content like hotdogs at an amusement park? | diab0lic wrote: | https://bridgertonexperience.com/san-francisco/ | | https://strangerthings-experience.com/ | scrlk wrote: | Disney Streaming had 850 employees as of 2017 [0] (can't | find any newer figures); LinkedIn is suggesting 1k-5k. | | [0] https://en.wikipedia.org/wiki/Disney_Streaming | rybosworld wrote: | That seems like a fair point if you just consider the video | streaming. I know that Netflix wants to break into gaming. | I'd imagine the bandwidth required for that is higher than | streaming videos. | jon-wood wrote: | It's really not, especially if you look at their current | model for doing so. Netflix at the moment are breaking into | mobile gaming, which means the bandwidth requirements are | placed on Apple/Google's app store infrastructure. I'd be | surprised if Netflix don't have any sort of metrics | gathering infrastructure to judge how much people are | playing those games, but they're also likely reusing the | same infrastructure used by Netflix video streaming for | that, so the incremental increase in load may well be | negligible. | rybosworld wrote: | I was referring to their plans for a game streaming | service. | jwmoz wrote: | I watch Disney content sometimes and it constantly drops or | freezes, you can see the difference in quality compared to | Netflix. | bmurphy1976 wrote: | Yeah, you can totally see the difference. Netflix encoding | looks like shit. | | I've done a lot of video processing professionally (the | server side stuff, exactly what Netflix does) and Netflix | is by far the worst of all the streaming providers. They | absolutely sacrifice the quality of the video to save | bandwidth costs in aggregate and it shows (or more | accurately it doesn't show, all the fidelity is lost). | mkmk wrote: | Do you think it's worth it to pay the extra $5-10/month | for premium quality? | https://help.netflix.com/en/node/24926 | bradstewart wrote: | Even the Premium 4k streams have surprisingly low | bitrates and, occasionally, framerates. I dug out the | blu-ray player the other day and was absolutely shocked | how good things looked and, even more so, _sounded_ --the | audio quality from Netflix (and most streaming services, | really) is simply atrocious. | jedberg wrote: | Are you getting the best Netflix encodings? You might be | getting worse quality because your ISP throttles Netflix. | bagels wrote: | Your isp may be throttling bandwidth for Netflix, leading | to lower quality encodings being served to you. Comcast | does this, for instance. | nicce wrote: | You can't make such conclusions from your own experience. | It is one form of bias. There are many variables. For me it | is the opposite, for example. | iamricks wrote: | Standing on the shoulders of giants, Netflix engineers didn't | have blog posts from other companies on how to handle the | scale they started facing. Facebook didn't have blog posts to | reference when they scaled to 1B users. They pay for talent | that have built systems that had not been built before and | they have seen a return on it so they continue to do it. | wowokay wrote: | Hulu was around before netflix | gavin_gee wrote: | yeah and have you see the awful performance of Hulu? its | basically unusable. poster child for under investing in | the streaming platform. | paxys wrote: | Huh? Netflix predates Hulu by over a decade. | msh wrote: | Hulu was never Netflix scale. YouTube is a better | example. | birdyrooster wrote: | Not even close. YouTube has orders of magnitude more | content and vastly more users. Google Global Cache was | the inspiration for Open Connect. | jedberg wrote: | Youtube is very different than Netflix from a technical | problem perspective. They serve free videos to anyone | around the world that are uploaded by users. | | It's closer to a live streaming problem than pre-encoded | video like Netflix. | | Having worked at Netflix I can say that the YouTube | problem is much more complex. | why_only_15 wrote: | I wonder what portion of Youtube's request traffic can be | served with cache servers at the edge with a few hundred | terabytes of storage. There's a very long tail but i | would guess a significant portion of their traffic is the | top ~10,000 videos at any given moment. | spockz wrote: | There was a Google organised hackathon on this topic. | Given a set of resources, locations, and (estimated) | popularity, Optimise for video load time by determining | what should be moved to the cache when and where. | Cerium wrote: | Sure? "After an early beta test in Oct. of that year, | Hulu was made available to the public on March 12, 2008-- | a year after Netflix launched its own streaming service." | | [1] https://www.foxbusiness.com/technology/5-things-to- | know-abou... | esotericimpl wrote: | pclmulqdq wrote: | The engineers are definitely cost-effective at this scale. | They may be the highest-leverage engineers at the company in | terms of $ earned from their efforts compared to $ spent. The | improvements that come from performance engineers at large | companies are frequently worth $10M/year/person or more. | | Most companies maintain internal calculations of these sorts | of things, and make rational decisions. | gregsadetsky wrote: | Sorry for the tangent, but really curious to ask: | | When you say that companies maintain internal calculations | of the benefits, would you say that it's (extremely | roughly) something like: $10M benefit, need 5 core | engineers + benefits + PM + testing lab etc etc -> we can | spend up to $500k per eng give or take. | | Or is the $10M one number (that would be held somewhat | secretly internally at the company) and the salaries mostly | represents where the market is? Does the (salary) market | take into account the down-the-line $10M value? | | Basically, could those engs negotiate to be paid more, or | are they already sort of paid close to exactly what the | group they're part of generates in terms of revenue? | | Thanks! | | -- | | I see that you said $10M per person, not for the "network | optimization group". Hmm. So it would be fair to say that | the engs are definitely not paid according to the value | they generate..? I wouldn't be surprised by that but just | to confirm. | pclmulqdq wrote: | The simple fact is that you are not paid for the value | you create. You are paid based on the salary you can | demand. For performance engineers, $10 | million/year/person opportunities are kind of rare, | meaning that you can't demand close to that. Your | alternatives to big tech are things like wall street, | which pay very well, so you can demand a higher salary | (and/or higher level) than a normal engineer of your | skill would get. However, this is nowhere near the value | of the work. | Hermitian909 wrote: | Not OP, but 1 engineer -> 10M of benefit sounds right for | my company. | | In terms of negotiation, it really depends on how | differentiated your skills are. Short answer is that if | you can convince management that it would be difficult to | find other engineers who could deliver the optimizations | you're delivering, yes, you have leverage. | pclmulqdq wrote: | This is exactly right about negotiation and your | skillset. I have seen performance engineers in the right | place at the right time get 10-20% of their benefit to | the company (I have seen both $1 million/year | compensation for line workers and $10+ million/year for | very senior folks). | | Very highly skilled engineers in specific niches can | basically price themselves like monopolists, because the | company can easily figure out how much money they are | leaving on the table by not hiring them. This is not like | "feature work" engineers, whose value is very nebulous | and unknown. | donavanm wrote: | If you are an employee there is little to no relationship | between your output and your compensation. Employer | employee relationships are based on the _cost to the | employer_ to secure equivalent or better output. | | Secondly, yes $10M per employee of revenue or cash flow | is pretty reasonable for similar companies. The | prioritization is NOT "how many employees per $MM." The | allocation is "what opportunity is the highest $MM return | per available employee." | toast0 wrote: | > Would netflix benefit from just throwing more hardware at | the problem vs paying more engineers 400-500k/y to optimize? | | Where the CDN boxes go, you can't always just throw more | hardware. There's a limited amount of space, it's not | controlled by Netflix, and other people want to throw | hardware into that same space. Pushing 800gbps in the same | amount of space that others do 80gbps (or less) is a big | deal. | [deleted] | slillibri wrote: | Disney bought a majority ownership in BAMTECH to build | Disney+. | entropie wrote: | I wasn't able to watch disney+ via chromecast for like a year | in 4k. Stuttering every 10 seconds or so. I never had | problems like this with netflix. | criddell wrote: | I guess you weren't a Comcast customer in 2014 trying to | watch Netflix and getting low quality, stuttering video. At | the time lots of people tried to frame it as a net | neutrality issue but in the end I think it was a peering | dispute that involved a third party. | | https://www.wsj.com/articles/SB1000142405270230483470457940 | 1... | loopercal wrote: | I think this just validates their points. Netflix has | more engineers and 8 years of them building and fixing | things, so they have fewer issues. | rakoo wrote: | Sure, if you place yourself in an arbitrarily hard problem, it | takes a lot to solve it. "How we dug a 100m pit without using | machines in 2 days" is an incredible feat, but the constraints | only serve those who put them. | | Serving large content has been solved for decades already. It's | much easier and reliable to serve from multiple sources, each | at their maximum speed. Want more speed ? Add another source. | Any client can be a source. | | Netflix artificially restrains itself by only serving from | their machines. It is a very nice engineering feat, but is | completely artificial. As a user it feels weird to think of | them highly when they could just have gone the easier road. | zinclozenge wrote: | How would you do it if you had much more modest scale | requirements? Say a few thousand simultaneous viewers. I'm | kicking around an idea for a niche content video streaming | service, but I don't know much about the tech stacks for it. | vagrantJin wrote: | A few thousand? | | Just use Nginx and a backend lang of your choosing. | zinclozenge wrote: | Not even bother using a cdn? | ev1 wrote: | For low-traffic niche content that might not be a cache | hit in the first place in every region? | | I wouldn't bother. Unless you use storage at the CDN - | which is probably very not cost effective for you. | rakoo wrote: | Use bittorrent. Every viewer is also a source. The more | people watch, the less your servers are loaded. | | Bittorrent is built towards "offline" viewing. Try Peertube | for a stack that is more built for streaming and has | bittorrent sharing built-in (actually webtorrent, because | the browser doesn't speak raw TCP or UDP, but the idea is | the same) | jedberg wrote: | The constraint is profit. Sure, with unlimited money you can | just keep getting more and more servers. But that costs | money. It would end up swamping any profit to be made. | | By creating this optimized system, it makes serving that much | video _profitable_. | rakoo wrote: | No, the constraints is _only you serve content_. But once | the content is distributed, anyone else can also distribute | it. | jedberg wrote: | I'm curious as to who you think would pay for the video | if anyone could distribute it and watch it. | yibg wrote: | And break the profit and also probably legal constraints. | Good job now you don't have a company anymore. | dmikalova wrote: | This just isn't true though. I worked at a relatively minor | video streaming company and we overloaded and took down AWS | CloudFront for an entire region. They refused to work with us | or increase capacity because the datacenter (one of the ones | in APac) was already full. This was on top of already | spreading the load across 3 regions. We only had a few | million viewers. | | We ended up switching to Fastly for CDN. There's something | hidden here though that becomes a problem at Netflix size. We | were willing to pay the cloud provider tax, and we didn't dig | down into kernel level or storage optimizations because off | the shelf was good enough. At Netflixes scale, that adds up | to millions of extra server hours you have to pay for if you | don't do the 5% optimizations outlined in the article. | rakoo wrote: | You still have the same constraints: only you can serve | content. | | The solution I'm talking about is bittorrent. The more | people watch your content, the less your servers bear load. | That is using the internet to its best potential instead of | reverting back to the centralized model of the big shopping | mall and its individual users. | rvnx wrote: | I think nobody said Netflix' infrastructure can be built in a | weekend. However, the scale doesn't matter that much after a | certain point once the scaling "wall" has been pierced. If you | are a biscuit factory that produces 100'000'000 biscuits per | year or 500'000'000 biscuit per year then the gap between 100M | and 500M isn't that impressive so much anymore as it's mostly | about scaling existing processes. However, if you turn a 1'000 | biscuit shop into a 1'000'000 biscuits company then it's very | impressive. | bmurphy1976 wrote: | Nonsense. | | It's still impressive. A 5x increase at that scale can be a | phenomenal challenge. Where do you source the ingredients? | Where do you build the factories (plural because at that | scale you almost certainly have multiple locations in | different geographic locales subject to different regulatory | structures). Where do you hire the people? How do you manage | it? What about the storage and shipping and maintenance of | all the equipment and on and on? How much do you do in house | how much do you outsource to partners? What happens when a | partner goes belly up or can't meet your ever increasing | needs? | | Your comment is a great example of what the OP pointed out. | jon-wood wrote: | My favourite example of these sort of extreme scaling | issues are the fact that McDonald's apparently declined to | sell products with blueberries in them because modelling | showed they'd have to buy the world's entire supply of | blueberries in order to do so. | notamy wrote: | I thought this was hyperbolic, so I looked into it: | | > _The menu team comes up with interesting ideas like | including kale in salads. The procurement team and | suppliers then try to get the menu team to understand the | challenges. How do you bring kale to 14,000 restaurants? | As one example, when they introduced Blueberry Smoothies | in the U.S., McDonald's ended up consuming one third of | the blueberry market overnight._ | | https://www.forbes.com/sites/stevebanker/2015/10/14/mcdon | ald... | | I couldn't find any other source to back it up, but still | wow! That's an absurd number. | menzoic wrote: | McDonald's sells blueberry muffins | indigodaddy wrote: | So there is an extreme dearth of blueberries I guess | compared to other food goods? I mean, McDs isn't taking | over the entire supply of potatoes or chickens for | example correct? | zaroth wrote: | I think the point is that the supply chains probably need | upwards of years of time to adapt in some cases, you | can't just turn on a recipe that needs a full cup of | blueberries per serving on Monday and expect there to be | a spare million cups of blueberries to be lying around | the supply chain on Tuesday. | | In the case in animal product, there are almost certainly | major operations worldwide that have been built and | financed purely to serve McDonalds demand. They probably | have to even build these out well before entering some | markets. | UncleEntity wrote: | They grow a lot of potatoes in the US. Last week I hauled | a load of tater tots destined for McDonalds. I've hauled | potato products for McDonalds quite often. | | They raise a lot of chickens in the US. I've hauled | chicken nuggets or chicken breasts for McDonald's in the | past quite often. | | I can't even tell you where they grow blueberries. | belinder wrote: | It's a good point, and I think it's an interesting | comparison. Obviously improving by a factor of 1000 is better | than improving by a factor of 5. But the absolute improvement | is still 4 times larger. 400'000'000 extra biscuits is going | to bring a lot more revenue than 999'000 biscuits | paxys wrote: | It's the exact opposite. | | Taking the software example, you can easily scale from 1 to | 100 users on your own machine. You can handle thousands by | moving to a shared host. Using off-the-shelf web servers and | load balancers will help you serve a million+. From there on | you'll have to spend a lot more effort optimizing and fixing | bottlenecks to get to tens, maybe hundreds of millions. What | if you want to handle a billion users? Five billion? Ten | billion? It always gets harder, not easier. | | Pushing the established limits of a problem takes | exponentially more effort than reusing existing solutions, | even though the marginal improvement may be a lot smaller. | Getting from 99.9% to 99.99% efficiency takes _more_ effort | than getting from 90% to 99%, which takes more effort than | getting from 50% to 90%. | | You never pierce the scaling wall. It only keeps getting | higher. | xuhu wrote: | If you can serve 1K users with 10 employees, you can | probably serve 1M users with 10k employees. | kaba0 wrote: | And you can birth one baby in 3 months by 3 women, right? | | To add something useful as well besides snark, first of | all, there are hard physical limits, which are sometimes | well within context (you really shouldn't try to | outcompete light speed for example, relevant in some | high-freq trading, infrastructure projects). Then you can | try to increase headcount to any number, you won't | produce for example a better compiler. There are simply | jobs that are more "serial" - the only way to win at | those is to try to employ the very best of the field in a | small team. | xuhu wrote: | No, just 3 babies in 9 months. | sllabres wrote: | That won't help your customer expecting their 'baby' | after three month due to the increased mother-workforce | ;) | kaba0 wrote: | You can deliver DVDs to netflix subscribers as well to | achieve a much bigger throughput, but I doubt they would | be as popular as they are right now :D | beckingz wrote: | Sneakernet! | | "Never underestimate the bandwidth of a station wagon | full of tapes hurtling down the highway." -Andrew | Tannenbaum | bmurphy1976 wrote: | That's too simplistic. What about the doctors and medical | facilities and other supporting infrastructure? What | about the baby food and medicine and clothing and | supplies and what about the people to take care of the | children? You think you can just keep throwing more women | at a hospital having babies to infinity and not have any | problems? | Dylan16807 wrote: | There's a limit on those things, but it might as well be | infinity when you're trying to have 1 baby or 3 babies or | 100 babies. | | 1M users and 10k employees is not in the range where you | have crushingly impactful logistics. | Dylan16807 wrote: | But the goal wasn't better or faster. It was giving more | customers the same service. You're talking about a | completely different problem. | beckingz wrote: | Remember that global productivity usually does not scale | with headcount! | | Each employee adds some overhead, which requires more | employees... which requires more employees. | Mavvie wrote: | Sounds like the rocket equation! Perhaps big companies | are rocket science? | loopercal wrote: | If you told McDonald's to double their number of McRibs | produced next year that would be an incredible challenge to | meet. They already sell enough that it affects the global | pork market, it'd be insane for them to double their demand | for pork. What about other supplies, would this result in a | reduced burger demand? How can they ensure they can respond | appropriately either way? They probably run near | fridge/storage capacity, does increasing this mean they need | to also increase storage at restaurants? | | That's a 2X increase. Now do it again and a half for a 5x. | Crazy to say there's a "scaling wall" that once you "pierce" | it's easy to scale up. It's the opposite, McDonald's already | knows how to supply and sell X McRibs a year, there's no | company that's ever sold 5X those McRibs so they have to | figure it out themselves. | rkagerer wrote: | There's an old rule of thumb that each order of magnitude | increase (10x) brings a whole new set of challenges. | | Anecdotally I experienced this when scaling my software | product from 1 --> 10 --> 100's --> 1000's etc. of users. | | Thats not to say 2x can't be a substantial challenge, as | you pointed out. It gets harder (and IMO more fun) when | you're at the bleeding edge of your industry. | bombcar wrote: | Part of it depends on if "build it five more times, again" is | a viable strategy. | | Building five "Netflixes" with identical content is possible; | the amount of content wouldn't change (it would decrease, the | cynic says); you just need parallel copies of everything | (servers, bandwidth, etc). | | The fun would come in syncing usernames, etc through the | system. | | It's an entirely different class of problem compared to | "acquire resource, convert it, sell it". | zeroxfe wrote: | > the gap between 100M and 500M isn't that impressive | | This is absolutely not true. The closer you are to peak | performance, the harder it is to scale, and the returns | diminish heavily. At many major tech companies, there's a | huge amount of effort into just 1% - 5% optimizations -- | these efforts really require creative thinking and complex | engineering (not just "scaling existing processes".) At the | volumes these companies operate, even a 1% optimization is | quite significant. | carlhjerpe wrote: | Aren't you contradicting yourself? | | If you're on 100M users you're probably scaling vertically. | So adding 5x more hardware shouldn't be a problem. | | But when you're at 500M all of a sudden it makes sense to | optimize further since the capital saved will be the same | percentage(ish) but the money is worth peoples time all of | a sudden. | | I know that we don't care particularly about power savings | in the DCs I've worked in, because they're relatively | small. While bigtech will do all kinds of shenanigans to | save a couple watts here and there, because it's worth it | across your hundreds of thousands of servers. | beoberha wrote: | Seeing scale issues as purely hardware bound is | incredibly naive. Even in a case like streaming, if | you're pushing more bits through the wire, it's likely | the increase in usage causing the traffic increase | affects the software systems you have in place to support | your service start degrading and you need to rearchitect | them. Very few problems at that scale can be solved by | throwing more hardware at them. | Quarrelsome wrote: | > why does Netflix need X thousand engineers, I could build it | in a weekend | | I would like to hope nobody asks that. Video is the one of the, | if the not the hardest data plumbing use-case on the internet. | dragontamer wrote: | I'd say realtime communications is harder. | | A lot of these tricks being discussed here cannot be applied | to Skype calls. | OJFord wrote: | Surely GP would agree, unless you mean even audio-only | calls? Otherwise it's just an extra requirement(s) on top | of 'video'. | dragontamer wrote: | The amount of transcoding needed to get a conference call | up hurts my brain. If 20 people are talking on Skype, the | server needs to receive those 20 streams, decode them, | mix the audio together, recode the streams, and then | broadcast it back out to all 20 people. | | I'm not a telecommunications guy, but I had some | professors back in college explain how difficult and | fundamental the research of "ma-bell" was from the 60s | through 80s. I'm talking Erlang, C++, CLOS circuits, etc. | etc. The innovations from Bell Labs are nearly endless. | | Telephone communications is one of the biggest sources of | fundamental comp. sci research over the 1950s through | 2000s. | nordsieck wrote: | > I'm talking Erlang, C++, CLOS circuits, etc. etc. The | innovations from Bell Labs are nearly endless. | | A lot of innovations did come out of Bell Labs. But I'm | pretty sure Erlang wasn't one of them. | dragontamer wrote: | Oh, you're right. There seems to have been a glitch in my | memory somehow. Still, its Ericsson, which is | telecommunications nonetheless. | eru wrote: | > Otherwise it's just an extra requirement(s) on top of | 'video'. | | I'm not sure what you mean? Real time communication, both | video and audio-only, have much lower latency | requirements. You can't just buffer ahead when you have | some spare bandwidth, like Netflix or YouTube can. | OJFord wrote: | Yeah that's what I'm kind of facetiously calling 'just an | extra requirement'. | | My point was intended to be that there's the same | challenges and more - but it's not something I've thought | about in depth (and certainly not had to work with), it | maybe wasn't a very good characterisation because it's | not the same on the other side either, no large file to | serve because at the start of the call it doesn't exist | yet for example, so perhaps I take it back. | eru wrote: | Yeah, jon-wood really did a great write-up of the | challenges involved. | | In any case, it's hard to say what the 'greatest' | engineering challenge is. You can make almost any kind of | engineering really challenging, if you (or the market..) | sets yourself a very low cost-ceiling. | bombcar wrote: | Video calling is "easier" in that the p2p option is | workable for some variant of "Works". | | It is _much much harder_ because you can 't do the "cache | everything on the edge" solution. If storage was | infinitely cheap and small, Netflix could run their | entire business by sending a USB stick with _every single | movie /tv show they have_ on it encrypted to you, and | everything would play locally. This is basically what | they do with their edge servers/CDNs. | | You can't do that with video calls, because the | video/audio didn't exist 1 millisecond ago. | jon-wood wrote: | Streaming pre-recorded video and streaming realtime video | are almost entirely different use cases. | | Pre-recorded video streaming is, under the hood, really | just a high-volume variant of serving up static web | pages. You have a few gigabytes of file to send from the | server its stored on to the device that wants to play | back the video. As this presentation demonstrates that | isn't trivial at scale, but the core functionality of | sending files over the internet is what it was designed | to do from day one. Because you can generally download | video across the internet faster than it can be played | back its possible to build up a decent sized buffer which | allows you to paper over temporary variance in network | performance without the customer noticing. | | Realtime video streaming has two variants. One to many | Twitch style video streaming is relatively simple, since | you can encode video into files and upload them to a | server for people watching to download those files. This | is how HLS streaming works, and most of the techniques | Netflix use to optimise video delivery can also be | applied here at the cost of adding latency between the | event being streamed and people consuming it. That | latency will often sit at about 30 seconds, and people | generally find that acceptable. | | Skype style realtime video streaming is much harder. | You're taking video from one person's camera, and then | sending it over the internet to one or more people's | device. You can't do any sort of pre-processing on that, | or stage the video on servers closer to the consuming | users, because you have no way of generating that video | until the point your users decide to start talking to | each other. Because you can't pre-stage that video you | need to be able to establish a network route between the | people on a call, potentially in an environment where | none of the participants have any open connection from | the internet directly to the device they're streaming | from. Slight fluctuations in network performance can | potentially degrade video delivery to the point of it | being unusable. The most common route to deal with that | is systems that attempt to establish a direct connection | (ideally over a local network) between participants, and | if that doesn't work going via relay servers operated by | the software provider. These servers provide a single | point on the internet all parties can connect to, and | then allow passing packets as if they were all on the | same network. | jdyyc wrote: | I work on a very technically trivial service at a large | company. | | It's the kind of thing that people run at home on a raspberry | pi, docker container or linux server and it consumes almost no | resources. | | But at our organization this needs to scale up to millions of | users in an extremely reliable way. It turns out this is | incredibly hard and expensive and takes a team of people and a | bucket of money to pull it off correctly. | | When I tell people what I work on they only think about their | tiny implementation of it, not the difficulty of doing it at an | extreme scale. | RektBoy wrote: | csmpltn wrote: | At this point, they should've just gone for an in-house bare- | bones operating system that supports the bare minimum: reading | chunks from disk, encrypting them, and forwarding them to the | NIC. | | Besides that, it sees like all of the heavy lifting here is done | by Mellanox hardware... | wmf wrote: | FreeBSD is their "in-house" operating system since they modify | it to do what they want. | csmpltn wrote: | But do they really need an entire operating system for what | amounts to simply copying around chunks of data? I think they | could've gone for some slim RTOS-ish solution instead: no | user-mode, no drivers, bare minimum. | wmf wrote: | They're using the FreeBSD filesystem and network stack, | both of which are significant amounts of code. I guess they | could have tried the rump kernel concept but it sounds like | a lot of work. | drewg123 wrote: | I worked on an OS like that once. The problem is with "all | the other stuff" that you need to support that's outside | the core mission of your OS. You wind up bogged down on | each additional feature that you need to implement from | scratch (or port from another OS with a compatible | license). With FreeBSD, all this comes for free. | | We chose to use FreeBSD, and have contributed our code back | to the FreeBSD upstream to make the world a better place | for everyone. | drewg123 wrote: | Its only doing the crypto. The VM system and the TCP stack are | doing most of the heavy lifting, and are both stock FreeBSD. | yrgulation wrote: | This is innovation and proper engineering. They choose freebsd. | Shows they are not afraid of solving actual hard problems that | yield impressive results. These are the types of engineers i'd | hire in a heart beat - if i ever was to own a successful company. | | Simply following trends and doing what everyone else does leads | to mediocre results and the assembly line type of work that most | software development has become. | faizshah wrote: | I have a bit of a naive question. If TLS has this much overhead | how do HFT and other finance firms secure their connections? | | I know they use a number of techniques like kernel bypass to get | the lowest latency possible, but maybe they have explored some | solution to this problem as well. | shiftpgdn wrote: | Mellanox cards or private links | wyager wrote: | TLS doesn't really add latency on top of TCP after you make the | initial connection - it mostly adds a bit of extra processing | overhead for encryption. HFT firms aren't usually encryption- | bandwidth-constrained. I'm not actually sure if most exchange | FIX connections or whatever actually run over TLS, but that | would be reasonable. | drewg123 wrote: | TLS has the highest overhead when you're serving data at rest, | like static files that are not already in the CPU cache. For | serving dynamic data that is in the CPU cache, TLS offload | matters a lot less. Our workload is basically the showcase for | TLS offload. | naikrovek wrote: | i love (love) how everyone else who answered this question | alongside you made what appears to be a complete stab in the | dark guess while only you knew the answer. | | never be afraid to admit that you don't know something. | guessing wrong is a much worse look than not answering at | all. | ddmitriev wrote: | Trading connections that go over private links such as cross- | connects between the firm's and the exchange's equipment within | a colocation facility are not encrypted. | AtlasBarfed wrote: | HFT needs to be outlawed. | | No exchange should allow trades to complete, I would argue in | any time less than 15 minutes, and each trade should have a | random 1-15 minute delay pad on top of that. | | The HFT access only serve the larger financial firms, and are | used to do frontloading and other basically-illegal tricks. It | provides anti-competitive advantage to large firms for markets | that are supposed to open access/fair trading. And of course it | leads to AI autotrading madness. | | I get that it keeps a lot of tech people very well compensated, | but it is either in the service of unregulated fraud at worst | and unfair advantage at best. | jonahhorowitz wrote: | Not really germane to the topic, but a financial transactions | tax would effectively kill HFT without the complexity that | you're suggesting. | [deleted] | theideaofcoffee wrote: | This is a remarkable technical achievement that builds on all of | its past work, as are the other updates from Netflix in the past | with serving ever more traffic from a single box. That said, I | still find it terrifying that so many users would be affected by | a single machine going down, that blast radius is so huge! | | Do we know if the rates that these hosts serve actually make it | into production? Or do they derate the amount they serve from a | single host and add others? | lanstin wrote: | I think they buffer and if the stream has issues the client | connects to another host. They have been doing chaos monkey for | a long long time. | drewg123 wrote: | As I said in a parallel comment, this is a testbed platform to | see what problems we'll encounter running at these speeds. | Production hosts are single socket, and can run at roughly 1/2 | this speed. | | I regret that I've crashed boxes doing hundreds of Gb/s. | Thankfully our stack is resilient enough that customers didn't | notice. | qwertox wrote: | Discussion of the same presentation 11 months ago, when the title | was 400GB/s. | | https://news.ycombinator.com/item?id=28584738 | | This was the video which was posted back then alongside the | slides: https://www.youtube.com/watch?v=_o-HcG8QxPc | ndom91 wrote: | Video of this presentation available here: | https://cdnapisec.kaltura.com/index.php/extwidget/preview/pa... | forgot_old_user wrote: | thank you! | haunter wrote: | As a total outsider it looks like FreeBSD is the "silent" OS | behind lot of the big money projects. Not just this but recently | learned it's the base OS of the Playstation 4 and 5 system too. | Is there a reason why FreeBSD is so popular? Just general | reliability? And why not the other BSD projects? Also one, like | me, would assume Linux is behind all of these but alas not. | keewee7 wrote: | The sysadmin experience on FreeBSD used to be more opinionated | than on Linux. This was before most Linux distros adopted | systemd. | | The reason companies like Sony and Apple pick FreeBSD is | because they get an open source POSIX-compliant OS they can | drastically modify down to the kernel level without having to | open source their modifications. | Thev00d00 wrote: | Sony used it because they got an entire OS for free with no | obligation to release the source. | naikrovek wrote: | Blame the GPL for this. The GPL is directly responsible for the | livelihood of many/all BSD developers, and i could not be | happier about that. Linux is overrated in a lot of ways. | robocat wrote: | However the GPL is irrelevant in this case because the | NetFlix Open Connect appliance is not sold to ISPs. The GPL | is only relevant if you are distributing GPL software e.g. | Sony PlayStation. | trunnell wrote: | The OpenConnect team at Netflix is truly amazing and lots of fun | to work with. My team at Netflix partnered closely with them for | many years. | | Incidentally, I saw some of their job posts yesterday. If you | think this presentation was cool, and you want to work with some | competent yet humble colleagues, check these out: | | CDN Site Reliability Engineer | https://jobs.netflix.com/jobs/223403454 | | Senior Software Engineer - Low Latency Transport Design | https://jobs.netflix.com/jobs/196504134 | | The client side team is hiring, too! (This is my old team.) | Again, it's full of amazing people, fascinating problems, and | huge impact: | | Senior Software Engineer, Streaming Algorithms | https://jobs.netflix.com/jobs/224538050 | | That last job post has a link to another very deep-dive tech talk | showing the client side perspective. | bagels wrote: | Slide says "we don't transcode on the server" | | Surely they transcode on some server? Maybe they just mean they | don't do it on the same server that is serving bits to customers? | naikrovek wrote: | It seemed clear to me: they don't transcode on the server that | is sending data to the viewer. Transcoding is done once per | piece of media and target format combination, instead of on the | fly as it is viewed. | a-dub wrote: | i haven't looked yet but i'm going to guess: edge caching running | on custom hardware with smart predictions and congestion control | algorithms for determining what gets cached where and when. | paxys wrote: | Does anyone know where these servers are hosted? Certainly not | AWS I imagine? | kkielhofner wrote: | As close to the eyeballs as possible. With OpenConnect[0] they | are located in ISP facilities and/or carrier-neutral facilities | with access to a peering fabric (kind of the same thing as the | OpenConnect Appliance is "hosted" by the ISP). | | It's a win-win. ISPs don't have to use their peering and/or | transit bandwidth to upstream peers and users get a much better | experience with lower latency, higher reliability, less | opportunity for packet loss, etc. | | [0] - https://openconnect.netflix.com/en/ | amelius wrote: | How many customers does that serve? | Aissen wrote: | At 15Mb/s for a start-quality 4k stream (5 times higher than | the average ISP speed measured by Netflix), that serves 53k | simultaneous customers. | | In the US, the fastest ISP for Netflix usage seems to be | Comcast (https://ispspeedindex.netflix.net/country/us ), with | an average speed of 3.6Mbps. That would serve an average of | 222k simultaneous customers on a single server. | samcrawford wrote: | That 15Mb/s figure for 4K is out of date by a couple of | years. They previously targeted a fixed average bitrate of | 15.6Mb/s. They now target a quality level, as scored by VMAF. | This makes their average bitrate for 4K variable, but they | say it has an upper bound of about 8Mb/s. See | https://netflixtechblog.com/optimized-shot-based-encodes- | for... | Aissen wrote: | Yep, that's correct. It looks like Netflix forgot to update | their support pages for this: | https://help.netflix.com/en/node/306 . | umanwizard wrote: | What does start-quality mean? | Aissen wrote: | Not much, see sibling comment. It used to be the minimum | quality for enjoyable 4k. (4k Blu-Ray discs have much | higher bitrates with HEVC). But since, Netflix heavily | optimized their encoding, greatly reducing the bandwidth | needs. | danielheath wrote: | Video formats require more data for the first frame of each | scene - subsequent frames can be encoded as transformations | of the previous frame. | [deleted] | ksec wrote: | That depends on the content's bitrate. Netflix serves their | video with bitrate anywhere from 2 - 18Mbps. Say if average | were 10Mbps, that is roughly 80K customer per box. | daper wrote: | I have some experience serving static content and working with | CDNs. Here is what I find interesting / unique here: | | - They are not using OS page cache or any memory caching for | that, every request is served directly from disks. This seems | possible only when requests are spread between may NVMe disks | since single high-end NVMe like Micron 9300 PRO has max 3.5GB/s | read speed (or 28Gbps) - far less than 800Gbps. Looks like it | works ok for long-tail content but what about new hot content | everybody wants to watch at the day of release? Do they spread | the same content over multiple disks for this purpose? | | - Async I/O resolves issues with nginx process stalling because | of disk read operation but only after you've already opened the | file. Depending on FS / number of files / other FS activities, | directory structure opening the file can block for significant | time and there is no async open() AFAIK. How they resolve that? | Are we assuming i-node cache contains all i-nodes and open() time | is insignificant? Or are they configuring nginx() with large open | file cache? | | - TLS for streamed media was necessary because browsers started | to complain about non-TLS content. But that makes things sooo | complicated as we see in the presentation (kTLS is 50% of CPU | usage before moving to encryption offloaded by NIC). One has to | remember that the content is most probably already encrypted | (DRM), we just add another layer of encryption / authentication. | TLS for media segments make so little sens IMO. | | - When you relay on encryption or TCP offloading by NIC you are | stuck with that is possible with your NIC. I guess no HTTP/3 over | UDP or fancy congestion control optimization in TCP until the | vendor somehow implements it in the hardware. | mgerdts wrote: | A Micron 9300 Pro is getting rather long in the tooth. They are | using PCIe gen 4 drives that are twice as fast as the Micron | 9300. | | My own testing on single socket systems that look rather | similar to the ones they are using suggests it is much easier | to push many 100 Gbit interfaces to their maximum throughput | without caching. If your working set fits in cache, that may be | different. If you have a legit need for sixteen 14 TiB (15.36 | TB) drives, you won't be able to fit that amount of RAM into | the system. (Edit: I saw a response saying they do use the | cache for the most popular content. They seem to explicitly | choose what goes into cache, not allowing a bunch of random | stuff to keep knocking the most important content out of cache. | That makes perfect sense and is not inconsistent with my | assertion that hoping a half TiB cache will do the right thing | with 224 TiB of content.) | | TLS is probably also to keep the cable company from snooping on | the Netflix traffic, which would allow the cable company to | more effectively market rival products and services. If there's | a vulnerability in the decoders of encrypted media formats, | putting the content in TLS prevents a MITM from exploiting | that. | | From the slides, you will see that they started working with | Mellanox on this in 2016 and got the first capable hardware in | 2020, with iterations since then. Maybe they see value in the | engineering relationship to get the HW acceleration that they | value into the hardware components they buy. | | Disclaimer: I work for NVIDIA who bought Mellanox a while back. | I have no inside knowledge of the NVIDIA/Netflix relationship. | ShroudedNight wrote: | Just from reading the specs (I.E. real world details might | derail all of this): | | https://www.freebsd.org/cgi/man.cgi?query=sendfile&sektion=2 | | Given one can specify arbitrary offsets for sendfile(), it's | not clear to me that there must be any kind of O(k > 1) | relationship between open() and sendfile() calls: As long as | you can map requested content to a sub-interval of a file, you | can co-mingle the catalogue into an arbitrarily small number of | files, or potentially even stream directly off raw block | devices. | drewg123 wrote: | Responding to a few points. We do indeed use the OS page cache. | The hottest files remain in cache and are not served from disk. | We manage what is cached in the page cache and what is directly | released using the SF_NOCACHE flag. | | I believe our TLS initiative was started before browsers | started to complain, and was done to protect our customer's | privacy. | | We have lots of fancy congestion optimizations in TCP. We | offload TLS to the NIC, *NOT* TCP. | daper wrote: | Can I ask if your whole content can be stored on a single | server so content is simply replicated everywhere or there is | some layer above that that directs requests to the specific | group of servers storing the requested content? I assume the | described machine is not just part of tiered cache setup | since I don't think nginx capable for complex caching | scenarios. | drewg123 wrote: | No, the entire catalog cannot fit on a single server. | | There is a Netflix Tech Blog from a few years ago that | talks about this better than I could: | https://netflixtechblog.com/content-popularity-for-open- | conn... | eru wrote: | Does the encryption in DRM protect the metadata? | daper wrote: | AFAIK no. The point of DRM is to prevent recording / playing | the media on a device without decryption key (authorization). | So the goal is different than TLS that is used by the client | to ensure the content is authentic, unaltered during | transmission and not readable by a man-in-the-middle. | | But do we really need such protection for a TV show? | | "Metadata" in HLS / DASH is a separate HTTP request which can | be served over HTTPS if you wish. Then it can refer to media | segments served over HTTP (unless your browser / client | doesn't like "mixed content"). | throw0101c wrote: | > _But do we really need such protection for a TV show?_ | | DRM may be mandated by the content owners. TLS gives | Netflix customers privacy against their ISP snooping what | they're watching. | sam0x17 wrote: | > But do we really need such protection for a TV show? | | What you watch can be a very private thing, especially for | famous people. | nextgens wrote: | No, and it doesn't protect the privacy of the viewer either! | saurik wrote: | FWIW, neither does the TLS layer: because the video is all | chunked into fixed-time-length segments, each video causes | a unique signature of variable-byte-size segments, making | it possible to determine which Netflix movie someone is | watching based simply on their (encrypted) traffic pattern. | Someone built this for YouTube a while back and managed to | get it up to like 98% accuracy. | | https://www.blackhat.com/docs/eu-16/materials/eu-16-Dubin-I | -... | | https://americansforbgu.org/hackers-can-see-what-youtube- | vid... | nightpool wrote: | Did TLS 1.3 fix this with content length hiding? Doesn't | it add support for variable-length padding that could | prevent the attacker from measuring the plaintext content | length? Do any major servers support it? | drewg123 wrote: | Author here. AMA | sam0x17 wrote: | > Sendfile | | Ah, so this is why everything stutters / falls apart when you | switch subtitles on or off -- it has to access a whole | different file and resume at the same place in that file I | assume? I would think you would want the (verbal) audio | separated out in a different file so it can be swapped out on | the fly without re-initializing the video stream, and same | thing with subtitle files? I'm just making some assumptions | based on the behavior I've seen but would be cool to know how | this works. | drewg123 wrote: | No, video and subtitles are separate files. | | I've never seen this bad behavior myself. Do you mind sharing | the client you're using? | ManWith2Plans wrote: | Do you have a link to video or audio for this presentation? I'm | probably don't speak for just myself when I say I would love to | see it. | quux wrote: | Someone else linked the video here: | https://news.ycombinator.com/item?id=32520750 | w10-1 wrote: | Thank you very much "drewg123"! | | Future technology advances increasingly looks like this complex | work integrating hardware, OS fixes, team collaboration. People | and teams and companies working together, and contributing to | shared resources like FreeBSD. Tolerating mistakes at scale, | giving credit where credit is due, and all the other things | that make respect real, which creates a space to get things | done. | | Most of us will never get close to these opportunities or | contexts, but still it helps us advance our own | technique/culture to observe and model your story. And perhaps | you'll help new collaborators find you. All the best. | Bluecobra wrote: | What kind of tuning is done in the BIOS? Is that profile | available to view to everyone? Are you using a custom BIOS from | Dell? | drewg123 wrote: | Not much tuning needed to be done. The little that was is | mentioned in the talk, and was basically to set NPS=1 and to | disable DLWM, in order to be able to access the full xgmi | interconnect bandwidth at all times, even when the CPU is not | heavily loaded. | nh2 wrote: | You mention AIO in nginx. | | In 2021 somebody submitted a patch for io_uring support in | nginx: | | https://mailman.nginx.org/pipermail/nginx-devel/2021-Februar... | | I'm not sure if there has been further progress on it so far. | In one comment feedback is "it doesn't seem to make the typical | nginx use case much faster" [at that time]. | | But I find this interesting, because io_uring can make almost | all things async that can't be used async so far in Linux | (open(), stat(), etc) and thus in nginx. | | Would io_uring integration in nginx be relevant for you? | [deleted] | throw0101c wrote: | You're using ConnectX-6 Dx here. Any technical reason for that | particular NIC, or just haven't gotten around to ConnectX-7s | yet? | | Have you examined other NIC vendors? (Chelsio?) | drewg123 wrote: | This talk was given roughly 4 months ago. CX7 was not | available yet. I'm looking forward to testing on them when we | get some. | | We looked at Chelsio (as T6 was available well before | CX6-DX). However, the CX6-DX offers a killer feature not | available on T6. The CX6-DX can remember the crypto state of | any in-order stream, while the T6 cannot. That means that the | TCP stack can send, say, 4K of a TLS record, wait for acks, | and come back 40ms later and send the next 4K _and DMA just | the requested 4K from the host_. The T6 cannot remember the | state, and would need to DMA the first 4K (which was already | sent) in order to re-establish the crypto state, and then DMA | the requested 4K. This could run the PCIe bus out of | bandwidth. The alternative is to make TCP always chunk sends | at the TLS record size, but this was horrible for streaming | quality. | phantomathkg wrote: | > Serve only static media files | | This part I don't get. How about DRM? Unless Netflix pre-DRM | all contents for all user? | Bluecobra wrote: | I would think that media files would be already encrypted and | gets decrypted by the Netflix client. Otherwise the DRM could | easily be defeated by using something like Wireshark. | drewg123 wrote: | Yes, all our content is also DRMed. Else somebody could | easily pirate content.. | onedr0p wrote: | To be fair, it already seems easily pirated. DRM is | useless, if content is able to be viewed on some personal | device it can be ripped and shared. I'd be curious how much | effort/money companies dump into adding DRM measures, it | seems like a lost cause. Maybe it just makes the execs | sleep better at night. | bri3d wrote: | Encrypting assets on the fly using a per-consumer symmetric | key would be prohibitively expensive, so I'm sure the media | is stored pre-encrypted using a shared symmetric key. | | It only really matters that this key is unique per package, | not per user, because once even a single user can compromise | the trusted execution environment and extract either the key | or the plain video stream, that piece of content is now | pirated anyway. So, key reuse against the same content | probably isn't really a major part of the threat model - this | attacker could share the key with others, but they might as | well share the decrypted content instead. | sophacles wrote: | Every iteration of this prezzo I've seen over the years has | made for a fascinating morning read, thanks! | | As much as I enjoy the results of the work, I'm always a bit | curious how the sausage is made. Is pushing the hardware limits | your primary job or something you do periodically? How do you | go about selecting the gear you use? How much do you work with | the vendors? (etc etc) I'd really enjoy a behind the scenes | blog post or something wrt this serving absurd amounts of | traffic from a single box. | drewg123 wrote: | My role is to make our CDN servers more efficient. One of the | easiest and most fun ways to do that is to push servers as | hard as I can and see what breaks and what doesn't scale. I | also work with our hardware team and their vendors to | evaluate new hardware and how it can fit into our system. | | But I do plenty of other things as well, including fixing | random kernel bugs. You can read the git log of the FreeBSD | main branch to see some of the things I've been working on.. | gopaz wrote: | What about the storage? Is it using Raid? Does blocksize | matter? What filsystem is used? | drewg123 wrote: | Every storage device is independent (no RAID), and runs UFS. | We use UFS because, unlike ZFS, it integrates directly with | the kernel page cache. | pyrolistical wrote: | When are you going to cut the CPU/main memory out completely? | | The bottleneck is at your NIC anyways, so seems like there | would be a market for NIC that can directly read from disk into | NIC's working memory | drewg123 wrote: | We've looked at this. The problem is that NICs want to read | in TCP MSS size chunks (1448 bytes, for example), while | storage devices are highly optimized for block-aligned (4K) | chunks. So you need to buffer the storage reads someplace, | and for now the only practical answer is host memory. There | are NVME technologies that could help, but they are either | too small, or come at too large of a price premium. CXL | memory looks promising, but its not ready yet. | Matthias247 wrote: | Does it? I thought with segmentation offloads the NIC | basically gets TCP stream data in more or less arbitrary | sizes, and then segments in into MTU sizes on its own? | drewg123 wrote: | We do fairly sophisticated TCP pacing, which requires | sending down some small multiple of MSS to the NIC, so it | doesn't always have the freedom to pull 4K at a time. | [deleted] | bri3d wrote: | At what point does it make sense to replace the CPU and OS with | custom hardware and software? At this point the CPU is | basically doing TCP state maintenance and DMA supervision, but | not much else, right? | | I totally get the cost, convenience, and supply chain risk- | value in commodity stuff that you can just go out and buy, but | once you're bound to a single network card, this advantage | starts to go away, and it seems like you're fighting with the | entire system topology when it comes to NUMA, no? Why not a | "TCP file send accelerator" instead of a whole computer? | wmf wrote: | I suppose you could attach NVMe drives directly to Bluefield | and cut out x86. | jalino23 wrote: | I was specifically looking for whats their tech stack for | playback? they pretty much have to use HLS for ios safari right? | where do those manifest server fit in? what about non ios browser | playback? | wly_cdgr wrote: | alpb wrote: | What's the benefit of going from 100Gb/s to 800Gb/s through | kernel/hardware optimizations as opposed to adding more machines | to meet the same throughput in this case? I'd be curious at what | point returns on the engineering effort is diminishing in this | problem. | seabrookmx wrote: | IIRC a lot of these boxes are deployed at actual ISP's so | they're closer to customers. I'd imagine the rackspace is | therefore limited and the more you can push from a single | machine, the better. | quotehelp1829 wrote: | I think it's quite obvious that instead of 8 machines you then | only need 1. This results in reduced costs for machinery, | storage (as each machine would have its own storage) and | probably power consumption too. Also, same room of servers can | throughput 8 times more content. | | Edit: Whoops, apparently this tab has been open for four hours | and of course someone already had responded to you, lol. | wistlo wrote: | This could be an answer as to why Netflix comes up reliably when | all the other streaming services in my experience (Hulu, Disney, | HBO Max, Amazon Prime) can take many multiples of time to | initialize and deliver a stable stream. | drewg123 wrote: | To be honest, this has much more to do with Randall Stewart's | RACK TCP, and his team's obsession with improving our member's | QoE. Ironically, this costs a lot of CPU as compared to normal | TCP (since it is doing pacing, limiting TSO burst sizes, etc). | https://github.com/freebsd/freebsd-src/blob/main/sys/netinet... | OJFord wrote: | Of those I only have Prime, and really agree. It was never as | good I don't think, but lately in particular it's been _so_ | slow to start (and then it 's an advert! It'll do it again for | the actual content once I click 'skip'!) and occasionally | pauses to buffer mid-stream. | | I don't get that with Netflix, I've occasionally had it crash | out 'sorry this could not be played right now' (which is a | weird bug itself - because it always loads fast & fine when I | immediately press play on it again) but never such slow loading | or pausing. | _gabe_ wrote: | This is incredible. I really like how you're able to trace the | evolution of the systems as well. | | It makes me wonder what the next hardware revolution will be. It | seems like most resource intense applications are bottlenecking | at transferring memory. UE5's nanite tech hinges on the ability | to transfer memory directly from disk to GPU, Netflix built | specific hardware to avoid copying memory between userspace and | hardware, and I wonder how much other performance we're missing | out on because we can't transfer memory fast enough. | | How much faster could AI training be if we could get memory | directly from disk to the GPU and avoid the CPU orchestrating it | all? What about video streaming? I have a feeling these processes | already use some clever tricks to avoid unnecessary trips through | the CPU, but it will be interesting to see which direction | hardware goes with this in mind. | aftbit wrote: | This is definitely the direction that things are going. In the | GPU space, see things like GPUDirect[1]. In networking and | storage, especially for hyperscale stuff, see the rise of | DPUs[2] replacing CPUs. | | 1: https://developer.nvidia.com/gpudirect | | 2: https://www.servethehome.com/what-is-a-dpu-a-data- | processing... ___________________________________________________________________ (page generated 2022-08-19 23:00 UTC)