[HN Gopher] FreeBSD optimizations used by Netflix to serve video... ___________________________________________________________________ FreeBSD optimizations used by Netflix to serve video at 800Gb/s [pdf] Author : _trackno5 Score : 301 points Date : 2022-11-03 10:58 UTC (12 hours ago) (HTM) web link (people.freebsd.org) (TXT) w3m dump (people.freebsd.org) | krylon wrote: | And here I sit like a chump with my home server connected to a | 100MBit switch. (I paid for that switch, and I'm not replacing it | until it gives up the ghost.) (And before you ask, the server | also runs FreeBSD, and I'm very happy with the result.) | nix23 wrote: | Bring it to the max with multipath ;) since you have already | Freebsd, no need to throw those beautiful reliable thing's | away, maybe just buy a second...third? Dirt-cheap 100MBit card: | | https://en.wikipedia.org/wiki/Multipath_TCP#Implementation | krylon wrote: | The server has a second NIC, but the switch has no more free | ports. I briefly thought of bonding, but stopped when I read | that the switch would need to support it (which it almost | certainly does not). | | But my point was that for my requirements, 100MBit are | actually sufficient and FreeBSD still is a good choice for | me, I was just being snarky about it. (I do find it | aesthetically displeasing, though, that my wifi is now faster | than my wired network, but I can live with that.) | toast0 wrote: | I understand the motivation, but $20 gets you an 8-port gigE | switch, so it seems like the wrong hill to die on. :) | krylon wrote: | I know, but so far 100MBit is sufficient, actually, I rarely | move Gigabytes of data around. When it becomes annoying, I'll | get a new switch, but so far the pressure is really low. | nix23 wrote: | I really think Netflix could make some good money being a | multimedia-cdn (even for "competitors") | jedberg wrote: | I thought the same thing 10 years ago when I worked there. At | the time management was not interested in losing focus on doing | anything other than streaming movies to customers. | | But it should be noted that the FreeBSD Openconnect boxes are | highly optimized to Netflix's use case. Which is serving a | predefined set of content that has been pre-rendered. Youtube | and its ilk are a completely different use case. | | The Netflix cache is so optimized for serving Netflix movies | that for many years we still used Akamai for all of our other | CDN needs, but it looks like they may have finally moved that | to Netflix's own CDN now. | virtuallynathan wrote: | It works with such high efficiency because we know how to place | content in advance, and the catalog is relatively small. Trying | to serve 800Gbps of YouTube content would be a nightmare. | ilyt wrote: | I wonder how much of YT traffic is the "big" (say >200k | viewers in a month) vs the small guys. | | But yeah, once your hot data size exceeds cache byebye | efficiency | adgjlsfhk1 wrote: | One hard part is that on the youtube side, most views occur | within the first 48 hours or so and a good fraction occur | within the first 6. With netflix, they have a catalogue of | ~5000 videos and gets <200 new per month. Youtube has | around 30k channels with more than 500k subscriptions, so | that's somewhere around 30k videos per week. | nix23 wrote: | I don't talk directly about Youtube, but also serving Disney, | hula, but ESPECIALLY national/continental portals like | Arte.tv, Play-SRF, ARD-Mediathek etc. | drewg123 wrote: | Indeed. YT has a much different problem, which is to | determine which video is going to go viral, and then | transcode it into popular formats when it does. | | In comparison, we pre-transcode everything to exacting | standards, so all our CDN has to do is serve is static files. | coldpie wrote: | Wow, that is actually a really interesting idea in the context | of developing a YouTube competitor. Delivery & bandwidth are a | really high barrier to entry, and piggy-backing off of | Netflix's existing network could really lower those costs. I | agree the "providing services to your direct competition" is | probably a stumbling block, and likely Netflix has other irons | in the fire. But anyway it's a cool idea to think about. | ilyt wrote: | It's nice to see someone actually still does proper engineering | instead of farting something about cloud and webscale and just | throwing money at a problem. | kleiba wrote: | Yet, in order to _watch_ Netflix on FreeBSD, you have to jump | through such hoops as "downloading either google chrome, | vivaldi, or brave, and [using] a small shell script which | basically creates a small jail for some ubuntu binaries that | actually install widevine which is essential for viewing some DRM | content such as Netflix" [1] | | [1] https://www.youtube.com/watch?v=mBYor4wL62Q | __MatrixMan__ wrote: | BitTorrent is probably easier. I just wish there was a good way | to send money to the artists without also funding DRM | enhancements. | andsoitis wrote: | So you want to send money to all the people who worked on the | TV Show or the Movie you just downloaded? | | I don't think you realize how impractical that is. Take a | look at the credits at the end of a movie some time. Or look | up the list of people who worked on a particular episode of a | show (yes, it can vary throughout a season). | andrewxdiamond wrote: | Certainly impractical for big budget shows, but Patreon has | proved the model works | __MatrixMan__ wrote: | It wouldn't be impractical if the studio planned ahead for | it. | | There could be a the address of a smart contact at the end | of the credits. Every time more than, say $1000, piles up | in that address, whatever is there gets dispensed to the | contributors at the end of that month. | | Plex could aggregate those addresses and tell you how to | allocate your payment based on how you allocated your | attention. Yes I know that's what Netflix does, but I | control my Plex server. Nobody is then going to find | additional ways to monetize that data. | | I know it's unconventional, but I really don't think it's | crazy to want to reward the creators of content that you | consume while simultaneously not wanting to contribute | towards the development of ecosystems that prevent people | from being in control of their tech. | reaperducer wrote: | _It wouldn 't be impractical if the studio planned ahead | for it._ | | Studios already plan for this. | | For a short time in the 80's, one of my mother's job | responsibilities was making sure every single person | involved in the production of a movie in the 1940's got | their revenue check each quarter, whether it was for | $50.00, or 12C/. Hundreds of people. Hundreds of checks. | __MatrixMan__ wrote: | Ok, so I've torrented a movie and I want to send the | equivalent of your mom a check so that next quarter it's | $0.13 instead of $0.12, where do I look in the credits to | get her address? | | Perhaps in the 80's it would've been impractical to pay | her to multiplex hundreds of $1 input checks into the | appropriate set of $50 or $0.12 output checks, but that's | now a job that's early done by a computer. | [deleted] | [deleted] | IntelMiner wrote: | Devils advocate: The people who work on the server engineering | at Netflix don't exactly have much control over copyright | holders being lawyer brained man children | jbirer wrote: | That is the problem with the BSD license, it says "use my work | and don't give anything back". Of course, GPL gets violated | too, but would be very difficult by an American company like | Netflix. | pjmlp wrote: | UNIX's strength was never in the desktop experience, rather | server room. | akreal wrote: | Not true for macOS. | pjmlp wrote: | You mean NeXTSTEP, all that makes it unique isn't part of | POSIX, and Steve Jobs had a quite clear position on UNIX | value for desktop computing. | SpaceInvader wrote: | FreeBSD is not a "desktop first" system and has strengths | elsewhere. I use it for 20+ years constantly. Sadly my | experiments with FreeBSD desktop ended years ago as there | always was something "not working". | asveikau wrote: | Typing this on a FreeBSD laptop. | | Haven't tried using netflix on it though. | Ar-Curunir wrote: | I don't think it needs to be said that while FreeBSD can | serve as a daily driver for some people, it is insufficient | for the vast majority of computer users in the world | alberth wrote: | > which is essential for viewing some DRM content such as | Netflix | | Are you complaining that Netflix doesn't want people to pirate | content, content they might have licensed from 3rd parties | which contractually bind them to not let being pirated? | | This + is the development/resources/cost of serving such few | people on FreeBSD even worth it. | | Note: I'm a huge FreeBSD fan. But consider this totally | understandable on Netflix part. | somehnguy wrote: | But it doesn't prevent it from being pirated at all. You can | get any Netflix release you want within minutes of release on | any torrent site. Sometimes _before_ the official release | even. | | It just makes normal people jump through hoops to watch the | things they are trying to pay for. That's a DRM issue in | general though, I acknowledge this isn't just a Netflix | thing. | seanw444 wrote: | And I will stick to getting it that way for as long as DRM | exists on the given platform. I'll still pay for the | subscription, but I'm handling the data my way. | KronisLV wrote: | > I'll still pay for the subscription, but I'm handling | the data my way. | | Huh, that's an interesting take. I feel like something | similar might end up being what you need to do with | certain video games as well. | | For example, I bought Grand Theft Auto IV as a boxed copy | back when it came out (though most of my games are | digital now). The problem is that the game expects Games | For Windows Live to be present, which is now deprecated | and some folks out there can't even launch the game | anymore. It's pretty obvious what one of the solutions | here is. | webmobdev wrote: | Me too. Especially because these same DRM will soon be | used to uniquely identify and profile you when these | streamers also become an ad platform. | judge2020 wrote: | What does DRM have to do with this? They'll connect what | you watch on Peacock with what you watch on Netflix on | your computer? Do you have a reference? | Thaxll wrote: | DRM makes 0 sense since you can get any content using | torrents in 2min. It's not protecting anything, as a matter | of fact it's just making people download more since it's a | painful experience. | | For example on Windows with Chrome you only get 720p playback | for Netflix, complete nonsense. | mschuster91 wrote: | Yeah, but tell that to braindead content license owners. | googlryas wrote: | Sure, but if there was no drm, there would probably just be | a chrome extension you could install and rip/share content | more readily than via BitTorrent. | | I don't like it, but there is some logic to it. For | business types, it isn't merely the existence of ripped | copies, but the ease of creating and spreading them. | _trackno5 wrote: | Recording of the presentation can be found here: | https://www.youtube.com/watch?v=36qZYL5RlgY | | Pretty cool stuff | [deleted] | eatonphil wrote: | From what I can see in a quick search (and from this | presentation), Netflix only uses FreeBSD for serving video and | they run these servers themselves in their own datacenters I | guess. In contrast their apps on EC2 use Linux [0]. Sounds like | the time has not yet come when AWS is paying anyone full time to | support FreeBSD on EC2. | | [0] https://twitter.com/brendangregg/status/1412201241472471048 | erk__ wrote: | cperciva whom you link have worked quite a bit on EC2 support | for FreeBSD, a lot of it documented on their blog [0] and | supported by Patreons at [1]. | | But yeah it would be nice if there was someone who could work | on it full time | | [0]: https://www.daemonology.net/blog/2022-03-29-FreeBSD- | EC2-repo... | | [1]: https://www.patreon.com/cperciva | eatonphil wrote: | Yep! In the thread he describes how he is not enough. | vbezhenar wrote: | What does it mean to support FreeBSD on EC2? Surely it's just a | KVM so you can run whatever you want? | [deleted] | sanxiyn wrote: | It means, for example, writing a FreeBSD kernel driver for | Elastic Network Adapter (ENA). Both Linux kernel driver and | FreeBSD kernel driver is available at | https://github.com/amzn/amzn-drivers | cotillion wrote: | Netflix works because they move content close to the users. | This is done by either having the ISP establish a peering | connection directly to Netflix hosted servers or by having the | ISPs host "Open Connect Appliances" which cache the most | requested content. These appliances are based on FreeBSD. | | The AWS egress savings from this setup must be immense. | | https://openconnect.netflix.com/ | ilyt wrote: | Yup, cloud bandwidth is insanely expensive considering to | what you _actually_ pay to get link to your datacenter. | | And you pay either by 95th percentile (basically "peak | usage") or by whole link, not per megabyte sent | [deleted] | paravz wrote: | What is Gb/s per watt of power between 2x400Gb/s servers and a | single 800Gb/s ? | | Following these reports since 2015, when I compared estimated | cost of your 9Gb/s server to F5 load balancer :) | pyuser583 wrote: | I think it's weird and cool how Netflix used FreeBSD/Dlang. | | Linux is just the automatic go to. It's great the big tech | companies are rethinking these basics. | loeg wrote: | Where are you seeing any mention of Dlang? | throw0101a wrote: | And to think not that long ago I remember being excited when the | V.92 standard was released and I could get 56 kb/s on my dial-up | connection: | | * https://en.wikipedia.org/wiki/V.92 | rwl4 wrote: | How about the marvel that was Walnut Creek's cdrom.com that | served 10,000 simultaneous FTP connections back in 1999? [1] | | I was always blown away by how much more efficient FreeBSD's | network stack was compared to Linux at the time. It convinced | me to go FreeBSD-only for a few years. | | [1] http://www.kegel.com/c10k.html | alberth wrote: | > compared to Linux at the time | | Do you consider that not to still be the case? | adrian_b wrote: | Before 2003, FreeBSD was definitely both faster and more | reliable than Linux, especially for networking or storage | applications. | | After that, Intel and AMD have introduced cheap multi- | threaded and multi-core CPUs. Linux was adapted very | quickly to work well on such CPUs, but FreeBSD has | struggled for many years until reaching an acceptable | performance on multi-threaded or multi-core CPUs, so it | became much slower than Linux. | | Later, the performance gap between Linux and FreeBSD has | diminished continuously, so now there is no longer any | large difference between them. | | Depending on the hardware and on the application, either | Linux or FreeBSD can be faster, but in the majority of the | cases the winner is Linux. | | Despite that, for certain applications there may be good | reasons to choose FreeBSD, even where it happens to be | slower than Linux. | lukego wrote: | FreeBSD was held back by limited TCP options around when | packet mobile internet (GPRS) came along. That was around | 2003 too. | | I remember noticing Yahoo properties being almost | unusable in GPRS because they did packet loss detective | and recovery in such basic ways e.g. no SACK. | anthk wrote: | Any setting for today's connection on capped mobile data? | 2.7 KB/S max. | LeonenTheDK wrote: | > Depending on the hardware and on the application, | either Linux or FreeBSD can be faster, but in the | majority of the cases the winner is Linux. | | I'm not denying this, but do you have a source? I've been | trying to find modern "Linux vs FreeBSD" performance | tests but haven't been super successful. Mostly I find | things from the early 2000s when FreeBSD had a clear | lead. | yakubin wrote: | https://www.phoronix.com/review/bsd-linux-eo2021 | mrtweetyhack wrote: | jedberg wrote: | > Depending on the hardware and on the application, | either Linux or FreeBSD can be faster, but in the | majority of the cases the winner is Linux. | | Do you have any data to back that up? Everything I've | seen recently and my own experience tells me this isn't | the case but I also don't have any data to back up my | position. Would love to find some good data on this | either way. | adrian_b wrote: | I have been using continuously both FreeBSD and Linux | since around 1995, since FreeBSD 2.0 and some Slackware | Linux distribution. | | In the early years, I have run many benchmarks between | them, in order to choose the one that was the best suited | for certain applications. | | However, during the last decade, I did not bother to | compare them any more, because now the main reasons why I | choose one or the other do not include the speed. | | Even if I have right now, besides me, several computers | with FreeBSD and several with Linux, it would not be easy | for me to run any benchmark, because they have very | different hardware, which would influence the results | much more than the OS. | | For all the applications where I use FreeBSD (for various | networking and storage services), its performance is | adequate, and I use it instead of Linux for other | reasons, not depending on whether it might be faster or | slower. | | In the applications where computational performance is | important, I use Linux, but that is not due to some | benchmark results, but because some commercial software | is available only for Linux, e.g. CUDA libraries or FPGA | design programs. | | Many benchmark results comparing FreeBSD and Linux may be | influenced more by the file systems used than by the OS | kernel. | | I have seen recently some benchmark comparing FreeBSD and | Linux for a database application dominated by SSD I/O, | but I cannot remember a link to it. | | The only file system shared by Linux and FreeBSD is ZFS. | With ZFS, the benchmark results were similar for Linux | and FreeBSD. However, FreeBSD was faster when using UFS | and Linux was much faster, when using either XFS or EXT4 | (BTRFS was much slower than ZFS). Such a benchmark was | much more influenced by the file system than by the | operating system. | | In conclusion, it is very hard to make a good comparison | between FreeBSD and Linux, because you need identical | hardware, which must be restricted to the shorter list | that is well supported by FreeBSD, and you need to run | some micro-benchmark testing some kernel system calls. | | Otherwise, the result may depend more on the supported | software, hardware or file systems, than on the OS | kernel. | jedberg wrote: | Right exactly which is why it's hard to find data. But | I'd love to see someone who has tried to limit variables | to just the network stack to figure out if one network | stack is better than the other. | | But you're right, in the end you just have to set up both | for your particular use case with the best optimizations | each has to offer and see which performs better. | Thaxll wrote: | The web run on Linux like most FANG servers do, so it | makes sense with the $$$ / people / R&D that this OS is | faster. A conservative number would be that 99.9% of the | web runs on Linux and it's probably much higher. | | At the scale of Google / MS / Amazon / Apple if servers | would run faster of BSD* they would use it. We're talking | about 10's millions of servers here. | | https://www.phoronix.com/review/bsd-linux-eo2021/7 | | It gives you a pretty clear picture. | jedberg wrote: | Based on that logic, Windows is the superior operating | system and always has been, because it's always been used | by more people on their desktop than anything else. | | There are a lot more factors involved in OS choice that | could drive popularity other than the speed of the | network stack. And BTW, Hotmail runs on BSD. MacOS is a | fork of BSD. And Yahoo ran on BSD (and may still). | drewg123 wrote: | Author here, happy to answer questions | waynesonfire wrote: | How did you generate those flamegraphs and what other tools did | you use to measure performance? | | My motivation for asking comes from these findings in the pdf, | | Did the graph show the bottleneck contention on aio queue? Did | the graph show that "a lot of time was spent accessing memory"? | | What made freebsd a better platform compared to Linux to begin | tackling this problem? | | Thanks! Super interesting. Both a freebsd fan and I have | workloads that I'd love to explore benchmarking to squeeze more | performance. | drewg123 wrote: | > How did you generate those flamegraphs and what other tools | did you use to measure performance? | | We have an internal shell script that takes hwpmc output and | generates flamegraphs from the stacks. It also works with | dtrace. I'm a huge fan of dtrace. I also make heavy use of | lockstat, AMD uProf, and Intel Vtune. | | > Did the graph show the bottleneck contention on aio queue? | Did the graph show that "a lot of time was spent accessing | memory"? | | See the graph on page 32 or so of the presentation. It shows | huge plateaus in lock_delay called out of the aio code. Its | also obvious from lockstat stacks (run as lockstat -x | aggsize=4m -s 10 sleep 10 > results.txt) | | See the graph on page 38 or so. The plateaus are mostly | memory copy functions (memcpy, copyin, copyout). | | We already use FreeBSD on our CDN, so it just made sense to | do the work in FreeBSD. | | The talk is on Youtube https://youtu.be/36qZYL5RlgY | smokel wrote: | The flame graphs might be generated using Brendan Gregg's | utility, see https://www.brendangregg.com/flamegraphs.html | drewg123 wrote: | They are generated by a local shell script that uses the | same helpers (stackcollapse*.pl, difffolded.pl). Our | revision control says the script was committed by somebody | else though. It existed before I joined Netflix. | monotux wrote: | How long will you be able to keep up with this near yearly | doubling of bandwidth used for serving video? :) | drewg123 wrote: | It depends on when we get PCIe Gen5 NICs and servers with | DDR5 :) | alberth wrote: | Any current estimates on timing? | toast0 wrote: | Not the OP, but PCIe5 NICs are already available in the | market; I've seen people requesting help getting them to | work on desktop platforms which have PCIe5 as of the most | recent chips. AFAIK, currently, both AMD and Intel | release desktop before server; I don't think there's a | public release date for Zen4 server chips, but probably | this quarter or next? Intel's release process is too hard | for me to follow, but they've got desktop chips with | PCIe5, so whenever those get to the server, then that | might be an option too. | Rafuino wrote: | Public release date for Zen4 server has been disclosed | for November 10, FYI. https://www.servethehome.com/amd- | epyc-genoa-launches-10-nove.... | | Looks like Intel's release is coming January 10. | https://www.tomshardware.com/news/intel-sapphire-rapids- | laun... | dist1ll wrote: | How involved was Netflix in the design of the Mellanox NIC? How | many stakeholders does this type of networking hardware have, | relatively speaking? | | Also, what percentage of CDN traffic that reaches the user is | served directly from your co-located appliances? | _-david-_ wrote: | There are a lot of slides and I am on my phone, so sorry if it | was addressed in the slides. | | How does Linux compare currently? I know in the past FreeBSD | was faster, but are there any current comparisons? | tame3902 wrote: | 1. I got excited when I saw arm64 mentioned. How competitive is | it? Do you think it will be a viable alternative for Netflix in | the future? | | 2. On amd, did you play around with BIOS settings? Like turbo, | sub-numa clustering or cTDP? | drewg123 wrote: | Arm64 is very competitive. As you can see from the slides, | the Ampere Q80-30 is pretty much on-par with our production | AMD systems. | | Yes, I've spent lots of time in the AMD BIOS over the years, | and lots of time with our AMD FAE (who is _fantastic_ , BTW) | poking at things. | crest wrote: | Which NIC and driver combinations support kTLS offloading to | the NIC? | | How did you deal with the hardware/firmware limitations on the | number of offloadable TLS sessions? | drewg123 wrote: | We use Mellanox ConnectX6-DX NICs, with the Mellanox drivers | built into FreeBSD 14-current (which are also present in | FreeBSD 13). | throw0101a wrote: | > _We use Mellanox ConnectX6-DX NICs_ | | Is there a plan to move to the Connect X-7 eventually? | | Depending on the bandwidth available, that'd be either 2x | to get the same 800Gb/s as here (or perhaps eventually with | 4x to get 1600Gb/s). | drewg123 wrote: | Yes, I'm looking forward to CX7. And to other pcie Gen5 | NICs! | eddyg wrote: | Wondering if there's a video presentation to go along with the | slides? | notaplumber1 wrote: | This talk was given at this years EuroBSDcon in Vienna, | recording is up on YouTube. | | https://2022.eurobsdcon.org/ | | https://www.youtube.com/watch?v=36qZYL5RlgY | | Some really great talks this year from all the *BSDs, highly | recommend checking a look: https://www.youtube.com/playlist?l | ist=PLskKNopggjc6_N7kpccFZ... | coredog64 wrote: | And is the video presentation on Netflix? | alberth wrote: | A. Just curious, are these servers performing any work besides | purely serving content? Eg user auth, album art, show | description, etc? | | B. What's the current biggest bottleneck preventing higher | throughout? | | C. Has everything been up streamed? Meaning, if I were to | theoretically purchase the exact same hardware - would I be | able to achieve similar throughout? | | (Amazing work by the way in these continued accomplishments. | These posts over thr years are always my favorite HN stories.) | drewg123 wrote: | a) These are CDN servers, so they serve CDN stuff. Some do | serve cover art, etc sorts of things. | | b) Memory bandwidth and PCIe bandwidth. I'm eagerly awaiting | Gen5 PCIe NICs and Gen5 PCIe / DDR5 based servers :) | | c) Yes, everything in the kernel has been upstreamed. I think | there may be some patches to nginx that we have not | upstreamed (SO_REUSEPORT_LB patches, TCP_REUSPORT_LB_NUMA | patches). | fsckin wrote: | What tools do you use for load testing / benchmarking? | drewg123 wrote: | At a very basic microbenchmark level, I use stream, netpef, a | few private VM stress tests, etc. But the majority of my | testing is done using real production traffic. | alberth wrote: | If a "typical" NIC was used, what do you think the throughput | would be? | | I have to imagine considerably less (e.g. 100 Gb/s instead of | 800). | drewg123 wrote: | Back of the envelop guess is ~400Gb/s. Each node has enough | memory BW for about 240Gb/s, then factor in some efficiency | loss for NUMA.. | toast0 wrote: | Not the OP, but that's basically in the slides. When it's | kTLS, but not NIC kTLS. Maybe you could optimize that a bit | more around the edges if NIC kTLS wasn't an option. | amelius wrote: | At what point will it make more sense to use specialized | hardware, e.g. network card that can do encryption? | drewg123 wrote: | We already do. The Mellanox ConnectX6-Dx with crypto | support.. It does inline crypto on TLS records as they are | transmitted. This saves memory bandwidth, as compared to a | traditional lookaside card. | MichaelZuo wrote: | What's the error rate, or uptime ratio, of those cards? | drewg123 wrote: | Were you assuming they were giant FPGA based NICs..? They | are production server NICs, using asics with a reasonable | power budget. I don't recall any failures. | MichaelZuo wrote: | Well I wasn't, though I was expecting some non-zero | amount of failures. | | That's pretty impressive if it's literally zero. | | How many machines are deployed with NICs? | drewg123 wrote: | I don't have any visibility into how many DOA NICs we | have, so I can't say that Mellanox is better or worse at | that point. But I do see most NIC related tickets for NIC | failures once machines are in production. In general, | we've found Mellanox NICs to be very reliable. | PYTHONDJANGO wrote: | * How is the DRM applied? * Is the software, that does DRM open | source, too? | alberth wrote: | How much "U's" of space do ISP typically give you (e.g. 4U, 8U, | etc)? | nixgeek wrote: | This is going to be a "How long is a piece of string?". Each | ASN will be unique, and even within any large ISP, there may | be many OCA deployment sites (there won't just be one for | Virgin Media in UK) and each site will likely have subtly | different traffic patterns and content consumption patterns, | meaning the OCA deployment may be customized to suit, and the | content pushed out (particularly to these NVME-based nodes) | will be tailored accordingly. | | Since the alternative for an ISP is to be carrying the bits | for Netflix further, the likelihood is they'll devote | whatever space is required because that's much cheaper than | backhauling the traffic and ingressing over either a | settlement-free PNI or IXP link to a Netflix-operated cache | site, or worse, ingressing the traffic over a paid transit | link. | | Meanwhile, on the flipside, since Netflix funds the OCA | deployments they have a strong interest in not "oversizing" | the sites. That said I'm sure there is an element of growth | forecasting involved once a site has been operational for a | period of time. | vkaku wrote: | Read the presentation. Had super noobie level questions. | | Is the RAM mostly used by page content read by the NICs due to | kTLS? | | If there was better DMA/Offload could this be done with a | fraction of the RAM? (NVME->NIC) | | If there was no need to TLS, would the RAM usage drop | dramatically? | drewg123 wrote: | These are actually fantastic questions. | | Yes, the RAM is mostly used by content sitting in the VM page | cache. | | Yes, you could go NVME->NIC with P2P DMA. The problem is that | NICs want to read data at once TCP mss (~1448b) and NVME | really wants to speak in 4K sized chunks. So there needs to | be some buffers somewhere. It might eventually be CXL based | memory, but for now it is host memory. | | EDIT: missed the last question. No, with NIC kTLS, the host | RAM usage is about the same as it would be without TLS at | all. Eg, connection data sitting in the socket buffers refers | to pages in the host vm page cache which can be shared among | multiple connections. With software kTLS, data in the socket | buffers must refer to private, per-connection encrypted data | which increases RAM requirements. | hzhou321 wrote: | What prevents linux to achieve the same bandwidth? | _trackno5 wrote: | Not sure about all other optimisations, but Linux doesn't | have support for async sendfile. | [deleted] | erk__ wrote: | Do you know if there is any documentation regarding interfacing | with the KTLS, eg to implement support for a new library? | sanxiyn wrote: | For Linux, there is a documentation at kernel.org: | https://docs.kernel.org/networking/tls.html | drewg123 wrote: | The ktls(4) man page is a start. The reference implementation | is OpenSSL right now. I did support for an internal Netflix | library a while ago, I probably should have documented it at | the time. For now feel free to contact me via email with | questions (the username in the URL, but @netflix.com) | kloch wrote: | What filesystem(s) are you using for root and content? | | And If ZFS, what options are you using? | drewg123 wrote: | We use ZFS for root, but not content. For content we use UFS. | This is because ZFS is not compatible with "zero-copy" | sendfile, since it uses its own ARC cache rather than the | kernel page cache, meaning sending data stored on ZFS | requires an extra data copy out of the ARC. Its also not | compatible with async sendfile, as it does not have the | methods required to call the sendfile completion handler | after data is read from disk into memory. | deltarholamda wrote: | >For content we use UFS | | I found this extremely interesting. ZFS is almost a cure- | all for what ails you WRT storage, but there is always | something that even Superman can't do. Sometimes old-school | is best-school. | | Thanks for the presentation and QA! | [deleted] | nicholasjarnold wrote: | I come from the time when the first internet connection my house | had was a 56k modem...just before cable modems/DOCSIS started | rolling out in the midwest. These speeds are somewhat mind | boggling to me. (Yeah, yeah, datacenter vs home, but it's still | somewhat hard to imagine saturating pipes like those.) | | While standing in a state of mild awe at 800Gb/s I read reviews | and consider upgrading my house to 2.5Gb/s equipment... Should I | just wait for 10Gbit to get a bit cheaper? Should I ditch copper | and go fiber like that guy who was on the front page here | recently (probably not, but that was cool)? Maybe raw single core | CPU performance is starting to level off a bit, but it seems that | networking technologies are still advancing a rapid clip! | seized wrote: | Fiber 10Gb is very cheap. NICs and SFPs from Ebay, fibre from | FS.com in whatever length you want. I got a plenum rated 100 ft | 4 pair cable from FS.com for $100 or so, and it was only that | expensive for the plenum rating as it runs through my cold air | returns. | ksec wrote: | Just some napkin maths. ( Correct me if I am wrong ) | | Looking at the 800Gbps Config, Dell R7525 with Dual 64C / 128T | and 4x Connect-DX 800Gbps in 2U. | | With Zen 4C, 128C and PCI-E 5.0, Connect-7, two node could fit | into 2U. i.e doubling to 1.6Tbps per 2U. | | That is going from 16Tbps to 32Tbps per Rack. ( Using 40U only ) | | To things in perspective, if every user were to use 20Mbps Stream | at the same time, ( not going to happen due to time zone | difference ), the 250M Netflix subscribers worldwide would need | 5000M Mbps or 5000 Tbps. That is less than 200 Racks to serve | every single of their customer on planet earth. ( Ignoring | Storage. ) You could ship a Rack to every Region, State, Nation, | Jurisdiction or Local ISP and Exchange and be done with it. | | I hope Lisa Su sent drewg123 and his team at Netflix with Zen 4C | ASAP to play, _cough_ , I mean help them test it. | | Note: We have PCI-E 6.0 ( and 7.0 ), DDR6 on Roadmap. The 200 | Racks could be down to 50 Racks by the end of this decade. | Assuming Netflix is still streaming at the same bitrate. | loeg wrote: | Netflix is more likely to use a single box of this kind of | throughout at any given POP than a rack of them. For bigger | installations they can use cheaper, less throughput-dense | hardware (although I don't know if they do). | carlhjerpe wrote: | Take a look at the hardware, it isn't particularly expensive | stuff. | loeg wrote: | Aside from GPUs, I'm not sure how you would increase the | cost density much. Those NICs doing hundreds of Gbps and | TLS aren't cheap, nor are the fast SSDs needed to sustain | the load, nor is RAM or top end AMD server CPUs. Of course, | the cost is absolutely worth it to Netflix! | carlhjerpe wrote: | Yes, but it's still just one box, if you're building a | cluster of cheaper characteristics you need more of | everything. A high-end server VS a cluster of 10 | machines, 10 machines wouldn't be cheaper to get to the | same throughput, it's not alien specialized supertech, | it's just top of the line commodity hardware. (10 is just | an example number here). | loeg wrote: | I mean, I guess I disagree with your stipulation that you | couldn't lower total costs somewhat using slightly more | slightly lower end hardware, if rack space was cheap. | | > top of the line commodity hardware | | Yeah -- cost in commodity hardware scales super-linearly | with performance. | alberth wrote: | > That is going from 16Tbps to 32Tbps per Rack ... only need | 200 racks | | I doubt ISP's give an entire rack to Netflix. I wouldn't be | surprised if they only get like 4U total (hence why throughput | per server is so important to Netflix). | BonoboIO wrote: | The minimum requirements | | https://openconnect.zendesk.com/hc/en- | us/articles/3600345383... | | I think it depends on the size of isp, probably a rack would | be too much even for the biggest isps, but one 4u too less. | TFortunato wrote: | Looking at the banner pic on their main page, they seem to | have at least one ISP install of multiple racks in the | wild. Also, doing a little reading on how "fill" of the | devices works, they talk about doing peer-to-peer filling | of appliances located at the same site, which leads me to | believe, even if not deploying a full rack, deploying | multiple appliances to an ISP site is a relatively normal | occurance | | https://openconnect.netflix.com/en/peering/ | meltedcapacitor wrote: | Why not? It's top bandwidth consumer for a retail ISP and | surely any reasonable amount of rack space is worth the | savings in interconnect bandwidth. | loeg wrote: | They are frequently rack space constrained, hence these | super dense hardware. | jedberg wrote: | Some ISPs give a full rack, some don't. It depends on how | much traffic they have and how willing they are. | | But a lot of the racks sit at internet exchange points, where | Netflix rents one or more racks at a time. | recuter wrote: | There is the rather intriguing prospect of NVM Express over | Fabrics (NVMe-oF): | https://en.wikipedia.org/wiki/NVM_Express#NVMe-oF | | Marvel Octeon 10 DPU (with an integrated 1 Terabit switch): | https://www.marvell.com/content/dam/marvell/en/company/media... | | Probably pretty soon you'll be able to chuck in a few hot | swappable 100 TB Nimbus exadrives | (https://nimbusdata.com/products/exadrive/) in there and call | it a day. 1T in 1U. :) | Melatonic wrote: | Interesting to see that Infiniband is still kicking | coherentpony wrote: | Not really. Ethernet and Infiniband are both perfectly | capable from a bandwidth perspective. Streaming video isn't | remotely close to latency-bound, which is where Infiniband | would be better suited. | Melatonic wrote: | The people doing this might also be doing infra as code for the | virtualization layer on the hardware itself - which this might | not be able to satisfy. At minimum they surely have a ton of | this stuff deployed already so changing hardware specs big time | might not be worth the cost. | | Also are you taking into account encryption for those specs? | reaperducer wrote: | _That is less than 200 Racks to serve every single of their | customer on planet earth. ( Ignoring Storage. )_ | | If you're going to ignore storage, Netflix could just ship a | low-end video server to every one of its customers and be done | with it. | | Every problem is an easy problem if your pretend the hard parts | don't exist. | OliverGuy wrote: | How much storage does Netflix actually need for its whole | library? | | It's got about 17,000 titles globally [1]. If they have | copies in SD, 720p, HD and 4k that would be 68,000 versions | (plus some extra audio tracks for stuff dubbed in multiple | languages, but I suspect this is fairly minimal in terms of | storage though) | | Let's assume that the resolutions have the bitrates at 5, 10, | 15 and 20 mbps. | | The average length of a Netflix original movie is ~90mins [2] | | So that would require about 575TB in storage if I have done | my maths correctly. | | You would need about 20x30TB Kioxia CD6 SSDs for all that. | Very expensive but definitely technically possible. | | I could totally see it being possible to fit those drives in | a single node to push the 800gbps required, not increasing | the over rack requirement at all. Not sure if the bandwidth | from that many drives is enough, might have to cache some of | the most watched stuff to ram) | | Not gonna see any in home boxes with all the titles pre | loaded any time soon though. As a hard drive array that's | still 30x20TB drives. | | [1] https://www.comparitech.com/blog/vpn-privacy/netflix- | statist... | | [2] https://stephenfollows.com/netflix-original-movies- | shows/#:~...) | AdrianB1 wrote: | Do they keep on every server the global library? I guess | they partition it geographically. | tecleandor wrote: | In their OpenConnect network they keep the most demanded | titles and the latest releases. And IIRC that refreshes | nightly (with new releases and whatever is hot that day) | | https://openconnect.zendesk.com/hc/en- | us/articles/3600356180... | virtuallynathan wrote: | Back of the Napkin Zen4 / Genoa gets you to ~500GB/s PCIe and | ~500GB/s of DRAM bandwidth -- nearly 4Tbps! Zen3/Rome is | ~300GB/s PCIe and ~300GB/s DRAM -- about 2.4Tbps. A single 2U | box with Genoa might scale to 1.25Tbps+ of useful Netflix | traffic. We'll have to see what magic Drew can pull :) | aeyes wrote: | You are probably overestimating Netflix traffic by a lot. | | IX.br peak traffic is 20Tb/s, DE-CIX peak traffic is 14Tb/s, | AMS-IX is around 11Tb/s. | | The 800Gpbs machine is probably enough for a country. | | Netflix traffic stats at PIT Chile, this is their only peering | connection in Chile: https://www.pitchile.cl/wp/graficos-con- | indicadores/streamin... | srmn wrote: | This assumption misses out on all the private interconnect | links and deployed OpenConnect appliances within ISP networks | - a majority of Netflix's traffic today. IXes are only a | small portion of overall internet traffic. | lostlogin wrote: | I notice people streaming in very low resolution without | realising it, and sometimes intervene when the pain gets too | great. | | I'd be vey surprised if the average bitrate was anywhere near | the appropriation. | | However that wasn't the point of the calculation, it was | looking for a maximum. | orangepurple wrote: | Agree and 20 mbps is a reasonable rate for modern codecs | for resolutions up to 4k for the 99% of viewers | ocbyc wrote: | "In networking units" | BonoboIO wrote: | It amazes me, that Netflix is capable of such top of the line | engineering things (really mindblowing stuff, one machine that | streams nearly 1 Terabit pers second), but is for the love of god | unable to stream HD Content to my iPhone (newest firmware, all | up2date). Tried everything gigabit wifi, cellular, multiple ISPs | ... | | It is better for me to pirate their content, play it with Plex | and be happy. I pay for Netflix, but still have to download it, | to see it an acceptable quality. Absurd. The support couldn't | help. It doesn't affect, because I have my Torrent/Plex Setup, | but for 99.9% of people it is a subpar experience. | | I think the best years are over for Netflix. The hard awakening | is here to make content that the users want and they are a | movie/tv content company, not primarily a ,,tech company". | staringback wrote: | leetharris wrote: | You live in a bubble. The vast majority of the world likely | cannot even tell the difference between HD and 4K. Netflix | continues to grow its content and retain subscribers. | BonoboIO wrote: | Netflix is a media company as I said. | | Well 4K vs HD, you are right, but 480p on a Retina display | right in front of me. Really obvious. | selfhoster69 wrote: | > unable to stream HD Content to my iPhone | | Yeah this has been the case since forever. It prioritizes | instant playback vs forcing 1080p or similar. | | Can't speak for iPhone, but on iPad, I've moved to using the | website which goes goes to 1080p immediately. | | > still have to download it, to see it an acceptable quality | | Downloaded content do contain a whole lot more compression than | streaming at max phone supported quality, so just a tiny FYI. | this15testing wrote: | related to slide 4... | | how much does netflix donate to the FreeBSD foundation relative | to their profits? | hnarn wrote: | "Netflix does contribute financially to the FreeBSD Foundation | and has done so since 2012. Last year they engaged at the | "platinum" level with contributing more than $50,000+ USD to | the foundation." (2019) | | Took about five seconds to Google, it's the first result for | "netflix donations to freebsd". | | NFLX Q3 2019 earnings were about $5.2B. | | So about 0.001%, I guess. | this15testing wrote: | haha ___________________________________________________________________ (page generated 2022-11-03 23:01 UTC)