hngopher.com

       [HN Gopher] Serving Netflix Video Traffic at 800Gb/s and Beyond ...
       ___________________________________________________________________
        
       Serving Netflix Video Traffic at 800Gb/s and Beyond [pdf]
        
       Author : ksec
       Score  : 434 points
       Date   : 2022-08-19 11:53 UTC (11 hours ago)
        
 (HTM) web link (nabstreamingsummit.com)
 (TXT) w3m dump (nabstreamingsummit.com)
        
       | walrus01 wrote:
       | Everything old is new again: Anyone remember seeing a 32-bit/33
       | MHz PCI (not pci-x, not pci-e) card for SSL acceleration in the
       | late 1990s? It was totally a thing at one point in time when your
       | typical 1U rackmount server was single-core CPUs and quite weak
       | in overall math processing power.
       | 
       | OpenBSD had support for them like 22 years ago.
       | 
       | https://www.google.com/search?client=firefox-b-d&q=SSL+accel...
       | 
       | Now we have TLS1.2/TLS1.3 offload getting built into the PCI-E
       | 4.0 100/200/400GbE (whatever speed) NIC.
        
       | lprd wrote:
       | I wonder if they are using something like truenas or just
       | interfacing directly with OpenZFS (assuming they use ZFS).
        
       | BonoboIO wrote:
       | It amazes me, that Netflix is capable of such top of the line
       | engineering things, but is for the love of god unable to stream
       | HD Content to my iPhone. Tried everything gigabit wifi, cellular
       | ...
       | 
       | It is better for me to pirate their content, play it with plex
       | and be happy. I pay for Netflix and it is absurd.
       | 
       | I think the best years are over for Netflix. The hard awakening
       | is here to make content that the users want and they are a
       | movie/tv content company, not primarily a ,,tech company".
        
         | Infinitesimus wrote:
         | Some ISPs throttle Netflix. Not sure your background but it
         | might be helpful to have more details about the type phone (I'd
         | expect a difference in 13 pro max vs an old 7) and ISP to see
         | if others have similar problems.
        
           | BonoboIO wrote:
           | iPhone XS iOS 15.4 Netflix App Updated A1 Telekom both mobile
           | and wifi. It does not even work at work with a different ISP.
           | 
           | Netflix has all the bandwidth data and metrics, but this is
           | not working since ages. Maybe a more basic setup on their end
           | would bring better results. Focus more and delivery, not 10
           | different UI versions, AB Tests, Batch Job Workflows and so
           | on. They Post on their engineering blog how they test
           | multiple TVs, multiple Profiles in encoding, great things,
           | but if the basics don't work ... well what is it good for.
           | 
           | I think they lost their focus.
        
       | ksec wrote:
       | And this is still on ConnectX-6 Dx, with PCI-Gen 5 and
       | ConnectX-7, Netflix should be able to push for _1.6Tbps_ per box.
       | This will hopefully keep drewg123 and his team busy for another
       | year :P
        
         | dragontamer wrote:
         | At that point, RAM itself would likely be the bottleneck.
         | 
         | But maybe DDR5 will come out by then and get this team busy
         | again lol.
        
           | wmf wrote:
           | Genoa does indeed have roughly double the memory bandwidth.
        
       | Moral_ wrote:
       | A lot of the reasons they've had to build most of this stuff
       | themselfs is because they decided for some reason to use freeBSD.
       | 
       | The NUMA work they did, I remember being in a meeting with them
       | as a Linux Developer at Intel at the time. They bought NVMe
       | drives or were saying they were going to buy NVMe drives from
       | Intel which got them access to "on the ground" kernel developers
       | and CPU people from Intel. Instead of talking about NVMe they
       | spent the entire meeting asking us about howt the Linux kernel
       | handles NUMA and corner cases around memory and scheudling. If I
       | recall correctly I think they asked if we could help them
       | upstream BSD code for NVMe and NUMA. I think in that meeting
       | there was even some L9 or super high up NUMA CPU guy from
       | Hillsborough they some how convinced to join.
       | 
       | The conversation and technical discussion was quite fun, but it
       | was sort of funny to us at the time they were having to do all
       | this work on the BSD kernel that was solved years ago for linux.
       | 
       | Technical debt I guess.
        
         | ksec wrote:
         | Is NUMA a solved issue on Linux? Correct me if I am wrong but I
         | was under the impression it may be better handled under certain
         | conditions, but NUMA, the problem in itself is hardly solved.
        
         | cperciva wrote:
         | Netflix tried Linux. FreeBSD worked better.
        
           | dboreham wrote:
           | By some definition of better.
        
             | trasz wrote:
             | It worked faster. It's a common misconception among newbies
             | that "Linux has NUMA" automatically means it will use NUMA
             | properly in a given workload. What it actually means is you
             | _should_ be able to use existing functionality. Sometimes
             | you'll only need to configure it, sometimes you'll need to
             | reimplement it from the scratch, and doing that in FreeBSD
             | is easier because there's less bloat.
        
           | throw0101c wrote:
           | *At the time when they created the OCA project.
           | 
           | If someone was going to do a similar comparison now the
           | results _could_ be different.
        
         | jeffbee wrote:
         | I still don't get the NUMA obsession here. It seems like they
         | could have saved a lot of effort and a huge number of
         | powerpoint slides by building a box with half of these
         | resources and no NUMA: one CPU socket with all the memory and
         | one PCIe root complex and all the disks and NICs attached
         | thereto. It would be half the size, draw half the power, and be
         | way easier to program.
        
           | Bluecobra wrote:
           | If you are buying servers at scale the costs will certainly
           | add up vs. buying two processors. If you buy single proc
           | servers, that is double the amount of chassis, rail kits,
           | power supplies, power cables, drives, iLO/iDRAC licenses,
           | etc.
        
             | dboreham wrote:
             | You can build motherboards with two or more completely
             | isolated sets of CPU and memory, that are physically
             | compatible with standard racks etc.
        
               | Bluecobra wrote:
               | Good point, I forgot about those. It would be interesting
               | to see if 1x PowerEdge C6525 with four single processor
               | nodes is cheaper than 2x Dell R7525 servers. The C6525
               | does support dual processor, so it does seem a bit
               | wasteful to me.
        
           | muststopmyths wrote:
           | Can you buy non NUMA mainstream CPUs though? Honest question
           | because I'd love to be rid of that BS too
        
             | jeffbee wrote:
             | NUMA is an outcome of system configuration. You can make a
             | non-NUMA platform using any CPU. You just limit yourself to
             | 1 CPU socket.
             | 
             | Here's a Facebook engineering blog post about how they left
             | NUMA behind. https://engineering.fb.com/2016/03/09/data-
             | center-engineerin...
        
               | Dylan16807 wrote:
               | > You can make a non-NUMA platform using any CPU. You
               | just limit yourself to 1 CPU socket.
               | 
               | Well, not on Epyc generation 1. Those have four NUMA
               | segments in each socket.
               | 
               | Also those Xeon Platinum 9200 processors Intel made as an
               | attention grab.
        
               | jeffbee wrote:
               | EPYC Naples wasn't good for much of anything though, so I
               | am trying to forget it.
        
           | drewg123 wrote:
           | This is a testbed to see what breaks at higher speed. Our
           | normal production platforms are indeed single socket and run
           | at 1/2 this speed. I've identified all kinds of unexpected
           | bottlenecks on this testbed, so it has been worth it.
           | 
           | We invested in NUMA back when Intel was the only game in
           | town, and they refused to give enough IO and memory bandwidth
           | per-socket to scale to 200Gb/s. Then AMD EPYC came along. And
           | even though Naples was single-socket, you had to treat it as
           | NUMA to get performance out of it. With Rome and Milan, you
           | can run them in 1NPS mode and still get good performance, so
           | NUMA is used mainly for forward looking performance testbeds.
        
       | pclmulqdq wrote:
       | This is amazing work from the Netflix team. I'm looking forward
       | to 1.6 Tb/s in 4 years.
       | 
       | It is interesting that this work is happening on FreeBSD, and
       | potentially with diverging implementations than Linux. Linux
       | programs seem to be moving towards userspace getting more power,
       | with things like io_uring and increasing use of frameworks like
       | DPDK/SPDK. This work is all about getting userspace out of the
       | way, with things like async sendfile and kernel TLS. That's
       | pretty neat!
        
         | mgerdts wrote:
         | PCIe Gen 5 drives look poised for wide availability next year
         | and NVIDIA has been demoing CX7 [1] which is also PCIe Gen 5.
         | Intel already has some Gen 5 chips and AMD looks like they will
         | follow soon [2]. Surely there will be other bumps, but I bet
         | they pull it off in way less than 4 years.
         | 
         | 1. https://www.servethehome.com/nvidia-connectx-7-shown-at-
         | isc-...
         | 
         | 2. https://wccftech.com/amd-epyc-7004-genoa-32-zen-4-core-
         | cpu-s...
        
         | the8472 wrote:
         | kTLS has been added to linux too including offload. It also has
         | p2p-dma, so in principle you can shovel the file directly from
         | NVMe to the NIC and have the NIC encrypt it, so it'll never
         | touch the CPU or main memory. But that only works on specific
         | hardware.
        
           | [deleted]
        
           | robocat wrote:
           | Memory is the cache for popular content. You couldn't serve
           | fast enough directly from NVMe.
           | 
           | "~200GB/sec of memory bandwidth is needed to serve 800Gb/s"
           | and "16x Intel Gen4 x4 14TB NVME". So each NVMe drive would
           | need to serve 12.5GB/s which is more than the 8GB/s limit for
           | PCIe 4.0 x4. Also popular content would need to be on every
           | drive, drastically lowering the total content stored.
           | 
           | Also see drewg's comment on this for a different reason:
           | https://news.ycombinator.com/item?id=32523509
        
             | pclmulqdq wrote:
             | With HBM2 sapphire rapids chips, I assume you can actually
             | get there. There is probably an insane price premium for
             | them, though, so I wouldn't hold my breath.
        
       | mschuster91 wrote:
       | They... serve 800 gigabytes a second on _one single content
       | server_ , do I get that right?
        
         | plucas wrote:
         | Gigabits, I presume, so 100 GB/s.
        
         | Aissen wrote:
         | Almost, it's 800 gigabits. Still _a lot_.
        
       | la64710 wrote:
       | Great engineering but how does this 800Gb/s throughput achieved
       | translate downstream all the way to the consumers? I suspect
       | there may be switches and routers from ISPs and others that
       | Netflix do not control in between that will reduce the effective
       | throughout to the end user.
        
         | kkielhofner wrote:
         | ISP routers have been more-or-less indistinguishable from
         | switches for decades at this point. They're all "line rate"
         | which is to say that regardless of features, packet size, etc
         | they'll push traffic between interfaces at whatever the
         | physical link is capable of without breaking a sweat.
         | 
         | In the case of Netflix it is in the ISPs best interest to let
         | them push as much traffic to their customer eyeballs as
         | possible. After all, it's much "easier" and cheaper to build
         | out your internal fabric and network (which you have to do
         | anyway for the traffic) than it is to buy and/or acquire
         | transit to "the internet" for this level of traffic.
        
           | vbernat wrote:
           | Modern routers are unable to do line-rate regardless of
           | packet size. See for example the Q100 ASIC from Cisco. Rated
           | for 10.8 Tbps, it is only able to achieve 6 Bpps [1]. So it
           | needs 200-byte packets to hit line-rate. However, as for
           | Netflix, this is not problem since they only push big
           | packets.
           | 
           | [1]: https://xrdocs.io/8000/tutorials/8201-architecture-
           | performan...
        
             | kkielhofner wrote:
             | Wow, I've been out of this space for a while! Last I was
             | paying close attention to any of this 10G ports were new.
             | Glad I learned something from my old life today!
             | 
             | I stand corrected on "always line rate all the time in any
             | circumstance" but by your math and my general point < 1
             | Tbps from one of these appliances across multiple 100G
             | ports isn't problematic in the least from a hardware
             | standpoint - especially for the Netflix traffic pattern
             | with relatively full (if not max MTU) packets.
        
             | [deleted]
        
         | Cyph0n wrote:
         | On the contrary, even older routers can handle this load with
         | no sweat. Service provider-grade routers can handle 10 to 200
         | Tbps depending on size.
        
           | Aeolun wrote:
           | But then it gets to my home and it's trashed down to
           | 100Mbit/s
        
             | rayiner wrote:
             | Of course--the fat backbone pipe is progressively split
             | into smaller pipes as it gets to your house. The internet
             | isn't a big truck. It's a series of tubes.
        
             | jeffbee wrote:
             | That would be more than enough to watch half a dozen
             | Netflix streams at the same time.
        
         | paxys wrote:
         | The 800 Gb/s isn't going to a single user. There are switches
         | and routers in the middle, sure, but they are all doing their
         | job, which is to split up traffic. The end user only needs ~8
         | Mb/s for a 4K stream.
        
       | jiripospisil wrote:
       | Here's the accompanying video:
       | https://cdnapisec.kaltura.com/index.php/extwidget/preview/pa...
        
         | antonio-ramadas wrote:
         | I found the same video on the website of the summit:
         | https://nabstreamingsummit.com/videos/2022vegas/
         | 
         | I'm on mobile and there does not seem to exist a direct link.
         | Search for: "Case Study: Serving Netflix Video Traffic at
         | 400Gb/s and Beyond"
        
       | Aissen wrote:
       | How do you deal with the higher power density of these servers
       | that needs to be put at the ISP locations ? Don't they have some
       | constraints for the open connect machines ?
        
         | Bluecobra wrote:
         | Delivering power is not the problem, cooling is. You can load
         | up a cabinet with four 60A PDU's (~50kW) but the challenge is
         | to cool all that hardware you packed in the cabinet.
        
           | Aissen wrote:
           | Yeah, I was including that in the budget (server fans), but
           | technically you're correct, DC cooling is powered separately.
        
         | kkielhofner wrote:
         | The Dell R7525 chassis is available with dual 800w power
         | supplies. General thinking for power supplies is that each
         | power supply is connected to completely separate power
         | distribution - independent cabling, battery backup, generators,
         | and points of entry to the facility. In many cases it's also
         | two different power grids. This is so that if one power source
         | fails anywhere the load can move over to the other power supply
         | without exceeding the power that can be delivered through a
         | single power supply or trip a breaker anywhere. Under normal
         | operating conditions each power supply is doing half the load.
         | 
         | Additionally, the National Electric Code in the US specifies
         | that continuous load should not exceed 80% of given
         | circuit/breaker capacity.
         | 
         | So with dual 800 power supplies at "max" 80% load that's "only"
         | 640 watts for one of these 2U servers. For 208V power that's
         | only 3 amps. High density (for sure) compared to the old days
         | but not as ridiculous as it may seem.
        
           | Aissen wrote:
           | You're right, it's not that much for 2U. But for this config,
           | I think they'd probably go for the 1400W power supplies:
           | 
           | - 16 x 25W SSDs
           | 
           | - 2 x 225W CPUs
           | 
           | - On top of that, add RAM, cooling, etc.
           | 
           | Honestly, it's still manageable. I doubt they'd put 10 of
           | those in a single rack (you'd need an ISP that would want
           | serve 2.2M subscribers in peak from a single location, not
           | necessarily desirable on their side); but if the site is
           | getting full, you'd feel the (power) pressure (slowly)
           | mounting.
        
             | kkielhofner wrote:
             | I didn't dig into the CPU config, etc but you're right
             | they'd probably go for the 1400W power supplies which is
             | 5.4 amps max at 208V. It's for an older config (based on
             | the other specs) but the current Netflix OpenConnect docs
             | call for 750w[0] which is more reasonable even for this
             | hardware configuration because no one really wants to
             | consistently run their power supplies at 80% (even in
             | branch loss) for obvious reasons.
             | 
             | They absolutely wouldn't want to concentrate them. The
             | entire purpose is to reduce ISP network load and get as
             | close to the customer eyeballs as possible. I don't have
             | any experience with these but I imagine ISPs would install
             | them at their peering "hubs" in major cities - in my
             | experience the usual suspects like Chicago, NYC, Miami,
             | etc.
             | 
             | [0] - https://openconnect.zendesk.com/hc/en-
             | us/articles/3600345383...
        
       | gmm1990 wrote:
       | Why use/make processors with numa if you have to go to all this
       | trouble not to use it?
        
         | 0x457 wrote:
         | Well, the point of NUMA is to allow you to do things like in
         | slides rather than - everyone suffers equally talking to North
         | Bridge. Fabric between NUMA nodes isn't the selling point -
         | fast and direct connection between CPU and other components is.
         | 
         | Plus, not every workload is: read from disk -> encrypt -> send
         | to nic
        
       | 0x500x79 wrote:
       | Awesome feats of engineering here taking hardware and software
       | into account when designing the system for a holistic approach to
       | serving content as quickly as possible!
       | 
       | The slide deck background though: At least half of the products
       | in the slide deck template are no longer on Netflix...
        
       | Joel_Mckay wrote:
       | How would this compare with 42 server slots running 100Gbps DRBD
       | in RAID 0? If I recall, it can pre-shard the data based on a
       | round-robin balancer. ;)
        
         | drewg123 wrote:
         | We don't consider solutions like DRBD that introduce inter-
         | dependencies between servers. Any CDN server has to be able to
         | fail and not take down our service.
        
       | ajross wrote:
       | So... the driver and device level seems happy here, but is anyone
       | else creeped out by "asynchronous sendfile()"? I mean, how do you
       | even specify that? You have a giant file you want dumped down the
       | pipe, so you call it, and... just walk away? How do you report
       | errors? What happens to all the data buffered if the other side
       | resets the connection? What happens if the connection just
       | stalls?
       | 
       | In synchronous IO paradigms, this is all managed by the
       | application with whatever logic the app author wants to
       | implement. You can report the errors, ping monitoring, whatever.
       | 
       | But with this async thing, what's the API for that? Do you have
       | to write kernel code that lives above the driver to implement
       | devops logic? How would one even document that?
       | 
       | +1 for the technical wizardry, but seems like it's going to be a
       | long road from here to an "OS" feature that can be documented.
        
         | 0x457 wrote:
         | Here is announcement for the feature:
         | https://www.nginx.com/blog/nginx-and-netflix-contribute-new-...
         | 
         | And here are the slides explaining it:
         | https://www.slideshare.net/facepalmtarbz2/new-sendfile-in-en...
         | 
         | There are video of various talks by Gleb Smirnoff explaining
         | all this magic on YouTube.
         | 
         | The feature is fully documented in `man 2 sendfile`, it was
         | part of the patch that did the work.
        
         | zackmorris wrote:
         | This was my thought too. I've been struggling with the concept
         | of "if you don't have anything nice to say, don't say anything
         | at all" lately, because I've been programming too long and just
         | see poison pills and better alternatives everywhere I look.
         | 
         | But I believe that async is an anti-pattern. From the article:
         | * When an nginx worker is blocked, it cannot service other
         | requests       * Solutions to prevent nginx from blocking like
         | aio or thread pools scale poorly
         | 
         | Nothing against nginx (I use it all the time, it's great) but I
         | probably would have used a synchronous blocking approach. The
         | bottleneck there would be artificial limits on stuff like I/O
         | and the number of available sockets or processes.
         | 
         | So.. why isn't anyone addressing these contrived limits of sync
         | blocking I/O at a fundamental level? We pretend that context
         | switching overhead is real, but it's not. It's an artifact of
         | poorly written kernels from 30+ years ago (especially in
         | Windows) where too many registers and too much thread state
         | must be saved while swapping threads. We're basically all
         | working around the fact that the big players have traditionally
         | dragged their feet on refactoring that latency.
         | 
         | And that some of the more performant approaches like atomic
         | operations using compare and swap (CAS) on thread-safe queues
         | beat locks/mutexes/semaphores. And that content-addressable
         | memory with multiple busses or even network storage beats
         | vertical scaling optimizations.
         | 
         | So I dunno, once again this feels like kind of a drink-the-
         | kool-aid article. If we had a better sync blocking foundation,
         | then a simple blocking shell script could serve video and this
         | whole PDF basically goes away. Rinse, repeat with most web
         | programming too, where miles-long async code becomes a single
         | deterministic blocking function that anyone can understand.
         | 
         | I'm kind of reaching the point where I expect more from big
         | companies to fix the actual root causes that force these async
         | workarounds. I kind of gave up on stuff like that over the last
         | 10 years, so am behind the times on improvements to sync
         | blocking kernel code. I'd love to hear if anyone knows of an OS
         | that excels at that.
        
           | 0x457 wrote:
           | Slide 25 shows benchmark between "old" sendfile and "new"
           | sendfile:
           | 
           | https://www.slideshare.net/facepalmtarbz2/new-sendfile-in-
           | en...
           | 
           | > but I probably would have used a synchronous blocking
           | approach.
           | 
           | Well, send a patch, then.
        
           | wmf wrote:
           | _I probably would have used a synchronous blocking approach_
           | 
           | Then Varnish is probably more your style. (A discussion
           | between phk and drewg would be fascinating to watch.)
           | 
           |  _We pretend that context switching overhead is real, but it
           | 's not._
           | 
           | This sounds crackpot to be honest. Linux has put a lot of
           | effort into optimizing context switching (that's why they
           | have NPTL instead of M:N) and I assume FreeBSD has as well.
           | 
           |  _...this whole PDF basically goes away_
           | 
           | Sync vs. async doesn't solve any of the NUMA or TLS issues
           | that this whole PDF is about.
        
         | drewg123 wrote:
         | This has been a feature upstream in FreeBSD for roughly 6
         | years.
         | 
         | If there is a connection RST, then the data buffered in the
         | kernel is released (either freed immediately, or put into the
         | page cache, depending on SF_NOCACHE).
         | 
         | sendfile_iodone() is called for completion. If there is no
         | error, it marks the mbufs on the socket buffer holding the
         | pages that were recently brought in as ready, and pokes the TCP
         | stack to send them. If there was an error, it calls TCP's
         | pru_abort() function to tear down the connection and release
         | what's sitting on the socket buffer. See
         | https://github.com/freebsd/freebsd-src/blob/main/sys/kern/ke...
        
       | mrbonner wrote:
       | My work requires to deal with political crap day long to get
       | promoted to a staff role. I miss this kind of work.
        
         | ryanianian wrote:
         | > deal with political crap day long to get promoted to a staff
         | role
         | 
         | That's largely what many staff+ engineers have to do, even in
         | otherwise healthy organizations. "Staff" isn't a glorified,
         | autonomous, and stress-free version of senior at most
         | companies. There's nothing wrong with staying at the senior
         | level indefinitely provided (1) the pay and other factors are
         | keeping up with your contributions and (2) the staff+ and
         | management folks are being effective umbrellas for the politics
         | and messy uninteresting details behind interesting problems
         | like this.
        
       | touisteur wrote:
       | Will be fun to see what can be done with pcie5 stuff and new 400g
       | NICs. Really amazed by the recent increase in bandwidth. Sfp56
       | recently becoming 'mainstream' in datacenters with 200G
       | controlers at <1500 each, you can stuff 8 or 10 of those in your
       | server. And you get immediate x2 with next gen. If you can
       | offload some of the heavy work to one (or several) GPUs or these
       | FPGA accelerator boards (Alveo or more niche but also crazy
       | ReflexCES with eth-800G capability) you're really starting to get
       | a 'datacenter in a box' system. If compacity is important, the
       | next years are going to be very interesting.
        
       | jwmoz wrote:
       | I've wondered how they achieve it and it's so far beyond my
       | knowledge and skills, truly astounding. The level of expertise
       | and costs must be so high.
        
         | Aeolun wrote:
         | Spend a few years just thinking of how to optimize video
         | delivery and you'd be a lot closer to understanding :)
        
       | throw0101c wrote:
       | Previous discussions on 400Gb/s:
       | 
       | * https://papers.freebsd.org/2021/eurobsdcon/gallatin-netflix-...
       | 
       | * https://news.ycombinator.com/item?id=28584738
        
       | skynetv2 wrote:
       | This is amazing work. I cant help but state that we have been
       | doing these in HPC environments for at least 15 years - User
       | space networking, offloads, NUMA domains aware scheduling, jitter
       | reduction ... great to see it being put to good use in more
       | mainstream workloads. Goes to show - software is eating the
       | world.
        
         | drewg123 wrote:
         | I worked in HPC as well, and I have to point out emphatically
         | that this _IS NOT USERSPACE NETWORKING_. The TCP stack resides
         | in the kernel. The vast majority of our CPU time is system +
         | interrupt time.
        
       | phantomathkg wrote:
       | > Serve only static media files This part is weird to me. My
       | understanding is DRM lock the file at a per user level, so the
       | DRM encrypted chunk would be different from yours. Unless for all
       | bitrate, and for all streaming format, Netflix has already pre-
       | computed everything. Otherwise, there must be some sort of pre-
       | computation before it can be served over TLS.
        
         | wmf wrote:
         | That's not how DRM works. The content is encrypted once and
         | that key is sent to the client. The content key is probably
         | wrapped in some per-session key (which may be wrapped in a per-
         | user key wrapped in a per-device key or something).
        
       | paxys wrote:
       | I love technical content like this. Not only is it incredibly
       | interesting and informative, it also serves as a perfect
       | counterpoint to the popular " _why does Netflix need X thousand
       | engineers, I could build it in a weekend_ " sentiment that is
       | frequently brought up on forums like this one.
       | 
       | Building software and productionalizing/scaling it are two very
       | different problems, and the latter is far more difficult. Running
       | a successful company always requires an unlimited number of very
       | smart people who are willing to get their hands dirty optimizing
       | every aspect of the product and business. Too many people today
       | think that programming starts and ends at pulling a dozen popular
       | libraries and making some API calls.
        
         | Aaronmacaron wrote:
         | I think the problem is that the "easy" parts of netflix such as
         | the UI or the recommendation engine seem like they were hacked
         | together over the weekend. Of course deploying and maintaining
         | something of the scale of netflix is incredibly hard. But if
         | they can afford thousands of engineers who optimize the
         | performance why can't they hire a few UI/UX engineers to fix
         | the godawful interface which is slightly different on every
         | device? I think this is where this sentiment stems from.
        
           | stackbutterflow wrote:
           | That's what puzzles me about Uber. I believe that behind the
           | scenes it does pretty complex things as explained many times
           | on HN, but it's the worst app I've ever used. UI and UX wise
           | it's so bad that if you told me it was a bootcamp graduation
           | project I'd have no problem believing you.
        
           | bradstewart wrote:
           | I honestly find Netflix's the easiest to navigate, by far.
           | 
           | Hulu did that big redesign, and it's extremely pretty to look
           | at, but even after a few years of trying to use it, I _still_
           | struggle to do anything other than  "resume episode". Finding
           | the previous episode, list episodes, etc is always an
           | exercise in randomly clicking, swiping, long pressing,
           | waiting for loading bars, etc.
           | 
           | One thing Netflix _really_ got right as well: the  "Watch It
           | Again" section. So many times I want to rewatch the episode I
           | just "finished" (because either my wife finished a show when
           | I leave the room, the kids fell off the table, I fell asleep
           | or wasn't paying attention, etc), and every other platform
           | makes this extremely difficult to find.
           | 
           | Back to Hulu--the only way I know how is the search feature,
           | which is a PITA with a remote.
        
           | Shaanie wrote:
           | I'm surprised you think Netflix's UI and UX is that poor.
           | Which streaming service do you think does a better job?
        
             | hugey010 wrote:
             | None of them, since they basically all copied Netflix! The
             | grid view limits users to slowly looking over limited
             | categories of content. Any list based tree structure would
             | be better in my opinion.
        
             | zasdffaa wrote:
             | Why do you say "UI and UX"; how are they different in your
             | view?
             | 
             | Jargon BS is invading people's heads and it has to stop.
        
             | brabel wrote:
             | Not OP, but I think the Swedish TV streaming service has a
             | simpler, while nicer UX (hope you can at least see this
             | from your country, if not play the content):
             | https://www.svtplay.se/
             | 
             | Admitedly, it follows the same pattern as Netflix, but I
             | like how it's more responsive and feels way
             | simpler/lighter.
        
               | paxys wrote:
               | You just linked to an exact copy of Netflix.
        
           | NavinF wrote:
           | > godawful interface which is slightly different on every
           | device
           | 
           | Which devices are you referring to? I've only used the PC and
           | mobile interfaces both of which are quite pleasant.
        
           | paxys wrote:
           | Technically speaking I think Netflix's UX blows every other
           | streaming app out of the water. It loads instantly, scrolling
           | is smooth, search is instant. Buttons are where you'd expect
           | and do what you expect. They have well-performing and up-to-
           | date apps for every conceivable device and appliance. They
           | support all the latest audio and video codecs.
           | 
           | This is all in stark contrast to services like HBO Max and
           | Disney+ which still stutter and crash multiple times a day.
           | Amazon for some reason treats every season of a TV show and
           | HD/SD versions of movies as independent items in their
           | library. I still haven't been able to download a HBO Max
           | video for offline viewing on iOS without the app crashing on
           | me at 99%.
           | 
           | The problems you mention with Netflix are real, but they have
           | more to do with the business side of things. Netflix
           | recommendations seem crap because they don't have a lot of
           | third party content to recommend in the first place. Their
           | front page layout is optimized to maximize repetition and
           | make their library seem larger. They steer viewers to their
           | own shows because that's what the business team wants. None
           | of these are problems you can fix by reassigning engineers.
        
             | P5fRxh5kUvp2th wrote:
             | My complaint about netflix UI/UX aren't technical in
             | nature, I agree with you their player is the best out
             | there, hands down.
             | 
             | The issue is the business polices surrounding it. The UI
             | itself is user-hostile.
        
             | rurp wrote:
             | > Buttons are where you'd expect and do what you expect.
             | 
             | Wait, what? Netflix is the absolute worst at this. Every
             | time I log in the interface is different! Netflix could not
             | care less about users having a consistent seamless
             | experience.
             | 
             | But as far as performance goes, I totally agree with you.
             | The performance is impressively good and noticeably better
             | than the other streaming apps I use.
             | 
             | The UX is just so bad in so many ways (UI churn, autoplay,
             | useless ratings, useless categories, recaps that can be
             | watched exactly once, and so on...) it mostly ruins the app
             | for me. The actual video quality is great though.
        
               | 0x457 wrote:
               | Interface is the same, order of rows is different. Yes,
               | it sucks. However, other streaming apps are much worse:
               | 
               | - by the time HBO Max finish loading, I've already lost
               | interest
               | 
               | - Amazon Prime constantly gives me errors, and it's often
               | hard to find what you paid for and what you have to pay
               | for
               | 
               | - Paramount+ often restart episode from beginning instead
               | of resuming.
               | 
               | - Many leave shit in your queue with a few seconds left
               | for you to "Continue Watching". I still have shows in
               | Paramount+ that I've finished months ago in the queue,
               | and there is no way to delete them without watching end
               | credits. - HBO Max only allows you FF in small fixed
               | intervals
               | 
               | - Plex...used to be okay, now it's pushing its streaming
               | services and works very bad offline
               | 
               | - Apple TV has awful offline experience compared to
               | netflix in terms of UX
               | 
               | Nah, I will take netflix constantly changing rows over
               | shit others do.
        
         | xnx wrote:
         | > Building software and productionalizing/scaling it are two
         | very different problems, and the latter is far more difficult.
         | 
         | Is this claim based on some example I should know? Countless
         | companies never achieve product/market fit, but very few I can
         | think of fail because they weren't able to handle all their
         | customers.
        
         | ternaryoperator wrote:
         | This! I am frustrated at how often devs will not accept that
         | simple things become incredibly complicated at scale. That
         | favorite coding technique? That container you wrote? Those
         | tests you added? All good, but until you've tested them at
         | scale, don't assert that everyone should use them. This dynamic
         | is true in the other direction too: that techniques often taken
         | for granted simply are not feasible in highly resource-
         | constrained environments. With rare exception, the best we can
         | say with accuracy is that "I find X works well enough in the
         | typical situations I code for."
        
         | geodel wrote:
         | Seems to be mixing too many things here. Many scaling/ hardware
         | challenges need a lot of people but it can still be true that
         | Netflix has choke full of engineers making half-assed turd Java
         | frameworks day in and day out. I know this because we are
         | forced to use these crappy tools as they are made by Netflix so
         | supposed to be best.
         | 
         | It's just that they succeeded in streaming market with low
         | competition and great success bring in lot of post facto
         | justifications on how outrageously great Netflix tech infra is.
         | 
         | I mean it may be excellent for their purpose but to think their
         | solution can be industry wide replicated seems not true to me.
        
           | paxys wrote:
           | So Netflix published a framework which seemingly isn't
           | suitable for your use case, your managers forced you to use
           | it, and your response is to blame...Netflix?
        
           | tankenmate wrote:
           | Que? You don't seem to have much justification for your
           | points; it seems more like a rant as you have had a bad
           | experience using software provided by Netflix. It would be
           | great if you could provide more details about what was wrong
           | with it rather than just "we are forced to use these crappy
           | tools". I'm genuinely interested.
           | 
           | In my personal experience lots of companies (admittedly all
           | large companies, but many of which sell their services /
           | software / hardware to smaller companies) have a use for
           | serving hundreds of Gbps of static file traffic as cheaply as
           | possible. And the slides for this talk seem exactly on the
           | money (again from my experience slinging lots of static data
           | to lots of users).
        
         | AtNightWeCode wrote:
         | Scaling streaming for a company at the size of Netflix is very
         | easy. You can use any edge cache solution, even homemade. The
         | complexity at N seems to stem from other things.
        
           | yibg wrote:
           | This is exactly the type of comment OP is referring to. Have
           | you build a steaming service at this scale? Do you actually
           | know what's involved? Or are you just looking at the surface
           | level, making a bunch of assumptions and reaching a gut feel
           | conclusion?
        
           | n0tth3dro1ds wrote:
           | >You can use any edge cache solution
           | 
           | Umm, those solutions exist (from places like AWS and Azure)
           | _because_ Netflix was able to do it without them. The cloud
           | platforms recognized that others would want to build their
           | own streaming services, so they built video streaming
           | offerings.
           | 
           | You have the cart in front of the horse. The out-of-the-box
           | solutions of today don't exist without Netflix (and YouTube)
           | building a planet scale video solution first.
        
             | AtNightWeCode wrote:
             | N had problems in US because they served data from CA.
             | Today, N uses edge caching and the data for me in Europe is
             | sent less than 10km to my home. And it should be cheap. We
             | are talking about serving static content here. It is not
             | very difficult.
        
               | jedberg wrote:
               | Why do you think Netflix served out of California? They
               | only did that for the first few months, until they
               | adopted Akamai, Limelight, and L3 CDNs. That was long
               | before Netflix launched in Europe.
        
               | AtNightWeCode wrote:
               | Well they use to, they tried to bully various ISPs into
               | increasing their throughput before they jumped the edge
               | cache wagon, long time after competitors. Akamai is a
               | stellar company. Don't think N uses A services today. At
               | the end of the day. N mostly serves static content to
               | users and I highly doubt that hardware costs is a very
               | relevant parameter.
        
               | jedberg wrote:
               | With all due respect, you have no idea what you're
               | talking about. I worked there during the transition from
               | 3rd party CDNs to OpenConnect. We got off 3rd party CDNs
               | in 2013/4 and operated solely out of OpenConnect, in
               | large part because no 3rd party CDN was capable of
               | serving our amount of video at any price, including
               | Akamai. We weren't even streaming out of our own
               | datacenter anymore by the time I started, and that was
               | when streaming was still free with your DVD plan.
               | 
               | And your timeline is all wrong too. Netflix didn't even
               | engage with the ISPs about bandwidth until long after
               | moving out of our own datacenter. We started the
               | OpenConnect program specifically to make it easier for
               | ISPs, there was no bullying. The spat you're thinking of
               | is that Comcast didn't want to adopt the OpenConnect but
               | also didn't want to appropriately peer with other
               | networks to give their customers the advertised speeds.
               | 
               | And hardware cost is a hugely relevant parameter. Being
               | efficient with hardware is the difference between
               | profitable streaming at that scale and not profitable.
        
               | AtNightWeCode wrote:
               | You mean all the heat maps provided by Comcast and so on
               | from 2014(?) are incorrect? That they lied about all the
               | traffic from CA caused by N?
        
               | jedberg wrote:
               | Please link those heat maps. I think you're reading them
               | wrong.
        
           | 0x457 wrote:
           | > any edge cache solution
           | 
           | Someone still has to do the R&D for edge cache? These slides
           | are about Open Connect - their own edge cache solution that
           | gets installed in partners racks (i.e. ISPs and Exchanges).
           | Before things that Netflix and Nginx implemented in FreeBSD,
           | hardware compute power was wasted on various things they
           | discuss in slides.
           | 
           | Yes, you can throw money at the problem and buy more
           | hardware.
        
             | AtNightWeCode wrote:
             | Fair. Point taken. I answered the comment not the article.
        
         | seydor wrote:
         | I dont see the point. A centralized data hose that is replacing
         | what internet was designed to be : a decentralized, multi
         | routed network. The problem may be useful to them, but unlikely
         | to be useful to anyone who doesn't already work there. I dunno,
         | if it was possible to monetize decentralized or bittorrent
         | video hosting, i think it would solve the problem in a more
         | interesting and resilient way. With fewer engineers.
         | 
         | But it's like, every discussion today must end with something
         | about the pay and head count of engineers.
        
           | paxys wrote:
           | While we are at it let's just put video streaming on the
           | blockchain! Who needs all these engineers and servers.
        
             | jedberg wrote:
             | But only seven people can stream at once!
        
               | RexM wrote:
               | Once you download the chain you can watch anything you
               | want! You'll have a local copy of _everything_
        
           | oleganza wrote:
           | I understand and even share a little bit of your sentiment,
           | but I'm tired of stretched "X is now not what X was supposed
           | to be".
           | 
           | Strictly speaking, the Internet was supposed to help some
           | servers survive and continue working together despite some
           | others being destroyed by a nuke. That is more-or-less the
           | case today: we see how people use VPNs to route around
           | censorship. Whether you were supposed to stream TikTok videos
           | directly from the phones of their authors or through a
           | centralized data hose - i'm not sure that was ever the grand
           | idea.
           | 
           | Also "decentralized" and "monetize" don't go well together
           | because innovation is stimulated by profit margins and rent-
           | free decentralized solutions by definition have those margins
           | equal to zero (otherwise the solution is not decentralized
           | enough).
        
           | jedberg wrote:
           | It's funny you mention this. When I worked at Netflix, we
           | looked at making streaming peer to peer. There were a lot of
           | problems with it though. Privacy issues, most people have
           | terrible upload bandwidth from home, people didn't like the
           | idea of their hardware serving other customers, home hardware
           | is flakey so you'd constantly be doing client selection, and
           | other problems.
           | 
           | So it turns out decentralized multi routed is not a good
           | solution for video streaming.
        
             | gizajob wrote:
             | Works great for storing pirated content though
        
               | jedberg wrote:
               | Usually you aren't live streaming your pirated content
               | right off other people's boxes. You download it first and
               | then view it. So you don't need every chunk available at
               | just the right time.
        
               | monocasa wrote:
               | Popcorn Time worked pretty well with just that model;
               | watching more or less immediately is it's downloaded in
               | order from the swarm.
        
               | jedberg wrote:
               | Did you actually use Popcorn time? It got stuck all the
               | time waiting for a chunk. Also, again, people sharing
               | pirated content don't care about privacy and are happy to
               | share their home hardware for other people to use. Paying
               | customers care about that stuff.
        
               | monocasa wrote:
               | I have; it worked flawlessly for content that was
               | decently seeded. And that's without the sorts of table
               | stakes you'd expect for a streaming platform like the
               | same content encoded at different bit rates, but chunked
               | on the same boundaries so you can dynamically change
               | bitrate as your buffer depletes.
               | 
               | And I'm not sure most people actually care if their home
               | hardware is being used for whatever by the service
               | they're using, or else there'd be pushback on electron
               | apps from more than just HN.
               | 
               | The sense I always got from Netflix's P2P work was that
               | it was heavily tied into the political battles wrt the BS
               | arguments that Netflix should pay for peering with tier 2
               | ISPs. Did this work there continue much after that
               | problem went quieter?
        
               | PaywallBuster wrote:
               | Used it dozens of times, usually works fine for the
               | popular content.
               | 
               | Good quality, barely any buffering.
               | 
               | The niche content may be too difficult for a "live"
               | streaming experience.
        
               | naet wrote:
               | Somebody I know (cough) starts torrent downloads in
               | sequential order after downloading the first and last
               | chunk, and then opens the file in VLC while it is
               | downloading.
               | 
               | Works amazingly well for watching something front to back
               | if your download speed is fast enough; you'd never know
               | it wasn't being streamed. The hardest part is finding a
               | good torrent for what you want to watch. Ironically the
               | Netflix catalog is one of the most easily available to
               | pirate since people rip it directly from web.
        
             | seydor wrote:
             | recently i see a lot of people with very high upload
             | speeds. Nobody is using them though, but nominally they are
             | there.
        
               | jedberg wrote:
               | Sure, very recently. But all the other issues still
               | apply. A real time feed from random people's machines is
               | very difficult at best.
        
               | seydor wrote:
               | I ve watched a lot of HBO (not available here) on popcorn
               | time
        
             | chasd00 wrote:
             | wouldn't a peer-to-peer setup be a non-starter legally?
             | ..or at least incredibly high risk. I could see major ISPs
             | complaining if Netflix is using the upstream side of the
             | ISP's customers for profit.
        
               | jedberg wrote:
               | Yes. :)
        
               | tinus_hn wrote:
               | No, Microsoft is doing the same thing and nobody cares.
               | Just mention it the small print in the agreement and
               | offer a way to turn it off.
        
         | onlyrealcuzzo wrote:
         | To be pedantic, scaling by itself isn't _that_ difficult.
         | 
         | Scaling cost-effectively is.
        
           | sllabres wrote:
           | Tell that e.g. Tesla
           | 
           | What I've read they burned a lot of money and hat large
           | problems scaling nevertheless. Which I don't find too
           | surprising, not because they are unable, but because it isn't
           | easy to scale.
           | 
           | From my experience and from what I read scaling people
           | roughly a power of ten is a larger change in an organisation
           | and therefor likely a challenge. For _any_ technical process
           | the boundaries might not be strictly a power of ten but i
           | would say that scaling a power of a hundred is a challenge if
           | this value is not already reached on any process in your
           | organisation.
        
             | onlyrealcuzzo wrote:
             | True.
             | 
             | Scaling to - say - Paramount+ size should not be difficult
             | if you're willing to pay AWS / Azure / GCP 10-100x what it
             | would cost to serve it yourself (which in many cases
             | actually makes sense).
             | 
             | It's possible at Netflix's size, they couldn't just run on
             | AWS anymore. Though, given enough lead time and a realistic
             | growth curve - I'm sure it's feasible.
             | 
             | Obviously scaling manufacturing is not a solved problem
             | like (realistically) scaling network and compute usage.
        
               | yibg wrote:
               | Serving Netflix streaming traffic from AWS would be...
               | unwise. One the bandwidth cost would be enormous even if
               | they can handle it. And two I doubt they can handle that
               | much traffic.
        
           | eru wrote:
           | Yes, and No. At some point, even scaling at all would be
           | hard.
           | 
           | (Just like sending a human to Alpha Centauri is hard, even if
           | you had unlimited funds.)
        
             | Dylan16807 wrote:
             | Like it how? Accomplishing a grand feat is nearly the
             | opposite of scaling.
             | 
             | If Netflix built out more slower servers, that would be
             | acceptable scaling. I don't see any plausible scenario
             | where that becomes too difficult. Even if they had billions
             | of subscribers.
        
             | toast0 wrote:
             | Eh, sending a human to Alpha Centauri wouldn't be that
             | hard... Although it would be difficult to know for sure if
             | they arrived, and for ease of transport, you may want to
             | send a dead human.
        
           | kaba0 wrote:
           | It depends entirely on the problem domain. Sure, it is more
           | of a devops problem when the problem is trivially
           | parallelizable, but often you have a bottleneck service (e.g.
           | the database) that has to run on a single machine. No matter
           | how many instance serves the frontend * if every call will
           | have to pass through that single machine.
           | 
           | * after a certain scale
        
         | le-mark wrote:
         | > Too many people today think that programming starts and ends
         | at pulling a dozen popular libraries and making some API calls.
         | 
         | The needle keeps moving doesn't it? A tremendous breadth of
         | difficult problems can be effectively addressed by pulling
         | together libraries and calling APIs today that weren't possible
         | before. Today's hard problems are yesterday impossibilities.
         | The challenge for those seeking to make an impact is to dream
         | big enough.
        
           | ezconnect wrote:
           | The basic problem is the same, pushing the hardware to its
           | limits.
        
             | whatshisface wrote:
             | The basic problem is delivering value to someone.
        
               | ezconnect wrote:
               | Programmers are not passionate to deliver value to
               | someone, that's the businessman problem.
        
               | bcrosby95 wrote:
               | Not every programmer is passionate about the same thing.
               | I got into this field because I love building things that
               | make people's lives easier.
        
               | echelon wrote:
               | Sure.
               | 
               | Anecdotal, but most of the people I've worked with as ICs
               | couldn't give a damn about that. They want dollarydoos.
               | 
               | One of the 10X-ers I know (they exist and are real), told
               | me repeatedly how he'd much rather be doing his own
               | thing. He hates the business needs. But income is
               | important and that's why he's dedicated to doing it. I'm
               | surprised at how focused and good he is given his
               | disposition, and I want to hire him when I scale my
               | business more. Drive and passion are sometimes just
               | spontaneous.
               | 
               | An old CEO of mine even quipped that we were not family
               | and that we were there to do a job. All true. Most of the
               | people doing that job were only there for the money.
               | 
               | Most jobs that drive sales and revenue simply aren't fun
               | or rewarding. There's lots of infrastructural glue and
               | scaling. Tiring, boring, monotonous work. 24/7 oncall
               | work. The money is good, though.
        
               | [deleted]
        
           | [deleted]
        
         | nwallin wrote:
         | > the popular "why does Netflix need X thousand engineers, I
         | could build it in a weekend" sentiment that is frequently
         | brought up on forums like this one.
         | 
         | I don't think that's a popular sentiment about Netflix.
         | Twitter, Reddit, Facebook, yes, but Netflix, YouTube, Zoom, not
         | so much.
        
         | mihaic wrote:
         | I don't think this actually answers why Netflix needs to many
         | engineers. This seems like the sort of thing that one or two
         | experienced engineers would spend a year refining, and it would
         | turn out like this.
         | 
         | This is the sort of impressive work that I've never seen scale.
        
           | drewg123 wrote:
           | Author here... Yes, most of this work was done by me, with
           | help from a handful of us on the OCA kernel team at Netflix
           | (and external FreeBSD developers), and our vendor partners
           | (Mellanox/NVIDIA).
           | 
           | With that said, we are standing on the shoulders of giants.
           | There are tons of other optimizations not mentioned in this
           | talk where removing any one of them could tank performance.
           | I'm giving a talk about that at EuroBSDCon next month.
        
         | tetha wrote:
         | The way I've been putting it to people lately is: Never
         | underestimate how hard a problem can grow by making it big. And
         | also, at times, it is hard to appreciate how difficult
         | something becomes if you haven't walked the path at least
         | partially.
         | 
         | Like, from work, hosting postgres. At this point, I very much
         | understand why a consultant once said - "You cannot make
         | mistakes in a postgres 10GB or 100GB and a dozen transactions
         | per second in size". And he's right, give it some hardware,
         | don't touch knobs except for 1 or 2 and that's it. The average
         | application accessing our postgres clusters is just too small
         | to cause problems.
         | 
         | And then we have 2 postgres clusters with a dataset size of 1TB
         | or 2TB peaking at like 300 - 400 transactions per second.
         | That's not necessarily big or busy for what postgres can do,
         | but it becomes noticeable that you have to do some things right
         | at this point and some patterns just stop working hard.
         | 
         | And then there are people dealing with postgres instances 100 -
         | 1000x bigger than this. And that's becoming tangibly awesome
         | and frightening by now, using awesome in a more oldschool way
         | there.
        
           | mlrtime wrote:
           | Not only make it big, engineer it in a way that makes it
           | profitable for the business.
           | 
           | I'm sure there are many teams that could design such a
           | network with nearly unlimited resources, but it is entirely
           | different when you have profit margins.
        
           | victor106 wrote:
           | As someone once said "Big is different"
        
         | Sytten wrote:
         | I think a fair criticism would be how many engineers they have
         | compared to their competitors. Disney+ is on a similar scale,
         | can they do the same/similar job with less people? And
         | considering netflix pays top of market, how much does Disney
         | spends for their engineering effort to get their result. Would
         | netflix benefit from just throwing more hardware at the problem
         | vs paying more engineers 400-500k/y to optimize?
        
           | paxys wrote:
           | Disney (the company) has 20x the number of employees as
           | Netflix, and just 2x the market cap (in fact they were
           | briefly worth the same last year), ~2x the revenue and 2/5
           | the net income. So Netflix is clearly doing something right.
        
             | eru wrote:
             | Perhaps they are just running different business models?
             | 
             | Walmart's market cap per employee is probably much, much
             | lower than Disney or Netflix, too. That doesn't mean
             | Walmart is doing anything wrong.
        
             | ziddoap wrote:
             | > _Disney (the company) has 20x the number of employees_
             | 
             | Is that all of Disney or just Disney+?
             | 
             | It doesn't seem like that would be a useful statistic if
             | that includes completely unrelated positions (e.g. does
             | that 20x statistic include Disney employees working at
             | Disney Land/World serving up hotdogs? Because they probably
             | don't contribute much to the streaming service)
        
               | briffle wrote:
               | Netflix also has production studios they now own making
               | content.
        
               | thfuran wrote:
               | Content like hotdogs at an amusement park?
        
               | diab0lic wrote:
               | https://bridgertonexperience.com/san-francisco/
               | 
               | https://strangerthings-experience.com/
        
             | scrlk wrote:
             | Disney Streaming had 850 employees as of 2017 [0] (can't
             | find any newer figures); LinkedIn is suggesting 1k-5k.
             | 
             | [0] https://en.wikipedia.org/wiki/Disney_Streaming
        
           | rybosworld wrote:
           | That seems like a fair point if you just consider the video
           | streaming. I know that Netflix wants to break into gaming.
           | I'd imagine the bandwidth required for that is higher than
           | streaming videos.
        
             | jon-wood wrote:
             | It's really not, especially if you look at their current
             | model for doing so. Netflix at the moment are breaking into
             | mobile gaming, which means the bandwidth requirements are
             | placed on Apple/Google's app store infrastructure. I'd be
             | surprised if Netflix don't have any sort of metrics
             | gathering infrastructure to judge how much people are
             | playing those games, but they're also likely reusing the
             | same infrastructure used by Netflix video streaming for
             | that, so the incremental increase in load may well be
             | negligible.
        
               | rybosworld wrote:
               | I was referring to their plans for a game streaming
               | service.
        
           | jwmoz wrote:
           | I watch Disney content sometimes and it constantly drops or
           | freezes, you can see the difference in quality compared to
           | Netflix.
        
             | bmurphy1976 wrote:
             | Yeah, you can totally see the difference. Netflix encoding
             | looks like shit.
             | 
             | I've done a lot of video processing professionally (the
             | server side stuff, exactly what Netflix does) and Netflix
             | is by far the worst of all the streaming providers. They
             | absolutely sacrifice the quality of the video to save
             | bandwidth costs in aggregate and it shows (or more
             | accurately it doesn't show, all the fidelity is lost).
        
               | mkmk wrote:
               | Do you think it's worth it to pay the extra $5-10/month
               | for premium quality?
               | https://help.netflix.com/en/node/24926
        
               | bradstewart wrote:
               | Even the Premium 4k streams have surprisingly low
               | bitrates and, occasionally, framerates. I dug out the
               | blu-ray player the other day and was absolutely shocked
               | how good things looked and, even more so, _sounded_ --the
               | audio quality from Netflix (and most streaming services,
               | really) is simply atrocious.
        
               | jedberg wrote:
               | Are you getting the best Netflix encodings? You might be
               | getting worse quality because your ISP throttles Netflix.
        
               | bagels wrote:
               | Your isp may be throttling bandwidth for Netflix, leading
               | to lower quality encodings being served to you. Comcast
               | does this, for instance.
        
             | nicce wrote:
             | You can't make such conclusions from your own experience.
             | It is one form of bias. There are many variables. For me it
             | is the opposite, for example.
        
           | iamricks wrote:
           | Standing on the shoulders of giants, Netflix engineers didn't
           | have blog posts from other companies on how to handle the
           | scale they started facing. Facebook didn't have blog posts to
           | reference when they scaled to 1B users. They pay for talent
           | that have built systems that had not been built before and
           | they have seen a return on it so they continue to do it.
        
             | wowokay wrote:
             | Hulu was around before netflix
        
               | gavin_gee wrote:
               | yeah and have you see the awful performance of Hulu? its
               | basically unusable. poster child for under investing in
               | the streaming platform.
        
               | paxys wrote:
               | Huh? Netflix predates Hulu by over a decade.
        
               | msh wrote:
               | Hulu was never Netflix scale. YouTube is a better
               | example.
        
               | birdyrooster wrote:
               | Not even close. YouTube has orders of magnitude more
               | content and vastly more users. Google Global Cache was
               | the inspiration for Open Connect.
        
               | jedberg wrote:
               | Youtube is very different than Netflix from a technical
               | problem perspective. They serve free videos to anyone
               | around the world that are uploaded by users.
               | 
               | It's closer to a live streaming problem than pre-encoded
               | video like Netflix.
               | 
               | Having worked at Netflix I can say that the YouTube
               | problem is much more complex.
        
               | why_only_15 wrote:
               | I wonder what portion of Youtube's request traffic can be
               | served with cache servers at the edge with a few hundred
               | terabytes of storage. There's a very long tail but i
               | would guess a significant portion of their traffic is the
               | top ~10,000 videos at any given moment.
        
               | spockz wrote:
               | There was a Google organised hackathon on this topic.
               | Given a set of resources, locations, and (estimated)
               | popularity, Optimise for video load time by determining
               | what should be moved to the cache when and where.
        
               | Cerium wrote:
               | Sure? "After an early beta test in Oct. of that year,
               | Hulu was made available to the public on March 12, 2008--
               | a year after Netflix launched its own streaming service."
               | 
               | [1] https://www.foxbusiness.com/technology/5-things-to-
               | know-abou...
        
             | esotericimpl wrote:
        
           | pclmulqdq wrote:
           | The engineers are definitely cost-effective at this scale.
           | They may be the highest-leverage engineers at the company in
           | terms of $ earned from their efforts compared to $ spent. The
           | improvements that come from performance engineers at large
           | companies are frequently worth $10M/year/person or more.
           | 
           | Most companies maintain internal calculations of these sorts
           | of things, and make rational decisions.
        
             | gregsadetsky wrote:
             | Sorry for the tangent, but really curious to ask:
             | 
             | When you say that companies maintain internal calculations
             | of the benefits, would you say that it's (extremely
             | roughly) something like: $10M benefit, need 5 core
             | engineers + benefits + PM + testing lab etc etc -> we can
             | spend up to $500k per eng give or take.
             | 
             | Or is the $10M one number (that would be held somewhat
             | secretly internally at the company) and the salaries mostly
             | represents where the market is? Does the (salary) market
             | take into account the down-the-line $10M value?
             | 
             | Basically, could those engs negotiate to be paid more, or
             | are they already sort of paid close to exactly what the
             | group they're part of generates in terms of revenue?
             | 
             | Thanks!
             | 
             | --
             | 
             | I see that you said $10M per person, not for the "network
             | optimization group". Hmm. So it would be fair to say that
             | the engs are definitely not paid according to the value
             | they generate..? I wouldn't be surprised by that but just
             | to confirm.
        
               | pclmulqdq wrote:
               | The simple fact is that you are not paid for the value
               | you create. You are paid based on the salary you can
               | demand. For performance engineers, $10
               | million/year/person opportunities are kind of rare,
               | meaning that you can't demand close to that. Your
               | alternatives to big tech are things like wall street,
               | which pay very well, so you can demand a higher salary
               | (and/or higher level) than a normal engineer of your
               | skill would get. However, this is nowhere near the value
               | of the work.
        
               | Hermitian909 wrote:
               | Not OP, but 1 engineer -> 10M of benefit sounds right for
               | my company.
               | 
               | In terms of negotiation, it really depends on how
               | differentiated your skills are. Short answer is that if
               | you can convince management that it would be difficult to
               | find other engineers who could deliver the optimizations
               | you're delivering, yes, you have leverage.
        
               | pclmulqdq wrote:
               | This is exactly right about negotiation and your
               | skillset. I have seen performance engineers in the right
               | place at the right time get 10-20% of their benefit to
               | the company (I have seen both $1 million/year
               | compensation for line workers and $10+ million/year for
               | very senior folks).
               | 
               | Very highly skilled engineers in specific niches can
               | basically price themselves like monopolists, because the
               | company can easily figure out how much money they are
               | leaving on the table by not hiring them. This is not like
               | "feature work" engineers, whose value is very nebulous
               | and unknown.
        
               | donavanm wrote:
               | If you are an employee there is little to no relationship
               | between your output and your compensation. Employer
               | employee relationships are based on the _cost to the
               | employer_ to secure equivalent or better output.
               | 
               | Secondly, yes $10M per employee of revenue or cash flow
               | is pretty reasonable for similar companies. The
               | prioritization is NOT "how many employees per $MM." The
               | allocation is "what opportunity is the highest $MM return
               | per available employee."
        
           | toast0 wrote:
           | > Would netflix benefit from just throwing more hardware at
           | the problem vs paying more engineers 400-500k/y to optimize?
           | 
           | Where the CDN boxes go, you can't always just throw more
           | hardware. There's a limited amount of space, it's not
           | controlled by Netflix, and other people want to throw
           | hardware into that same space. Pushing 800gbps in the same
           | amount of space that others do 80gbps (or less) is a big
           | deal.
        
             | [deleted]
        
           | slillibri wrote:
           | Disney bought a majority ownership in BAMTECH to build
           | Disney+.
        
           | entropie wrote:
           | I wasn't able to watch disney+ via chromecast for like a year
           | in 4k. Stuttering every 10 seconds or so. I never had
           | problems like this with netflix.
        
             | criddell wrote:
             | I guess you weren't a Comcast customer in 2014 trying to
             | watch Netflix and getting low quality, stuttering video. At
             | the time lots of people tried to frame it as a net
             | neutrality issue but in the end I think it was a peering
             | dispute that involved a third party.
             | 
             | https://www.wsj.com/articles/SB1000142405270230483470457940
             | 1...
        
               | loopercal wrote:
               | I think this just validates their points. Netflix has
               | more engineers and 8 years of them building and fixing
               | things, so they have fewer issues.
        
         | rakoo wrote:
         | Sure, if you place yourself in an arbitrarily hard problem, it
         | takes a lot to solve it. "How we dug a 100m pit without using
         | machines in 2 days" is an incredible feat, but the constraints
         | only serve those who put them.
         | 
         | Serving large content has been solved for decades already. It's
         | much easier and reliable to serve from multiple sources, each
         | at their maximum speed. Want more speed ? Add another source.
         | Any client can be a source.
         | 
         | Netflix artificially restrains itself by only serving from
         | their machines. It is a very nice engineering feat, but is
         | completely artificial. As a user it feels weird to think of
         | them highly when they could just have gone the easier road.
        
           | zinclozenge wrote:
           | How would you do it if you had much more modest scale
           | requirements? Say a few thousand simultaneous viewers. I'm
           | kicking around an idea for a niche content video streaming
           | service, but I don't know much about the tech stacks for it.
        
             | vagrantJin wrote:
             | A few thousand?
             | 
             | Just use Nginx and a backend lang of your choosing.
        
               | zinclozenge wrote:
               | Not even bother using a cdn?
        
               | ev1 wrote:
               | For low-traffic niche content that might not be a cache
               | hit in the first place in every region?
               | 
               | I wouldn't bother. Unless you use storage at the CDN -
               | which is probably very not cost effective for you.
        
             | rakoo wrote:
             | Use bittorrent. Every viewer is also a source. The more
             | people watch, the less your servers are loaded.
             | 
             | Bittorrent is built towards "offline" viewing. Try Peertube
             | for a stack that is more built for streaming and has
             | bittorrent sharing built-in (actually webtorrent, because
             | the browser doesn't speak raw TCP or UDP, but the idea is
             | the same)
        
           | jedberg wrote:
           | The constraint is profit. Sure, with unlimited money you can
           | just keep getting more and more servers. But that costs
           | money. It would end up swamping any profit to be made.
           | 
           | By creating this optimized system, it makes serving that much
           | video _profitable_.
        
             | rakoo wrote:
             | No, the constraints is _only you serve content_. But once
             | the content is distributed, anyone else can also distribute
             | it.
        
               | jedberg wrote:
               | I'm curious as to who you think would pay for the video
               | if anyone could distribute it and watch it.
        
               | yibg wrote:
               | And break the profit and also probably legal constraints.
               | Good job now you don't have a company anymore.
        
           | dmikalova wrote:
           | This just isn't true though. I worked at a relatively minor
           | video streaming company and we overloaded and took down AWS
           | CloudFront for an entire region. They refused to work with us
           | or increase capacity because the datacenter (one of the ones
           | in APac) was already full. This was on top of already
           | spreading the load across 3 regions. We only had a few
           | million viewers.
           | 
           | We ended up switching to Fastly for CDN. There's something
           | hidden here though that becomes a problem at Netflix size. We
           | were willing to pay the cloud provider tax, and we didn't dig
           | down into kernel level or storage optimizations because off
           | the shelf was good enough. At Netflixes scale, that adds up
           | to millions of extra server hours you have to pay for if you
           | don't do the 5% optimizations outlined in the article.
        
             | rakoo wrote:
             | You still have the same constraints: only you can serve
             | content.
             | 
             | The solution I'm talking about is bittorrent. The more
             | people watch your content, the less your servers bear load.
             | That is using the internet to its best potential instead of
             | reverting back to the centralized model of the big shopping
             | mall and its individual users.
        
         | rvnx wrote:
         | I think nobody said Netflix' infrastructure can be built in a
         | weekend. However, the scale doesn't matter that much after a
         | certain point once the scaling "wall" has been pierced. If you
         | are a biscuit factory that produces 100'000'000 biscuits per
         | year or 500'000'000 biscuit per year then the gap between 100M
         | and 500M isn't that impressive so much anymore as it's mostly
         | about scaling existing processes. However, if you turn a 1'000
         | biscuit shop into a 1'000'000 biscuits company then it's very
         | impressive.
        
           | bmurphy1976 wrote:
           | Nonsense.
           | 
           | It's still impressive. A 5x increase at that scale can be a
           | phenomenal challenge. Where do you source the ingredients?
           | Where do you build the factories (plural because at that
           | scale you almost certainly have multiple locations in
           | different geographic locales subject to different regulatory
           | structures). Where do you hire the people? How do you manage
           | it? What about the storage and shipping and maintenance of
           | all the equipment and on and on? How much do you do in house
           | how much do you outsource to partners? What happens when a
           | partner goes belly up or can't meet your ever increasing
           | needs?
           | 
           | Your comment is a great example of what the OP pointed out.
        
             | jon-wood wrote:
             | My favourite example of these sort of extreme scaling
             | issues are the fact that McDonald's apparently declined to
             | sell products with blueberries in them because modelling
             | showed they'd have to buy the world's entire supply of
             | blueberries in order to do so.
        
               | notamy wrote:
               | I thought this was hyperbolic, so I looked into it:
               | 
               | > _The menu team comes up with interesting ideas like
               | including kale in salads. The procurement team and
               | suppliers then try to get the menu team to understand the
               | challenges. How do you bring kale to 14,000 restaurants?
               | As one example, when they introduced Blueberry Smoothies
               | in the U.S., McDonald's ended up consuming one third of
               | the blueberry market overnight._
               | 
               | https://www.forbes.com/sites/stevebanker/2015/10/14/mcdon
               | ald...
               | 
               | I couldn't find any other source to back it up, but still
               | wow! That's an absurd number.
        
               | menzoic wrote:
               | McDonald's sells blueberry muffins
        
               | indigodaddy wrote:
               | So there is an extreme dearth of blueberries I guess
               | compared to other food goods? I mean, McDs isn't taking
               | over the entire supply of potatoes or chickens for
               | example correct?
        
               | zaroth wrote:
               | I think the point is that the supply chains probably need
               | upwards of years of time to adapt in some cases, you
               | can't just turn on a recipe that needs a full cup of
               | blueberries per serving on Monday and expect there to be
               | a spare million cups of blueberries to be lying around
               | the supply chain on Tuesday.
               | 
               | In the case in animal product, there are almost certainly
               | major operations worldwide that have been built and
               | financed purely to serve McDonalds demand. They probably
               | have to even build these out well before entering some
               | markets.
        
               | UncleEntity wrote:
               | They grow a lot of potatoes in the US. Last week I hauled
               | a load of tater tots destined for McDonalds. I've hauled
               | potato products for McDonalds quite often.
               | 
               | They raise a lot of chickens in the US. I've hauled
               | chicken nuggets or chicken breasts for McDonald's in the
               | past quite often.
               | 
               | I can't even tell you where they grow blueberries.
        
           | belinder wrote:
           | It's a good point, and I think it's an interesting
           | comparison. Obviously improving by a factor of 1000 is better
           | than improving by a factor of 5. But the absolute improvement
           | is still 4 times larger. 400'000'000 extra biscuits is going
           | to bring a lot more revenue than 999'000 biscuits
        
           | paxys wrote:
           | It's the exact opposite.
           | 
           | Taking the software example, you can easily scale from 1 to
           | 100 users on your own machine. You can handle thousands by
           | moving to a shared host. Using off-the-shelf web servers and
           | load balancers will help you serve a million+. From there on
           | you'll have to spend a lot more effort optimizing and fixing
           | bottlenecks to get to tens, maybe hundreds of millions. What
           | if you want to handle a billion users? Five billion? Ten
           | billion? It always gets harder, not easier.
           | 
           | Pushing the established limits of a problem takes
           | exponentially more effort than reusing existing solutions,
           | even though the marginal improvement may be a lot smaller.
           | Getting from 99.9% to 99.99% efficiency takes _more_ effort
           | than getting from 90% to 99%, which takes more effort than
           | getting from 50% to 90%.
           | 
           | You never pierce the scaling wall. It only keeps getting
           | higher.
        
             | xuhu wrote:
             | If you can serve 1K users with 10 employees, you can
             | probably serve 1M users with 10k employees.
        
               | kaba0 wrote:
               | And you can birth one baby in 3 months by 3 women, right?
               | 
               | To add something useful as well besides snark, first of
               | all, there are hard physical limits, which are sometimes
               | well within context (you really shouldn't try to
               | outcompete light speed for example, relevant in some
               | high-freq trading, infrastructure projects). Then you can
               | try to increase headcount to any number, you won't
               | produce for example a better compiler. There are simply
               | jobs that are more "serial" - the only way to win at
               | those is to try to employ the very best of the field in a
               | small team.
        
               | xuhu wrote:
               | No, just 3 babies in 9 months.
        
               | sllabres wrote:
               | That won't help your customer expecting their 'baby'
               | after three month due to the increased mother-workforce
               | ;)
        
               | kaba0 wrote:
               | You can deliver DVDs to netflix subscribers as well to
               | achieve a much bigger throughput, but I doubt they would
               | be as popular as they are right now :D
        
               | beckingz wrote:
               | Sneakernet!
               | 
               | "Never underestimate the bandwidth of a station wagon
               | full of tapes hurtling down the highway." -Andrew
               | Tannenbaum
        
               | bmurphy1976 wrote:
               | That's too simplistic. What about the doctors and medical
               | facilities and other supporting infrastructure? What
               | about the baby food and medicine and clothing and
               | supplies and what about the people to take care of the
               | children? You think you can just keep throwing more women
               | at a hospital having babies to infinity and not have any
               | problems?
        
               | Dylan16807 wrote:
               | There's a limit on those things, but it might as well be
               | infinity when you're trying to have 1 baby or 3 babies or
               | 100 babies.
               | 
               | 1M users and 10k employees is not in the range where you
               | have crushingly impactful logistics.
        
               | Dylan16807 wrote:
               | But the goal wasn't better or faster. It was giving more
               | customers the same service. You're talking about a
               | completely different problem.
        
               | beckingz wrote:
               | Remember that global productivity usually does not scale
               | with headcount!
               | 
               | Each employee adds some overhead, which requires more
               | employees... which requires more employees.
        
               | Mavvie wrote:
               | Sounds like the rocket equation! Perhaps big companies
               | are rocket science?
        
           | loopercal wrote:
           | If you told McDonald's to double their number of McRibs
           | produced next year that would be an incredible challenge to
           | meet. They already sell enough that it affects the global
           | pork market, it'd be insane for them to double their demand
           | for pork. What about other supplies, would this result in a
           | reduced burger demand? How can they ensure they can respond
           | appropriately either way? They probably run near
           | fridge/storage capacity, does increasing this mean they need
           | to also increase storage at restaurants?
           | 
           | That's a 2X increase. Now do it again and a half for a 5x.
           | Crazy to say there's a "scaling wall" that once you "pierce"
           | it's easy to scale up. It's the opposite, McDonald's already
           | knows how to supply and sell X McRibs a year, there's no
           | company that's ever sold 5X those McRibs so they have to
           | figure it out themselves.
        
             | rkagerer wrote:
             | There's an old rule of thumb that each order of magnitude
             | increase (10x) brings a whole new set of challenges.
             | 
             | Anecdotally I experienced this when scaling my software
             | product from 1 --> 10 --> 100's --> 1000's etc. of users.
             | 
             | Thats not to say 2x can't be a substantial challenge, as
             | you pointed out. It gets harder (and IMO more fun) when
             | you're at the bleeding edge of your industry.
        
           | bombcar wrote:
           | Part of it depends on if "build it five more times, again" is
           | a viable strategy.
           | 
           | Building five "Netflixes" with identical content is possible;
           | the amount of content wouldn't change (it would decrease, the
           | cynic says); you just need parallel copies of everything
           | (servers, bandwidth, etc).
           | 
           | The fun would come in syncing usernames, etc through the
           | system.
           | 
           | It's an entirely different class of problem compared to
           | "acquire resource, convert it, sell it".
        
           | zeroxfe wrote:
           | > the gap between 100M and 500M isn't that impressive
           | 
           | This is absolutely not true. The closer you are to peak
           | performance, the harder it is to scale, and the returns
           | diminish heavily. At many major tech companies, there's a
           | huge amount of effort into just 1% - 5% optimizations --
           | these efforts really require creative thinking and complex
           | engineering (not just "scaling existing processes".) At the
           | volumes these companies operate, even a 1% optimization is
           | quite significant.
        
             | carlhjerpe wrote:
             | Aren't you contradicting yourself?
             | 
             | If you're on 100M users you're probably scaling vertically.
             | So adding 5x more hardware shouldn't be a problem.
             | 
             | But when you're at 500M all of a sudden it makes sense to
             | optimize further since the capital saved will be the same
             | percentage(ish) but the money is worth peoples time all of
             | a sudden.
             | 
             | I know that we don't care particularly about power savings
             | in the DCs I've worked in, because they're relatively
             | small. While bigtech will do all kinds of shenanigans to
             | save a couple watts here and there, because it's worth it
             | across your hundreds of thousands of servers.
        
               | beoberha wrote:
               | Seeing scale issues as purely hardware bound is
               | incredibly naive. Even in a case like streaming, if
               | you're pushing more bits through the wire, it's likely
               | the increase in usage causing the traffic increase
               | affects the software systems you have in place to support
               | your service start degrading and you need to rearchitect
               | them. Very few problems at that scale can be solved by
               | throwing more hardware at them.
        
         | Quarrelsome wrote:
         | > why does Netflix need X thousand engineers, I could build it
         | in a weekend
         | 
         | I would like to hope nobody asks that. Video is the one of the,
         | if the not the hardest data plumbing use-case on the internet.
        
           | dragontamer wrote:
           | I'd say realtime communications is harder.
           | 
           | A lot of these tricks being discussed here cannot be applied
           | to Skype calls.
        
             | OJFord wrote:
             | Surely GP would agree, unless you mean even audio-only
             | calls? Otherwise it's just an extra requirement(s) on top
             | of 'video'.
        
               | dragontamer wrote:
               | The amount of transcoding needed to get a conference call
               | up hurts my brain. If 20 people are talking on Skype, the
               | server needs to receive those 20 streams, decode them,
               | mix the audio together, recode the streams, and then
               | broadcast it back out to all 20 people.
               | 
               | I'm not a telecommunications guy, but I had some
               | professors back in college explain how difficult and
               | fundamental the research of "ma-bell" was from the 60s
               | through 80s. I'm talking Erlang, C++, CLOS circuits, etc.
               | etc. The innovations from Bell Labs are nearly endless.
               | 
               | Telephone communications is one of the biggest sources of
               | fundamental comp. sci research over the 1950s through
               | 2000s.
        
               | nordsieck wrote:
               | > I'm talking Erlang, C++, CLOS circuits, etc. etc. The
               | innovations from Bell Labs are nearly endless.
               | 
               | A lot of innovations did come out of Bell Labs. But I'm
               | pretty sure Erlang wasn't one of them.
        
               | dragontamer wrote:
               | Oh, you're right. There seems to have been a glitch in my
               | memory somehow. Still, its Ericsson, which is
               | telecommunications nonetheless.
        
               | eru wrote:
               | > Otherwise it's just an extra requirement(s) on top of
               | 'video'.
               | 
               | I'm not sure what you mean? Real time communication, both
               | video and audio-only, have much lower latency
               | requirements. You can't just buffer ahead when you have
               | some spare bandwidth, like Netflix or YouTube can.
        
               | OJFord wrote:
               | Yeah that's what I'm kind of facetiously calling 'just an
               | extra requirement'.
               | 
               | My point was intended to be that there's the same
               | challenges and more - but it's not something I've thought
               | about in depth (and certainly not had to work with), it
               | maybe wasn't a very good characterisation because it's
               | not the same on the other side either, no large file to
               | serve because at the start of the call it doesn't exist
               | yet for example, so perhaps I take it back.
        
               | eru wrote:
               | Yeah, jon-wood really did a great write-up of the
               | challenges involved.
               | 
               | In any case, it's hard to say what the 'greatest'
               | engineering challenge is. You can make almost any kind of
               | engineering really challenging, if you (or the market..)
               | sets yourself a very low cost-ceiling.
        
               | bombcar wrote:
               | Video calling is "easier" in that the p2p option is
               | workable for some variant of "Works".
               | 
               | It is _much much harder_ because you can 't do the "cache
               | everything on the edge" solution. If storage was
               | infinitely cheap and small, Netflix could run their
               | entire business by sending a USB stick with _every single
               | movie /tv show they have_ on it encrypted to you, and
               | everything would play locally. This is basically what
               | they do with their edge servers/CDNs.
               | 
               | You can't do that with video calls, because the
               | video/audio didn't exist 1 millisecond ago.
        
               | jon-wood wrote:
               | Streaming pre-recorded video and streaming realtime video
               | are almost entirely different use cases.
               | 
               | Pre-recorded video streaming is, under the hood, really
               | just a high-volume variant of serving up static web
               | pages. You have a few gigabytes of file to send from the
               | server its stored on to the device that wants to play
               | back the video. As this presentation demonstrates that
               | isn't trivial at scale, but the core functionality of
               | sending files over the internet is what it was designed
               | to do from day one. Because you can generally download
               | video across the internet faster than it can be played
               | back its possible to build up a decent sized buffer which
               | allows you to paper over temporary variance in network
               | performance without the customer noticing.
               | 
               | Realtime video streaming has two variants. One to many
               | Twitch style video streaming is relatively simple, since
               | you can encode video into files and upload them to a
               | server for people watching to download those files. This
               | is how HLS streaming works, and most of the techniques
               | Netflix use to optimise video delivery can also be
               | applied here at the cost of adding latency between the
               | event being streamed and people consuming it. That
               | latency will often sit at about 30 seconds, and people
               | generally find that acceptable.
               | 
               | Skype style realtime video streaming is much harder.
               | You're taking video from one person's camera, and then
               | sending it over the internet to one or more people's
               | device. You can't do any sort of pre-processing on that,
               | or stage the video on servers closer to the consuming
               | users, because you have no way of generating that video
               | until the point your users decide to start talking to
               | each other. Because you can't pre-stage that video you
               | need to be able to establish a network route between the
               | people on a call, potentially in an environment where
               | none of the participants have any open connection from
               | the internet directly to the device they're streaming
               | from. Slight fluctuations in network performance can
               | potentially degrade video delivery to the point of it
               | being unusable. The most common route to deal with that
               | is systems that attempt to establish a direct connection
               | (ideally over a local network) between participants, and
               | if that doesn't work going via relay servers operated by
               | the software provider. These servers provide a single
               | point on the internet all parties can connect to, and
               | then allow passing packets as if they were all on the
               | same network.
        
         | jdyyc wrote:
         | I work on a very technically trivial service at a large
         | company.
         | 
         | It's the kind of thing that people run at home on a raspberry
         | pi, docker container or linux server and it consumes almost no
         | resources.
         | 
         | But at our organization this needs to scale up to millions of
         | users in an extremely reliable way. It turns out this is
         | incredibly hard and expensive and takes a team of people and a
         | bucket of money to pull it off correctly.
         | 
         | When I tell people what I work on they only think about their
         | tiny implementation of it, not the difficulty of doing it at an
         | extreme scale.
        
       | RektBoy wrote:
        
       | csmpltn wrote:
       | At this point, they should've just gone for an in-house bare-
       | bones operating system that supports the bare minimum: reading
       | chunks from disk, encrypting them, and forwarding them to the
       | NIC.
       | 
       | Besides that, it sees like all of the heavy lifting here is done
       | by Mellanox hardware...
        
         | wmf wrote:
         | FreeBSD is their "in-house" operating system since they modify
         | it to do what they want.
        
           | csmpltn wrote:
           | But do they really need an entire operating system for what
           | amounts to simply copying around chunks of data? I think they
           | could've gone for some slim RTOS-ish solution instead: no
           | user-mode, no drivers, bare minimum.
        
             | wmf wrote:
             | They're using the FreeBSD filesystem and network stack,
             | both of which are significant amounts of code. I guess they
             | could have tried the rump kernel concept but it sounds like
             | a lot of work.
        
             | drewg123 wrote:
             | I worked on an OS like that once. The problem is with "all
             | the other stuff" that you need to support that's outside
             | the core mission of your OS. You wind up bogged down on
             | each additional feature that you need to implement from
             | scratch (or port from another OS with a compatible
             | license). With FreeBSD, all this comes for free.
             | 
             | We chose to use FreeBSD, and have contributed our code back
             | to the FreeBSD upstream to make the world a better place
             | for everyone.
        
         | drewg123 wrote:
         | Its only doing the crypto. The VM system and the TCP stack are
         | doing most of the heavy lifting, and are both stock FreeBSD.
        
       | yrgulation wrote:
       | This is innovation and proper engineering. They choose freebsd.
       | Shows they are not afraid of solving actual hard problems that
       | yield impressive results. These are the types of engineers i'd
       | hire in a heart beat - if i ever was to own a successful company.
       | 
       | Simply following trends and doing what everyone else does leads
       | to mediocre results and the assembly line type of work that most
       | software development has become.
        
       | faizshah wrote:
       | I have a bit of a naive question. If TLS has this much overhead
       | how do HFT and other finance firms secure their connections?
       | 
       | I know they use a number of techniques like kernel bypass to get
       | the lowest latency possible, but maybe they have explored some
       | solution to this problem as well.
        
         | shiftpgdn wrote:
         | Mellanox cards or private links
        
         | wyager wrote:
         | TLS doesn't really add latency on top of TCP after you make the
         | initial connection - it mostly adds a bit of extra processing
         | overhead for encryption. HFT firms aren't usually encryption-
         | bandwidth-constrained. I'm not actually sure if most exchange
         | FIX connections or whatever actually run over TLS, but that
         | would be reasonable.
        
         | drewg123 wrote:
         | TLS has the highest overhead when you're serving data at rest,
         | like static files that are not already in the CPU cache. For
         | serving dynamic data that is in the CPU cache, TLS offload
         | matters a lot less. Our workload is basically the showcase for
         | TLS offload.
        
           | naikrovek wrote:
           | i love (love) how everyone else who answered this question
           | alongside you made what appears to be a complete stab in the
           | dark guess while only you knew the answer.
           | 
           | never be afraid to admit that you don't know something.
           | guessing wrong is a much worse look than not answering at
           | all.
        
         | ddmitriev wrote:
         | Trading connections that go over private links such as cross-
         | connects between the firm's and the exchange's equipment within
         | a colocation facility are not encrypted.
        
         | AtlasBarfed wrote:
         | HFT needs to be outlawed.
         | 
         | No exchange should allow trades to complete, I would argue in
         | any time less than 15 minutes, and each trade should have a
         | random 1-15 minute delay pad on top of that.
         | 
         | The HFT access only serve the larger financial firms, and are
         | used to do frontloading and other basically-illegal tricks. It
         | provides anti-competitive advantage to large firms for markets
         | that are supposed to open access/fair trading. And of course it
         | leads to AI autotrading madness.
         | 
         | I get that it keeps a lot of tech people very well compensated,
         | but it is either in the service of unregulated fraud at worst
         | and unfair advantage at best.
        
           | jonahhorowitz wrote:
           | Not really germane to the topic, but a financial transactions
           | tax would effectively kill HFT without the complexity that
           | you're suggesting.
        
       | [deleted]
        
       | theideaofcoffee wrote:
       | This is a remarkable technical achievement that builds on all of
       | its past work, as are the other updates from Netflix in the past
       | with serving ever more traffic from a single box. That said, I
       | still find it terrifying that so many users would be affected by
       | a single machine going down, that blast radius is so huge!
       | 
       | Do we know if the rates that these hosts serve actually make it
       | into production? Or do they derate the amount they serve from a
       | single host and add others?
        
         | lanstin wrote:
         | I think they buffer and if the stream has issues the client
         | connects to another host. They have been doing chaos monkey for
         | a long long time.
        
         | drewg123 wrote:
         | As I said in a parallel comment, this is a testbed platform to
         | see what problems we'll encounter running at these speeds.
         | Production hosts are single socket, and can run at roughly 1/2
         | this speed.
         | 
         | I regret that I've crashed boxes doing hundreds of Gb/s.
         | Thankfully our stack is resilient enough that customers didn't
         | notice.
        
       | qwertox wrote:
       | Discussion of the same presentation 11 months ago, when the title
       | was 400GB/s.
       | 
       | https://news.ycombinator.com/item?id=28584738
       | 
       | This was the video which was posted back then alongside the
       | slides: https://www.youtube.com/watch?v=_o-HcG8QxPc
        
       | ndom91 wrote:
       | Video of this presentation available here:
       | https://cdnapisec.kaltura.com/index.php/extwidget/preview/pa...
        
         | forgot_old_user wrote:
         | thank you!
        
       | haunter wrote:
       | As a total outsider it looks like FreeBSD is the "silent" OS
       | behind lot of the big money projects. Not just this but recently
       | learned it's the base OS of the Playstation 4 and 5 system too.
       | Is there a reason why FreeBSD is so popular? Just general
       | reliability? And why not the other BSD projects? Also one, like
       | me, would assume Linux is behind all of these but alas not.
        
         | keewee7 wrote:
         | The sysadmin experience on FreeBSD used to be more opinionated
         | than on Linux. This was before most Linux distros adopted
         | systemd.
         | 
         | The reason companies like Sony and Apple pick FreeBSD is
         | because they get an open source POSIX-compliant OS they can
         | drastically modify down to the kernel level without having to
         | open source their modifications.
        
         | Thev00d00 wrote:
         | Sony used it because they got an entire OS for free with no
         | obligation to release the source.
        
         | naikrovek wrote:
         | Blame the GPL for this. The GPL is directly responsible for the
         | livelihood of many/all BSD developers, and i could not be
         | happier about that. Linux is overrated in a lot of ways.
        
           | robocat wrote:
           | However the GPL is irrelevant in this case because the
           | NetFlix Open Connect appliance is not sold to ISPs. The GPL
           | is only relevant if you are distributing GPL software e.g.
           | Sony PlayStation.
        
       | trunnell wrote:
       | The OpenConnect team at Netflix is truly amazing and lots of fun
       | to work with. My team at Netflix partnered closely with them for
       | many years.
       | 
       | Incidentally, I saw some of their job posts yesterday. If you
       | think this presentation was cool, and you want to work with some
       | competent yet humble colleagues, check these out:
       | 
       | CDN Site Reliability Engineer
       | https://jobs.netflix.com/jobs/223403454
       | 
       | Senior Software Engineer - Low Latency Transport Design
       | https://jobs.netflix.com/jobs/196504134
       | 
       | The client side team is hiring, too! (This is my old team.)
       | Again, it's full of amazing people, fascinating problems, and
       | huge impact:
       | 
       | Senior Software Engineer, Streaming Algorithms
       | https://jobs.netflix.com/jobs/224538050
       | 
       | That last job post has a link to another very deep-dive tech talk
       | showing the client side perspective.
        
       | bagels wrote:
       | Slide says "we don't transcode on the server"
       | 
       | Surely they transcode on some server? Maybe they just mean they
       | don't do it on the same server that is serving bits to customers?
        
         | naikrovek wrote:
         | It seemed clear to me: they don't transcode on the server that
         | is sending data to the viewer. Transcoding is done once per
         | piece of media and target format combination, instead of on the
         | fly as it is viewed.
        
       | a-dub wrote:
       | i haven't looked yet but i'm going to guess: edge caching running
       | on custom hardware with smart predictions and congestion control
       | algorithms for determining what gets cached where and when.
        
       | paxys wrote:
       | Does anyone know where these servers are hosted? Certainly not
       | AWS I imagine?
        
         | kkielhofner wrote:
         | As close to the eyeballs as possible. With OpenConnect[0] they
         | are located in ISP facilities and/or carrier-neutral facilities
         | with access to a peering fabric (kind of the same thing as the
         | OpenConnect Appliance is "hosted" by the ISP).
         | 
         | It's a win-win. ISPs don't have to use their peering and/or
         | transit bandwidth to upstream peers and users get a much better
         | experience with lower latency, higher reliability, less
         | opportunity for packet loss, etc.
         | 
         | [0] - https://openconnect.netflix.com/en/
        
       | amelius wrote:
       | How many customers does that serve?
        
         | Aissen wrote:
         | At 15Mb/s for a start-quality 4k stream (5 times higher than
         | the average ISP speed measured by Netflix), that serves 53k
         | simultaneous customers.
         | 
         | In the US, the fastest ISP for Netflix usage seems to be
         | Comcast (https://ispspeedindex.netflix.net/country/us ), with
         | an average speed of 3.6Mbps. That would serve an average of
         | 222k simultaneous customers on a single server.
        
           | samcrawford wrote:
           | That 15Mb/s figure for 4K is out of date by a couple of
           | years. They previously targeted a fixed average bitrate of
           | 15.6Mb/s. They now target a quality level, as scored by VMAF.
           | This makes their average bitrate for 4K variable, but they
           | say it has an upper bound of about 8Mb/s. See
           | https://netflixtechblog.com/optimized-shot-based-encodes-
           | for...
        
             | Aissen wrote:
             | Yep, that's correct. It looks like Netflix forgot to update
             | their support pages for this:
             | https://help.netflix.com/en/node/306 .
        
           | umanwizard wrote:
           | What does start-quality mean?
        
             | Aissen wrote:
             | Not much, see sibling comment. It used to be the minimum
             | quality for enjoyable 4k. (4k Blu-Ray discs have much
             | higher bitrates with HEVC). But since, Netflix heavily
             | optimized their encoding, greatly reducing the bandwidth
             | needs.
        
             | danielheath wrote:
             | Video formats require more data for the first frame of each
             | scene - subsequent frames can be encoded as transformations
             | of the previous frame.
        
         | [deleted]
        
         | ksec wrote:
         | That depends on the content's bitrate. Netflix serves their
         | video with bitrate anywhere from 2 - 18Mbps. Say if average
         | were 10Mbps, that is roughly 80K customer per box.
        
       | daper wrote:
       | I have some experience serving static content and working with
       | CDNs. Here is what I find interesting / unique here:
       | 
       | - They are not using OS page cache or any memory caching for
       | that, every request is served directly from disks. This seems
       | possible only when requests are spread between may NVMe disks
       | since single high-end NVMe like Micron 9300 PRO has max 3.5GB/s
       | read speed (or 28Gbps) - far less than 800Gbps. Looks like it
       | works ok for long-tail content but what about new hot content
       | everybody wants to watch at the day of release? Do they spread
       | the same content over multiple disks for this purpose?
       | 
       | - Async I/O resolves issues with nginx process stalling because
       | of disk read operation but only after you've already opened the
       | file. Depending on FS / number of files / other FS activities,
       | directory structure opening the file can block for significant
       | time and there is no async open() AFAIK. How they resolve that?
       | Are we assuming i-node cache contains all i-nodes and open() time
       | is insignificant? Or are they configuring nginx() with large open
       | file cache?
       | 
       | - TLS for streamed media was necessary because browsers started
       | to complain about non-TLS content. But that makes things sooo
       | complicated as we see in the presentation (kTLS is 50% of CPU
       | usage before moving to encryption offloaded by NIC). One has to
       | remember that the content is most probably already encrypted
       | (DRM), we just add another layer of encryption / authentication.
       | TLS for media segments make so little sens IMO.
       | 
       | - When you relay on encryption or TCP offloading by NIC you are
       | stuck with that is possible with your NIC. I guess no HTTP/3 over
       | UDP or fancy congestion control optimization in TCP until the
       | vendor somehow implements it in the hardware.
        
         | mgerdts wrote:
         | A Micron 9300 Pro is getting rather long in the tooth. They are
         | using PCIe gen 4 drives that are twice as fast as the Micron
         | 9300.
         | 
         | My own testing on single socket systems that look rather
         | similar to the ones they are using suggests it is much easier
         | to push many 100 Gbit interfaces to their maximum throughput
         | without caching. If your working set fits in cache, that may be
         | different. If you have a legit need for sixteen 14 TiB (15.36
         | TB) drives, you won't be able to fit that amount of RAM into
         | the system. (Edit: I saw a response saying they do use the
         | cache for the most popular content. They seem to explicitly
         | choose what goes into cache, not allowing a bunch of random
         | stuff to keep knocking the most important content out of cache.
         | That makes perfect sense and is not inconsistent with my
         | assertion that hoping a half TiB cache will do the right thing
         | with 224 TiB of content.)
         | 
         | TLS is probably also to keep the cable company from snooping on
         | the Netflix traffic, which would allow the cable company to
         | more effectively market rival products and services. If there's
         | a vulnerability in the decoders of encrypted media formats,
         | putting the content in TLS prevents a MITM from exploiting
         | that.
         | 
         | From the slides, you will see that they started working with
         | Mellanox on this in 2016 and got the first capable hardware in
         | 2020, with iterations since then. Maybe they see value in the
         | engineering relationship to get the HW acceleration that they
         | value into the hardware components they buy.
         | 
         | Disclaimer: I work for NVIDIA who bought Mellanox a while back.
         | I have no inside knowledge of the NVIDIA/Netflix relationship.
        
         | ShroudedNight wrote:
         | Just from reading the specs (I.E. real world details might
         | derail all of this):
         | 
         | https://www.freebsd.org/cgi/man.cgi?query=sendfile&sektion=2
         | 
         | Given one can specify arbitrary offsets for sendfile(), it's
         | not clear to me that there must be any kind of O(k > 1)
         | relationship between open() and sendfile() calls: As long as
         | you can map requested content to a sub-interval of a file, you
         | can co-mingle the catalogue into an arbitrarily small number of
         | files, or potentially even stream directly off raw block
         | devices.
        
         | drewg123 wrote:
         | Responding to a few points. We do indeed use the OS page cache.
         | The hottest files remain in cache and are not served from disk.
         | We manage what is cached in the page cache and what is directly
         | released using the SF_NOCACHE flag.
         | 
         | I believe our TLS initiative was started before browsers
         | started to complain, and was done to protect our customer's
         | privacy.
         | 
         | We have lots of fancy congestion optimizations in TCP. We
         | offload TLS to the NIC, *NOT* TCP.
        
           | daper wrote:
           | Can I ask if your whole content can be stored on a single
           | server so content is simply replicated everywhere or there is
           | some layer above that that directs requests to the specific
           | group of servers storing the requested content? I assume the
           | described machine is not just part of tiered cache setup
           | since I don't think nginx capable for complex caching
           | scenarios.
        
             | drewg123 wrote:
             | No, the entire catalog cannot fit on a single server.
             | 
             | There is a Netflix Tech Blog from a few years ago that
             | talks about this better than I could:
             | https://netflixtechblog.com/content-popularity-for-open-
             | conn...
        
         | eru wrote:
         | Does the encryption in DRM protect the metadata?
        
           | daper wrote:
           | AFAIK no. The point of DRM is to prevent recording / playing
           | the media on a device without decryption key (authorization).
           | So the goal is different than TLS that is used by the client
           | to ensure the content is authentic, unaltered during
           | transmission and not readable by a man-in-the-middle.
           | 
           | But do we really need such protection for a TV show?
           | 
           | "Metadata" in HLS / DASH is a separate HTTP request which can
           | be served over HTTPS if you wish. Then it can refer to media
           | segments served over HTTP (unless your browser / client
           | doesn't like "mixed content").
        
             | throw0101c wrote:
             | > _But do we really need such protection for a TV show?_
             | 
             | DRM may be mandated by the content owners. TLS gives
             | Netflix customers privacy against their ISP snooping what
             | they're watching.
        
             | sam0x17 wrote:
             | > But do we really need such protection for a TV show?
             | 
             | What you watch can be a very private thing, especially for
             | famous people.
        
           | nextgens wrote:
           | No, and it doesn't protect the privacy of the viewer either!
        
             | saurik wrote:
             | FWIW, neither does the TLS layer: because the video is all
             | chunked into fixed-time-length segments, each video causes
             | a unique signature of variable-byte-size segments, making
             | it possible to determine which Netflix movie someone is
             | watching based simply on their (encrypted) traffic pattern.
             | Someone built this for YouTube a while back and managed to
             | get it up to like 98% accuracy.
             | 
             | https://www.blackhat.com/docs/eu-16/materials/eu-16-Dubin-I
             | -...
             | 
             | https://americansforbgu.org/hackers-can-see-what-youtube-
             | vid...
        
               | nightpool wrote:
               | Did TLS 1.3 fix this with content length hiding? Doesn't
               | it add support for variable-length padding that could
               | prevent the attacker from measuring the plaintext content
               | length? Do any major servers support it?
        
       | drewg123 wrote:
       | Author here. AMA
        
         | sam0x17 wrote:
         | > Sendfile
         | 
         | Ah, so this is why everything stutters / falls apart when you
         | switch subtitles on or off -- it has to access a whole
         | different file and resume at the same place in that file I
         | assume? I would think you would want the (verbal) audio
         | separated out in a different file so it can be swapped out on
         | the fly without re-initializing the video stream, and same
         | thing with subtitle files? I'm just making some assumptions
         | based on the behavior I've seen but would be cool to know how
         | this works.
        
           | drewg123 wrote:
           | No, video and subtitles are separate files.
           | 
           | I've never seen this bad behavior myself. Do you mind sharing
           | the client you're using?
        
         | ManWith2Plans wrote:
         | Do you have a link to video or audio for this presentation? I'm
         | probably don't speak for just myself when I say I would love to
         | see it.
        
           | quux wrote:
           | Someone else linked the video here:
           | https://news.ycombinator.com/item?id=32520750
        
         | w10-1 wrote:
         | Thank you very much "drewg123"!
         | 
         | Future technology advances increasingly looks like this complex
         | work integrating hardware, OS fixes, team collaboration. People
         | and teams and companies working together, and contributing to
         | shared resources like FreeBSD. Tolerating mistakes at scale,
         | giving credit where credit is due, and all the other things
         | that make respect real, which creates a space to get things
         | done.
         | 
         | Most of us will never get close to these opportunities or
         | contexts, but still it helps us advance our own
         | technique/culture to observe and model your story. And perhaps
         | you'll help new collaborators find you. All the best.
        
         | Bluecobra wrote:
         | What kind of tuning is done in the BIOS? Is that profile
         | available to view to everyone? Are you using a custom BIOS from
         | Dell?
        
           | drewg123 wrote:
           | Not much tuning needed to be done. The little that was is
           | mentioned in the talk, and was basically to set NPS=1 and to
           | disable DLWM, in order to be able to access the full xgmi
           | interconnect bandwidth at all times, even when the CPU is not
           | heavily loaded.
        
         | nh2 wrote:
         | You mention AIO in nginx.
         | 
         | In 2021 somebody submitted a patch for io_uring support in
         | nginx:
         | 
         | https://mailman.nginx.org/pipermail/nginx-devel/2021-Februar...
         | 
         | I'm not sure if there has been further progress on it so far.
         | In one comment feedback is "it doesn't seem to make the typical
         | nginx use case much faster" [at that time].
         | 
         | But I find this interesting, because io_uring can make almost
         | all things async that can't be used async so far in Linux
         | (open(), stat(), etc) and thus in nginx.
         | 
         | Would io_uring integration in nginx be relevant for you?
        
         | [deleted]
        
         | throw0101c wrote:
         | You're using ConnectX-6 Dx here. Any technical reason for that
         | particular NIC, or just haven't gotten around to ConnectX-7s
         | yet?
         | 
         | Have you examined other NIC vendors? (Chelsio?)
        
           | drewg123 wrote:
           | This talk was given roughly 4 months ago. CX7 was not
           | available yet. I'm looking forward to testing on them when we
           | get some.
           | 
           | We looked at Chelsio (as T6 was available well before
           | CX6-DX). However, the CX6-DX offers a killer feature not
           | available on T6. The CX6-DX can remember the crypto state of
           | any in-order stream, while the T6 cannot. That means that the
           | TCP stack can send, say, 4K of a TLS record, wait for acks,
           | and come back 40ms later and send the next 4K _and DMA just
           | the requested 4K from the host_. The T6 cannot remember the
           | state, and would need to DMA the first 4K (which was already
           | sent) in order to re-establish the crypto state, and then DMA
           | the requested 4K. This could run the PCIe bus out of
           | bandwidth. The alternative is to make TCP always chunk sends
           | at the TLS record size, but this was horrible for streaming
           | quality.
        
         | phantomathkg wrote:
         | > Serve only static media files
         | 
         | This part I don't get. How about DRM? Unless Netflix pre-DRM
         | all contents for all user?
        
           | Bluecobra wrote:
           | I would think that media files would be already encrypted and
           | gets decrypted by the Netflix client. Otherwise the DRM could
           | easily be defeated by using something like Wireshark.
        
           | drewg123 wrote:
           | Yes, all our content is also DRMed. Else somebody could
           | easily pirate content..
        
             | onedr0p wrote:
             | To be fair, it already seems easily pirated. DRM is
             | useless, if content is able to be viewed on some personal
             | device it can be ripped and shared. I'd be curious how much
             | effort/money companies dump into adding DRM measures, it
             | seems like a lost cause. Maybe it just makes the execs
             | sleep better at night.
        
           | bri3d wrote:
           | Encrypting assets on the fly using a per-consumer symmetric
           | key would be prohibitively expensive, so I'm sure the media
           | is stored pre-encrypted using a shared symmetric key.
           | 
           | It only really matters that this key is unique per package,
           | not per user, because once even a single user can compromise
           | the trusted execution environment and extract either the key
           | or the plain video stream, that piece of content is now
           | pirated anyway. So, key reuse against the same content
           | probably isn't really a major part of the threat model - this
           | attacker could share the key with others, but they might as
           | well share the decrypted content instead.
        
         | sophacles wrote:
         | Every iteration of this prezzo I've seen over the years has
         | made for a fascinating morning read, thanks!
         | 
         | As much as I enjoy the results of the work, I'm always a bit
         | curious how the sausage is made. Is pushing the hardware limits
         | your primary job or something you do periodically? How do you
         | go about selecting the gear you use? How much do you work with
         | the vendors? (etc etc) I'd really enjoy a behind the scenes
         | blog post or something wrt this serving absurd amounts of
         | traffic from a single box.
        
           | drewg123 wrote:
           | My role is to make our CDN servers more efficient. One of the
           | easiest and most fun ways to do that is to push servers as
           | hard as I can and see what breaks and what doesn't scale. I
           | also work with our hardware team and their vendors to
           | evaluate new hardware and how it can fit into our system.
           | 
           | But I do plenty of other things as well, including fixing
           | random kernel bugs. You can read the git log of the FreeBSD
           | main branch to see some of the things I've been working on..
        
         | gopaz wrote:
         | What about the storage? Is it using Raid? Does blocksize
         | matter? What filsystem is used?
        
           | drewg123 wrote:
           | Every storage device is independent (no RAID), and runs UFS.
           | We use UFS because, unlike ZFS, it integrates directly with
           | the kernel page cache.
        
         | pyrolistical wrote:
         | When are you going to cut the CPU/main memory out completely?
         | 
         | The bottleneck is at your NIC anyways, so seems like there
         | would be a market for NIC that can directly read from disk into
         | NIC's working memory
        
           | drewg123 wrote:
           | We've looked at this. The problem is that NICs want to read
           | in TCP MSS size chunks (1448 bytes, for example), while
           | storage devices are highly optimized for block-aligned (4K)
           | chunks. So you need to buffer the storage reads someplace,
           | and for now the only practical answer is host memory. There
           | are NVME technologies that could help, but they are either
           | too small, or come at too large of a price premium. CXL
           | memory looks promising, but its not ready yet.
        
             | Matthias247 wrote:
             | Does it? I thought with segmentation offloads the NIC
             | basically gets TCP stream data in more or less arbitrary
             | sizes, and then segments in into MTU sizes on its own?
        
               | drewg123 wrote:
               | We do fairly sophisticated TCP pacing, which requires
               | sending down some small multiple of MSS to the NIC, so it
               | doesn't always have the freedom to pull 4K at a time.
        
             | [deleted]
        
         | bri3d wrote:
         | At what point does it make sense to replace the CPU and OS with
         | custom hardware and software? At this point the CPU is
         | basically doing TCP state maintenance and DMA supervision, but
         | not much else, right?
         | 
         | I totally get the cost, convenience, and supply chain risk-
         | value in commodity stuff that you can just go out and buy, but
         | once you're bound to a single network card, this advantage
         | starts to go away, and it seems like you're fighting with the
         | entire system topology when it comes to NUMA, no? Why not a
         | "TCP file send accelerator" instead of a whole computer?
        
           | wmf wrote:
           | I suppose you could attach NVMe drives directly to Bluefield
           | and cut out x86.
        
       | jalino23 wrote:
       | I was specifically looking for whats their tech stack for
       | playback? they pretty much have to use HLS for ios safari right?
       | where do those manifest server fit in? what about non ios browser
       | playback?
        
       | wly_cdgr wrote:
        
       | alpb wrote:
       | What's the benefit of going from 100Gb/s to 800Gb/s through
       | kernel/hardware optimizations as opposed to adding more machines
       | to meet the same throughput in this case? I'd be curious at what
       | point returns on the engineering effort is diminishing in this
       | problem.
        
         | seabrookmx wrote:
         | IIRC a lot of these boxes are deployed at actual ISP's so
         | they're closer to customers. I'd imagine the rackspace is
         | therefore limited and the more you can push from a single
         | machine, the better.
        
         | quotehelp1829 wrote:
         | I think it's quite obvious that instead of 8 machines you then
         | only need 1. This results in reduced costs for machinery,
         | storage (as each machine would have its own storage) and
         | probably power consumption too. Also, same room of servers can
         | throughput 8 times more content.
         | 
         | Edit: Whoops, apparently this tab has been open for four hours
         | and of course someone already had responded to you, lol.
        
       | wistlo wrote:
       | This could be an answer as to why Netflix comes up reliably when
       | all the other streaming services in my experience (Hulu, Disney,
       | HBO Max, Amazon Prime) can take many multiples of time to
       | initialize and deliver a stable stream.
        
         | drewg123 wrote:
         | To be honest, this has much more to do with Randall Stewart's
         | RACK TCP, and his team's obsession with improving our member's
         | QoE. Ironically, this costs a lot of CPU as compared to normal
         | TCP (since it is doing pacing, limiting TSO burst sizes, etc).
         | https://github.com/freebsd/freebsd-src/blob/main/sys/netinet...
        
         | OJFord wrote:
         | Of those I only have Prime, and really agree. It was never as
         | good I don't think, but lately in particular it's been _so_
         | slow to start (and then it 's an advert! It'll do it again for
         | the actual content once I click 'skip'!) and occasionally
         | pauses to buffer mid-stream.
         | 
         | I don't get that with Netflix, I've occasionally had it crash
         | out 'sorry this could not be played right now' (which is a
         | weird bug itself - because it always loads fast & fine when I
         | immediately press play on it again) but never such slow loading
         | or pausing.
        
       | _gabe_ wrote:
       | This is incredible. I really like how you're able to trace the
       | evolution of the systems as well.
       | 
       | It makes me wonder what the next hardware revolution will be. It
       | seems like most resource intense applications are bottlenecking
       | at transferring memory. UE5's nanite tech hinges on the ability
       | to transfer memory directly from disk to GPU, Netflix built
       | specific hardware to avoid copying memory between userspace and
       | hardware, and I wonder how much other performance we're missing
       | out on because we can't transfer memory fast enough.
       | 
       | How much faster could AI training be if we could get memory
       | directly from disk to the GPU and avoid the CPU orchestrating it
       | all? What about video streaming? I have a feeling these processes
       | already use some clever tricks to avoid unnecessary trips through
       | the CPU, but it will be interesting to see which direction
       | hardware goes with this in mind.
        
         | aftbit wrote:
         | This is definitely the direction that things are going. In the
         | GPU space, see things like GPUDirect[1]. In networking and
         | storage, especially for hyperscale stuff, see the rise of
         | DPUs[2] replacing CPUs.
         | 
         | 1: https://developer.nvidia.com/gpudirect
         | 
         | 2: https://www.servethehome.com/what-is-a-dpu-a-data-
         | processing...
        
       ___________________________________________________________________
       (page generated 2022-08-19 23:00 UTC)