[HN Gopher] Pingora, the proxy that connects Cloudflare to the I...
       ___________________________________________________________________
        
       Pingora, the proxy that connects Cloudflare to the Internet
        
       Author : HieronymusBosch
       Score  : 271 points
       Date   : 2022-09-14 13:11 UTC (9 hours ago)
        
 (HTM) web link (blog.cloudflare.com)
 (TXT) w3m dump (blog.cloudflare.com)
        
       | totallyunknown wrote:
       | We did the same. We've replaced nginx/lua with a cache server
       | (for video) written in Golang - now serving up to 100 Gbit/s per
       | node. It's more CPU and memory efficient and completely tailored
       | to our needs. We are happy that we moved away from nginx.
        
         | BonoboIO wrote:
         | Wow ... 100 Gbit/s. Where do you work? That's some serious
         | traffic.
        
           | totallyunknown wrote:
           | A german company building an app for watching linear TV.
           | Netflix is actually serving 400Gbit/s per node and already
           | have 800Gbit/s ready.
           | 
           | I think we can scale our setup up to 200 Gbit/s but we are
           | too small. Total traffic is ~2 Tbit/s.
           | 
           | Most challenging is the missing support of QUIC/http3 and
           | KTLS in Golang. Also 100G NIC supply chain is difficult. We
           | use NVIDIA Connect-X 6, but it's impossible to get a version
           | with TLS offloading.
        
             | lossolo wrote:
             | Interesting, do you do a lot of processing in Golang or
             | basically you just use it as a wrapper around sendfile[1] ?
             | 
             | 1. https://man7.org/linux/man-pages/man2/sendfile.2.html
        
             | BonoboIO wrote:
             | Es liegt mir auf der Zunge welche Firma das ist.
             | 
             | I think it starts with a Wa ... you don't have to say. I
             | kind of remember to have been stumbled on a Twitter
             | engineering ipv6 tweet. Maybe I m wrong.
             | 
             | For me it's impressive to get so much data through a
             | computer. But I have one question, what does count as a
             | node, is a node like 1 machine with dual sockets, a lot of
             | ram and a lot of nics or is it like multiple machines
             | combined that act as 1 node like a whole 19 inch rack.
        
           | trillic wrote:
           | 100 Gbit/s is only like 3000 concurrent viewers at 5000
           | KiB/s.
        
             | totallyunknown wrote:
             | 100 Gbit/s / 5000 KiB/s is 20000.
        
               | ImprobableTruth wrote:
               | KiB = Kibi _byte_ , not Kibi _bit_
        
             | mplewis wrote:
             | "Only" :)
        
       | marune wrote:
       | In the 3rd party section, no mention of HAProxy as a candidate,
       | any specific reason for that?
        
       | VWWHFSfQ wrote:
       | Should have waited to post this until it was actually ready to be
       | open sourced. Otherwise this is just kinda like "huh, neat"
       | without anything else to do with it.
        
         | qwertox wrote:
         | In some cases it can be enough to know that it could be worth
         | waiting for the release instead of putting more resources into
         | a stack you're currently using. You might replace it entirely
         | in a few months if the release turns out to be a product which
         | you can and want to switch to, so it's ok to get a heads-up.
        
           | nextaccountic wrote:
           | Unfortunately, without being able to run the code yourself or
           | at least seeing a benchmark, it's hard to commit to
           | unreleased code like this
        
       | nicoburns wrote:
       | > When crashes do occur an engineer needs to spend time to
       | diagnose how it happened and what caused it. Since Pingora's
       | inception we've served a few hundred trillion requests and have
       | yet to crash due to our service code.
       | 
       | > In fact, Pingora crashes are so rare we usually find unrelated
       | issues when we do encounter one. Recently we discovered a kernel
       | bug soon after our service started crashing. We've also
       | discovered hardware issues on a few machines, in the past ruling
       | out rare memory bugs caused by our software even after
       | significant debugging was nearly impossible.
       | 
       | That's quite the endorsement of Rust. A lot of people focus on
       | the fact that Rust can't absolutely guarantee freedom from
       | crashes and memory safety issues. Which I think misses the point
       | that this kind of experience of running high traffic Rust
       | services in production for months almost without a single issue
       | is _common_ in practice.
        
         | brink wrote:
         | I had the same experience when I wrote a camera capture /
         | motion detection / video logging service for a commercial smart
         | refrigerator in Rust. The Swift component crashed at least
         | twice a week, the Rust component ran for months and months
         | without issue.
        
         | drogus wrote:
         | I had a very similar experience. Much smaller scale, but the
         | service was keeping internal state and clients were connecting
         | with a WebSocket. It could handle up to a million clients on
         | one server and it practically never crashed. While I was
         | writing it I had only hobby-level experience with Rust and I
         | was also mentoring a colleague, so he wrote a big chunk of code
         | as a total Rust noob.
        
           | victor106 wrote:
           | Is this using Async Rust?
        
           | Aperocky wrote:
           | Curses! now I need to learn yet another language!
        
             | nicoburns wrote:
             | Luckily Rust is also fun :)
        
         | rkagerer wrote:
         | Which aspect(s) of Rust do you think are most responsible for
         | this? (e.g. borrow checker, memory safety, culture that
         | attracts devs who care about reliability, etc)
        
           | gpm wrote:
           | Not the person you're asking, but the culture around data
           | representation strikes me as the biggest factor:
           | 
           | 1. Only making valid states possible, to the greatest extent
           | reasonably possible.
           | 
           | 2. Treating error as regular data, not an afterthought - with
           | language features to make that not too painful.
           | 
           | 3. Not returning placeholder values (i.e. if you get back a
           | parsed value from a parse function, it means it parsed
           | correctly, not that either there's an error somewhere else
           | _or_ it parsed correctly).
           | 
           | Language features, in particular "enums" (aka algebraic data
           | types, aka tagged unions), make this approach possible. You
           | couldn't do it in go, for instance, even if there was a
           | cultural decision to.
        
           | pornel wrote:
           | It is of course a combination of all these aspects.
           | 
           | A type system that can express thread safety (Send/Sync
           | traits) is incredibly valuable when building multi-threaded
           | systems.
           | 
           | Universal definition of what is safe, and standard traits and
           | borrowing rules, make APIs more predictable. Just from
           | function's signature you know a lot about its behavior,
           | without having to look for gotchas in the manual.
           | 
           | Mandatory error handling prevents cutting corners. Unit
           | testing is built-in.
           | 
           | Generics, good inlining, and Cargo help split code into
           | libraries without a performance or usability hit, which helps
           | make focused, well-tested components.
           | 
           | Most of these things aren't groundbreaking, but Rust being
           | new had a luxury of picking current best practices and
           | sensible defaults.
        
           | nicoburns wrote:
           | A few things:
           | 
           | - I think memory safety is a baseline. You'll note that
           | memory safe languages already tend to be much more reliable
           | than non-memory-safe languages in general.
           | 
           | - Then you have the error handling. A lot of unreliability in
           | my code in other languages comes from unhandled exceptions
           | that only occur rarely. Rust generally puts all possible
           | error conditions in the type signature of the function.
           | Meaning it's actually feasible to handle every failure case.
           | 
           | - Speaking of unhandled exceptions, a lot of those in typed
           | languages tend to be caused by null. Rust does not have null.
           | Instead it has Option, and it is impossible to access the
           | contents of an option without doing the equivalent of a null
           | check. So that entire class of errors is gone.
           | 
           | - Both Result (used for error handling) and Option (used
           | instead of null) are what Rust calls enums, and what are more
           | generally called Sum Types. I think these are a huge deal.
           | They allow you to safely represent data that may be one thing
           | or another with very strict type checking. These are broadly
           | very useful in API design, and in my experience lead to much
           | more robust code than the class hierarchies you need in OOP
           | languages or unions which lack the safety checks. (Aside: sum
           | types would be quite simple to add to other languages. I have
           | no idea why they haven't been added yet).
           | 
           | - Speaking of classes, inheritance is not supported. So
           | that's a bunch of confusing code that just isn't possible to
           | write. This can add a bit of boilerplate to Rust code, but it
           | makes it more straightforward and less bug prone.
           | 
           | - You mentioned the borrow checker. That definitely helps.
           | It's yet another tool that allows you to write APIs that
           | cannot be misused. A great example would be Rust's Mutex
           | type. It can prove at compile time that code does not hold on
           | to references to the protected data beyond the duration that
           | the lock is held.
           | 
           | - Speaking of Mutex. Rust's Send and Sync traits provide
           | _very_ good thread safety. You almost don 't need to worry
           | about thread safety at all in Rust. Most concurrency bugs are
           | prevented by the compiler (you can still do things like cause
           | data races).
           | 
           | - Newtypes allow you to check invariants once and then have
           | the fact that they remain satisfied enforced by the type
           | system.
           | 
           | - All type casts are explicit.
           | 
           | - Lots of other little things
           | 
           | One final thing that I think is often overlooked. Rust is
           | strict, and all of these checks apply not only to the code
           | your write, but to all of your dependencies. That means that
           | Rust libraries tend to be much more reliable than libraries
           | from other ecosystems. That probably is partly because of a
           | culture of reliability. But it's also because the language
           | itself makes it hard to write sloppy code. And that the code
           | you are building on is likely to be reliable makes it both
           | less effort and more worthwhile to make your own code
           | reliable (including for library authors), leading to virtuous
           | circle of reliable code.
        
         | jhgg wrote:
         | We had the same experience at work deploying rust services that
         | serve many billions of requests a day as well.
        
           | petr_tik wrote:
           | does your company have any public information about this?
           | Blogs, job descriptions with numbers, twitter threads?
        
             | remram wrote:
             | Not GP but I believe this is it:
             | https://discord.com/category/engineering
        
       | swlkr wrote:
       | Wow this is just what I was looking for, a proxy written in a
       | memory safe language like rust with no GC as an alternative to
       | nginx. Looking forward to the open source version!
        
       | kronololo wrote:
       | What are HTTP status codes greater than 599 used for in practice?
       | 
       | It'd be interesting to see another Cloudflare blog post that just
       | goes into detail on the weird protocol behaviour they've had to
       | work around over the years. I imagine they have more insight into
       | this than pretty much any other organisation on the planet.
        
         | jenny91 wrote:
         | Presumably custom statuses for app-to-app traffic or someone's
         | weird API, etc.
        
       | TimTheTinker wrote:
       | Did you guys consider HAProxy? I've only ever heard good things
       | about it - particularly stability (though it probably can't beat
       | Rust), performance, and configurability.
        
       | noncoml wrote:
       | Really curious, are they using async/await?
        
         | jhgg wrote:
         | They mentioned they were using tokio, so naturally, yes.
        
         | jgrahamc wrote:
         | I see quite a bit of                   async fn
         | 
         | and                   .await();
         | 
         | in the source code. What did you want to know?
        
           | nemothekid wrote:
           | .await();
           | 
           | Hmm does the code compile at all?
        
       | rowin wrote:
       | Was Go considered as the language to write Pingora in? If so, why
       | was Rust chosen?
        
         | TheFlyingFish wrote:
         | Not from Cloudflare, but at a guess:
         | 
         | * They already have some pretty deep Rust experience on staff
         | 
         | * They were already dissatisfied with the performance penalty
         | from Lua's GC, so Go's GC was presumably unattractive as well
         | 
         | * Rust is worth more internet points than Go (just kidding,
         | mostly)
        
       | AtNightWeCode wrote:
       | Sounds good. I never encountered any performance issues with
       | Cloudflare.
       | 
       | If you have the time for enhancement, then:
       | 
       | 1. Option to hit the cache before workers. (Why we never use
       | workers).
       | 
       | 2. Rules for blocking traffic during nights (time-based rules).
       | 
       | 3. Make sure every product is a replacement. If you offer the
       | same thing as a cloud provider. Don't make us write a lot of
       | custom code.
        
         | ehPReth wrote:
         | Could you expand on 3?
        
           | AtNightWeCode wrote:
           | Why would you swap Azure blob storage or S3 for anything in
           | Cloudflare if it comes with running custom code in workers?
        
         | zwily wrote:
         | I agree with #1... Workers before the cache is crazy powerful
         | for the original purpose of Workers (modifying incoming
         | requests). But now that people are starting to use Workers as
         | their original (for remix, etc) it would be nice to be able to
         | have the cache before Workers. As it is right now, having the
         | CDN do full content caching of rendered Remix pages is
         | difficult.
        
         | RL_Quine wrote:
         | When is night on the internet?
        
           | AtNightWeCode wrote:
           | Not uncommon before the intro of edge services to block login
           | on sites during off working hours or during nights. Or at
           | least doing some rate-limiting. We see many attempts at brute
           | force-attacks during the nights. Most sites are not global.
        
           | yamtaddle wrote:
           | Most--maybe damn near all--sites see significant dips in
           | traffic for at least a few hours a day. Which part of the
           | day, depends on the site. More often than not, it's while the
           | team is asleep and staffing, if any, is at its lowest point,
           | since teams tend to live roughly in the same ~half of the
           | world that their products are most-used in. Plus there's
           | practically no-one in the Pacific until you reach Japan, and
           | not a ton of "Western" sites see much use in Asia, and vice
           | versa, with a few notable exceptions.
           | 
           | It's not unusual for e.g. ecommerce sites to crank up
           | automated fraud prevention "at night" because staffing is so
           | much lower.
           | 
           | TL;DR Most sites' usage patterns exhibit a pronounced
           | day/night cycle that's not too far off from natural day/night
           | cycles where the bulk of the team lives.
        
             | karambahh wrote:
             | Fraud prevention or even straight up order blocking between
             | 1:30 & 4AM because the downstream order management system
             | has only so much buffer capacity
             | 
             | (it's getting rarer, but it does still happen. Fraud
             | prevention cranked up is definitely a thing on any large
             | enough ecommerce website)
        
       | Sytten wrote:
       | The post mentions tokio, but I would be curious to see if it uses
       | tower or something similar in house. For our product (caido.io)
       | we also built a custom HTTP parser so if you open source the tool
       | it could be nice to split the parsing in its own crate so we have
       | an alternative to hyper that can understand malformed requests.
        
       | aliljet wrote:
       | I'm mildly blown away to read, 'And the NGINX community is not
       | very active, and development tends to be "behind closed doors".'
       | Is this a reflection of the company, nginx (now owned by F5)
       | going the way of an Oracle-style takeover of WebLogic from
       | another era?
        
         | moderation wrote:
         | Dropbox wrote about their migration from NGINX to Envoy in July
         | 2020 and highlighted a lot of the same concerns [0]. As per
         | this thread [1], NGINX have posted very similar blog posts for
         | the last two years saying they are 'returning to our open
         | source roots', but without much tangible change. And the
         | Cloudflare CEO forecasted this move away from NGINX back in
         | 2018 [2]
         | 
         | 0. https://dropbox.tech/infrastructure/how-we-migrated-
         | dropbox-...
         | 
         | 1. https://news.ycombinator.com/item?id=32572153
         | 
         | 2. https://twitter.com/eastdakota/status/1024515150546493440
        
         | schmichael wrote:
         | IMHO nginx has never been a particularly "open" or friendly
         | open source project. I don't mean to sound rude. I don't think
         | open source contributors "owe" anyone anything in this regard.
         | If you want to throw code over a wall and run away, that's your
         | prerogative. However I do think Cloudflare's assessment is
         | accurate and a real liability for them.
         | 
         | Some of the OSS papercuts with nginx:
         | 
         | - nginx has always used a "submit a patch to a mailing list"
         | style of contributions. Many contributions, my own attempt a
         | decade ago, just get ghosted:
         | https://mailman.nginx.org/pipermail/nginx-devel/2010-Decembe...
         | 
         | - Neither the contributing page
         | (http://nginx.org/en/docs/contributing_changes.html) nor the
         | Mercurial repo (http://hg.nginx.org/) redirect to HTTPS!
         | 
         | - Tests were a later addition and in a distinct repo with a
         | bespoke harness. I'm sure it has advantages, but it also takes
         | extra work for contributors to figure out.
         | 
         | - They use Trac?! I loved Trac circa 2008 but had no idea it
         | was still a thing. I can't even login to it without it timing
         | out.
         | 
         | I don't want to nitpick an excellent project like nginx, but I
         | think it's clear that easing third party contributions has
         | never been a high priority.
        
       | sullivanmatt wrote:
       | For any of the Cloudflare team that frequents HN, curious if you
       | have an eventual plan to open-source Pingora? I recognize it may
       | stay proprietary if you consider it to be a differentiator and
       | competitive advantage, but this blog post almost has a tone of
       | "introducing this new technology!" as if it's in the cards for
       | the future.
        
         | eastdakota wrote:
         | We are planning on open sourcing it. That's mentioned in the
         | post near the end.
        
           | sullivanmatt wrote:
           | Thanks Matt, not sure how I missed that. Glad to hear it!
        
           | peterhadlaw wrote:
        
           | Dowwie wrote:
           | Do you think that it would be beneficial during analyst
           | conference calls to highlight that Cloudflare is using Rust
           | to build its next-gen critical systems? It shows a strong
           | commitment to building best-in-class technology.
        
           | latchkey wrote:
           | It is kind of weird to point out nginx doing closed door
           | development as a negative, and then do exactly the same thing
           | yourself.
        
       | xfalcox wrote:
       | I share lots of feelings towards NGINX that Cloudflare mention on
       | this blog post. New features like 103 Early Hints and HTTP/3
       | exist in HAProxy and Caddy but there is nothing coming in NGINX.
        
         | datalopers wrote:
         | nginx was good for a decade or two. they were acquired and
         | doomed to irrelevancy since.
        
           | qwertox wrote:
           | F5 is not to blame, they didn't change anything for the
           | worse. The Plus-license is the problem where essential things
           | like monitoring are behind a paywall. Back then this wasn't
           | so important because you basically only had Apache and nginx.
           | 
           | I think I read two weeks ago what F5 was going to focus more
           | on improving the open source version. Probably because the
           | competition is getting harder and they're noticing it in a
           | market share decline, but whichever the reason is, this was
           | good to hear.
           | 
           | Also it was good to see it no longer being part of a Russian
           | company, even though the devs and owners are good people. You
           | never know how a government can enforce some problematic
           | behavior, specially one which is known for liking to throw
           | people out of high rise windows.
        
       | fasteo wrote:
       | Great write up !
       | 
       | Any cloudflarer involved in this project mind sharing some basic
       | metrics like LOCs, team size, how long from design to first
       | deployment.
       | 
       | Just curious.
        
       | arberx wrote:
       | Is it open source?
        
         | jgrahamc wrote:
         | It will be. There will be a follow up blog post about the open
         | sourcing with all the gory details of how it was built and how
         | it works.
        
           | network2592 wrote:
           | It might be a couple of months. The open sourcing of
           | Cloudflare workers was announced in May and still has not
           | been released. [1]
           | 
           | [1] https://blog.cloudflare.com/workers-open-source-
           | announcement...
        
           | Dowwie wrote:
           | You may be interested in knowing that about two years ago, a
           | team of engineers at Dropbox wrote gory details about their
           | use of Rust and it was inspiring. The passion about their
           | work really came through. The team also held an AMA on
           | /r/rust that went well. See here: https://www.reddit.com/r/ru
           | st/comments/fjt4q3/rewriting_the_...
        
           | latchkey wrote:
           | I think you should remove the part about closed door
           | development as a negative for nginx given the way that this
           | has been developed.
        
       | stjohnswarts wrote:
       | Any one else immediately do "open source" ctrl-f? That's all I
       | wanted to read but I bookmarked the article and put it on my list
       | of things to peruse later
        
         | mcherm wrote:
         | > That's all I wanted to read
         | 
         | While I agree that in selecting a proxy to use whether it is
         | open source is one of the most important considerations, if
         | that's all you look for in this article, you may be missing
         | something.
         | 
         | I thought the article did a good job of describing how they
         | went about making the choice of whether to continue
         | contributing to an existing (nominally open source) system or
         | to build a new one. And of course, it did a good job of
         | showcasing the strengths of Rust (reliability guarantees strong
         | enough that they could identify when problems were due to
         | hardware.)
        
       | boris wrote:
       | I wonder how this is deployed to presumably a large number of
       | hosts? Do you build a distribution package out of your Rust build
       | and ship that? If so, what about the Rust standard library?
       | Though I believe some distributions do provide a package for the
       | Rust standard library, but that means one also has to use the
       | packaged rustc/cargo, which tends to lag behind quite a bit.
        
         | pornel wrote:
         | Yes, the distribution package is built with `cargo deb`, which
         | automatically makes a suitable binary package. It doesn't need
         | Rust in production. Rust's standard library is compiled into
         | the executable. Its size is negligible, especially with link-
         | time-optimizations.
        
         | nicoburns wrote:
         | Rust's standard library is statically linked. Rust binaries
         | typically only require libc (and can be compiled with musl to
         | avoid that dependency too).
        
       | pornel wrote:
       | Huge congratulations to the tokio.rs team -- the async runtime
       | has proven to work well even in such demanding project.
        
       | tothrowaway wrote:
       | Does anyone know why nginx used separate processes for workers,
       | instead of threads? This post makes it sound like threads are the
       | way to go, but presumably nginx had a reason for using processes
       | back in the day.
        
         | vbernat wrote:
         | Share-nothing architecture were deemed more scalable as you
         | don't need synchronization. But then, you can't share stuff,
         | like a connection pool. Also, the architecture was simpler this
         | way. Nginx is also an application server and it was "easy" to
         | develop applications on top of it because of this architecture.
        
         | [deleted]
        
         | AgentME wrote:
         | Nginx was written in C. Multithreaded code in a language that
         | doesn't provide any safety rails is hard to get right, and so
         | is async code. They probably figured that the complexity of
         | doing both async and multithreading outweighed the benefits
         | that were predicted to be small. Rust's type system checks for
         | and prohibits many kinds of mistakes that are possible in
         | multithreaded code and in async code, so it's much easier to
         | combine them safely.
        
       ___________________________________________________________________
       (page generated 2022-09-14 23:00 UTC)