[HN Gopher] Cloudflare servers don't own IPs anymore so how do t... ___________________________________________________________________ Cloudflare servers don't own IPs anymore so how do they connect to the internet? Author : jgrahamc Score : 213 points Date : 2022-11-25 14:16 UTC (8 hours ago) (HTM) web link (blog.cloudflare.com) (TXT) w3m dump (blog.cloudflare.com) | danrl wrote: | As an industry, we are bad at deprecating old protocols like | IPv4. This is a genius hack for a problem we have due to IPv6 not | being adopted widely enough so that serving legacy IP users | becomes a dropable liability to the business. The ROI is still | high enough for us to "innovate" here. I applaud the solution but | mourn the fact that we still need this. | | I guess ingress is next, then? Two layers of Unimog to achieve | stability before TCP/TLS termination maybe. | dopylitty wrote: | I've been thinking a lot about this in my own enterprise and | I've increasingly come to the conclusion that IP itself is the | wrong abstraction for how the majority of modern networked | compute works. IPv6, as a (quite old itself) iteration on top | of IPv4 with a bunch of byzantine processes and acronyms tacked | on is solving the wrong problem. | | Originally IP was a way to allow discrete physical computers in | different locations owned by different organizations to find | each other and exchange information autonomously. | | These days most compute actually doesn't look like that. All my | compute is in AWS. Rather than being autonomous it is | controlled by a single global control plane and uniquely | identified within that control plane. | | So when I want my services to connect to each-other within AWS | why am I still dealing with these complex routing algorithms | and obtuse numbering schemes? | | AWS knows exactly which physical hosts my processes are running | on and could at a control plane level connect them directly. | And I, as someone running a business, could focus on the higher | level problem of 'service X is allowed to connect to service Y' | rather than figuring out how to send IP packets across | subnets/TGWs and where to configure which ports in NACLs and | security groups to allow the connection. | | Similarly my ISP knows exactly where Amazon and CloudFlare's | nearest front doors are so instead of 15 hops and DNS | resolutions my laptop could just make a request to Service X on | AWS. My ISP could drop the message in AWS' nearest front door | and AWS could figure out how to drop the message on the right | host however they want to. | | I know there's a lot of legacy cruft and also that there are | benefits of the autonomous/decentralized model vs central | control for the internet as a whole but given the centralized | reality we're in, especially within the enterprise, I think | it's worth reevaluating how we approach networking and whether | the continuing focus on IP is the best use of of our time. | ec109685 wrote: | The IP addresses you see as an AWS customer aren't the same | used to route packets between hosts. That said, there's a | huge amount of commodity infrastructure built up that | understands IP addresses and routing layers, so unless a new | scheme offers tremendous benefits, it won't get adoption. | | At least from a security perspective though ip acl's are | falling out of favor to service based identities, which is a | good thing. | | You can see how AWS internally does networking here: | https://m.youtube.com/watch?v=ii5XWpcYYnI | wpietri wrote: | > my laptop could just make a request to Service X on AWS | | I was looking for the "just" that handwaves away the | complexity and I was not disappointed. | | How do you imagine your laptop expressing a request in a way | that it makes it through to the right machine? Doing a | traceroute to amazon.com, I count 26 devices between me and | it. How will those devices know which physical connection to | pass the request over? Remember that some of them will be | handling absurd amounts of traffic, so your scheme will need | to work with custom silicon for routing as well as doing ok | on the $40 Linksys home unit. What are you imagining that | would be so much more efficient that it's worth the enormous | switching costs? | | I also have questions about your notion of "centralization". | Are you saying that Google, Microsoft, and other cloud | vendors should just... give up and hand their business to | AWS? Is that also true for anybody who does hosting, | including me running a server at home? If so, I invite you to | read up on the history of antitrust law, as there are good | reasons to avoid a small number of people having total | control over key economic sectors. | dopylitty wrote: | > How do you imagine your laptop expressing a request in a | way that it makes it through to the right machine? Doing a | traceroute to amazon.com, I count 26 devices between me and | it. How will those devices know which physical connection | to pass the request over? | | That's my whole point. You're thinking of it from an IP | perspective where there are individual devices in some | chain and they all need to autonomously figure out a path | from my laptop to AWS. The reality is every device between | me and AWS is owned by my ISP. They know exactly which | physical path ahead of time will get a message from my | laptop to AWS. So why waste all the time on the IP | abstraction? | | > I also have questions about your notion of | "centralization". Are you saying that Google, Microsoft, | and other cloud vendors should just... give up and hand | their business to AWS? | | AWS is just an example. Realistically a huge amount of | traffic on the internet is going to 6 places and my ISP | already has direct physical connections to those places. | Maintaining this complex and byzantine abstraction to | figure out how to get a message from my laptop to compute | in those companies' infrastructure should not be necessary. | | And in general the more important part is within AWS' (or | Microsoft's or enterprise X's) network why waste time on IP | when the network owner knows exactly which host every | compute process is running on? | | Instead of thinking of an enterprise network as a set of | autonomous hosts that need to figure out a path between | each other think of it as a set of processes running on the | same OS (the virtual infrastructure). Linux doesn't need to | do BGP to figure out how to connect two processes so why | does your network? | scarmig wrote: | > The reality is every device between me and AWS is owned | by my ISP. They know exactly which physical path ahead of | time will get a message from my laptop to AWS. | | None of these are true. | akira2501 wrote: | > Rather than being autonomous it is controlled by a single | global control plane and uniquely identified within that | control plane. | | By default, sure. You can easily bring your own IPs into AWS | and use them instead, and I don't think it's hard to imagine | the pertinent use cases and risk management this brings. | mike256 wrote: | Wouldn't it be better when all those big CDNs just switch off | IPv4 and force the sleeping ISPs to enable IPv6? Maybe we should | introduce some IPv6 only days as a first step... | subarctic wrote: | Pretty interesting article. TLDR: they're now using anycast for | egress, not just ingress. | | Each data center has a single IP for each country code (so that | they can make outgoing requests that are geolocated in any | country). In order to achieve that, they have a /24 or larger | range for each country, and announce it from all their data | centers, and then they route the traffic over their backbone to | the appropriate data center for that IP. | | Then in the data center, they share the single IP across all | their servers by giving each server a range of TCP/UDP port space | (instead of doing stateful NAT). | ec109685 wrote: | It's not a single IP address per data center. Otherwise they'd | only be able to make 64k simultaneous egress connections, nor | would their scheme of different ip addresses per "geo" and | product work. | Terretta wrote: | I quite like what CloudFlare has done here. | | There's a fourth way to resolve this, that works for the core use | case, is less engineering, and was in production 20 years ago, | but I can't fit it within the margins of this comment box. | | // CF's approach has additional feature advantages though. | [deleted] | dfawcus wrote: | What they describe sounds a lot like a distributed static RSIP | scheme. | | https://en.wikipedia.org/wiki/Realm-Specific_IP | | With port ranges rather than being 'leased', being allocated on | the the basis of per server within a locale. | | So the IP goes to the locale, the port range is the the static | RSIP to the server within that locale. | martinohansen wrote: | Am I missing something here or did they just reinvent a NAT | gateway with static rules? | | I understand that they started using anycast for the egress IPs | as well, but thats unrelated to the NAT problem. | [deleted] | xg15 wrote: | > _However, while anycast works well in the ingress direction, it | can 't operate on egress. Establishing an outgoing connection | from an anycast IP won't work. Consider the response packet. It's | likely to be routed back to a wrong place - a data center | geographically closest to the sender, not necessarily the source | data center!_ | | Slightly OT question, but why wouldn't this be a problem with | ingress, too? | | E.g. suppose I want to send a request to https://1.2.3.4. What I | don't know is that 1.2.3.4 is an anycast address. | | So my client sends a SYN packet to 1.2.3.4:443 to open the | connection. The packet is routed to data center #1. The data | center duly replies with a SYN/ACK packet, which my client | answers with an ACK packet. | | However, due to some bad luck, the ACK packet is routed to data | center #2 which is also a destination for the anycast address. | | Of course, data center #2 doesn't know anything about my | connection, so it just drops the ACK or replies with a RST. In | the best case, I can eventually resend my ACK and reach the right | data center (with multi-second delay), in the worst case, the | connection setup will fail. | | Why does this not happen on ingress, but is a problem for egress? | | Even if the handshake uses SYN cookies and got through on data | center #2, what would keep subsequent packets that I send on that | connection from being routed to random data centers that don't | know anything about the connection? | matsur wrote: | This is a problem in theory. In practice (and through | experience) we see very little routing instability in the way | you describe. | xg15 wrote: | You mean, it's just luck? | Brian_K_White wrote: | right? also seems like load should or at least could be | changing all the time. geo or hops proximity is really the | only things that decide a route? not load also? | | But although I would be surprised if load were not also | part of the route picker, I would also be surprised if the | routers didn't have some association or state tracking to | actively ensure related packets get the same route. | | But I guess this is saying exactly that, that it's relying | on luck and happenstance. | | It may be doing the job well enough that not enough people | complain, but I wouldn't be proud of it myself. | remram wrote: | Anycast is implemented by BGP and doesn't take load into | account in any way. You will reach the closest location | announcing that address (well, prefix). | ignoramous wrote: | TFA claims that _Anycast_ is an advantage when dealing | with DDoS because it helps spread the load? A regional | DDoS (where it consistently hits a small set of DCs) is | not a common scenario, I guess? | csande17 wrote: | Basically yes. Large-scale DDoS attacks rely on | compromising random servers and devices, either directly | with malware or indirectly with reflection attacks. Those | hosts aren't all going to be located in the same place. | | An attacker could choose to only compromise devices | located near a particular data center, but that would | really reduce the amount of traffic they could generate, | and also other data centers would stay online and serve | requests from users in other places. | toast0 wrote: | Your intuition is more or less all wrong here, sorry. | | Most routers with multiple viable paths pass was too much | traffic to do state tracking of individual flows. Most | typically, the default metric is BGP path length, for a | given prefix, send packets through the route that has the | most specific prefix, if there's a tie, use the route | that transits the fewest networks to get there, if | there's still a tie, use the route that has been up the | longest (which maybe counts as state tracking). Routing | like this doesn't take into account any sort of load | metric, although people managing the routers might do | traffic engineering to try to avoid overloaded routes | (but it's difficult to see what's overloaded a few hops | beyond your own router). | | For the most part, an anycast operation is going to work | best if all sites can handle all the forseable load, | because it's easy to move all the traffic, but it's not | easy to only move some. Everything you can do to try to | move some traffic is likely to either not be effective or | move too much. | richieartoul wrote: | Why shouldn't they be proud of a massive system like | Cloudflare that works extremely well? As a commentor | below described, it's not luck or happenstance, it's a | natural consequence of how BGP works. Seems pretty | elegant to me. | rizky05 wrote: | [deleted] | tonyb wrote: | It works because the route to 1.2.3.4 is relatively stable. The | routes would only change and end up at data center #2 if data | center #1 stopped announcing the routes. In that case the | connection would just re-negotiate to data center #2. | xg15 wrote: | Ah, ok, that makes sense. So for a given point of origin, | anycast generally routes to the same server? | majke wrote: | Correct. From a single place, you're likely to BGP-reach | one Cloudflare location, and it doesn't change often. | ratorx wrote: | As others have mentioned, this is not often a problem because | routing is normally fairly stable (at least compared to the | lifetime of a typical connection). For longer lived connections | (e.g. video uploads), it's more of a problem. | | Also, there are a fair number of ASes that attempt to load | balance traffic between multiple peering points, without | hashing (or only using the src/dst address and not the port). | This will also cause the problem you described. | | In practice it's possible to handle this by keeping track of | where the connections for an IP address typically ingress and | sending packets there instead of handling them locally. Again, | since it's a few ASes that cause problems for typical | connections, is also possible to figure out which IP prefixes | experience the most instability and only turn on this overlay | for them. | grogers wrote: | Yep, it can happen that your packet gets routed to a different | DC from a prior packet. But the routers in between the client | and the anycast destination will do the same thing if the | environment is the same. So to get routed to a new location, | you would usually need either: | | * A new (usually closer) DC comes online. That will probably be | your destination from now on. | | * The prior DC (or a critical link on the path to it) goes | down. | | The crucial thing is that the client will typically be routed | to the closest destination to it. In the egress case the | current DC may not be the closest DC to the server it is trying | to reach so the return traffic would go to the wrong place. | This system of identifying a server with unique IP/port(s) | means that CF's network can forward the return traffic to the | correct place. | ignoramous wrote: | Yes, as others have mentioned, route flapping is a problem. | But, in practice, not as big a problem as DNS-based routing. | | - See: https://news.ycombinator.com/item?id=10636547 | | - And: https://news.ycombinator.com/item?id=17904663 | | Besides, SCTP / QUIC aware load balancers (or proxies) are | detached from IPs and should continue to hum along just fine | regardless of which server IP the packet ends up at. | Thorentis wrote: | The fact that we haven't yet adopted IPv6 tells me that IPv6 | isn't actually that great of a solution. We need an Internet | Protocol that solves modern problems and that has a good | migration path. | wpietri wrote: | 40% of Google's traffic comes via IPv6. Up from 1% a decade | ago. https://www.google.com/intl/en/ipv6/statistics.html | | If you think you can do better than that, I look forward to | hearing your plan. Personally, I think that's huge progress. | eastdakota wrote: | Fun fact: the first product we announced to celebrate | Cloudflare's launch day anniversary was a IPv4<->IPv6 gateway: | | https://blog.cloudflare.com/introducing-cloudflares-automati... | | The success of that convinced us we should do something to | improve the Internet every year to celebrate our "birthday." | Over time we ended up with more than one product that met that | criteria and timing, so it went from a day of celebration to a | week. That became our Birthday Week. Then we saw how well | bundling a set of announcements into a week was so we decided | to do it other times of the year. And that's how Cloudflare | Innovation Weeks got started, explicitly with us delivering | IPv6 support back in 2011. | growse wrote: | You need an IPv4 src address to connect out to an IPv4 origin. | zekica wrote: | Where do they say that they haven't adopted IPv6? All their | offerings support IPv6. | inopinatus wrote: | TLDR: Cloudflare is using five bits from the port number as a | subnetting & routing scheme, with optional content policy | semantics, for hosts behind anycast addressing and inside their | network boundary. | Ptchd wrote: | If you don't need an IP to be connected to the internet, sign me | up... I think they are full of it though... Even if you only have | one IP.... you still have an IP | | > PING cloudflare.com (104.16.133.229) 56(84) bytes of data. | | > 64 bytes from 104.16.133.229 (104.16.133.229): icmp_seq=1 | ttl=52 time=10.6 ms | | With a ping like this, you know that I am not using Musk's | Internet.... | cesarb wrote: | All this wonderful complexity, just because a few servers insist | on behaving as if the location of the IP address and the location | of the user should always match. | jesuspiece wrote: | ronnier wrote: | Spammers are exploiting cloudflare by creating thousands of new | domains on the free tld (like .ml) and hosting the sites behind | cloudflare and spamming social media apps with links to scam | dating sites. CPA scammers. | | If anyone from CF sees this, I can work with you and give you | data on this. I'm dealing with this at one of the large social | media companies. | | Here's an example, this is NSFW - https://atragcara.ga | elorant wrote: | So why aren't social media platforms blocking the domains? | ronnier wrote: | We do. But with free TLD's, spammers and scammers can create | an unlimited number of new domains at zero cost. That's the | problem. They can send a single spam URL to a single person | and scale that out, each person gets a unique domain and URL. | elorant wrote: | So how about blocking the users then? Or limit their | ability to post links. | ronnier wrote: | That's done too. But it's not just a few, it's literally | 10s of thousands of individuals from places like | Bangladesh who do this as their source of income. They | are smart, on real devices, will solve any puzzle you | throw at them, and will adapt to any blocks or locking. | It's not an easy problem to solve which is why no | platform has solved it (oddly, spam is pretty much non | existent on HN) | elorant wrote: | I don't think there's any benefit in spamming HN. There | aren't that many users in here, and it could lead to a | backlash consider the technical expertise of most people. | gnfargbl wrote: | OK, but why don't you block Freenom domains entirely? | | Apart from perhaps a couple of sites like gob.gq, there's | essentially nothing of any value on those TLDs. Allow-list | the handful of good sites, if you must, and default block | the rest. | ronnier wrote: | I could. But we are talking about one of the worlds | largest social media platforms used by hundreds of | millions of people daily. There's legit websites hosted | on these free domains and I don't want to kill those | along with the scam sites. I've mostly got the scam sites | blocked at this point though. Just took me a week or so | to adapt. | gnfargbl wrote: | > There's legit websites hosted on these free domains | | Are there though, really? Can you give some examples? | | To a first approximation, I contend that essentially | everything on Freenom is bad. There are maybe a _handful_ | of good sites (the one I listed, https://koulouba.ml/, | etc) but you can find those on Google in a few minutes | with some _site:_ searches. | | I commend your efforts in blocking the scam sites, but | also honestly believe that it would be better for you, | your customers and the internet at large to default block | Freenom. Freenom sites are junk, wherever they are | hosted. | ronnier wrote: | Here's NSFW scam sites behind CF that use free TLDs. I | could post 10s of thousands of these. | | * https://atragcara.ga | | * https://donaga.tk | | * https://snatemhatzemerbedc.tk | gnfargbl wrote: | Yep, I know. I monitor these as they appear in | Certificate Transparency logs and DNS PTR records. | | Freenom TLDs are just junk. Save yourself the hassle and | default block :-). | ronnier wrote: | Seems these sites should be blocked on CF, at the root. | Not all the leaf nodes apps. It's pretty easy for me to | automate it at my company. Seems CF could? | sschueller wrote: | Same goes for DDoS attacks. I am not sure how they do it but we | get hit by CF IPs with synfloods etc. | gnfargbl wrote: | Anyone can set the source IP on their packets to be anything. | I can send you TCP SYNs which are apparently from Cloudflare. | | There was a proposal (BCP38) which said that networks should | not allow outbound packets with source IPs which could not | originate from that network, but it didn't really get a lot | of traction -- mainly due to BGP multihoming, I think. | toast0 wrote: | BCP38 has gotten some traction, but it's not super | effective until all the major tier-1 ISPs enforce it | against their customers. But it's hard to pressure tier-1 | ISPs; you can't drop connections with them, because they're | too useful, anyway if you did, the traffic would just flow | through another tier-1 ISPs, because it's not really | realistic for tier-1s to prefix filter peerings between | themselves. Anyway, the customer that's spoofing could be | spoofing sources their ISP legitimagely handles, and | there's a lot of those. | | Some tier-1s do follow BCP38 though, so one day maybe? | Still, there's plenty of abuse to be done without spoofing, | so while it would be an improvement, it wouldn't usher in | an era of no abuse. | slothsarecool wrote: | You do not get attacked from Cloudflare with TCP attacks. | Somebody is spoofing the IP header and make it seem like | Cloudflare is DDoSing you. | | The only way for somebody to DDoS from Cloudflare would be | using workers, however, this isn't practical as workers have | a very limited IP Range. | fncivivue7 wrote: | cmeacham98 wrote: | The reason people do this, by the way, is because it's | common if you're hosting via CF to whitelist their IPs and | block the rest. This allows their SYN flood to bypass that. | [deleted] | uvdn7 wrote: | This is a wonderful article. Thanks for sharing. As always, | Cloudflare blog posts do not disappoint. | | It's very interesting that they are essentially treating IP | addresses as "data". Once looking at the problem from a | distributed system lens, the solution here can be mapped to | distributed systems almost perfectly. | | - Replicating a piece of data on every host in the fleet is | expensive, but fast and reliable. The compromise is usually to | keep one replica in a region; same as how they share a single /32 | IP address in a region. | | - "sending datagram to IP X" is no different than "fetching data | X from a distributed system". This is essentially the underlying | philosophy of the soft-unicast. Just like data lives in a | distributed system/cloud, you no longer know where is an IP | address located. | | It's ingenious. | | They said they don't like stateful NAT, which is understandable. | But the load balancer has to be stateful still to perform the | routing correctly. It would be an interesting follow up blog post | talking about how they coordinate port/data movements (moving a | port from server A to server B), as it's state management (not | very different from moving data in a distributed system again). | remram wrote: | I have a lot of trouble mapping your comment to the content of | the article. It is about the _egress addresses_ , the ones | CloudFlare use as source when fetching from origin servers. | Those addresses need to be separated by the region of the end- | user ("eyeball"/browser) and the CloudFlare service they are | using (CDN or WARP). | | The cost they are working around is the cost of IPv4 addresses, | versus the combinatorial explosion in their allocation scheme | (they need number of services * number of regions * whatever | dimension they add next, because IP addresses are nothing like | data). | | I am not sure where you see data replication in this scheme? | uvdn7 wrote: | It's not meant to be a perfect analogy. The replication | analogy is mostly talking about the tradeoff between | performance and cost. So it's less about "replicating" the ip | addresses (which is not happening). On that front, maybe | distribution would be a better term. Instead of storing a | single piece of data on a single host (unicast), they are | distributing it to a set of hosts. | | Overall, it seems like they are treating ip addresses as data | essentially, which becomes most obvious when they talk about | soft-unicast. | | Anyway, I just found it interesting to look at this through | this lens. | majke wrote: | "Overall, it seems like they are treating ip addresses as | data essentially" | | Spot on! | | In past: | | * /24 per datacenter (BGP), /32 per server (local network) | (all 64K ports) | | New: | | * /24 per continent (group of colos), /32 per colo, port- | slice per server | | This is totally hierarchical. All we did is build a tech to | change the "assignment granularity". Now with this tech we | can do... anything we want. We're not tied to BGP, or IP's | belonging to servers, or adjacent IP's needing to be | nearby. | | The cost is the memory cost of global topology. We don't | want a global shared-state NAT (each 2 or 4-tuple being | replicated globally on all servers). We don't want zero- | state (a machine knowing nothing about routing, just BGP | does the job). We want to select a reasonable mix. Right | now it's /32 per datacenter.... but we can change it if we | want and be more, or less specific than that. | superkuh wrote: | Yikes. More cloudflare breakage of the internet model. Pretty | soon we might as well all just live within cloudflare's WAN | entirely. | eastdakota wrote: | -\\_(tsu)_/- | | Another perspective is that the connection of an IP to specific | content or individuals was a bug of the Internet's original | design and thankfully we're finally finding ways to | disassociate them. | AlphaSite wrote: | The internets a set of abstractions, as long as they still | implement some common protocols and don't create a walled | garden, is there any real social or technical issue with them | doing unusual things in their network? | | I can totally see an argument against their CDN being too | pervasive and problematic for TOR users, but this seems fine | IMO. | wrs wrote: | What's breaking the internet model is the internet becoming too | popular and running out of addresses. There's nothing specific | to Cloudflare here. You're free to do the same thing to | conserve your own address space. It's sort of a super-fancy | NAT. | majke wrote: | Author here, I know this is a dismissive comment, but I'll bite | anyway. | | As far as I understand the history of the IP protocol, | initially an IP address pointed to a host. (/etc/hosts file | seems that way) | | Then it was realized a single entity might have multiple | network interfaces, and an IP started to point to a network | card on a host. (a host can have many IP's). Then all the VRF, | dummy devices, tuntaps, VETH and containers. I guess an IP is | now pointing to a container or VM. But there is more. For | performance you can (almost should!) have an unique IP address | per NUMA node. Or even logical CPU. | | In modern internet a server IP: points to a single CPU on a | container in a VM on a host. | | Then consider Anycast, like 1.1.1.1 or 8.8.8.8. An IP means | something else... it means a resource. | | On the "client" side we have customer NAT's. CG NAT's and | VPN's. An IP means similarly little. | | The IP's are really expensive, so in some cases there is a | strong advantage to save them. Take a look at | https://blog.cloudflare.com/addressing-agility/ | | "So, test we did. From a /20 address set, to a /24 and then, | from June 2021, to an address set of one /32, and equivalently | a /128 (Ao1). It doesn't just work. It really works" | | We're able to serve "all cloudflare" from /32. | | There is this whole trend of getting denser and denser IP | usage. It's not avoidable. It's not "breaking the Internet" in | any way more than "NAT's are breaking the Internet". The | network evolves, because it has to. And for one, I don't think | this is inherently bad. | superkuh wrote: | >It's not avoidable. It's not "breaking the Internet" in any | way more than "NAT's are breaking the Internet". | | I agree. NATs, particularly the Carrier NAT that smartphone | users are behind, has broken the internet. It's made it so | most people do not have ports and cannot participate in the | internet. So now software developers cannot write software | that uses the internet (without depending on third parties). | This is bad. So is what you've done. | | Someday ipv6 will save us. | remram wrote: | TLDR: | | > To avoid geofencing issues, we need to choose specific egress | addresses tagged with an appropriate country, depending on WARP | user location. (...) Instead of having one or two egress IP | addresses for each server, now we require dozens, and IPv4 | addresses aren't cheap. | | > Instead of assigning one /32 IPv4 address for each server, we | devised a method of assigning a /32 IP per data center, and then | sharing it among physical servers (...) splitting an egress IP | across servers by a port range. | majke wrote: | Ha, I guess this is one way of summarizing it :) Author here. I | wanted to share more subtleties of the design, but maybe I | failed. | | Indeed, the starting point is sharing IP's across servers with | port-ranges. | | But there is more: | | * awesome performance allowed by anycast. | | * ability to route /32 instead of /24 per datacenter. | | Generally, with this tech we can have _much_ better IP usage | density, without sacrificing reliability or performance. You | can call it "global anycast-based stateless NAT" but that | often implies some magic router configuration, which we don't | have. | | Here's one example of problems we run into - the lack of | connectx() syscall on Linux - makes it hard to actually select | port range to originate connections from: | | https://blog.cloudflare.com/how-to-stop-running-out-of-ephem... | chatmasta wrote: | I was surprised IPv6 was only briefly mentioned! Is that | something you're looking at next, or are you already running | an IPv6 egress network? | | Of course not every destination is an IPv6 host, so IPv4 | remains necessary, but at least IPv6 can avoid the need for | port slicing, since you can encode the same bucketing | information in the IP address itself. | | I've seen this idea used as a cool trick [0] to implement a | SOCKS proxy that randomizes outbound IPv6 address to be | within a publicly routed prefix for the host (commonly a | /64). | | I guess as long as you need to support IPv4, then port | slicing is a requirement and IPv6 won't confer much benefit. | (Maybe it could help alleviate port exhaustion if IPv6 | addresses can use dynamic ports from any slice?) | | Either way, thanks for the blog post, I enjoyed it! | | [0] https://github.com/blacklanternsecurity/TREVORproxy | miyuru wrote: | I was also interested to know how this was handled for | IPv6, but it was only briefly mentioned. | | Probably they didn't need to do much work with IPv6, since | half of the post is solving IPv4 exhaustion problems. | chriscappuccio wrote: | Cloudflare wants to make money. The IPv6 features can | come second as v6 usage increases. | dknecht wrote: | All of Cloudflare services ship with IPv6 day 1. IPv6 not | an issue as we have enough IPv6 for each machine to have | own IPs. | pencilcode wrote: | Is the geofencing country level only? So if, using warp, I | use trip advisor and go and see nearby restaurants it will | have no idea of what city I'm in? Guessing that's not so but | wondering how it works | aeyes wrote: | This blog post has some info: | https://blog.cloudflare.com/geoexit-improving-warp-user- | expe... | | Warp uses its own set of egress IPs and their geolocation | is close to your real location. | remram wrote: | From your article it seemed that your use of anycast was more | accident than feature, due to the limit of BGP prefix sizes. | If you could route those IPs to their correct destination, | you would, you only go to the closest data center and route | again because you have no choice. | | Maybe this ends up reducing cost on customers though, because | the international transit happens in your backbone network | rather than on the internet (customer-side). | elp wrote: | In english they now do carrier grade nat. | cm2187 wrote: | well, vanilla NAT really. | [deleted] | immibis wrote: | This is a horrible way to avoid upgrading the world to IPv6. | xnyan wrote: | The industry will not transition to v6 unless: 1) The cost of | not doing so is higher than the cost of sticking with v4. | Because of all the numerous clever tricks and products designed | to mitigate v4's limitations, the cost argument still favors v4 | for most people in most situations. | | or | | 2) We admit that v6 needs to be rethought and rethink it. I | understand why v6 does not just increase IP address bits from | 32 to 128, but at this point I think everyone has admitted that | v6 is simply too difficult for most IT departments to | implement. In particular, the complexity of the new assignment | schemes like prefix delegation and SLAAC needs to be paired | back. Offer a minimum set of features and spin off everything | else. | Animats wrote: | I'm surprised that Cloudflare isn't all IPv6 when Cloudflare is | the client. That would solve their address problems. Maybe | charge more if your servers can't talk IPv6. Or require it for | the free tier. | | It's useful that they use client side certificates. (They call | this "authenticated origin pull", but it seems to be client | side certs. | ec109685 wrote: | They also have to egress to third party servers since they | are a CDN and support things like serverless functions ___________________________________________________________________ (page generated 2022-11-25 23:00 UTC)