[HN Gopher] What happens when you update your DNS ___________________________________________________________________ What happens when you update your DNS Author : kiyanwang Score : 83 points Date : 2020-06-22 06:15 UTC (16 hours ago) (HTM) web link (jvns.ca) (TXT) w3m dump (jvns.ca) | jrockway wrote: | This reminds me that I wish DNS had some way to define a load | balancing algorithm for clients to use, so browsers could make | load balancing decisions. This would eliminate the need for | virtual IP addresses, having to pass originating subnet | information up recursive queries, having to remove faulty VIPs | (or hosts) from DNS, etc. | | It is baffling to me that inside the datacenter, I can control | the balancing strategy for every service-to-service transaction, | but for the end user's browser, all I can do is some L3 hacks to | make two routers appear as one (for failover purposes). L3 | balancing would be completely unnecessary if I could just program | the user agent to go to the right host, after all. The end result | is unnecessary cost and complexity multiplied over a billion | websites. | m3047 wrote: | > [...] I wish DNS had some way to define a load balancing | algorithm for clients to use, so browsers could make load | balancing decisions. | | There's actually the germ of an interesting idea in that | statement. If I'm going to go to the trouble, let's say, of | running a local TCP forwarder (good for the whole device), can | I run a packet sniffer at the same time and watch netflows and | edit the responses I return to the device based on what I see | performance-wise concerning those flows? | | Expert me says that web sites are loaded with too much cruft | and since the far end terminations are spread far and wide, | there's not enough opportunity to apply that learning in any | practical sense. But I could be wrong. | (https://github.com/m3047/shodohflo) | LinuxBender wrote: | That is a use of SRV records [1], however it was not accepted | into the HTTP protocol specification. I bring it up every time | there is a new protocol version but I am too lazy to write an | RFC addendum for it and hope that someone else will. Existing | protocols may not be modified in this manor once ratified. | Maybe HTTP/4.0? /s | | Some applications use SRV records for load balancing. Many VoIP | and video conferencing apps do this. There is a better list on | Wikipedia. _service._proto.name. TTL class | SRV priority weight port target. | | [1] - https://en.wikipedia.org/wiki/SRV_record | jrockway wrote: | Yeah, I always liked SRV records. It seems that they proved | inadequate for gRPC balancing, so there are new experiments | in progress (mostly xDS). | 1996 wrote: | How would it be better than round robin DNS with low TTL? | jrockway wrote: | Basically, it affords you the ability to cache for longer and | still end up with users able to go to your website. | | Right now, you can try resolving common hosts, and you will | see that they often provide several hosts in response to a | lookup. What the browser does with those IPs is up to the | browser, the standard does not define what to do. What the | administrator that sets up that record wants is "send to | whichever one of these seems healthy", and some browsers do | do that. Other browsers just pick one at random and report | failure, so your redundancy makes the system more likely to | break. | | What I want is a way to define what to do in this case. Maybe | you want to try them all in parallel and pick the first to | respond (at the TCP connection level). Maybe you want to try | them sequentially. Maybe you want to open a connection to all | of them and send 1/n requests to each. Right now, there is no | way to know what the service intends, so the browser has to | guess. And each one guesses differently. | | (You will notice that people like Google and Cloudflare | skillfully respond with only one record with a 5 minute TTL. | That is so the behavior of the browser is well defined, but | it also eats their entire year of 99.999% uptime with one bad | reply. Your systems had better be very reliable if DNS issues | can eat a year's worth of error budget.) | aeden wrote: | FWIW, there is an IETF draft that may be suitable for | addressing this: https://datatracker.ietf.org/doc/draft-ietf- | dnsop-svcb-https... | jrockway wrote: | Aha, this sounds like exactly what I was looking for! | m3047 wrote: | DNS has its own load balancing at several levels (and several | different kinds): | | Nameserver records (NS records) used to locate a resource are | served by other nameservers. NS records are chosen from among | those offered in response to a query (RRs), and should all be | tried if necessary to elicit a response. The algorithm isn't | strictly specified and some nameservers will shuffle the order | in which they return RRs in their answers, some won't assuming | the stub resolver or app will do it. The foregoing also applies | to A and AAAA records (returning IP addresses for names), and | this has long been used as a quick and easy form of load | balancing/failover, except that it doesn't really failover very | well unless your app is coded to try all of the different | answers (and the stub resolver returns them to your app). | | Nameservers querying other nameservers (caching/recursive | resolvers) are supposed to compile metrics on response times | when they make upstream requests and pick the fastest upstreams | once they learn them. | | Stub resolvers (running on your device) typically query | nameservers in the order you specified them in your network | config, but not always. | | From the foregoing, you can probably see that running a | caching/recursive resolver close to your devices is supposed to | be desirable, by design. | | So far, so far. ;-) | | As specified, and it's never been changed, DNS tries UDP first. | "Ok", you think "that must mean it will try TCP" but that's not | actually true: it only tries TCP if it receives a UDP response | with TC=1 (flagged as truncated). But if there's a UDP frag and | it doesn't get all the frags or never gets a UDP response at | all it /never/ tries TCP. | | You're mixing two very different environments above: 1) a | datacenter with (let's just assume) VPCs and 2) a web browser. | | In case #2 I'll match your ante and raise you an overloaded | segment which is dropping UDP packets, in which case stuff may | fail to resolve at all. Oh look, I drew a wildcard: | traditionally browsers have utilized the devices stub resolver, | but since they've pushed ahead with DoH they've had to | implement their own. People think I'm a DNS expert (what do | they know?) and I guess conventional wisdom amongst myself and | my peers is that UDP should perform better than TCP but | anecdotally people are claiming that DoH and DoT perform better | for them than their stub resolver. "Must be your ISP messing | with you" says someone, "yeah right, that's gotta be it". Me: | "did you try running your own local resolver?" them: "wuut?" | | So here's where I confess that the experts aren't always right, | because I run my own local resolver and I have the same | problem: when the streaming media devices are running DNS | resolution on the wifi-connected laptop sucks and if I run a | TCP forwarder it starts working! | (https://github.com/m3047/tcp_only_forwarder). | | Now to case #1, the datacenter. I hope you're running your own | authoritative and caching server, and you should read about | views in your server config guide; using EDNS to pass subnet | info is a kludge. If you're writing datacenter apps, you should | consider doing your own resolution and using TCP (try the | forwarder, I dare you), and provisioning accordingly (because | DNS servers assume most requests will come in via UDP). | | If you want load balancing "you know, like nginx" I've got news | for you: BIND comes with instructions for configuring nginx as | a reverse TCP proxy. Oh! Looks like I've got a straight in a | single suit: nginx provides SSL termination so I've got DoT for | free! | jrockway wrote: | I am not really talking about load balancing the DNS traffic, | I'm talking about interpreting the response of the DNS query. | (The reliability at the network level seems to be handled by | moving everything to DNS-over-HTTPS or something, and is a | debate for another day.) | | For example, consider the case where you resolve | ycombinator.com. You get: ycombinator.com. | 59 IN A 13.225.214.21 | ycombinator.com. 59 IN A 13.225.214.51 | ycombinator.com. 59 IN A 13.225.214.81 | ycombinator.com. 59 IN A 13.225.214.73 | | Which of those hosts should I open a TCP connection to to | begin speaking TLS/ALPN/HTTP2? The standard doesn't say. I | would like a standard that says what to do. (The more | interesting case is, say I pick 13.225.214.21 at random. It | doesn't respond. What do I do now? Tell the user | ycombinator.com is down? Try another one? All of this could | be defined by a standard ;) | JoshMcguigan wrote: | DNS infrastructure is really interesting. I did a bit of a deep | dive on it a few months ago, culminating in running my own | authoritative name servers [0] for a while. | | [0]: https://www.joshmcguigan.com/blog/run-your-own-dns-servers/ | rhizome wrote: | One neat way of retaining that control is running your own | SOA(s), but getting robust secondaries and listing _those_ in | WHOIS so that they take all of the wild queries. Then you just | work with your little SOA and everything just propagates as | necessary and you don 't get hammered. | ricardo81 wrote: | Recursive DNS servers can also throw you off the scent a bit by | giving you an answer that is not the same as the authoritative | server. | | I've seen 8.8.8.8 return something other than NXDOMAIN for some | domains that do not exist | | Cloudflare will not honour dns ANY requests | | Knowing how to query the authoritative nameservers is a handy | tool for debugging. | eat_veggies wrote: | One of my favorite revelations about the network tracing tools | (things like `traceroute` and `dig +trace`) that might not be | obvious for people like me who work higher up in the stack, is | that the data they provide isn't usually made available during | "normal" usage. Packets don't just phone home and tell you where | they've been. Something else is going on. | | When you send a DNS query to a recursive server like your ISP's | or something like 1.1.1.1, you make a single DNS query and get | back a single response, because the recursive DNS server handles | all the different queries that Julia outlines in the post. As the | client, we have no idea what steps just happened in the | background. | | But when you run `dig +trace`, dig is actually _pretending to be | a recursive name server_ , and making all those queries _itself_ | instead of letting the real recursive name servers do their work. | It 's a fun hack but that means it's not always 100% accurate to | what's going on in the real world [0] | | [0] https://serverfault.com/questions/482913/is-dig-trace- | always... | nijave wrote: | Yup, and to complicate matters more those resolvers you're | talking to may be talking to more caching resolvers. | | For a given application server it might be: | | - check local dns caching resolver | | - check local network caching resolver | | - if internal domain, check local authoritative resolver | | - if public domain check isp resolver | | - recursively resolve from there | LogicX wrote: | Just to add to the discussion -- 'whats happening in the | background' -- more specifically is your operating system's | stub resolver. | | So when you ask for www.amazon.com it ends up making multiple | DNS lookups, as www.amazon.com is a CNAME record. | | Nothing about this CNAME lookup gets passed back up the stack | to your application; you just get that end-result: the IP | address. | | host www.amazon.com www.amazon.com is an alias for | tp.47cf2c8c9-frontier.amazon.com. | tp.47cf2c8c9-frontier.amazon.com is an alias for | www.amazon.com.edgekey.net. www.amazon.com.edgekey.net is an | alias for e15316.e22.akamaiedge.net. e15316.e22.akamaiedge.net | has address 23.204.68.114 | dgl wrote: | dig +trace takes one path, there's also tools like dnstrace | that attempt to show all the paths: | https://github.com/rs/dnstrace | | Still there can be caches that don't quite agree as the other | comment mentions. | fragmede wrote: | In particular, one your ISPs/their ISP's DNS servers may be | caching a record for longer than it's supposed to and will | return incorrect and expired data. | | The other possibility is different IP's being returned by a DNS | server based on where a query is coming from, eg a CDN. If | you're in location A and your ISPs DNS server is in location B, | the CDN's DNS server may return a different IP based on if the | request is coming from A or B. ECS [0] is supposed to mitigate | this, but may or may not be used. | | [0] https://en.wikipedia.org/wiki/EDNS_Client_Subnet | qes wrote: | > In particular, one your ISPs/their ISP's DNS servers may be | caching a record for longer than it's supposed to and will | return incorrect and expired data. | | It's disturbing how many clients we'll see hitting an old IP | address for 30 days after a change. | muppetman wrote: | Glad to see this. One of my (stupid) pet peeves is people that | say "You have to wait for the DNS to propogate". DNS _does not_ | propogate. What you 're actually waiting for is the cache TTL to | expire so those name-servers that have cached it have to query | the real answer again, thus getting the newly pushed information. | Of course it appears exactly like it "takes time to propagate" | which is why it's actually a pretty sound description of what's | happening, and thus why it's a stupid pet peeve. Pointless rant | ends. | gerdesj wrote: | Don't forget negative caching. Windows famously fucks up here. | A DNS look up these days is minute in the grand scheme of | things and yet Windows still insists on caching a failed lookup | for five minutes. | | So you fire up cmd.exe and issue ifconfig /releasedns, ..., | ipconfig /?, ipconfig /flushdns and then you go back to pinging | the bloody address instead of using nslookup because you | learned from another idle/jaded sysadmin to use ping as a | shortcut to querying DNS, instead of actually querying what the | DNS servers respond with. | | Obviously, a better thing to do when checking your DNS entries | is dig out ... dig. | | DNS _changes_ _do_ propagate: from the one you edited to the | others via zone transfers and the like (primary to secondary | etc) and thence to caching resolvers. | rovr138 wrote: | The change is propagating through the network, but it's not a | push like most would assume based on the wording | HenryBemis wrote: | Old guy here: maybe people were confusing DNS with the WINS | service that was helping to propagate ("replicate") the | name/servers changes 20 years ago? | tialaramex wrote: | Yes, I'm annoyed about this too. | | The most egregious case I've seen was an Amiga site. The site | went down and for several _days_ reported that users would need | to wait for the updated records to propagate and lots of loyal | fans were insisting anybody who couldn 't read the site was | just being too impatient. | | What was actually wrong? They wrote their new IP address as a | DNS name in their DNS configuration rather than as an IP | address. Once they fixed that it began working and they acted | as though that was just because now it had successfully | propagated. | | On the other hand propagation _is_ a thing when it comes to | distributing modified DNS records to multiple notionally | authoritative DNS servers. | | This can be a problem for using Let's Encrypt dns-01 challenges | for example, especially with a third party DNS provider. | | Suppose you write a TXT record to pass dns-01 and get a | wildcard certificates for your domain example.com. You submit | it to your provider's weird custom API and it says OK. | Unfortunately when you do this all it really did was write the | updated TXT record to a text file on an SFTP server. Each of | the provider's say three authoritative DNS servers (mango, | lime, kiwi) check this site every five minutes, download any | updated files and begin serving the new answers. | | Still they said OK, so you call Let's Encrypt and say you're | ready to pass the challenge. Let's Encrypt calls authoritative | server kiwi, which has never seen this TXT record and you fail | the challenge. | | So you check DNS - your cache infrastructure calls lime, which | has updated and gives the correct answer, it seems like | everything is fine, so you report a bug with Let's Encrypt. But | nothing was wrong on their side. | | Now, unlike typical "DNS propagation" myths the times for | authoritative servers are usually minutes and can be only | seconds for a sensible design (SFTP servers is not a sensible | design) so you can just add a nice generous allowance of time | and it'll usually work. But clearly the Right Thing(tm) is to | have an API that actually confirms the authoritative servers | are updated before returning OK. | asciimike wrote: | https://howdns.works is one of my favorite educational booklets | on the subject. Not as in depth as many other resources, but | highly amusing and fairly sticky. | logikblok wrote: | This is brilliant thanks. ___________________________________________________________________ (page generated 2020-06-22 23:00 UTC)