[HN Gopher] What developers should know about TCP ___________________________________________________________________ What developers should know about TCP Author : todsacerdoti Score : 218 points Date : 2020-05-14 10:03 UTC (1 days ago) (HTM) web link (robertovitillo.com) (TXT) w3m dump (robertovitillo.com) | citrin_ru wrote: | A very important point every develeper should know: successful | write(2) syscall doesn't not grantee that the data received by a | remote application. TCP is described as a protocol which grantees | packet delivery and this often misleading. | | write(2) syscall returned without a error means that data has | been placed in OS kernel buffer. OS kernel then will try to send | it to a remote host. If couple packet will be lost it's not a | problem - kernel will retry a few times. But if power will be | lost shortly after a write, data may never hit the wire. Then | there is possibility that network link will be broken for a long | time. OS will retry, but for a limited time and then will give | up. Also remote host can crash at any time before remote | application actually will read the data. | | So if you need reliable delivery you need acknowledgement on | application protocol level despite the fact that TCP already have | acknowledgements. | commandlinefan wrote: | When I first started working with computer networks, I just | thought of TCP/IP as "low-level stuff" and I focused instead on | the higher level stuff. After I kept running into | incomprehensible errors seemingly over and over again, I finally | broke down and picked up a copy of Richard Steven's "TCP/IP | Illustrated". Hands down, the best investment in time I've ever | made. If you deal with distributed systems (hint, you do), you | _need_ to understand how they actually work. | c0nsumer wrote: | A lot of benefit that I add in my day job is bridging the gap | between high level folks (OS people) and what's-actually- | happening-on-the-wire. | | While so, so, so much of this is rarely the network, knowing | how to look under the covers and see what's actually hitting | the wire (versus what the API call asked for) leads to far, far | faster resolution of problems. | | It's frustrating to me that so many people see this as a | mystery of "knowing networking" when it's really just basic | protocol analysis. | non-entity wrote: | Is this still a reccomend book? I was looking for a good TCP/IP | reference book, but many seemed rather old. Of course, I | imagine protocols like that dont get modified too much. | travmatt wrote: | I just finished Kurose' "Computer Networking: A Top-Down | Approach" and I'd recommend it. | commandlinefan wrote: | Well, it's definitely out of date: the first edition predates | even IPv6 (and the second edition is awful, don't buy it). | Still, the way it's laid out is so well done that once you | understand how TCP/IP worked in the mid-90's, you'll easily | be able to work out the evolution of it since on your own. | It's a shame there's no better up-to-date book, but Stevens | was one-of-a-kind. The Comer book isn't bad (but it's not | really good, either), and the Kurose & Ross book is less not | bad (and more not good), but even though both are more | modern, I'd still recommend TCP/IP Illustrated to really | understand what's going on in the network stack. | rb808 wrote: | Great I thought I'll take a look. Three volumes each over 1000 | pages? Any other suggestions? Did you mean all 3 books? | commandlinefan wrote: | Hehe - I did end up reading all 3, and enjoyed them all, but | I'd say I got 90% of the value from volume 1. Volume 2 walks | through the BSD implementation of TCP/IP, which is | fascinating, but way more detail than you'd ever need to | know, and volume 3 goes off into some esoteric topics that | seemed promising at the time but mostly ended up being | abandoned (along with a brief discussion of HTTP as it was | around the 90's). | | If you're going to read it, though, find a used copy of the | original Stevens' first edition, not that terrible desecrated | second edition. | [deleted] | Bootvis wrote: | What is wrong with the second edition? | commandlinefan wrote: | It was rewritten by a different author (the original | author, Richard Stevens, died in a car accident in the | late 90's). I guess the new guy tried his best, but he | just doesn't have the writing skill that Stevens had. | tenant wrote: | Depressing, isn't it? There are so many books that I probably | should read about almost any number of topics that, in my | work, I "touch on". | irrational wrote: | I have books that I purchased decades ago like this that | are still languishing on my bookshelf. | rb808 wrote: | Lol I'll probably buy it and put on my bookshelf with all | the others. | hilem wrote: | Not OP but the first volume is the one that's cited | frequently and the only one of the series I believe to have a | second edition. | kevstev wrote: | Just read the first one- it reads more like a novel than a | textbook IMHO, though I may be biased- I have always been | fascinated by networks and when I was coming of age this was | the "high tech" of the time- I used to read RFCs for fun (I | highly recommend this as well if you want to dig a little | deeper- Jon Postel's are great reads). | | This is one of the best written textbooks, if not the best, I | have ever read. | outworlder wrote: | > I focused instead on the higher level stuff. | | That's fine. But every developer should have a basic | understanding of networking. But that can also be dangerous. | | I still have people in the company who swear you can't have | more than 65k incoming connections to a machine, because | "that's how many ports there are". Don't get me started on all | the misconceptions on TCP_TW_REUSE AND TCP_TW_RECYCLE. Lengthy | discussions because apparently "TIME_WAIT is bad and uses up | ports! "(see also, 65k). For context, these are servers, with | multiple clients, from different source IPs. | crazygringo wrote: | > _But what about large files, such as videos? Surely there is a | latency penalty for receiving the first byte, but shouldn't it be | smooth sailing after that?_ | | So the articles (unstated) conclusion seems to be that, as long | as there isn't network congestion, it _is_ smooth sailing after | that. | | But that congestion reduces bandwidth. But of course, that | applies just as much to a national backbone as to last-mile. | | So I'm curious: where _does_ most packet loss occur? Is it last- | mile, at your ISP, or along major backbones? Because that has | major implications as to whether caching video content closer to | users actually results in higher-quality video (e.g. supporting | 1080p instead of 720p) or not. | boryas wrote: | > where does most packet loss occur | | Here's an interesting paper from SIGCOMM (it won best paper at | the conference in 2018, FWIW) that attempts to figure out what | links are congested without direct access to ISP networks: | https://www.caida.org/publications/papers/2018/inferring_per... | z3t4 wrote: | Ive been debugging packet loss issues lately and they did all | occur in the datacenter. For backbones and network exhanges | they move so much traffic already that things like everyone | working remote only increases traffic by a few percent, and | they have _a lot_ of over capacity in order to handle spikes or | when a new game is released and everyone downloads it at the | same time. | | So yes it would really help to have more decentralisation. Like | putting the content closer to the user. | jeffbee wrote: | Just enough information to be dangerous? Article attributes | behaviors of loss-based congestion control schemes like Reno and | Cubic to TCP itself. In practice, the congestion control scheme | is not really part of the protocol (there is, for example, BBR). | There's also ECN, showing that loss is not the only way to | discover congestion. | convolvatron wrote: | the RTT discussion was a little misleading. its true that slow | start rates are entirely dependent on RTT...but eventually the | sawtooth should reach the same steady state. | | there is work that shows that higher RTT connection do | statistically suffer a smaller fair share, but that's a subtler | if related issue. actually, I really wish the author would have | shown the sawtooth. | toast0 wrote: | At some point, with increasing bandwidth and increasing RTT, | you end up with your effective bandwidth capped by receive | windows and/or send buffers. Cross country high def video | might not be quite enough to hit that, but intercontinental | high def video would be. | | Being closer means faster initial 'slow start', but also | faster 'slow start' on congestion, which is why you get a | bigger share. | convolvatron wrote: | sure. but thats really just a window being under the | bandwidth delay product. the discussion makes it seem like | you suffer an outright linear performance hit | [deleted] | 29athrowaway wrote: | The RFC is useful as well. | | https://tools.ietf.org/html/rfc793 | | TCP state machine diagrams can be useful too. | [deleted] | freefriedrice wrote: | EDIT: Sure wish I could delete this post. | | Wait, this isn't TCP, this is protocol level above TCP, right? | TCP doesn't shape traffic by itself through rate limiting and | congestion analysis, does it? I thought the layer above it used | TCP to send/receive the buffer size, and that has nothing to do | with TCP. | | Am I wrong? | zwkrt wrote: | You are wrong! Obviously the application layer on top of TCP | could be the bottleneck, but TCP itself has mechanisms to | ensure traffic is flowing as fast and as smoothly as possible. | Look up "TCP Flow control" and "TCP Congestion Control" | scott_s wrote: | TCP definitely does congestion control itself: | https://en.wikipedia.org/wiki/TCP_congestion_control | duxup wrote: | When I used to do networking tech support for some networking | equipment the guy's who sat next to me supported the load | balancer product. | | I swear a high percentage of their calls were questions about how | the load balancer wasn't working and sending all the traffic to | one server and then after some investigation we discover all | traffic is in fact directed to that lone server... because the | client code has the IP of that server hard coded. A tedious | discussion would then ensue about how that is not how to do it. | | The next week? Same angry call... | | Partly that is what inspired my decision to change careers. "Man | if these developers can't figure out basic networking, maybe I | could be a developer...?" | Matthias247 wrote: | What they mostly should know: TCP provides a bidirectional stream | of bytes on the application level. It does NOT provide a stream | of packets. | | That means whatever you pass to a send() call is not necessarily | the same amount of data the receiver will observe in a single | read() call. You might get more or less bytes, since the | transport layer is free to buffer and to fragment data. | | I have seen the assumption of TCP having packet boundaries on | application level being made too often - typically in | stackoverflow questions like: ,,I don't receive all data. Is my | OS/library broken?" | nicolaslem wrote: | One way to stop falling into this trap is by knowing what | happens behind the send syscall: the application is not sending | bytes down the wire, it just fills a buffer in the OS. Once in | the buffer there is no boundary between bytes from different | send calls. Same thing for receiving, in reverse. | anilakar wrote: | In the fall of 2016 I had a lengthy email exchange with an | industrial automation vendor who didn't understand this issue. | I even mailed them a short Python proof-of-concept snippet that | slept a few milliseconds between the write() calls and in | response got back my code "fixed" with the sleep removed. | | In between the emails I googled a bit and found the changelogs | for the RTOS they were using. Turned out that it was a bug in | the upstream HTTP server. This also meant that the platform | they were using had all the security holes from those five-plus | years. The bug was later silently fixed when they acquired a | newer release from upstream. | | Currently I'm having a similar issue with the very same vendor. | This time they don't understand why client-side authentication | means no authentication at all and why passwords must not be | stored in plain text in the database that can be remotely | backed up from the device. | irrational wrote: | Why don't you tell us the vendor's name? It seems like the | responsible thing to do. | anilakar wrote: | Even after the bug gets fixed, it'll probably take years | for all the embedded devices in the public internet to get | patched, so no. | laughinghan wrote: | But in the meantime, won't the vendor keep adding more | broken devices to the public internet, making the problem | worse? | | The longer it takes for this problem to become public, | won't the more harm be caused when it does become public? | outworlder wrote: | > This time they don't understand why client-side | authentication means no authentication at all | | I've seen this... with an intern! I can't imagine dealing | with a whole team like that. | throwaway_pdp09 wrote: | How do you not kill these people? How do you put up with it? | How do vendors like this survive? | maartenh wrote: | Just like in nature, they survive because they are good | enough, and don't experience enough competition to be | eliminated by selection. | the8472 wrote: | full disclosure could put some selective pressure on | them. | jolmg wrote: | Depending on what kind of vendor we're talking about, it | might be that such aspects aren't even part of what makes | them competitive. The average user is not going to know | about these types of issues, and so they're not even | going to consider such issues when evaluating the vendor. | ink_13 wrote: | Just about every industrial automation vendor is like this | in my experience. They never upgrade because they don't | want to break anything. | richardwhiuk wrote: | If you do want that, then SCTP will provide it. | [deleted] | jes5199 wrote: | if you turn off Nagle's algorithm, it gets closer to this | though | jfkebwjsbx wrote: | No, it has nothing to do with that. | wahern wrote: | A version of Microsoft Exchange had a bug in its SMTP | implementation that was tickled when lines crossed packet | boundaries. (EDIT: The issue was more likely a bug in | Exchange's TLS record processing, breaking when a logical line | crossed TLS records.) My async SMTP library used a simple fifo | for buffering outbound data which didn't realign the write | pointer to 0 except when it was completely drained, so when | reading slices (iovec's) from the fifo for write-out it would | occasionally call write/send with an incomplete line (i.e. part | of a line that wrapped around from the end of the fifo buffer | array to the front) even if the application had only written | full lines. (At the time it didn't support writev/sendmsg, | though I'm not sure it would have helped as the TLS record | layer might still have been prone to splitting logical lines | across packets.) There was no bug here on my end--everything | would be sent correctly--but you can't tell the customer that | he can't send e-mail to some third-party because that third- | party is using a broken version of Exchange. | | The first quick fix was to unconditionally realign the fifo | contents after every write (the fifo had a realign method), but | that ran into a computational complexity problem when you had | lots of small lines (e.g. the application caller dumped a huge | message into the buffer and then flushed it out in one go) and | a high-latency connection that resulted in many short writes; | you were constantly memmove'ing the megabytes of remaining | contents in the buffer for every tiny write you did. So then I | ended up having to add a new interface to the fifo that | returned a slice up to a limit but always ending with a | specified delimiter (e.g. "\n") if the delimiter was within the | maximum chunk size. | | Of course, none of these fixes would have completely remedied | the issue as lower layers (the TLS stack, the kernel TCP stack) | could have still potentially split logical lines, and I'm sure | did on occasion. But it at least seemed to put us on equal | footing with everybody else in terms of how often it happened, | which is really the best anybody could have done. Complaints | did die down. | outworlder wrote: | > What they mostly should know: TCP provides a bidirectional | stream of bytes on the application level. It does NOT provide a | stream of packets. | | > That means whatever you pass to a send() call is not | necessarily the same amount of data the receiver will observe | in a single read() call. | | Yes, this. For god's sake, listen to them. | | I had to fight a coworker on this. I had quickly created some | client code just to validate that the server was working. Due | to some quirk, all the messages were arriving in full in every | read call. He told me to ship it. | | I said no! "I need to check if there's more data and if so add | a loop to read again" "But it is working, release it". That | went on for a while, to no avail. Wouldn't look at | documentation either. | | Eventually he head to leave for the day, and I took the time to | implement it correctly. | | I started including basic TCP questions on interviews. Not many | people even get past the TCP handshake (if they even know about | that). | scott_s wrote: | The problem here was not a lack of knowledge of a particular | subject. The problem is that this person was unwilling to | learn about a thing they thought they knew. | draw_down wrote: | That's correct. | Ididntdothis wrote: | "But it is working, release it". | | Famous last words :-) | austincheney wrote: | Sounds like how most software handles security until it's | audited. | SilasX wrote: | Stupid question: why would you be writing code that works at | the level of TCP? Don't you usually want to use the OS's (or | some popular library's) TCP software stack? | jfkebwjsbx wrote: | It seems to me GP is talking about using TCP, not | implementing it. | austincheney wrote: | Your terminology is a little off. TCP does not provide anything | for the application layer as it is transport layer. The | application layer rides on top of that. Examples of transport | protocols are TCP and UDP while application protocols are | things like http, ssh, irc, and all those things your | applications use. | | The network layer on which the transport layer rides is packet | switched. The TCP uses segments with each segment having its | own header and sequence numbers. Streams are just a series of | segments populating across a single established handshake | without a prior defined termination segment. | Matthias247 wrote: | I didn't meant to talk about OSI terminologies. It was more | about: [user-space] applications which use the TCP/IP stack | do not observe packet boundaries, whereas the Kernel | certainly does. Obviously this is a bit ambiguous, and you | can even get packet boundaries in user-space by running a TCP | stack there. But for most TCP/IP usages it holds true. | austincheney wrote: | > It was more about: [user-space] applications which use | the TCP/IP stack do not observe packet boundaries | | That is still a bit imprecise. Userland applications won't | directly see TCP as they are just looking at an application | protocol. Typically it's the OS that packages and unpacks | the application protocol data into a TCP segment, so of | course the userland application won't see it since its not | managing that part of the communication. | | https://en.wikipedia.org/wiki/Transmission_Control_Protocol | #... | | There are some exceptions where some application platforms | allow developers to write custom TCP protocols, such as | Node.js, but these exceptions generally apply to network | services and don't commonly apply to the end user | application experiance. | | https://nodejs.org/dist/latest-v14.x/docs/api/net.html#net_ | n... | twotwotwo wrote: | Yeah. Fun problem for beginners, because 1) your incorrect code | may work for a while when reads/writes are small or it's only | run on a local network or such, 2) you might design a broken | _protocol_ if you don 't understand fragmentation, etc., which | will tend to be harder than (say) an isolated client bug to | fix, 3) the implementation-dependent nature of fragmentation | can make it look like you hit a language/library/OS issue, 4) | your language/library may or may not offer tools to help a | beginner to implement a delimited or framed wire format | properly (ideally with things like record-size limits and | timeouts). | | Not sure it says anything you haven't, but a StackOverflow | answer on fragmentation (framed by asker as Go not behaving | like C) is one of the more-read ones I've written: | https://stackoverflow.com/questions/26999615/go-tcp-read-is-... | Unklejoe wrote: | > stream of bytes | | I've always wondered: What's the best/defacto way to delimit | this back into packets at the application level on the | receiving end? | | I would think the obvious approach would be to insert some | magic word into the stream so that you can re-sync. | | Or is this not an issue since you know that once you're | connected, you'll never drop a single byte, therefore, the only | way to get out of sync would be a program error? | mytailorisrich wrote: | The standard way is to include explicit information on the | length of the message that is following. | | For example if the message is x bytes long then you first | send 'x' then you send the x bytes of the message. | | Or your messages have a defined header that contains the | length of the message payload. | jstanley wrote: | You will never drop a single byte. | | If you need some packet-oriented messaging, you could use | something like http://jsonlines.org/ (i.e. JSON messages | separated by newline characters), or | https://github.com/protocolbuffers/protobuf if it's more | performance-critical. | timeinput wrote: | Protobuf isn't self delimiting so you still have to have | some extra packet wrapper around it to say the length. | | I like zeromq to get to a packet based system. | genpfault wrote: | Netstrings[1] :) | | [1]: https://en.wikipedia.org/wiki/Netstring | vasilvv wrote: | It will never get out-of-sync because TCP guarantees that the | bytes will be delivered in the same order they've arrived. | | The best approach is typically put a length in front of every | message. The good things about that approach are: | | 1. The receiver can allocate buffer that is exactly the size | it needs to fit the message. 2. The receiver can check | whether the message is too long before seeing the entire | message. | | The only disadvantage is that you have to know the length of | all messages in advance. | fenwick67 wrote: | This probably bites lots of newbies, since when you're just | sending traffic over localhost, the send()s and read()s tend to | line up. | yjftsjthsd-h wrote: | I have often wished for an "unhelpful testing environment" of | sorts, to deal with these things before they get out of hand. | It would feature a compiler that had creatively different | interpretations of undefined behaviors, randomly compile | against glibc and musl, have a base OS lovingly crafted from | Ubuntu, but with most coreutils replaced with busybox and/or | BSD versions. And, now, I suppose, it would have a customized | network stack (kernel module?) that would randomly | reorder/drop/duplicate packets, randomly reselect MTU on | every boot, or maybe just randomly fragments things | regardless of MTU. Ideally it would come with a FAQ of "my | program broke on X; what did I do wrong?". | | The idea being that if your software is actually written to | relevant standards, and actually handles things properly | outside the golden path, then it should still work fine. If, | however, you accidentally did something implementation- | defined, or that only worked by coincidence, this system | _will_ break it. | jeroenhd wrote: | There are tools that intentionally insert failures into the | network streams of applications. A few of them are | described here: https://medium.com/@docler/network-issues- | simulation-how-to-... | | The other linking/OS problems can probably be automated | with some simple integration tests and a bunch of different | docker containers to compile the code in. Should be | possible to squeeze it into a CI/CD flow somewhere with | some clever tricks. | Matthias247 wrote: | I created such an environment for my unit-tests: Wrapping | TCP sockets in a stream which only accepts 1 byte at a time | in both directions and returns EAGAIN on every second read | provides an easy way to make sure the code on top of the | socket does perform all the correct retries. | | That will most likely not help newcomers which directly | write their code agains the OS socket. But once you get a | better understanding of the topic and start adding tests to | your codebase it's rather easy to add. | brlewis wrote: | For me, at least in this decade, it would have been better if I | didn't know that. I put off learning websockets longer than I | should have because I don't find packet boundaries fun to deal | with, and my interest in websockets was mainly for fun. Then | when I finally picked websockets up I was pleasantly surprised | that message framing is built in. | ex3ndr wrote: | The biggest issue with TCP is that it can randomly freeze and you | have to restart it in pretty much any network. You CAN NOT rely | on socket closing on any side, you have to maintain connection by | yourself. | | I am super puzzled why something like websockets not solving this | problem, simple heartbeat could solve the problem, but no one | implements it. | gsich wrote: | You can use keepalives at the protocol (TCP) level. | dblohm7 wrote: | This reminds me of an issue I had to debug over a decade ago. Our | product had its own protocol written atop TCP, but its handshake | was written in a way such that it was much slower than it should | have been due to delays caused by the Nagle algorithm. | | Turning on TCP_NODELAY was a quick-n-dirty fix, but the real fix | was to rewrite the handshake to be more compatible with the inner | workings of TCP. | resca79 wrote: | I loved this area when I was at university. At the end of | Computer Networking course I brought a project on based on | https://www.isi.edu/nsnam/ns/ | | It was really fun expecially because it allows you to understand | better all networking layers. | | I did some tests about network topology to minimize lost tcp | packs as possible, given different network traffics | vinay_ys wrote: | Single biggest TCP issue I have had to debug and fix numerous | times is about not doing connection reuse properly leading to tcp | port exhaustion and causing seemly random delays causing timeout | failures at higher level protocols, usually http. This one single | issue has taken down multi-billion dollar production systems. | | So, I hope people learn to check their http client/server | implementations to have proper connection handling. Client should | have a thoughtfully sized bounded connection pool with reasonably | large idle timeout. It shouldn't close the connection after every | application request (say, http request). There shouldn't be | sockets in TIME_WAIT state accumulating at the client end. | | Server should accept thoughtfully limited number of connections | per client. Server should never close the connection except when | it is shutting down. | | There should be tcp keepalive messages to keep the connection | alive with intermediate hop stateful firewalls (connection | tracking table entries in firewalls expire when the connection is | idle for too long) and to detect stale connections and re- | establish them. | | All of these things can be verified by analyzing at a packet | capture. You can get a manageable sized pcap file by filtering on | client/server ip/port-range pairs for at least 330 seconds. | | Knowing tools to understand/debug tcp issues is an essential | skill. sock stat command - ss, wireshark/tshark with Lua | scripting is super useful. Knowing higher level application | protocols like TLS and http is essential too. | bsamuels wrote: | Why doesn't the congestion control part of TCP prevent buffer | bloat[1]? Is it because ISP throttling of the internet connection | doesn't touch the TCP packets themselves? | | I recently started doing off-site backups, which requires my | entire internet uplink to be used for uploading said backups for | about a week at a time. The internet basically becomes unusable | because all the packets end up in a buffer on the router and | latency spikes to 5000ms. | | [1] | https://www.bufferbloat.net/projects/bloat/wiki/What_can_I_d... | the8472 wrote: | > Why doesn't the congestion control part of TCP prevent buffer | bloat[1]? | | It can. Enable BBR + fq/fq_codel on the box in question and | CAKE on your router. | milesvp wrote: | This is a fundamental problem on the internet. Ram is so cheap | that every device has too big buffers that don't allow for | proper TCP back pressure. Eric Raymond gave a talk on this a | few years ago. He was going to distribute a lot of small | embedded devices around the world to measure this to try to | address it. I'm curious what happened to that effort. | jeffbee wrote: | If there is a huge FIFO queue on your router, the rate-finding | algorithms associated with TCP will be forced to conclude that | the RTT to your site is enormous. They may try to open the | window to compensate, but here's a fun fact: most operating | system default settings are insufficient to utilize very high | bandwidth-delay products. If you want to send a 1gbps flow | across an 80ms distance on Linux, you'll need to change some | parameters with sysctl before it will work. If your apparent | RTT is 5000ms, the flow you can get will be reduced in | proportion. | | In any case, the solution to bufferbloat is queue discipline, | not congestion control. | jfkebwjsbx wrote: | Up to what speeds/latencies are the default sysctl parameters | alright? Is there any easy way to know whether you are | getting hit by this? Nowadays many people is getting 1 Gbps | links at home! | | What do you mean by queue discipline? | jeffbee wrote: | You know, the worst part is that Linux sets the maximum | receive window size at boot time depending on how much | memory the system contains, ensuring that it's never quite | right. On this machine, with 32GB of main memory, it | defaults to 6291456 bytes. | jfkebwjsbx wrote: | I see, thanks! | | What about the queue discipline? | jeffbee wrote: | If you face a choice of what frame to put on the wire at | any moment, the queue discipline makes that choice. The | easiest policy is to simply send the oldest frame, but | this is also the worst policy. | jfkebwjsbx wrote: | Ah, so the eviction/priority algorithm. Thanks! | vasilvv wrote: | Most of the common TCP congestion control algorithms (Reno, | Cubic) are loss-based: they try to send more and more data | until the link no longer can buffer all of the packets, and | drops some of them. Naturally, this approach requires the | buffer to fill up, causing the latency to spike. | | There are algorithms that try to use increased delay as a | signal that the link is full. This approach has multiple | problems, one of which is that delay can be really noisy on | wireless networks; another is that if you have a loss-based and | a delay-based connection sharing the same link, the delay-based | one will get much less than a fair share of its bandwidth. | People have been trying to make an algorithm that both coexists | with Reno/CUBIC and does not induce bufferbloat for the last 25 | years or so, and there's been some progress, but none of it has | reached the point where it could be used as a default | congestion control for all operating systems. | | The problem of "I have files to transfer in background, but I | want my connection to yield to more important traffic" can | actually solved using a special congestion control algorithm | called LEDBAT [1]; it's used by Apple for things like software | updates, and BitTorrent uses it too. Unfortunately, I think | only Apple implements it in its TCP stack, so anyone who wants | to do that would have to roll their own thing using UDP. | | [1] https://en.wikipedia.org/wiki/LEDBAT | kqr wrote: | Big buffers that can be filled fast trick congestion control | algorithms into thinking your wire is really fast. The point of | the buffer is to be transparent to the transmitting ends, so | they see the packets going out at lightning speed and assume | it's because they're actually going that fast, and not just | piled into a buffer that fast. | toast0 wrote: | > Why doesn't the congestion control part of TCP prevent buffer | bloat[1]? Is it because ISP throttling of the internet | connection doesn't touch the TCP packets themselves? | | Most of the congestion control algorithms use packet loss as | the only indicator of congestion. In a network with oversized | buffers, congestion will result in delay and not packet loss. | If the delay gets large enough, recieve and congestion windows | will restrict the effective bandwidth, but the latency at that | point is terrible. | | There are some alternate congestion control algorithms which do | use latency as a signal, but they aren't universally available, | and may not be a good fit for all flows. | | For your backup use case, probably the simplest thing is to | reduce your sendbuffers for the backup sender process. Although | allowing packets to drop instead of queue at your router/modem | would really be best, often that's difficult to acheive. | api wrote: | A major reason explicit congestion notification is not used | is firewalls that block anything that isn't bog standard TCP | or UDP. Some even ban odd combinations of flags. There are | enough of these to make ECN useless. | toast0 wrote: | A router that is willing to buffer 5 seconds worth of | packets probably wasn't going to mark for congestion and | drop either. | | Note also, Apple is using MP-TCP and ECN in iOS, and the | world didn't stop. It might not work everywhere, and I | don't praise Apple lightly, but there's a pretty clear path | to using things like this. Send a syn with it enabled, wait | a bit, and send one with it disabled. Keep track of | networks where it doesn't work and stop trying it there. If | you have leverage, yell at people to not do dumb things, | otherwise, let them figure out why expensive things work | better on their competetors' networks. You can't rely on | being able to use these things, but you can use them for | progressive enhancement. ___________________________________________________________________ (page generated 2020-05-15 23:00 UTC)