[HN Gopher] Achieveing reliable UDP transmission at 10 Gb/s [pdf... ___________________________________________________________________ Achieveing reliable UDP transmission at 10 Gb/s [pdf] (2017) Author : pmoriarty Score : 92 points Date : 2020-04-19 17:48 UTC (5 hours ago) (HTM) web link (arxiv.org) (TXT) w3m dump (arxiv.org) | Matthias247 wrote: | Reading through the paper I can't see what the authors mean with | "reliable transmission" there, and how they achieve it. | | I only see them referencing having increased socket buffers, | which then lead - in combination with the available (and non- | congested) network bandwidth and their app sending behavior - to | no transmission errors. As soon as you change any of those | parameters it seems like the system would break down, and they | have absolutely no measures in place to "make it reliable". | | The right answer still seems: Implement a congestion controller, | retransmits, etc. - which essentially ends up in implementing | TCP/SCTP/QUIC/etc | rubatuga wrote: | They want reliable UDP, not TCP. They state that very clearly. | zamadatix wrote: | Yes but they didn't do anything to make UDP reliable they | just said in our test scenario we didn't notice any loss at | the application layer after increasing the socket receive | buffer and called it a day because elsewhere in the paper | they noted "For some detector readout it is not even evident | that guaranteed delivery is necessary. In one detector | prototype we discarded around 24% of the data due to | threshold suppression, so spending extra time making an | occasional retransmission may not be worth the added | complexity." | | I think the paper meant "reliable" in a different way than | most would take "reliable" to mean on a paper about | networking similar to if someone created a paper about | "Achieving an asynchronous database for timekeeping" and | spent a lot of time talking about databases in the paper but | it turns out by "asynchronous" they meant you could enter | your hours at the end of the week rather than the moment you | walked in/out of the door. | bcoates wrote: | Having end-to-end control of their topology in production is | the measure they're using to make it reliable. Since they're | saturating the link the receiver parameters are reasonably | robust, the sender physically cannot burst any faster and | overrun the receiver. | | Retransmit-based systems are probably unusable in this | application, even over the short hop the bandwidth-delay | product is probably much bigger than the buffer on the sensor. | The only case where retransmit would be happen is receiver | buffer overflow, which is catastrophic: the retransmit would | cause even more overflow. | | If you had to fix random packet loss in a system like this you | wouldn't want to use retransmission, you'd need to do FEC. | aDfbrtVt wrote: | EPON already includes a RS(255,223) ECC scheme as part of the | standard. | tomohawk wrote: | If you have a very low error rate line, the main point at which | packet loss will occur for UDP is on the receiving system. If | the receive buffer size is not large enough, it is possible | that it can get filled up while the receiving app is doing | other things, and then packets will be dropped. | ignoramous wrote: | > (abstract) _Optimizations for throughput are: MTU, packet | sizes, tuning Linux kernel parameters, thread affinity, core | locality and efficient timers._ | | Cloudflare's u/majke shared a series of articles on a similar | topic [0][1][2] (with focus on achieving line-rate with higher | packets-per-second and lower latency instead of throughput) that | I found super helpful especially since they are so very thorough | [3]. | | Speaking of throughput, u/drewg123 wrote an article on how | Netflix does 100gbps _with_ FreeBSD 's network stack [4] and | here's BBC on how they do so by _bypassing_ Linux 's network | stack [5]. | | --- | | [0] https://news.ycombinator.com/item?id=10763323 | | [1] https://news.ycombinator.com/item?id=12404137 | | [2] https://news.ycombinator.com/item?id=17063816 | | [3] https://news.ycombinator.com/item?id=12408672 | | [4] https://news.ycombinator.com/item?id=15367421 | | [5] https://news.ycombinator.com/item?id=16986100 | a_t48 wrote: | I wish I had seen this at my last job. This is something I had to | set up and it was painful - lots of trial and error. | snisarenko wrote: | Optimizing UDP transmission over internet is an interesting | topic. | | I remember reading a paper a while ago that showed that if you | send two consecutive UDP packets with exact same data over the | internet, at least 1 of them will arrive to the destination at | pretty high success rate (something like 99.99%) | | I wonder if this still works with current internet | infrastructure, and if this trick is still used in real-time | streaming protocols. | wmf wrote: | So basically rate 1/2 FEC. | [deleted] | zamadatix wrote: | 99.99% for two tries would be a 1% drop chance which I'd say is | pretty lenient - we average better than that on our sites | running off 4G (jitter is horrible though and that will kill | any real-time protocols without huge delays added). | | Generally you'd just implement a more generic FEC algorithm | though unless you had 2 separate paths you wanted to try (e.g. | race a cable modem and 4G with every packet and if one side | drops it hope the other side still finishes the race) as there | are FEC options that allow non integer redundancy levels and | can reduce header overhead compared to sending multiple copies | of small packets. | syrrim wrote: | >99.99% for two tries would be a 1% drop chance | | Not per se. The drop chance for consecutive packets is likely | correlated, such that if you know the first one was dropped | you should increase your prior that the second one will also | be dropped. | zamadatix wrote: | Depends on the cause and root question. For instance in the | most common scenario of congestion routers do intelligent | random drops with increasing probability as the buffer gets | more full | https://en.wikipedia.org/wiki/Random_early_detection. The | internet actually relies on this random low drop chance to | make things work smoothly rather than waiting til things | are failing apart to signal to streams to slow down all at | once while it catches up. Same randomness with transmission | bit errors which will cause drops but the randomness is not | by design as much as by the way noise is what is causing | those. | | On the other hand if the root question is if there is an | outage style issue then yeah if the path to the destination | is having a hard down style issue no number of packets are | going to help because they are all going to drop. Likewise | if the question is "on a short enough time scale is | reliability of delivering a single packet somewhere on the | internet ever less than 99%" then yeah somewhere there is a | failure scenario and if you look at a short enough time | scale any failure scenario can be made to say there is 0% | reliability. | mcguire wrote: | Odds are, at least one of the links between the source and | destination will be shared. If so, sending two packets is an | expensive attempt at reliability; it will cut the bandwidth in | half. Further, one data packet will arrive with a highish | success rate. | tomohawk wrote: | It depends on what the characteristics of the transmission line | are. If it is purely random, that is one thing, but often if | one packet is dropped or smashed, there is a higher probability | of following ones to meet the same fate. For example, if the | transmission is over a microwave link, it is easy to see how | something could cause a few thousand packets in a row to go | missing. | ignoramous wrote: | u/noselasd: | | > _Also keep in mind this note:http://technet.microsoft.com/en- | us/library/cc940021.aspx _ | | > _Basically, if you send() 2 or more UDP datagrams in quick | succession, and the OS has to resolve the destination with ARP, | all but the 1 packet is dropped until you get an ARP reply | (this behavior isn 't entirely unique to windows, btw)._ | | https://news.ycombinator.com/item?id=8468313 | zamadatix wrote: | "In a readout system such as ours the network only consists of a | data sender and a data receiver with an optional switch | connecting them. Thus the only places where congestion occurs are | at the sender or receiver. The readout system will typically | produce data at near constant rates during measurements so | congestion at the receiver will result in reduced data rates by | the transmitter when using TCP." | | At that point a better paper title would have been "Increasing | buffers or optimizing application syscalls to receive 10 GB/s of | data" as it has nothing to do with achieving reliable UDP | transmission, which it doesn't even seem they needed: | | "For some detector readout it is not even evident that guaranteed | delivery is necessary. In one detector prototype we discarded | around 24% of the data due to threshold suppression, so spending | extra time making an occasional retransmission may not be worth | the added complexity" | | As far as actual reliable UDP testing at high speeds one might | also want to consider the test scenario as not all Ethernet | connections are equal. The 2 meter passive DACs used in this | probably achieve ~10^-18 bit error rate (BER) or 1 bit error in | every ~100 petabytes transferred. On the other hand go optical | even with forward error correction (FEC) it's not uncommon to | expect transmission loss in the real world. E.g. looking at | something a little more current | https://blogs.cisco.com/sp/transforming-enterprise-applicati... | is happy to call 10^-12 with FEC "traditionally considered to be | 'error free'" which would have likely resulted in lost packets | even in this 400 GB transfer test (though again they were fine | with up to 24% loss in some cases so I don't think they were | worried about reliable as much as reading the paper title would | suggest). | | Generally if you have any of these: 1) unknown congestion 2) | unknown speed 3) unknown tolerance for error | | You'll have to do something that eats CPU time and massive | amounts of buffers for reliability. If you need the best | reliability you can get but you don't have the luxury of | retransmitting for whatever reason then as much error correction | in the upper level protocol as you can afford from a CPU | perspective is your best bet. | | If you want to see a modern take on achieving reliable | transmission over UDP check out HTTP/3. | aDfbrtVt wrote: | Traditional error free transmission in optical comms is 1E-15 | BER. I can't access the EPON standard right now, but my | experience with other IEEE standards would tell me they're | probably guaranteeing 1E-15 for worst-case optical link. This | link is pretty close to optimal, so 400G of data is nowhere the | amount to say anything with certainty about the BER of the | channel. | zamadatix wrote: | IEEE only guarantees 10^-12 which is almost certainly why 1st | gen 25G products released exactly when they were able to hit | that. My estimate a 2m 10G DAC from 2017 would have a BER of | ~10^-18 is from personal experience (As unlikely as it sounds | I actually have done extensive testing 7 of the exact model | server and NIC in our lab purchased about the same time, | different switch though) not derived from the 400 GB | transfers in the paper. | ignoramous wrote: | > _Generally if you have any of these: 1) unknown congestion 2) | unknown speed 3) unknown tolerance for error_ | | > ... _If you want to see a modern take on achieving reliable | transmission over UDP check out HTTP /3._ | | Not an expert but I have seen folks here complain that QUIC / | HTTP3 doesn't have a proper congestion control like uTP | (BitTorrent over UDP) does with LEDBAT: | https://news.ycombinator.com/item?id=10546651 | wmf wrote: | LEDBAT-style congestion control is not proper for | "foreground" Web traffic and it will result in lower | performance than TCP-based HTTP. Fixing bufferbloat is an | ongoing project and it isn't fair to blame QUIC for being no | worse than TCP. | mynegation wrote: | Relevant discussion on HN from 4 months ago of IBM's proprietary | large data transfer tool: | https://news.ycombinator.com/item?id=21898072 | [deleted] | exdsq wrote: | Can you do something similar with TCP and increase the packet | size such that the "TCP Overhead" is reduced compared to 64 byte | payloads but with the increased reliability over UDP? | zamadatix wrote: | MTU is maximum transmission unit so increasing that does | nothing about making 64 byte packets more efficient. You should | try to send as much data as you can in one go and the socket | will automatically figure out how to split that up the best it | can. By default most systems default to a 1500 byte MTU so the | OS will chunk it up to fit in multiple 1500 byte packets. The | OS will usually try to optimize a send of a bunch of small | payloads in one larger packet as well via e.g. | https://en.wikipedia.org/wiki/Nagle%27s_algorithm but that's | not guaranteed and much more CPU inefficient even when it does | work. | | 99% of the time you are transferring data you don't need to | think this deep into networking though. E.g. I have the exact | same DL360 Gen9 servers with the same 10G NICs in my lab and | 10G TCP streams run just fine on them without manual tweaking. | Setting MTU to 9000 does make it more efficient but that's | about as far as I'd go without a particularly strong driver to | optimize (e.g. "We've got 2,000 of these servers and if we | could get by with 5% fewer it'd save your yearly salary" kind | of things). | toast0 wrote: | In the system proposed, not really. | | To use TCP instead of UDP there are two big problems: | | 1) the sensor device would need to keep unacknowledged data in | memory, but it may not have enough memory for that | | 2) if they're running at line rate (max bandwidth in this case) | in UDP, there's no bandwidth left to retransmit data | | All of the buffer manipulation is going to be more CPU | intensive on both sides as well, and you'd run into congestion | control limiting the data rate in the early part of the capture | as well. | | For a system like this, while UDP doesn't guarantee | reliability, careful network setup (either sensor direct to | recorder, or on a dedicated network with sufficient capacity | and no outside traffic) in combination with careful software | setup allows for a very low probability of lost packets dispite | no ability to retransmit. | fulafel wrote: | This would be interesting to try on today's faster ethernet | speeds, wonder how it goes at 100G. | otterley wrote: | (2017) | rubatuga wrote: | TLDR: sysctl -w net.core.rmem_max=12582912 | sysctl -w net.core.wmem_max=12582912 sysctl -w | net.core.netdev_max_backlog=5000 ifconfig eno49 mtu 9000 | txqueuelen 10000 up ___________________________________________________________________ (page generated 2020-04-19 23:00 UTC)