[HN Gopher] How to build a faster file transfer protocol
       ___________________________________________________________________
        
       How to build a faster file transfer protocol
        
       Author : mcharawi
       Score  : 62 points
       Date   : 2022-03-26 16:44 UTC (6 hours ago)
        
 (HTM) web link (www.trytachyon.com)
 (TXT) w3m dump (www.trytachyon.com)
        
       | kevinherron wrote:
       | > If your internet connection is 1Gps and you are transferring a
       | 10Gb file, it should theoretically take 10 seconds to transfer
       | 
       | Err what? I don't know the this "Gps" unit is, but if it's 1Gbps
       | (gigabit per second), and a 10GB (gigabyte) file, that's not how
       | it works... it would be 80 seconds.
        
         | madsbuch wrote:
         | It should be OK. It says "10Gb" not "10GB", ie. it is a 10 giga
         | _bit_ file. (while it is untraditional to measure file size in
         | bits, it should be perfectly fine)
        
           | [deleted]
        
         | mcharawi wrote:
         | Sorry about the typo-you are right it should be Gbps. As for
         | the transfer time, we are just using bits for the file size to
         | make the mental math easier.
        
       | mypalmike wrote:
       | I worked at a tier 2 ISP about 15 years ago that developed
       | multiple products trying to sell accelerated transfers as a
       | service. They worked similarly to what this article describes.
       | The problem was that there were very few buyers. It's easier to
       | sell transparent acceleration boxes as an appliance, and even
       | then it's very niche.
        
       | metadat wrote:
       | > It took us a little while to build UDT..
       | 
       | > Building this infrastructure took a substantial amount of
       | time..
       | 
       | > If anyone is interested in trying the Tachyon Transfer
       | Algorithm we offer a storage transfer acceleration API like AWS
       | does. Our SDK includes node, c++ and objc and could be used in a
       | wide variety of applications
       | 
       | So it was a lot of effort, and now they're inviting Big-G and
       | Cloudflare to contact them to possibly achieve a paltry 30%-ish
       | speed increase for certain scenarios? Or are they inviting app
       | devs who want faster video uploads to reach out? What is the
       | actual use case where the sometimes 30% improvement matters and
       | actually moves the needle?!
       | 
       | Why hasn't Tachyon been working with their prospective customers
       | and warming them up the whole time, or at least working the
       | social and investor nets and reaching out proactively already?
       | 
       | This strategy is kind of like being a dweeb at a poorly lit
       | school dance and hoping the most popular girl at the dance
       | somehow notices you're wearing shoes that let you float a
       | centimeter in the air. Cool trick, bud.
       | 
       | Presumably it's not a $10/mo service contract. Is this really an
       | effective strategy when building and selling to enterprise these
       | days? To me it sounds like a risky and hard way to make less
       | money than what is possible using tried and true product
       | development strategies. To be fair, I have also made this mistake
       | before. It was embarrassing enough as a solo-founder, and seems
       | less forgivable with larger founding group sizes, because it
       | means more folks agreed to support and follow such a sub-optimal
       | harebrained scheme :)
       | 
       | You all sound like very capable software engineers, and I know
       | it's both fun and satisfying to build and make The Thing.
       | 
       | Good luck, sincerely.
       | 
       | P.s. You may also consider pursuing some of the medium sized
       | targets like Backblaze, Rackspace, Larry Ellisons Oracle OCI, or
       | Microshaft Azure.
       | 
       | (sorry, I couldn't resist having some fun at the end, though the
       | suggestion is real!)
        
       | kkfx wrote:
       | IMVHO the main issue in file transfer today is that in 2022 most
       | people still do not have a public ip (like an IPv6 global ones)
       | so most people still have NAT traversal issues and need to relay
       | on third parties or not-so-performant more or less distributed
       | networks...
       | 
       | The second main issue is that most do not own a personal domain
       | name with a subdomain per personal host (like
       | {desktop,craphone,laptop}.mydomain.tld etc).
       | 
       | Those two issues are so big IMVHO that push all others aside...
        
       | dochtman wrote:
       | Does UDT come with encryption? If so, how does it compare to
       | QUIC?
        
         | mcharawi wrote:
         | The canonical UDT implementation does not come with encryption,
         | however there are some older open source GitHub repos that have
         | attempted to add TLS to UDT. The original author of UDT,
         | Yunhong Gu, has a project called Sector/Sphere that adds some
         | application-level encryption to file transfer if you want to
         | check it out: http://sector.sourceforge.net/. We've added
         | encryption for our algorithm though!
         | 
         | With respect to QUIC, I believe it was designed specifically to
         | reduce the latency of HTTP connections by using multiple UDP
         | flows and building the reliability/ordering guarantees at the
         | application layer.
         | 
         | The problem with getting performance increases out of multiple,
         | distinct traffic flows is that you become more and more unfair
         | to other packet traffic as you increase the number of flows you
         | are using. For example, if you use 9 TCP (or any other AIMD)
         | flows to send a file over some link, and a tenth connection is
         | started, you now are taking up to 90% of the available
         | bandwidth (because AIMD flows are designed to be fair amongst
         | themselves).
        
           | moreati wrote:
           | > AIMD
           | 
           | Additive Increase Multiplactive Decrease (for others
           | wondering)
           | 
           | > a feedback control algorithm best known for its use in TCP
           | congestion control. AIMD combines linear growth of the
           | congestion window when there is no congestion with an
           | exponential reduction when congestion is detected.
           | 
           | -- https://en.wikipedia.org/wiki/Additive_increase/multiplica
           | ti...
        
       | Scaevolus wrote:
       | Always good to see more in this space! Long fat networks (LFNs or
       | "elephants") are everywhere, especially once you start moving
       | data between continents.
       | 
       | I've had success personally with UFTP, but you explicitly set the
       | transmit rate. Don't forget to enable encryption/authentication
       | if you want the downloads to be verified! You'll get silent UDP
       | corruption otherwise: http://uftp-multicast.sourceforge.net/
        
       | ac130kz wrote:
       | Some basic transfer based on UDP with forward error correction is
       | a really good solution to tackle packet loss and avoid TCP
       | congestion entirely.
        
         | mcharawi wrote:
         | So congestion and packet loss are different problems; it is
         | true that forward error correction could be a good way to avoid
         | retransmitting lost packets, but the only way to avoid
         | congestion is to adjust the congestion window (for window based
         | congestion control) or packet sending rate (for rate based
         | congestion control) based on some indicator of congestion.
        
       | mcharawi wrote:
       | Hey HN! I'm Mahamad, co-founder of Tachyon Transfer, where we're
       | building faster file transfer tools for developers. We've spent
       | the last year building an ultra-fast FTP replacement, and we
       | thought we'd show you guys what our technical process was like.
       | Let me know if you have any questions!
        
         | KennyBlanken wrote:
         | Please show performance tests versus hpn-ssh, GridFTP (aka, the
         | defacto tool of the particle physics and genetics research
         | communities) and simpler systems like wget2's multi-threaded
         | mode.
        
           | eps wrote:
           | Would also be nice to compare against different standard TCP
           | congestion avoidance algs, of which there's plenty.
           | 
           | It is, after all, a _very_ well researched area.
        
         | Bancakes wrote:
         | Can I tunnel this over SSH and use it the same way as faster
         | drop-in replacement for SFTP? (Why not?)
        
           | mcharawi wrote:
           | Standard SSH uses TCP over port 22 by default, so it wouldn't
           | be possible without modifying SSH to use a different
           | protocol. That being said, however, our protocol uses TLS
           | over UDP via the OpenSSL libraries so it is secure by
           | default. We also offer a BSD-style socket interface that you
           | can use if you want a drop in replacement for TCP sockets.
           | Shoot me a note at mahamad _at_ trytachyon _dot_ com if you
           | want to chat!
        
         | [deleted]
        
         | rsync wrote:
         | Is this software that one licenses and uses on any arbitrary
         | network or do you run a network of some kind that users pay to
         | access?
         | 
         | Or both ?
         | 
         | I think this is a software package but the tl;dr doesn't make
         | that clear to me...
        
           | mcharawi wrote:
           | At the moment we offer both options. We offer our own network
           | with a pricing plan similar to massive.io (though 10c per gb
           | vs 25c) Our licensing is cheaper but requires large volumes.
        
         | tener wrote:
         | Can you share some actual performance numbers across whatever
         | are the key metrics that you observe?
        
       | AitchEmArsey wrote:
       | Interesting, but somewhat misses the point; the reason people
       | want an alternative to Aspera is that no-one wants to pay for
       | file transfer tools.
        
         | mcharawi wrote:
         | Thanks for the feedback-we're actually planning to open-source
         | a version of our work that significantly improves on the
         | original UDT project: https://udt.sourceforge.io/.
        
           | AitchEmArsey wrote:
           | Look forward to it. I'd be interested to hear how your tool
           | compares with Facebook WDT[1], as that would be my go-to
           | right now if someone asked me for a fast point-to-point data
           | transfer solution.
           | 
           | [1] https://github.com/facebook/wdt
        
       | amaccuish wrote:
       | Never understood, once SMB gets going it's pretty fast, but it
       | takes agessss to list a directory. Like why can't it just pipe
       | the output of dir() or ls (when samba) out over the network.
        
       | rsync wrote:
       | I actually read the entire article and was specifically looking
       | for a reference to hpn-ssh which I think is the most standard way
       | to approach this ... can op comment here on that tooling and how
       | that compares and contrasts ?
        
         | mcharawi wrote:
         | Thanks for reading!
         | 
         | I haven't seen hpn-ssh before, but from a cursory look at the
         | project page it looks like the main improvements are targeted
         | at improving the speed of the encryption using multi-threading,
         | and increasing ssh/scp buffer sizes. These are certainly good
         | improvements over standard ssh/scp (and setting TCP buffers to
         | the value of the bandwidth delay product for a particular
         | network path is a well known way to squeeze some perf out of
         | TCP) but do not address the root cause of slowdown in window-
         | based, loss-based congestion control.
         | 
         | In order to be fair to other flows, exponential back-off is
         | required on detection of congestion, and packet loss as an
         | indicator of congestion is both a lagging indicator of
         | congestion and has a very low signal to noise ratio on high
         | throughput, lossy networks.
        
           | KennyBlanken wrote:
           | hpn-ssh is specifically designed for high latency, high
           | bandwidth file transfer and is more than just "big buffers
           | and multi-threaded." And the question remains: how does your
           | solution compare in simulated and real-world testing?
           | 
           | It's a little strange that you "conducted an extensive
           | literature review" of congestion algorithms but you aren't
           | aware of basic common tools like hpn-ssh, wget2's
           | multithreading mode, or GridFTP which is used extensively in
           | particle physics and genetics research communities.
        
             | mcharawi wrote:
             | Thanks for the feedback. The file transfer ecosystem is
             | very large and conducting a through review of the
             | application level tools was not the goal of this project,
             | as the overwhelming majority of them focus on differences
             | at the application layer, not the transport layer.
             | 
             | We are specifically focusing on rebuilding a congestion
             | control algorithm from the ground up that can better
             | tolerate modern network conditions, including things like
             | high bandwidth, high packet loss, and high latency.
             | 
             | With respect to Grid-FTP, wget2 multi-threading, and other
             | multi-flow approaches: the problem with getting performance
             | increases out of multiple, distinct traffic flows is that
             | you become more and more unfair to other packet traffic as
             | you increase the number of flows you are using. For
             | example, if you use 9 TCP (or any other AIMD) flows to send
             | a file over some link, and a tenth connection is started,
             | you now are taking up to 90% of the available bandwidth
             | (because AIMD flows are designed to be fair amongst
             | themselves).
        
       | fn-mote wrote:
       | This article was interesting and also frustrating to read.
       | 
       | 1. There are very few numbers. In particular, improvement in
       | performance under various circumstances is _not_ given! If you
       | dig around you can find their transfer time application [1], but
       | there is no discussion on that page.
       | 
       | 2. The basis for the improvement is not spelled out. (References
       | are given, but you have to know the field - "acronyms only".) If
       | I understand correctly, their contribution is the improved
       | measures of congestion used. Their landing page just touts "don't
       | use TCP"... which sounds like Step 0 of a very long process.
       | 
       | I admit, the title is basically accurate: "how to build" not "the
       | performance of".
       | 
       | tl;dr: Start with existing work, simulate and improve
       | incrementally.
       | 
       | I don't know anything about the field, but this article didn't
       | lead me to understand any better. I'd love to know the real
       | numbers they observed, which approaches didn't pan out, are they
       | effectively using an error correcting code?
       | 
       | Anyway, it's certainly not an academic paper - just an
       | advertisement.
       | 
       | [1] https://www.trytachyon.com/file-transfer-calculator
        
         | mcharawi wrote:
         | Thanks for taking the time to read it! To address your
         | concerns:
         | 
         | 1. To give you an idea of the speed improvements, we
         | transferred a 2GB file between Ohio and Singapore on AWS and
         | were able to transfer it in 0:26 (seconds) using our protocol,
         | vs 2:15 for SCP.
         | 
         | 2. The basis for improvement is taking into account the changes
         | in round-trip-time for a particular network path; these
         | temporary increases are used as the primary congestion signal.
         | 
         | We are not using error correcting codes, which are good for
         | preventing the retransmission of packets but do not address the
         | underlying problem of avoiding congestion in a network.
        
           | koprulusector wrote:
           | Can I ask a dumb question? Why SCP and not rsync?
        
           | Straw wrote:
           | How much was SCP affected by TCP buffer size tuning?
        
         | amelius wrote:
         | I just tried the calculator. It seems that if you're in "US
         | Metro" or "Europe", then the transfer protocol is just as fast
         | as TCP, is this correct? I wonder why this is the case. Is it
         | because the routers play more fairly?
        
           | jandrese wrote:
           | I would expect it means your service provider isn't dropping
           | packets. Their protocol seems to just be more aggressive
           | about not backing off in the face of packet loss, which is
           | helpful if one of your links is a marginal radio connection.
           | 
           | The cynic in me thinks they achieve better throughput because
           | they don't play nice with TCP and monopolize the link while
           | everybody else gets backed off.
        
       ___________________________________________________________________
       (page generated 2022-03-26 23:01 UTC)