[HN Gopher] Unix Domain Sockets vs Loopback TCP Sockets (2014)
       ___________________________________________________________________
        
       Unix Domain Sockets vs Loopback TCP Sockets (2014)
        
       Author : e12e
       Score  : 112 points
       Date   : 2023-09-11 12:51 UTC (9 hours ago)
        
 (HTM) web link (nicisdigital.wordpress.com)
 (TXT) w3m dump (nicisdigital.wordpress.com)
        
       | c7DJTLrn wrote:
       | A lot of modern software disregards the existence of unix
       | sockets, probably because TCP sockets are an OS agnostic concept
       | and perform well enough. You'd need to write Windows-specific
       | code to handle named pipes if you didn't want to use TCP sockets.
        
         | giovannibonetti wrote:
         | I imagine there should be some OS-agnostic libraries somewhere
         | that handle it and provide the developer a unified interface.
        
         | eptcyka wrote:
         | Yes, but there's NamedPipes and they can be used the same way
         | on Windows. And Windows also supports UDS as well today. It's
         | no excuse.
        
         | nsteel wrote:
         | Going forward, hopefully modern software will use the modern
         | approach of AF_UNIX sockets in Windows 10 and above:
         | https://devblogs.microsoft.com/commandline/af_unix-comes-to-...
         | 
         | EDIT: And it would be interesting for someone to reproduce a
         | benchmark like this on Windows to compare TCP loopback and the
         | new(ish) unix socket support.
        
           | [deleted]
        
         | zamadatix wrote:
         | A couple of years after this article came out Windows added
         | support for SOCK_STREM Unix sockets.
        
         | rnmmrnm wrote:
         | windows is exactly the reason they didn't prevail imo. Windows
         | named pipes have weird security caveats and are not really
         | supported in high level languages. I think this lead everyone
         | to just using loopback TCP as the portable IPC communication
         | API instead of going with unix sockets.
        
           | duped wrote:
           | IME a lot of developers have never even heard of address
           | families and treat "socket" as synonymous with TCP (or
           | possibly, but rarely, UDP).
        
           | [deleted]
        
         | jjice wrote:
         | Windows actually added Unix sockets about six years ago, and
         | with how aggressive Microsoft EOLs older versions of their OS
         | (relative to something like enterprise linux at least), it's
         | probably a pretty safe bet to use at this point.
         | 
         | https://devblogs.microsoft.com/commandline/af_unix-comes-to-...
        
           | c7DJTLrn wrote:
           | Interesting, thanks.
        
           | Aachen wrote:
           | With how aggressively Microsoft EOLs older versions of their
           | OS, we're still finding decades-old server and client systems
           | at clients.
           | 
           | While Server 2003 is getting more rare and the last sighting
           | of Windows 98/2000 has been a while, they're all running at
           | the very least a few months after the last free security
           | support is gone. But whether that's something you want to
           | support as a developer is your choice to make.
        
             | marcosdumay wrote:
             | That's not very relevant.
             | 
             | If you start developing a new software today, it won't need
             | to run on those computers. And if it's old enough that it
             | need to, you can bet all of those architectural decisions
             | were already made and written into stone all over the
             | place.
        
               | johnmaguire wrote:
               | > If you start developing a new software today, it won't
               | need to run on those computers.
               | 
               | This is a weird argument to make.
               | 
               | For context, I work on mesh overlay VPNs at Defined.net.
               | We initially used Unix domain sockets for our daemon-
               | client control model. This supported Windows 10 / Server
               | 2019+.
               | 
               | We very quickly found our users needed support for Server
               | 2016. Some are even still running 2012.
               | 
               | Ultimately, as a software vendor, we can't just force
               | customers to upgrade their datacenters.
        
               | foobiekr wrote:
               | It's actually the opposite of Microsoft quickly eoling on
               | the server side. Server 2012 was EVERYWHERE as late as
               | 2018-2019. They were still issuing service packs in 2018.
        
       | rollcat wrote:
       | I'd be more interested in the security and usability aspect.
       | Loopback sockets (assuming you don't accidentally bind to
       | 0.0.0.0, which would make it even worse) are effectively rwx to
       | any process on the same machine that has the permission to open
       | network connections, unless you bother with setting up a local
       | firewall (which requires admin privileges). On top of that you
       | need to figure out which port is free to bind to, and have a
       | backup plan in case the port isn't free.
       | 
       | Domain sockets are simpler in both aspects: you can create one in
       | any suitable directory, give it an arbitrary name, chmod it to
       | control access, etc.
        
       | spacechild1 wrote:
       | > Two communicating processes on a single machine have a few
       | options
       | 
       | Curiously, the article does not even mention pipes, which I would
       | assume to be the most obvious solution for this task (but not
       | necessarily the best, of course!)
       | 
       | In particular, I am wondering how Unix domain sockets compare to
       | (a pair of) pipes. At first glance, they appear to be very
       | similar. What are the trade-offs?
        
         | tptacek wrote:
         | The pipe vs. socket perf debate is a very old one. Sockets are
         | more flexible and tunable, which may net you better performance
         | (for instance, by tweaking buffer sizes), but my guess is that
         | the high order bit of how a UDS and a pipe perform are the
         | same.
         | 
         | Using pipes instead of a UDS:
         | 
         | * Requires managing an extra set of file descriptors to get
         | bidirectionality
         | 
         | * Requires processes to be related
         | 
         | * Surrenders socket features like file descriptor passing
         | 
         | * Is more fiddly than the socket code, which can often be
         | interchangeable with TCP sockets (see, for instant, the Go
         | standard library)
         | 
         | If you're sticking with Linux, I can't personally see a reason
         | ever to prefer pipes. A UDS is probably the best default answer
         | for generic IPC on Linux.
        
         | xuhu wrote:
         | With pipes, the sender has to add a SIGPIPE handler which is
         | not trivial to do if it's a library doing the send/recv. With
         | sockets it can use send(fd, buf, MSG_NOSIGNAL) instead.
        
       | badrabbit wrote:
       | Why not UDP? Less overhead and you can use multicast to expand
       | messaging to machines in a lan. TCP on localhost makes little
       | sense, especially when simple ack's can be implemented in UDP.
       | 
       | But even then, I wonder how the segmentation in TCP is affecting
       | performance in addition to windowing.
       | 
       | Another thing I always wanted to try was using raw IP packets,
       | why not? Just sequence requests and let the sender close a send
       | transaction only when it gets an ack packet with the sequence #
       | for each send. Even better, a raw AF_PACKET socket on the
       | loopback interface! That might beat UDS!
        
         | sophacles wrote:
         | Give it a try and find out! I'd give that blog post a read.
         | 
         | I suspect you'd run into all sorts of interesting issues...
         | particularly if the server is one process but there are N>1
         | clients and you're using AF_PACKET.
        
       | svanwaa wrote:
       | Would TCP_NODELAY make any difference (good or bad)?
        
       | inv2004 wrote:
       | Would be better to retest
       | 
       | If I remember correct, we had the same results described in
       | article in 2014, but also I remember that linux loopback was
       | optimized after it and different was much smaller if visible
        
       | duped wrote:
       | What's in the way of TCP hitting the same performance as unix
       | sockets, is it just netfilter?
        
         | woodruffw wrote:
         | I believe the conventional wisdom here is that UDS performs
         | better because of fewer context switches and copies between
         | userspace and kernelspace.
        
           | foobiekr wrote:
           | No. This is exactly the same. Think about life of a data gram
           | or stream bytes on the syscall edge for each.
        
             | woodruffw wrote:
             | I'm not sure I understand. This isn't something I haven't
             | thought about in a while, but it's pretty intuitive to me
             | that a loopback TCP connection would pretty much always be
             | slower: each transmission unit goes through the entire TCP
             | stack, feeds into the TCP state machine, etc. Thats more
             | time spent in the kernel.
        
         | foobiekr wrote:
         | The ip stack.
        
         | noselasd wrote:
         | TCP has a lot of rules nailed down in numerous RFCs -
         | everything from how to handle sequence numbers, the 3-way
         | handshake, congestion control, and much more.
         | 
         | That translates into a whole lot of code that needs to run,
         | while unix sockets are not that much more than a kernel buffer
         | and code to copy data back and forth in that buffer - which
         | doesn't need a lot of code to make happen.
        
       | majke wrote:
       | Always use Unix Domain sockets if you can. There are at least
       | three concerns with TCP.
       | 
       | First, local port numbers are a limited resource.
       | 
       | https://blog.cloudflare.com/how-to-stop-running-out-of-ephem...
       | https://blog.cloudflare.com/the-quantum-state-of-a-tcp-port/
       | https://blog.cloudflare.com/this-is-strictly-a-violation-of-...
       | 
       | Then the TCP buffer autotune can go berserk:
       | https://blog.cloudflare.com/unbounded-memory-usage-by-tcp-fo...
       | https://blog.cloudflare.com/when-the-window-is-not-fully-ope...
       | 
       | Finally, conntrack. https://blog.cloudflare.com/conntrack-tales-
       | one-thousand-and... https://blog.cloudflare.com/conntrack-turns-
       | a-blind-eye-to-d...
       | 
       | These issues don't exist in Unix Sockets land.
        
         | alexvitkov wrote:
         | If different components of your system are talking over a
         | pretend network you've already architectured yourself face
         | first into a pile of shit. There's no argument for quality
         | either way so I'll just use TCP sockets and save myself 2 hours
         | when I inevitably have to get it running on Windows.
        
           | mhuffman wrote:
           | >If different components of your system are talking over a
           | pretend network you've already architectured yourself face
           | first into a pile of shit.
           | 
           | How do you have your file delivery, database, and business
           | logic "talk" to each other? Everything on the same computer
           | is a "pretend network" to some extent, right? Do you always
           | architect your own database right into your business logic
           | along with a web-server as a single monolith? One off SPAs
           | must take 2-3 months!
        
           | johnmaguire wrote:
           | FYI, Windows supports Unix domain sockets since Windows 10 /
           | Server 2019.
        
             | alexvitkov wrote:
             | Good thing to mention, thanks.
             | 
             | That's mostly why I said 2 hours and not a day, as you
             | still have to deal with paths (there's no /run) and you may
             | have to fickle with UAC or god save us NTFS permissions
        
             | adzm wrote:
             | I had not head of this! Long story short, AF_UNIX now
             | exists for Windows development.
             | 
             | https://devblogs.microsoft.com/commandline/af_unix-comes-
             | to-... https://visualrecode.com/blog/unix-
             | sockets/#:~:text=Unix%20d....
        
         | Karrot_Kream wrote:
         | These matter if you have need to bind to multiple ports, but if
         | you're only running a handful of services that need to bind a
         | socket, then port number allocation isn't a big issue. TCP
         | Buffer autotune having problems also matters at certain scale,
         | but in my experience requires a tipping point. TCP sockets also
         | have configurable buffer sizes while Unix sockets have a fixed
         | buffer size, so TCP socket buffers can get much deeper.
         | 
         | At my last role we benchmarked TCP sockets vs Unix sockets in a
         | variety of scenarios. In our benchmarks, only certain cases
         | benefited from Unix sockets and generally the complexity of
         | using them in containerized environments made them less
         | attractive than TCP unless we needed to talk to a high
         | throughput cache or we were doing things like farming requests
         | out to a FastCGI process manager. Generally speaking, using
         | less chatty protocols than REST (involving a lot less serde
         | overhead and making it easier to allocate ingest structures)
         | made a much bigger difference.
         | 
         | I was actually a huge believer in deferring to Unix sockets
         | where possible, due to blog posts like these and my
         | understanding of the implementation details (I've implemented
         | toy IPC in a toy kernel before), but a coworker challenged me
         | to benchmark my belief. Sure enough on benchmark it turned out
         | that in most cases TCP sockets were fine and simplified a
         | containerized architecture enough that Unix sockets just
         | weren't worth it.
        
           | kelnos wrote:
           | > _the complexity of using [UNIX sockets] in containerized
           | environments made them less attractive than TCP_
           | 
           | Huh, I would think UNIX sockets would be easier; since
           | sharing the socket between the host and a container (or
           | between containers) is as simple as mounting a volume in the
           | container and setting permissions on the socket
           | appropriately.
           | 
           | Using TCP means dealing with iptables and seems... less fun.
           | I easily run into cases where the host's iptables firewall
           | interferes with what Docker wants to do with iptables such
           | that it takes hours just to get simple things working
           | properly.
        
         | lokar wrote:
         | Also UDS have more features, for example you can get the remote
         | peer UID and pass FDs
        
           | the8472 wrote:
           | And SOCK_SEQPACKET which greatly simplifies fd-passing
        
             | chadaustin wrote:
             | How does SOCK_SEQPACKET simplify fd-passing? Writing a
             | streaming IPC crate as we speak and wondering if there are
             | land mines beyond https://gist.github.com/kentonv/bc7592af9
             | 8c68ba2738f44369208...
        
               | the8472 wrote:
               | Well, the kernel does create implicit packetization
               | boundary when you attach FDs to a byte-stream... but this
               | is underdocumented and there's an impedance mismatch
               | between byte streams and discrete application-level
               | messages. You can also send zero-sized messages to pass
               | an FD. with byte streams you must send at least one byte.
               | Which means you can send the FDs separately after sending
               | the bytes which makes it easier to notify the application
               | that it should expect FDs (in case it's not always using
               | recvmsg with an cmsg allocation prepared). SEQPACKET just
               | makes it more straight-forward because 1 message
               | (+ancillary data) is always one sendmsg/recvmsg pair.
        
               | chadaustin wrote:
               | I appreciate your reply!
               | 
               | My approach has been to send a header with the number of
               | fds and bytes the next packet will contain, and the
               | number of payload bytes is naturally never 0 in my case.
        
             | etaham wrote:
             | +1
        
         | booleanbetrayal wrote:
         | We've seen observable performance increases in migrating to
         | unix domain sockets wherever possible, as some TCP stack
         | overhead is bypassed.
        
         | LinAGKar wrote:
         | One problem I've run into when trying to use Unix sockets
         | though is that it can only buffer fairly few messages at once,
         | so if you have a lot of messages in flight at once you can
         | easily end up with sends failing. TCP sockets can handle a lot
         | more messages.
        
           | count wrote:
           | Can't you tune this with sysctl?
        
         | kevincox wrote:
         | The biggest reason for me is that you can use filesystem
         | permissions to control access. Often I want to run a service
         | locally and do auth at the reverse proxy, but if the service
         | binds to localhost then all local processes can access without
         | auth. If I only grant the reverse proxy permissions on the
         | filesystem socket then you can't access without going through
         | the auth.
        
           | piperswe wrote:
           | And with `SO_PEERCRED`, you can even implement more complex
           | transparent authorization & logging based on the uid of the
           | connecting process.
        
             | kevincox wrote:
             | This is true but to me mostly negates the benefit for this
             | use case. The goal is to offload the auth work to the
             | reverse proxy not to add more rules.
             | 
             | Although I guess you could have the reverse proxy listen
             | both on IP and UNIX sockets. It can then do different auth
             | depending on how the connection came in. So you could auth
             | with TLS Cert or Password over IP or using your PID/UNIX
             | account over the UNIX socket.
        
         | o11c wrote:
         | Adjacently, remember that with TCP sockets you _can_ vary the
         | address anywhere within 127.0.0.0 /8
        
           | majke wrote:
           | However this is not the case for ipv6. Technically you can
           | use only ::1, unless you do Ipv6 FREEBIND
        
             | nine_k wrote:
             | You usually have a whole bunch of link-local IPv6
             | addresses. Can't you use them?
        
         | cout wrote:
         | I agree. Always choose unix domain sockets over local TCP if it
         | is an option. There are some valid reasons though to choose
         | TCP.
         | 
         | In the past, I've chosen local TCP sockets because I can
         | configure the receive buffer size to avoid burdening the sender
         | (ideally both TCP and unix domain sockets should correctly
         | handle EAGAIN, but I haven't always had control over the code
         | that does the write). IIRC the max buffer size for unix domain
         | sockets is lower than for TCP.
         | 
         | Another limitation of unix domain sockets is that the size of
         | the path string must be less than PATH_MAX. I've run into this
         | when the only directory I had write access to was already close
         | to the limit. Local TCP sockets obviously do not have this
         | limitation.
         | 
         | Local TCP sockets can also bypass the kernel if you have a
         | user-space TCP stack. I don't know if you can do this with unix
         | domain sockets (I've never tried).
         | 
         | I can also use local tcp for websockets. I have no idea if
         | that's possible with unix domain sockets.
         | 
         | In general, I choose a shared memory queue for local-only
         | inter-process communication.
        
           | duped wrote:
           | > Local TCP sockets can also bypass the kernel if you have a
           | user-space TCP stack. I don't know if you can do this with
           | unix domain sockets (I've never tried).
           | 
           | Kernel bypass exists because hardware can handle more packets
           | than the kernel can read or write, and all the tricks
           | employed are clever workarounds (read: kinda hacks) to get
           | the packets managed in user space.
           | 
           | This is kind of an orthogonal problem to IPC, and there's
           | already a well defined interface for multiple processes to
           | communicate without buffering through the kernel - and that's
           | shared memory. You could employ some of the tricks (like
           | LD_PRELOAD to hijack socket/accept/bind/send/recv) and
           | implement it in terms of shared memory, but at that point why
           | not just use it directly?
           | 
           | If speed is your concern, shared memory is always the fastest
           | IPC. The tradeoff is that you now have to manage the
           | messaging across that channel.
        
             | bheadmaster wrote:
             | In my experience, for small unbatchable messages, UNIX
             | sockets are fast enough not to warrant the complexity of
             | dealing with shared memory.
             | 
             | However, for bigger and/or batchable messages, shared
             | memory ringbuffer + UNIX socket for synchronization is the
             | most convenient yet fast IPC I've used.
        
           | Agingcoder wrote:
           | On Linux you can use abstract names, prefixed with a null
           | byte. They disappear automatically when your process dies,
           | and afaik don't require rw access to a directory.
        
           | throwway120385 wrote:
           | > I can also use local tcp for websockets. I have no idea if
           | that's possible with unix domain sockets.
           | 
           | The thing that makes this possible or impossible is how your
           | library implements the protocol, at least in C/C++. The
           | really bad protocol libraries I've seen like for MQTT, AMQP,
           | et. al. all insist on controlling both the connection stream
           | and the protocol state machine and commingle all of the code
           | for both. They often also insist on owning your main loop
           | which is a bad practice for library authors.
           | 
           | A much better approach is to implement the protocol as a
           | separate "chunk" of code with well-defined interfaces for
           | receiving inputs and generating outputs on a stream, and with
           | hooks for protocol configuration as-needed. This allows me to
           | do three things that are good: * Choose how I want to do I/O
           | with the remote end of the connection. * Write my own main
           | loop or integrate with any third-party main loop that I want.
           | * Test the protocol code without standing up an entire TLS
           | connection.
           | 
           | I've seen a LOT of libraries that don't allow these things.
           | Apache's QPID Proton is a big offender for me, although they
           | were refactoring in this direction. libmosquitto provides
           | some facilities to access the filedescriptor but otherwise
           | tries to own the entire connection. So on and so forth.
           | 
           | Edit: I get how you end up there because it's the easiest way
           | to figure out the libraries. Also, if I had spare time on my
           | hands I would go through and work with maintainers to fix
           | these libraries because having generic open-source protocol
           | implementations would be really useful and would probably
           | solve a lot of problems in the embedded space with ad-hoc
           | messaging implementations.
           | 
           | If the protocol library allows you to control the connection
           | and provides a connection-agnostic protocol implementation
           | then you could replace a TLS connection over TCP local
           | sockets from OpenSSL with SPI transfers or CAN transfers to
           | another device if you really wanted to. Or Unix Domain
           | Sockets, because you own the file descriptor and you manage
           | the transfers yourself.
        
           | chrsig wrote:
           | > Another limitation of unix domain sockets is that the size
           | of the path string must be less than PATH_MAX. I've run into
           | this when the only directory I had write access to was
           | already close to the limit. Local TCP sockets obviously do
           | not have this limitation.
           | 
           | This drove me nuts for a _long_ time, trying to hunt down why
           | the socket couldn 't be created. it's a really subtle
           | limitation, and there's not a good error message or anything.
           | 
           | In my use case, it was for testing the server creating the
           | socket, and each test would create it's own temp dir to house
           | the socket file and various other resources.
           | 
           | > In general, I choose a shared memory queue for local-only
           | inter-process communication.
           | 
           | Do you mean the sysv message queues, or some user space
           | system? I've never actually seen sysv queues in the wild, so
           | I'm curious to hear more.
        
           | pixl97 wrote:
           | Isn't PATH_MAX 4k characters these days? Have to have some
           | pretty intense directory structures to hit that.
        
             | rascul wrote:
             | For unix domain sockets on Linux the max is 108 including a
             | null terminator.
             | 
             | https://www.man7.org/linux/man-pages/man7/unix.7.html
             | 
             | https://unix.stackexchange.com/questions/367008/why-is-
             | socke...
        
       | rwmj wrote:
       | AF_VSOCK is another one to consider these days. It's a kind of
       | hybrid of loopback and Unix. Although they are designed for
       | communicating between virtual machines, vsock sockets work just
       | as well between regular processes. Also supported on Windows.
       | 
       | https://www.man7.org/linux/man-pages/man7/vsock.7.html
       | https://wiki.qemu.org/Features/VirtioVsock
        
         | touisteur wrote:
         | With some luck and love in the future hopefully we'll also be
         | able to use them in containers
         | https://patchwork.kernel.org/project/kvm/cover/2020011617242...
         | which would simplify a lot of little things.
        
         | tptacek wrote:
         | What's the advantage to vsocks over Unix domain sockets? UDS's
         | are very fast, and much easier to use.
        
           | rwmj wrote:
           | I didn't mean to imply any advantage, just that they are
           | another socket-based method for two processes to communicate.
           | Since vsocks use a distinct implementation they should
           | probably be benchmarked alongside Unix domain sockets and
           | loopback sockets in any comparisons. My expectation is they
           | would be somewhere in the middle - not as well optimized as
           | Unix domain sockets, but with less general overhead than TCP
           | loopback.
           | 
           | If you are using vsocks between two VMs as intended then they
           | have the advantage that they allow communication without
           | involving the network stack. This is used by VMs to implement
           | guest agent communications (screen resizing, copy and paste
           | and so on) where the comms don't require the network to have
           | been set up at all or be routable to the host.
        
         | cout wrote:
         | I did not know about this. Thanks for the tip!
        
         | coppsilgold wrote:
         | VMM's such as firecracker and cloud-hypervisor translate
         | between vsock and UDS. [1]
         | 
         | In recent kernel versions, sockmap also has vsock translation: 
         | <https://github.com/torvalds/linux/commit/5a8c8b72f65f6b80b52..
         | .>
         | 
         | This allows for a sort of UDS "transparency" between guest and
         | host. When the host is connecting to a guest, the use of a
         | multiplexer UDS is required. [1]
         | 
         | [1] <https://github.com/firecracker-
         | microvm/firecracker/blob/main...>
        
       ___________________________________________________________________
       (page generated 2023-09-11 22:00 UTC)