[HN Gopher] Unix Domain Sockets vs Loopback TCP Sockets (2014) ___________________________________________________________________ Unix Domain Sockets vs Loopback TCP Sockets (2014) Author : e12e Score : 112 points Date : 2023-09-11 12:51 UTC (9 hours ago) (HTM) web link (nicisdigital.wordpress.com) (TXT) w3m dump (nicisdigital.wordpress.com) | c7DJTLrn wrote: | A lot of modern software disregards the existence of unix | sockets, probably because TCP sockets are an OS agnostic concept | and perform well enough. You'd need to write Windows-specific | code to handle named pipes if you didn't want to use TCP sockets. | giovannibonetti wrote: | I imagine there should be some OS-agnostic libraries somewhere | that handle it and provide the developer a unified interface. | eptcyka wrote: | Yes, but there's NamedPipes and they can be used the same way | on Windows. And Windows also supports UDS as well today. It's | no excuse. | nsteel wrote: | Going forward, hopefully modern software will use the modern | approach of AF_UNIX sockets in Windows 10 and above: | https://devblogs.microsoft.com/commandline/af_unix-comes-to-... | | EDIT: And it would be interesting for someone to reproduce a | benchmark like this on Windows to compare TCP loopback and the | new(ish) unix socket support. | [deleted] | zamadatix wrote: | A couple of years after this article came out Windows added | support for SOCK_STREM Unix sockets. | rnmmrnm wrote: | windows is exactly the reason they didn't prevail imo. Windows | named pipes have weird security caveats and are not really | supported in high level languages. I think this lead everyone | to just using loopback TCP as the portable IPC communication | API instead of going with unix sockets. | duped wrote: | IME a lot of developers have never even heard of address | families and treat "socket" as synonymous with TCP (or | possibly, but rarely, UDP). | [deleted] | jjice wrote: | Windows actually added Unix sockets about six years ago, and | with how aggressive Microsoft EOLs older versions of their OS | (relative to something like enterprise linux at least), it's | probably a pretty safe bet to use at this point. | | https://devblogs.microsoft.com/commandline/af_unix-comes-to-... | c7DJTLrn wrote: | Interesting, thanks. | Aachen wrote: | With how aggressively Microsoft EOLs older versions of their | OS, we're still finding decades-old server and client systems | at clients. | | While Server 2003 is getting more rare and the last sighting | of Windows 98/2000 has been a while, they're all running at | the very least a few months after the last free security | support is gone. But whether that's something you want to | support as a developer is your choice to make. | marcosdumay wrote: | That's not very relevant. | | If you start developing a new software today, it won't need | to run on those computers. And if it's old enough that it | need to, you can bet all of those architectural decisions | were already made and written into stone all over the | place. | johnmaguire wrote: | > If you start developing a new software today, it won't | need to run on those computers. | | This is a weird argument to make. | | For context, I work on mesh overlay VPNs at Defined.net. | We initially used Unix domain sockets for our daemon- | client control model. This supported Windows 10 / Server | 2019+. | | We very quickly found our users needed support for Server | 2016. Some are even still running 2012. | | Ultimately, as a software vendor, we can't just force | customers to upgrade their datacenters. | foobiekr wrote: | It's actually the opposite of Microsoft quickly eoling on | the server side. Server 2012 was EVERYWHERE as late as | 2018-2019. They were still issuing service packs in 2018. | rollcat wrote: | I'd be more interested in the security and usability aspect. | Loopback sockets (assuming you don't accidentally bind to | 0.0.0.0, which would make it even worse) are effectively rwx to | any process on the same machine that has the permission to open | network connections, unless you bother with setting up a local | firewall (which requires admin privileges). On top of that you | need to figure out which port is free to bind to, and have a | backup plan in case the port isn't free. | | Domain sockets are simpler in both aspects: you can create one in | any suitable directory, give it an arbitrary name, chmod it to | control access, etc. | spacechild1 wrote: | > Two communicating processes on a single machine have a few | options | | Curiously, the article does not even mention pipes, which I would | assume to be the most obvious solution for this task (but not | necessarily the best, of course!) | | In particular, I am wondering how Unix domain sockets compare to | (a pair of) pipes. At first glance, they appear to be very | similar. What are the trade-offs? | tptacek wrote: | The pipe vs. socket perf debate is a very old one. Sockets are | more flexible and tunable, which may net you better performance | (for instance, by tweaking buffer sizes), but my guess is that | the high order bit of how a UDS and a pipe perform are the | same. | | Using pipes instead of a UDS: | | * Requires managing an extra set of file descriptors to get | bidirectionality | | * Requires processes to be related | | * Surrenders socket features like file descriptor passing | | * Is more fiddly than the socket code, which can often be | interchangeable with TCP sockets (see, for instant, the Go | standard library) | | If you're sticking with Linux, I can't personally see a reason | ever to prefer pipes. A UDS is probably the best default answer | for generic IPC on Linux. | xuhu wrote: | With pipes, the sender has to add a SIGPIPE handler which is | not trivial to do if it's a library doing the send/recv. With | sockets it can use send(fd, buf, MSG_NOSIGNAL) instead. | badrabbit wrote: | Why not UDP? Less overhead and you can use multicast to expand | messaging to machines in a lan. TCP on localhost makes little | sense, especially when simple ack's can be implemented in UDP. | | But even then, I wonder how the segmentation in TCP is affecting | performance in addition to windowing. | | Another thing I always wanted to try was using raw IP packets, | why not? Just sequence requests and let the sender close a send | transaction only when it gets an ack packet with the sequence # | for each send. Even better, a raw AF_PACKET socket on the | loopback interface! That might beat UDS! | sophacles wrote: | Give it a try and find out! I'd give that blog post a read. | | I suspect you'd run into all sorts of interesting issues... | particularly if the server is one process but there are N>1 | clients and you're using AF_PACKET. | svanwaa wrote: | Would TCP_NODELAY make any difference (good or bad)? | inv2004 wrote: | Would be better to retest | | If I remember correct, we had the same results described in | article in 2014, but also I remember that linux loopback was | optimized after it and different was much smaller if visible | duped wrote: | What's in the way of TCP hitting the same performance as unix | sockets, is it just netfilter? | woodruffw wrote: | I believe the conventional wisdom here is that UDS performs | better because of fewer context switches and copies between | userspace and kernelspace. | foobiekr wrote: | No. This is exactly the same. Think about life of a data gram | or stream bytes on the syscall edge for each. | woodruffw wrote: | I'm not sure I understand. This isn't something I haven't | thought about in a while, but it's pretty intuitive to me | that a loopback TCP connection would pretty much always be | slower: each transmission unit goes through the entire TCP | stack, feeds into the TCP state machine, etc. Thats more | time spent in the kernel. | foobiekr wrote: | The ip stack. | noselasd wrote: | TCP has a lot of rules nailed down in numerous RFCs - | everything from how to handle sequence numbers, the 3-way | handshake, congestion control, and much more. | | That translates into a whole lot of code that needs to run, | while unix sockets are not that much more than a kernel buffer | and code to copy data back and forth in that buffer - which | doesn't need a lot of code to make happen. | majke wrote: | Always use Unix Domain sockets if you can. There are at least | three concerns with TCP. | | First, local port numbers are a limited resource. | | https://blog.cloudflare.com/how-to-stop-running-out-of-ephem... | https://blog.cloudflare.com/the-quantum-state-of-a-tcp-port/ | https://blog.cloudflare.com/this-is-strictly-a-violation-of-... | | Then the TCP buffer autotune can go berserk: | https://blog.cloudflare.com/unbounded-memory-usage-by-tcp-fo... | https://blog.cloudflare.com/when-the-window-is-not-fully-ope... | | Finally, conntrack. https://blog.cloudflare.com/conntrack-tales- | one-thousand-and... https://blog.cloudflare.com/conntrack-turns- | a-blind-eye-to-d... | | These issues don't exist in Unix Sockets land. | alexvitkov wrote: | If different components of your system are talking over a | pretend network you've already architectured yourself face | first into a pile of shit. There's no argument for quality | either way so I'll just use TCP sockets and save myself 2 hours | when I inevitably have to get it running on Windows. | mhuffman wrote: | >If different components of your system are talking over a | pretend network you've already architectured yourself face | first into a pile of shit. | | How do you have your file delivery, database, and business | logic "talk" to each other? Everything on the same computer | is a "pretend network" to some extent, right? Do you always | architect your own database right into your business logic | along with a web-server as a single monolith? One off SPAs | must take 2-3 months! | johnmaguire wrote: | FYI, Windows supports Unix domain sockets since Windows 10 / | Server 2019. | alexvitkov wrote: | Good thing to mention, thanks. | | That's mostly why I said 2 hours and not a day, as you | still have to deal with paths (there's no /run) and you may | have to fickle with UAC or god save us NTFS permissions | adzm wrote: | I had not head of this! Long story short, AF_UNIX now | exists for Windows development. | | https://devblogs.microsoft.com/commandline/af_unix-comes- | to-... https://visualrecode.com/blog/unix- | sockets/#:~:text=Unix%20d.... | Karrot_Kream wrote: | These matter if you have need to bind to multiple ports, but if | you're only running a handful of services that need to bind a | socket, then port number allocation isn't a big issue. TCP | Buffer autotune having problems also matters at certain scale, | but in my experience requires a tipping point. TCP sockets also | have configurable buffer sizes while Unix sockets have a fixed | buffer size, so TCP socket buffers can get much deeper. | | At my last role we benchmarked TCP sockets vs Unix sockets in a | variety of scenarios. In our benchmarks, only certain cases | benefited from Unix sockets and generally the complexity of | using them in containerized environments made them less | attractive than TCP unless we needed to talk to a high | throughput cache or we were doing things like farming requests | out to a FastCGI process manager. Generally speaking, using | less chatty protocols than REST (involving a lot less serde | overhead and making it easier to allocate ingest structures) | made a much bigger difference. | | I was actually a huge believer in deferring to Unix sockets | where possible, due to blog posts like these and my | understanding of the implementation details (I've implemented | toy IPC in a toy kernel before), but a coworker challenged me | to benchmark my belief. Sure enough on benchmark it turned out | that in most cases TCP sockets were fine and simplified a | containerized architecture enough that Unix sockets just | weren't worth it. | kelnos wrote: | > _the complexity of using [UNIX sockets] in containerized | environments made them less attractive than TCP_ | | Huh, I would think UNIX sockets would be easier; since | sharing the socket between the host and a container (or | between containers) is as simple as mounting a volume in the | container and setting permissions on the socket | appropriately. | | Using TCP means dealing with iptables and seems... less fun. | I easily run into cases where the host's iptables firewall | interferes with what Docker wants to do with iptables such | that it takes hours just to get simple things working | properly. | lokar wrote: | Also UDS have more features, for example you can get the remote | peer UID and pass FDs | the8472 wrote: | And SOCK_SEQPACKET which greatly simplifies fd-passing | chadaustin wrote: | How does SOCK_SEQPACKET simplify fd-passing? Writing a | streaming IPC crate as we speak and wondering if there are | land mines beyond https://gist.github.com/kentonv/bc7592af9 | 8c68ba2738f44369208... | the8472 wrote: | Well, the kernel does create implicit packetization | boundary when you attach FDs to a byte-stream... but this | is underdocumented and there's an impedance mismatch | between byte streams and discrete application-level | messages. You can also send zero-sized messages to pass | an FD. with byte streams you must send at least one byte. | Which means you can send the FDs separately after sending | the bytes which makes it easier to notify the application | that it should expect FDs (in case it's not always using | recvmsg with an cmsg allocation prepared). SEQPACKET just | makes it more straight-forward because 1 message | (+ancillary data) is always one sendmsg/recvmsg pair. | chadaustin wrote: | I appreciate your reply! | | My approach has been to send a header with the number of | fds and bytes the next packet will contain, and the | number of payload bytes is naturally never 0 in my case. | etaham wrote: | +1 | booleanbetrayal wrote: | We've seen observable performance increases in migrating to | unix domain sockets wherever possible, as some TCP stack | overhead is bypassed. | LinAGKar wrote: | One problem I've run into when trying to use Unix sockets | though is that it can only buffer fairly few messages at once, | so if you have a lot of messages in flight at once you can | easily end up with sends failing. TCP sockets can handle a lot | more messages. | count wrote: | Can't you tune this with sysctl? | kevincox wrote: | The biggest reason for me is that you can use filesystem | permissions to control access. Often I want to run a service | locally and do auth at the reverse proxy, but if the service | binds to localhost then all local processes can access without | auth. If I only grant the reverse proxy permissions on the | filesystem socket then you can't access without going through | the auth. | piperswe wrote: | And with `SO_PEERCRED`, you can even implement more complex | transparent authorization & logging based on the uid of the | connecting process. | kevincox wrote: | This is true but to me mostly negates the benefit for this | use case. The goal is to offload the auth work to the | reverse proxy not to add more rules. | | Although I guess you could have the reverse proxy listen | both on IP and UNIX sockets. It can then do different auth | depending on how the connection came in. So you could auth | with TLS Cert or Password over IP or using your PID/UNIX | account over the UNIX socket. | o11c wrote: | Adjacently, remember that with TCP sockets you _can_ vary the | address anywhere within 127.0.0.0 /8 | majke wrote: | However this is not the case for ipv6. Technically you can | use only ::1, unless you do Ipv6 FREEBIND | nine_k wrote: | You usually have a whole bunch of link-local IPv6 | addresses. Can't you use them? | cout wrote: | I agree. Always choose unix domain sockets over local TCP if it | is an option. There are some valid reasons though to choose | TCP. | | In the past, I've chosen local TCP sockets because I can | configure the receive buffer size to avoid burdening the sender | (ideally both TCP and unix domain sockets should correctly | handle EAGAIN, but I haven't always had control over the code | that does the write). IIRC the max buffer size for unix domain | sockets is lower than for TCP. | | Another limitation of unix domain sockets is that the size of | the path string must be less than PATH_MAX. I've run into this | when the only directory I had write access to was already close | to the limit. Local TCP sockets obviously do not have this | limitation. | | Local TCP sockets can also bypass the kernel if you have a | user-space TCP stack. I don't know if you can do this with unix | domain sockets (I've never tried). | | I can also use local tcp for websockets. I have no idea if | that's possible with unix domain sockets. | | In general, I choose a shared memory queue for local-only | inter-process communication. | duped wrote: | > Local TCP sockets can also bypass the kernel if you have a | user-space TCP stack. I don't know if you can do this with | unix domain sockets (I've never tried). | | Kernel bypass exists because hardware can handle more packets | than the kernel can read or write, and all the tricks | employed are clever workarounds (read: kinda hacks) to get | the packets managed in user space. | | This is kind of an orthogonal problem to IPC, and there's | already a well defined interface for multiple processes to | communicate without buffering through the kernel - and that's | shared memory. You could employ some of the tricks (like | LD_PRELOAD to hijack socket/accept/bind/send/recv) and | implement it in terms of shared memory, but at that point why | not just use it directly? | | If speed is your concern, shared memory is always the fastest | IPC. The tradeoff is that you now have to manage the | messaging across that channel. | bheadmaster wrote: | In my experience, for small unbatchable messages, UNIX | sockets are fast enough not to warrant the complexity of | dealing with shared memory. | | However, for bigger and/or batchable messages, shared | memory ringbuffer + UNIX socket for synchronization is the | most convenient yet fast IPC I've used. | Agingcoder wrote: | On Linux you can use abstract names, prefixed with a null | byte. They disappear automatically when your process dies, | and afaik don't require rw access to a directory. | throwway120385 wrote: | > I can also use local tcp for websockets. I have no idea if | that's possible with unix domain sockets. | | The thing that makes this possible or impossible is how your | library implements the protocol, at least in C/C++. The | really bad protocol libraries I've seen like for MQTT, AMQP, | et. al. all insist on controlling both the connection stream | and the protocol state machine and commingle all of the code | for both. They often also insist on owning your main loop | which is a bad practice for library authors. | | A much better approach is to implement the protocol as a | separate "chunk" of code with well-defined interfaces for | receiving inputs and generating outputs on a stream, and with | hooks for protocol configuration as-needed. This allows me to | do three things that are good: * Choose how I want to do I/O | with the remote end of the connection. * Write my own main | loop or integrate with any third-party main loop that I want. | * Test the protocol code without standing up an entire TLS | connection. | | I've seen a LOT of libraries that don't allow these things. | Apache's QPID Proton is a big offender for me, although they | were refactoring in this direction. libmosquitto provides | some facilities to access the filedescriptor but otherwise | tries to own the entire connection. So on and so forth. | | Edit: I get how you end up there because it's the easiest way | to figure out the libraries. Also, if I had spare time on my | hands I would go through and work with maintainers to fix | these libraries because having generic open-source protocol | implementations would be really useful and would probably | solve a lot of problems in the embedded space with ad-hoc | messaging implementations. | | If the protocol library allows you to control the connection | and provides a connection-agnostic protocol implementation | then you could replace a TLS connection over TCP local | sockets from OpenSSL with SPI transfers or CAN transfers to | another device if you really wanted to. Or Unix Domain | Sockets, because you own the file descriptor and you manage | the transfers yourself. | chrsig wrote: | > Another limitation of unix domain sockets is that the size | of the path string must be less than PATH_MAX. I've run into | this when the only directory I had write access to was | already close to the limit. Local TCP sockets obviously do | not have this limitation. | | This drove me nuts for a _long_ time, trying to hunt down why | the socket couldn 't be created. it's a really subtle | limitation, and there's not a good error message or anything. | | In my use case, it was for testing the server creating the | socket, and each test would create it's own temp dir to house | the socket file and various other resources. | | > In general, I choose a shared memory queue for local-only | inter-process communication. | | Do you mean the sysv message queues, or some user space | system? I've never actually seen sysv queues in the wild, so | I'm curious to hear more. | pixl97 wrote: | Isn't PATH_MAX 4k characters these days? Have to have some | pretty intense directory structures to hit that. | rascul wrote: | For unix domain sockets on Linux the max is 108 including a | null terminator. | | https://www.man7.org/linux/man-pages/man7/unix.7.html | | https://unix.stackexchange.com/questions/367008/why-is- | socke... | rwmj wrote: | AF_VSOCK is another one to consider these days. It's a kind of | hybrid of loopback and Unix. Although they are designed for | communicating between virtual machines, vsock sockets work just | as well between regular processes. Also supported on Windows. | | https://www.man7.org/linux/man-pages/man7/vsock.7.html | https://wiki.qemu.org/Features/VirtioVsock | touisteur wrote: | With some luck and love in the future hopefully we'll also be | able to use them in containers | https://patchwork.kernel.org/project/kvm/cover/2020011617242... | which would simplify a lot of little things. | tptacek wrote: | What's the advantage to vsocks over Unix domain sockets? UDS's | are very fast, and much easier to use. | rwmj wrote: | I didn't mean to imply any advantage, just that they are | another socket-based method for two processes to communicate. | Since vsocks use a distinct implementation they should | probably be benchmarked alongside Unix domain sockets and | loopback sockets in any comparisons. My expectation is they | would be somewhere in the middle - not as well optimized as | Unix domain sockets, but with less general overhead than TCP | loopback. | | If you are using vsocks between two VMs as intended then they | have the advantage that they allow communication without | involving the network stack. This is used by VMs to implement | guest agent communications (screen resizing, copy and paste | and so on) where the comms don't require the network to have | been set up at all or be routable to the host. | cout wrote: | I did not know about this. Thanks for the tip! | coppsilgold wrote: | VMM's such as firecracker and cloud-hypervisor translate | between vsock and UDS. [1] | | In recent kernel versions, sockmap also has vsock translation: | <https://github.com/torvalds/linux/commit/5a8c8b72f65f6b80b52.. | .> | | This allows for a sort of UDS "transparency" between guest and | host. When the host is connecting to a guest, the use of a | multiplexer UDS is required. [1] | | [1] <https://github.com/firecracker- | microvm/firecracker/blob/main...> ___________________________________________________________________ (page generated 2023-09-11 22:00 UTC)