[HN Gopher] Ghosts of Unix past, part 3: Unfixable designs (2010) ___________________________________________________________________ Ghosts of Unix past, part 3: Unfixable designs (2010) Author : wmanley Score : 135 points Date : 2021-05-17 14:25 UTC (8 hours ago) (HTM) web link (lwn.net) (TXT) w3m dump (lwn.net) | kazinator wrote: | Forgets threads! | | Speaking of signals, the mess of when POSIX threads collided with | signals. And with fork. And chdir being process wide ... | Juntu wrote: | Biometrical specimens can be another 5 to 20 years. No threads | old now tomorrow I can create data to show easy tasks the year is | decade plus one passed part 2 and 1. No threads | jude- wrote: | I think "unfixable" is in the eye of the beholder. A lot of these | design choices make a lot more sense if you believe that complex | Unix applications are supposed to be built from multiple loosely- | coupled programs that perform exactly one task each, and | communicate by piping data to one another. I think the two design | decisions discussed in this article only create problems for | developers who write programs that do "too much," such that the | programs can no longer make best use of the OS facilities. | | > Unix signals | | Unix signals are an asynchronous best-effort form of out-of-band | IPC. Because the programs that make up your Unix application | _already use_ pipes for IPC (which are synchronous and reliable), | the role of the signal handler in a program would be to either | absorb the signal by taking some localized action, or translate | the signal into some piped IPC message to other program(s) in the | application to consume and handle. | | It's been pointed out elsewhere that threads and signals don't | play nicely. But that shouldn't be a problem for a multi-program | Unix application -- you'd keep the multi-threaded logic in a | separate program(s) from the signal-handling logic, and have the | signal-handling program forward the multi-threaded program the | signal data in-band, via a pipe. For example, you might factor | the application into a supervisor program and one or more | subordinate programs (which can be multi-threaded), and have the | supervisor intercept signals and route the relevant IPC | notification to subordinates via pipes. | | > Unix permissions | | The "one user and one group" model for files stops being so | limiting if you can make it so the different programs that make | up your application run as different users and groups. For | example, a "logger" program in your application would have a | separate user/group ID than a "database" program, and in doing | so, ensure that the "logger" program can only access log state, | and the "database" program can only access database state. | zxzax wrote: | But that's more working around the problems than fixing them, | no? You can't really move all signal handling logic to a | dedicated process, because every process can receive signals. | And as for databases: every major production database I've seen | implements its own authentication and permission scheme, so it | can do things like provide ACLs on a more granular (per-table, | sometimes per-row) basis. | jude- wrote: | > But that's more working around the problems than fixing | them, no? | | I don't think Unix signal behavior is the problem. The | problems outlined in the article stem from people using them | inappropriately. The new signal syscalls introduced in Linux | over the years haven't stopped people from misusing them. | | > You can't really move all signal handling logic to a | dedicated process, because every process can receive signals. | | Processes are not obliged to take action in response to | signals. But, they _could_ simply propagate the signal data | to the parent process via a pipe file descriptor it inherits. | Then, you _could_ place all the signal-handling logic into a | supervisor -- the supervisor would get notified via a pipe | when one of its descendants receives a signal, and take | appropriate action. | | > And as for databases: every major production database I've | seen implements its own authentication and permission scheme, | so it can do things like provide ACLs on a more granular | (per-table, sometimes per-row) basis. | | No one said an application can't have its own authentication | and permission scheme. All I am saying is that if you factor | your application into multiple processes running under | different system-level user accounts, you can get more | mileage out of the Unix permission system than you could | otherwise, because the kernel would be able to distinguish | individual pieces of your application as having different | sets of permissions. | bombcar wrote: | Isn't the case with something like signals is that it needs to | simply be left as is and instead a new API for interprocess | communication be developed alongside? | | It seems pretty clear that they're used for way more than | originally expected (did threads even exist when signals began?) | - and I suspect a number of systems use other communication paths | already. | Jasper_ wrote: | A proper inter-process communication mechanism that supports | multicast has been proposed to the kernel multiple times and | denied every time. So it's probably not going to happen. | nine_k wrote: | Why was it denied? | cbsmith wrote: | Unix has a plethora of IPC APIs, almost all of which were | invented after signals (e.g. sockets). | | Signals themselves got new APIs long before signalfd: sigaction | and posix real-time signals were already a thing, as were posix | threads, when Linux was invented. | User23 wrote: | What's really sad is that multi user system interrupts were | long since a solved problem when Unix was developed. I don't | know why that existing body of knowledge wasn't applied. | pjc50 wrote: | Which current system uses this? Do you have a shorter | explanation than the Dijkstra link below, like API docs? | coldtea wrote: | So, like warts in C and Go that were not present or fixed in | languages a decade or more earlier? | pjmlp wrote: | Yep, somehow there is a common line going on there. | linschn wrote: | Do you have a reference on pre-1969 multi user interrupts | being solved? Also, Unix was developed on a ~16kB RAM machine | IIRC... Maybe that's the reason? | User23 wrote: | Sure, here's Dijkstra on it[1]. The X1[2] was a | significantly more limited machine than the PDP-11. I | believe that work predates Unix by nearly a decade. | | [1] https://www.cs.utexas.edu/users/EWD/transcriptions/EWD1 | 3xx/E... | | [2] https://ub.fnwi.uva.nl/computermuseum//X1.html | linschn wrote: | Thanks | pm215 wrote: | The impression I got from reading the v6 kernel code in the | Lions book was that signal handling had been added in as a | solution to a few specific problems (like "we're going to | kill this process but maybe it should get a chance to clean | up first"). If you're thinking about them from that viewpoint | then the (now) well-known problems like interruption-of- | syscalls and the initial "when you take a signal the handler | gets unregistered" don't seem like such a big deal -- after | all, the process is going to exit anyway. | pjmlp wrote: | Just like safer systems programming languages precede C for | about 10 years. | | It is not as if the UNIX culture was to pay attention to best | practices being done on other systems. | icedchai wrote: | Threads arrived relatively late to Unix, long after signals. I | think Solaris 2.x was the first mainstream Unix to have | threads. | usr1106 wrote: | Are you talking about kernel threads or user space threads? I | believe none of them were really in Unix very early, but the | time of introduction varied. | icedchai wrote: | I was referring to kernel threads. | usr1106 wrote: | I vaguely remember that kernel threads were something new | in HP-UX in the end of the 1990s. | | The question is, was that late? Windows NT had kernel | threads from the beginning, so maybe a few years earlier. | But then it took NT years to become stable enough to be | used in servers, so saying they were generally ahead | would not be a correct description. | | So if Unix is considered late (according to the GGP) and | NT not a comparable competitor, who was really early? If | anybody. | QuesnayJr wrote: | Didn't OS/2 have threads? | icedchai wrote: | BeOS was heavily threaded. | pjmlp wrote: | Xerox PARC workstations, Solo OS, Topaz are some early | examples. | pavon wrote: | Title should have (2010) | bediger4000 wrote: | Lots of difference between how unfixable design problems are | treated in open source vs closed source operating systems. Is | this difference good or bad? | [deleted] | surajrmal wrote: | I would argue being open source vs closed source doesn't | matter. The governance model does as do the priorities of the | project. This isn't to say you need a BDFL or single company | running the project to address "unfixable" problems, but they | certainly do seem to help. | | On a more meta note, open source means a lot of different | things. There is actually a lot of nuance in the different | styles of open source. Linux vs chromium vs that project that | just does source dumps. Whether they accept contributions, | accept bug reports/feature requests, allow you to build from | source (source dumps often don't include a working build | system), have open communication channels, etc can all vary. I | hope we have more specific terms for the different styles of | open source in the future. | bombcar wrote: | I agree - though open source does allow for a major fork if | the users don't agree with the developers (or project | leadership) on how it should be fixed. | | Project leadership is the most significant aspect - look at | Linus's absolute declaration that the kernel can "never break | userspace" meaning that once an API is exported to userspace | it never gets removed. | | This is actually similar to Microsoft's philosophy though | theirs is more "business oriented" (nobody will buy Win95 if | their DOS and Win 3.1 programs won't work). Another example | of this is Knuth's TeX code. | | Open source developments seem to lean (in general) more | toward "rip it out and replace everything" (see for example | internal kernel APIs not exported to userspace) because | access to the source means they can fix the things that touch | it. Closed source programs more likely just die and get | entirely replaced, otherwise they roughly try to keep working | as is. | cbsmith wrote: | Open source isn't a governance model. | zxzax wrote: | Something else that doesn't often get brought up here is that | kill(2) itself is an unfixable race condition waiting to happen. | It's only safe to use that to signal direct child processes. In | Linux, programs should be using the newer pidfd_send_signal | syscall in almost every case where they would otherwise use | kill(2). | | Edit: waitpid is also similarly broken and unfixable for a lot of | the same reasons as signals, pidfds and waitid(P_PIDFD, ...) | should be replacing most uses of that as well. | colonwqbang wrote: | Could you explain what the problem is, or provide a link? | [deleted] | taviso wrote: | The primary problem is that a process could exit, and the pid | might be recycled, so you kill() the wrong process. | MaxBarraclough wrote: | The classic ABA problem, then. | | https://en.wikipedia.org/wiki/ABA_problem | orthonormal wrote: | Had no idea about this! Thank you. Now I'm starting to | understand undefined behavior safety is such a walled | garden. All sorts of snakes might be lurking beneath. | asveikau wrote: | This is a (rare?) instance where I would say Win32 gives | you some remedy over POSIX. On Windows, you can open a | handle and deal with that, rather than a pid. Once the | handle is open, it isn't subject to this recycling problem. | | However if opening a handle based on pid you may want to | double-check that the handle matches your expectation | before using it, since that would be prone to the same | race. | simcop2387 wrote: | This is actually what pidfd does on linux, you get a | handle to the process that lets you interact with it in | that same manner. Once the process exits the handle gets | closed and all the operations will report an error even | if you have a recycled pid | hnarn wrote: | This might be naive of me, but isn't this in a way fixed by | systemd service handling? Assuming the process in question | is, in fact, handled as a service of course. | zxzax wrote: | The issue is solved in any service manager as long as the | service doesn't fork, when you are the parent process you | can ensure that you don't reap the child before sending a | signal. | | Once the service forks then it becomes a problem. If you | use cgroups it can be solved separately with the cgroup | freezer, but there are still some open issues with this | in systemd: | https://github.com/systemd/systemd/issues/13101 | Quekid5 wrote: | Correct. | Jasper_ wrote: | I've always felt like it should be the case that as long as | a pidfd for that process is open, the pid doesn't get | recycled, so you could open the pidfd and then use kill | safely, then close it later. Means you wouldn't need a | whole bunch of new syscalls. | | Unfortunately, it seems like this idea was rejected during | the introduction of pidfd. | zxzax wrote: | I think the idea with that was it would lead to denial- | of-service type situations where some process could leak | a bunch of pidfds and then that would cause exhaustion of | pids everywhere else. | Jasper_ wrote: | A process could also do that by spawning a bunch of | processes if it wanted to. | zxzax wrote: | Not really, most systems should set RLIMIT_NPROC to | prevent that. If pidfds held onto the pid, it would | create a new denial-of-service that allowed random other | processes to keep zombie processes open, and the fix for | it would actually allow you to circumvent that limit! | Jasper_ wrote: | You can also set RLIMIT_NOFILE to prevent the number of | FDs the app can open if you're worried about it. | zxzax wrote: | I don't think that would necessary solve it, since the | maximum number you can have open is still RLIMIT_NPROC * | RLIMIT_NOFILE, right? It seems it would still be a | problem as long as it's greater than RLIMIT_NPROC. Edit: | I suppose you could fix it as long as you could guarantee | that NPROC * NOFILE * maxlogins < kernel.pid_max... but | to me this is piling on more workarounds. | zxzax wrote: | The problem was discussed quite a lot around the introduction | of pidfds: | | https://lwn.net/Articles/773459/ | | https://lwn.net/Articles/784831/ | | In essence it's a classic TOCTTOU. | SavantIdiot wrote: | Regarding file permissions: This kind of archeaology (or | forensics) is very important. Why? Because it exposes the trial- | and-error over a multi-decade evolution. Sometimes trial-and- | error is used as a pejorative (brute force hacking), but over the | course of decades as technology advances, it is inevitable. The | author is clear to point out that much of the issue here was due | to scalability, but i think there is something else at work: | unknown unknowns. It is impossible to be a 100% defensive | software architecture team, and "room to grow" is usually | jettisoned because it can lead to sloppy code, or worse, attack | vectors. It's such a hard problem and analyses like these papers | are first step in what I believe will become a full-blown | historic discipline of software meta-thought. I say "become" | because you can't really do this kind of analysis with 5, 10 or | 20 years of history: you need multiple decades, and that is just | now upon us. | | I think there are applications for this kind of study. It can | very clearly feed back into current practices, and possibly even | more formalized language syntax that can be defensive and | extensible. I would also love to see if this kind of analysis | bears out which aspects of various languages (and architectural | OS decisions) proved to be the most robust. Like with hemaglobin: | it is one of the largest and oldest genes, it is hard to break | via mutatation, and is shared by every animal with oxygenated | blood cells. Something was done right with that design! | nooyurrsdey wrote: | Really enjoyed reading about the struggles of implementing file | permissions. | | It seems like something that should be so simple, but once you | sit down and try to build it you'll realize you have to support | so many uses cases. I bet if you asked everyone on HN how they'd | do it, you'd end up with so many confident answers that also had | shortcomings themselves. | williesleg wrote: | Well kids, get off your ass and write a new operating system. We | invented it, you all sit on your ass and do nothing but complain. | With ads. | primis wrote: | One of the complaints brought up - the 16 bit group/uid seems to | have been fixed quite nicely in modern linux systems by adding an | additional 16 bit s to each. It seems these problems aren't | "unfixable" after all | mmcgaha wrote: | The idea that having ownership and permission bits at the file | level being a problem fixed by moving the permissions to the | directory level completely hand waves over the fact that hard | links exist in unix file systems. They need to think a little | harder about that Mencken quote. | wmanley wrote: | See the next article in the series "Ghosts of Unix past, part | 4: High-maintenance designs"[1]. This specifically addresses | how the existence of hard links is elegant in itself, but | exports complexity to other parts of the system. | | [1]: https://lwn.net/Articles/416494/ | Macha wrote: | Proper ACLs exist on Linux these days as an alternative to | user/group permissioning as well for use cases which call for a | more powerful system. | bombcar wrote: | The "systematic" part of things is relatively easy to handle | (as the code can be made to handle anything complex) - it's | the "user" interface that is harder. A user with root access | wants to give access to a given file/directory to a user - | this needs to be made easy to do successfully and securely. | Too many times I've seen entire web directories 777 because | they just wanted it to work. | | Commands providing "why user X can't access Y" and | recommended solutions can help. | samf wrote: | Yes, this is a major problem when you introduce ACLs into | unix-like systems. A comment in the article mentions the | "Richacl" work. A key problem with this work was that even | "chmod 777" might not get you out of a situation where an | ACL was denying access. It's been over ten years since I've | been involved in this; it might have changed. | | The POSIX draft ACLs had the same problem, where a chmod | might not grant you the permission that you're asking for. | Back when Solaris implemented POSIX draft ACLs, they needed | to change many user-level interfaces (e.g., the chmod | command and the ftp daemon) to have a chmod request work | the way end users expected. | devchix wrote: | Have you worked with setfacl(1), getfacl(1) recently? The | agony they inflict makes me want to die. Do you need log dirs | read by a non-root logreader? Are there nested subdirs? What | are the defaults? Extra crispy boss-mode: is SELinux on? I | think the extended ACLs have taken us further into the weeds, | and I think the permission architecture needs to be rethink | entirely. It was designed for shared university-type | computing resources at a time when 30 profs and researchers | shared dirs and commingle a set of files, and daemons are | users with own places to keep things. No longer. The RBAC and | inheritance model, I dunno, they may work correctly but they | are so fiddly with so many knobs and intersections that you | end up front-loading a huge amount of work; nobody wants to | do that, nor have I seen it done correctly, with design and | intent. | GauntletWizard wrote: | I'm actually fully behind the POSIX permissions model as a | solution for this: If you have a group that all needs | read/write, no big deal. If you have a group that needs to | write and the world reads, no big deal. If you have a group | that writes and another group that reads: No big deal, so | long as you have a third group that's the union of both | groups and can have a multi-level subdirectory (where a/ | has 750 and a/b/ has 775). If you have groups that need to | read and groups that need to write in a more complicated | (or somehow path-specific) problem, you probably need a | daemon or setuid program to moderate access, and that's | okay. | | Happy to argue it or simply be told I'm wrong, but I've yet | to encounter a not-insane permissions model that I couldn't | solve with some "simple" nested groups (that in and of | itself is a tooling problem, but a solvable one) and POSIX. | trasz wrote: | Linux is probably the last major system not supporting NFSv4 | ACLs. Windows (obviously), MacOS X, Solaris, FreeBSD - all | those support them - for at least a decade now. ___________________________________________________________________ (page generated 2021-05-17 23:00 UTC)