[HN Gopher] Ghosts of Unix past, part 3: Unfixable designs (2010)
       ___________________________________________________________________
        
       Ghosts of Unix past, part 3: Unfixable designs (2010)
        
       Author : wmanley
       Score  : 135 points
       Date   : 2021-05-17 14:25 UTC (8 hours ago)
        
 (HTM) web link (lwn.net)
 (TXT) w3m dump (lwn.net)
        
       | kazinator wrote:
       | Forgets threads!
       | 
       | Speaking of signals, the mess of when POSIX threads collided with
       | signals. And with fork. And chdir being process wide ...
        
       | Juntu wrote:
       | Biometrical specimens can be another 5 to 20 years. No threads
       | old now tomorrow I can create data to show easy tasks the year is
       | decade plus one passed part 2 and 1. No threads
        
       | jude- wrote:
       | I think "unfixable" is in the eye of the beholder. A lot of these
       | design choices make a lot more sense if you believe that complex
       | Unix applications are supposed to be built from multiple loosely-
       | coupled programs that perform exactly one task each, and
       | communicate by piping data to one another. I think the two design
       | decisions discussed in this article only create problems for
       | developers who write programs that do "too much," such that the
       | programs can no longer make best use of the OS facilities.
       | 
       | > Unix signals
       | 
       | Unix signals are an asynchronous best-effort form of out-of-band
       | IPC. Because the programs that make up your Unix application
       | _already use_ pipes for IPC (which are synchronous and reliable),
       | the role of the signal handler in a program would be to either
       | absorb the signal by taking some localized action, or translate
       | the signal into some piped IPC message to other program(s) in the
       | application to consume and handle.
       | 
       | It's been pointed out elsewhere that threads and signals don't
       | play nicely. But that shouldn't be a problem for a multi-program
       | Unix application -- you'd keep the multi-threaded logic in a
       | separate program(s) from the signal-handling logic, and have the
       | signal-handling program forward the multi-threaded program the
       | signal data in-band, via a pipe. For example, you might factor
       | the application into a supervisor program and one or more
       | subordinate programs (which can be multi-threaded), and have the
       | supervisor intercept signals and route the relevant IPC
       | notification to subordinates via pipes.
       | 
       | > Unix permissions
       | 
       | The "one user and one group" model for files stops being so
       | limiting if you can make it so the different programs that make
       | up your application run as different users and groups. For
       | example, a "logger" program in your application would have a
       | separate user/group ID than a "database" program, and in doing
       | so, ensure that the "logger" program can only access log state,
       | and the "database" program can only access database state.
        
         | zxzax wrote:
         | But that's more working around the problems than fixing them,
         | no? You can't really move all signal handling logic to a
         | dedicated process, because every process can receive signals.
         | And as for databases: every major production database I've seen
         | implements its own authentication and permission scheme, so it
         | can do things like provide ACLs on a more granular (per-table,
         | sometimes per-row) basis.
        
           | jude- wrote:
           | > But that's more working around the problems than fixing
           | them, no?
           | 
           | I don't think Unix signal behavior is the problem. The
           | problems outlined in the article stem from people using them
           | inappropriately. The new signal syscalls introduced in Linux
           | over the years haven't stopped people from misusing them.
           | 
           | > You can't really move all signal handling logic to a
           | dedicated process, because every process can receive signals.
           | 
           | Processes are not obliged to take action in response to
           | signals. But, they _could_ simply propagate the signal data
           | to the parent process via a pipe file descriptor it inherits.
           | Then, you _could_ place all the signal-handling logic into a
           | supervisor -- the supervisor would get notified via a pipe
           | when one of its descendants receives a signal, and take
           | appropriate action.
           | 
           | > And as for databases: every major production database I've
           | seen implements its own authentication and permission scheme,
           | so it can do things like provide ACLs on a more granular
           | (per-table, sometimes per-row) basis.
           | 
           | No one said an application can't have its own authentication
           | and permission scheme. All I am saying is that if you factor
           | your application into multiple processes running under
           | different system-level user accounts, you can get more
           | mileage out of the Unix permission system than you could
           | otherwise, because the kernel would be able to distinguish
           | individual pieces of your application as having different
           | sets of permissions.
        
       | bombcar wrote:
       | Isn't the case with something like signals is that it needs to
       | simply be left as is and instead a new API for interprocess
       | communication be developed alongside?
       | 
       | It seems pretty clear that they're used for way more than
       | originally expected (did threads even exist when signals began?)
       | - and I suspect a number of systems use other communication paths
       | already.
        
         | Jasper_ wrote:
         | A proper inter-process communication mechanism that supports
         | multicast has been proposed to the kernel multiple times and
         | denied every time. So it's probably not going to happen.
        
           | nine_k wrote:
           | Why was it denied?
        
         | cbsmith wrote:
         | Unix has a plethora of IPC APIs, almost all of which were
         | invented after signals (e.g. sockets).
         | 
         | Signals themselves got new APIs long before signalfd: sigaction
         | and posix real-time signals were already a thing, as were posix
         | threads, when Linux was invented.
        
         | User23 wrote:
         | What's really sad is that multi user system interrupts were
         | long since a solved problem when Unix was developed. I don't
         | know why that existing body of knowledge wasn't applied.
        
           | pjc50 wrote:
           | Which current system uses this? Do you have a shorter
           | explanation than the Dijkstra link below, like API docs?
        
           | coldtea wrote:
           | So, like warts in C and Go that were not present or fixed in
           | languages a decade or more earlier?
        
             | pjmlp wrote:
             | Yep, somehow there is a common line going on there.
        
           | linschn wrote:
           | Do you have a reference on pre-1969 multi user interrupts
           | being solved? Also, Unix was developed on a ~16kB RAM machine
           | IIRC... Maybe that's the reason?
        
             | User23 wrote:
             | Sure, here's Dijkstra on it[1]. The X1[2] was a
             | significantly more limited machine than the PDP-11. I
             | believe that work predates Unix by nearly a decade.
             | 
             | [1] https://www.cs.utexas.edu/users/EWD/transcriptions/EWD1
             | 3xx/E...
             | 
             | [2] https://ub.fnwi.uva.nl/computermuseum//X1.html
        
               | linschn wrote:
               | Thanks
        
           | pm215 wrote:
           | The impression I got from reading the v6 kernel code in the
           | Lions book was that signal handling had been added in as a
           | solution to a few specific problems (like "we're going to
           | kill this process but maybe it should get a chance to clean
           | up first"). If you're thinking about them from that viewpoint
           | then the (now) well-known problems like interruption-of-
           | syscalls and the initial "when you take a signal the handler
           | gets unregistered" don't seem like such a big deal -- after
           | all, the process is going to exit anyway.
        
           | pjmlp wrote:
           | Just like safer systems programming languages precede C for
           | about 10 years.
           | 
           | It is not as if the UNIX culture was to pay attention to best
           | practices being done on other systems.
        
         | icedchai wrote:
         | Threads arrived relatively late to Unix, long after signals. I
         | think Solaris 2.x was the first mainstream Unix to have
         | threads.
        
           | usr1106 wrote:
           | Are you talking about kernel threads or user space threads? I
           | believe none of them were really in Unix very early, but the
           | time of introduction varied.
        
             | icedchai wrote:
             | I was referring to kernel threads.
        
               | usr1106 wrote:
               | I vaguely remember that kernel threads were something new
               | in HP-UX in the end of the 1990s.
               | 
               | The question is, was that late? Windows NT had kernel
               | threads from the beginning, so maybe a few years earlier.
               | But then it took NT years to become stable enough to be
               | used in servers, so saying they were generally ahead
               | would not be a correct description.
               | 
               | So if Unix is considered late (according to the GGP) and
               | NT not a comparable competitor, who was really early? If
               | anybody.
        
               | QuesnayJr wrote:
               | Didn't OS/2 have threads?
        
               | icedchai wrote:
               | BeOS was heavily threaded.
        
               | pjmlp wrote:
               | Xerox PARC workstations, Solo OS, Topaz are some early
               | examples.
        
       | pavon wrote:
       | Title should have (2010)
        
       | bediger4000 wrote:
       | Lots of difference between how unfixable design problems are
       | treated in open source vs closed source operating systems. Is
       | this difference good or bad?
        
         | [deleted]
        
         | surajrmal wrote:
         | I would argue being open source vs closed source doesn't
         | matter. The governance model does as do the priorities of the
         | project. This isn't to say you need a BDFL or single company
         | running the project to address "unfixable" problems, but they
         | certainly do seem to help.
         | 
         | On a more meta note, open source means a lot of different
         | things. There is actually a lot of nuance in the different
         | styles of open source. Linux vs chromium vs that project that
         | just does source dumps. Whether they accept contributions,
         | accept bug reports/feature requests, allow you to build from
         | source (source dumps often don't include a working build
         | system), have open communication channels, etc can all vary. I
         | hope we have more specific terms for the different styles of
         | open source in the future.
        
           | bombcar wrote:
           | I agree - though open source does allow for a major fork if
           | the users don't agree with the developers (or project
           | leadership) on how it should be fixed.
           | 
           | Project leadership is the most significant aspect - look at
           | Linus's absolute declaration that the kernel can "never break
           | userspace" meaning that once an API is exported to userspace
           | it never gets removed.
           | 
           | This is actually similar to Microsoft's philosophy though
           | theirs is more "business oriented" (nobody will buy Win95 if
           | their DOS and Win 3.1 programs won't work). Another example
           | of this is Knuth's TeX code.
           | 
           | Open source developments seem to lean (in general) more
           | toward "rip it out and replace everything" (see for example
           | internal kernel APIs not exported to userspace) because
           | access to the source means they can fix the things that touch
           | it. Closed source programs more likely just die and get
           | entirely replaced, otherwise they roughly try to keep working
           | as is.
        
           | cbsmith wrote:
           | Open source isn't a governance model.
        
       | zxzax wrote:
       | Something else that doesn't often get brought up here is that
       | kill(2) itself is an unfixable race condition waiting to happen.
       | It's only safe to use that to signal direct child processes. In
       | Linux, programs should be using the newer pidfd_send_signal
       | syscall in almost every case where they would otherwise use
       | kill(2).
       | 
       | Edit: waitpid is also similarly broken and unfixable for a lot of
       | the same reasons as signals, pidfds and waitid(P_PIDFD, ...)
       | should be replacing most uses of that as well.
        
         | colonwqbang wrote:
         | Could you explain what the problem is, or provide a link?
        
           | [deleted]
        
           | taviso wrote:
           | The primary problem is that a process could exit, and the pid
           | might be recycled, so you kill() the wrong process.
        
             | MaxBarraclough wrote:
             | The classic ABA problem, then.
             | 
             | https://en.wikipedia.org/wiki/ABA_problem
        
               | orthonormal wrote:
               | Had no idea about this! Thank you. Now I'm starting to
               | understand undefined behavior safety is such a walled
               | garden. All sorts of snakes might be lurking beneath.
        
             | asveikau wrote:
             | This is a (rare?) instance where I would say Win32 gives
             | you some remedy over POSIX. On Windows, you can open a
             | handle and deal with that, rather than a pid. Once the
             | handle is open, it isn't subject to this recycling problem.
             | 
             | However if opening a handle based on pid you may want to
             | double-check that the handle matches your expectation
             | before using it, since that would be prone to the same
             | race.
        
               | simcop2387 wrote:
               | This is actually what pidfd does on linux, you get a
               | handle to the process that lets you interact with it in
               | that same manner. Once the process exits the handle gets
               | closed and all the operations will report an error even
               | if you have a recycled pid
        
             | hnarn wrote:
             | This might be naive of me, but isn't this in a way fixed by
             | systemd service handling? Assuming the process in question
             | is, in fact, handled as a service of course.
        
               | zxzax wrote:
               | The issue is solved in any service manager as long as the
               | service doesn't fork, when you are the parent process you
               | can ensure that you don't reap the child before sending a
               | signal.
               | 
               | Once the service forks then it becomes a problem. If you
               | use cgroups it can be solved separately with the cgroup
               | freezer, but there are still some open issues with this
               | in systemd:
               | https://github.com/systemd/systemd/issues/13101
        
               | Quekid5 wrote:
               | Correct.
        
             | Jasper_ wrote:
             | I've always felt like it should be the case that as long as
             | a pidfd for that process is open, the pid doesn't get
             | recycled, so you could open the pidfd and then use kill
             | safely, then close it later. Means you wouldn't need a
             | whole bunch of new syscalls.
             | 
             | Unfortunately, it seems like this idea was rejected during
             | the introduction of pidfd.
        
               | zxzax wrote:
               | I think the idea with that was it would lead to denial-
               | of-service type situations where some process could leak
               | a bunch of pidfds and then that would cause exhaustion of
               | pids everywhere else.
        
               | Jasper_ wrote:
               | A process could also do that by spawning a bunch of
               | processes if it wanted to.
        
               | zxzax wrote:
               | Not really, most systems should set RLIMIT_NPROC to
               | prevent that. If pidfds held onto the pid, it would
               | create a new denial-of-service that allowed random other
               | processes to keep zombie processes open, and the fix for
               | it would actually allow you to circumvent that limit!
        
               | Jasper_ wrote:
               | You can also set RLIMIT_NOFILE to prevent the number of
               | FDs the app can open if you're worried about it.
        
               | zxzax wrote:
               | I don't think that would necessary solve it, since the
               | maximum number you can have open is still RLIMIT_NPROC *
               | RLIMIT_NOFILE, right? It seems it would still be a
               | problem as long as it's greater than RLIMIT_NPROC. Edit:
               | I suppose you could fix it as long as you could guarantee
               | that NPROC * NOFILE * maxlogins < kernel.pid_max... but
               | to me this is piling on more workarounds.
        
           | zxzax wrote:
           | The problem was discussed quite a lot around the introduction
           | of pidfds:
           | 
           | https://lwn.net/Articles/773459/
           | 
           | https://lwn.net/Articles/784831/
           | 
           | In essence it's a classic TOCTTOU.
        
       | SavantIdiot wrote:
       | Regarding file permissions: This kind of archeaology (or
       | forensics) is very important. Why? Because it exposes the trial-
       | and-error over a multi-decade evolution. Sometimes trial-and-
       | error is used as a pejorative (brute force hacking), but over the
       | course of decades as technology advances, it is inevitable. The
       | author is clear to point out that much of the issue here was due
       | to scalability, but i think there is something else at work:
       | unknown unknowns. It is impossible to be a 100% defensive
       | software architecture team, and "room to grow" is usually
       | jettisoned because it can lead to sloppy code, or worse, attack
       | vectors. It's such a hard problem and analyses like these papers
       | are first step in what I believe will become a full-blown
       | historic discipline of software meta-thought. I say "become"
       | because you can't really do this kind of analysis with 5, 10 or
       | 20 years of history: you need multiple decades, and that is just
       | now upon us.
       | 
       | I think there are applications for this kind of study. It can
       | very clearly feed back into current practices, and possibly even
       | more formalized language syntax that can be defensive and
       | extensible. I would also love to see if this kind of analysis
       | bears out which aspects of various languages (and architectural
       | OS decisions) proved to be the most robust. Like with hemaglobin:
       | it is one of the largest and oldest genes, it is hard to break
       | via mutatation, and is shared by every animal with oxygenated
       | blood cells. Something was done right with that design!
        
       | nooyurrsdey wrote:
       | Really enjoyed reading about the struggles of implementing file
       | permissions.
       | 
       | It seems like something that should be so simple, but once you
       | sit down and try to build it you'll realize you have to support
       | so many uses cases. I bet if you asked everyone on HN how they'd
       | do it, you'd end up with so many confident answers that also had
       | shortcomings themselves.
        
       | williesleg wrote:
       | Well kids, get off your ass and write a new operating system. We
       | invented it, you all sit on your ass and do nothing but complain.
       | With ads.
        
       | primis wrote:
       | One of the complaints brought up - the 16 bit group/uid seems to
       | have been fixed quite nicely in modern linux systems by adding an
       | additional 16 bit s to each. It seems these problems aren't
       | "unfixable" after all
        
         | mmcgaha wrote:
         | The idea that having ownership and permission bits at the file
         | level being a problem fixed by moving the permissions to the
         | directory level completely hand waves over the fact that hard
         | links exist in unix file systems. They need to think a little
         | harder about that Mencken quote.
        
           | wmanley wrote:
           | See the next article in the series "Ghosts of Unix past, part
           | 4: High-maintenance designs"[1]. This specifically addresses
           | how the existence of hard links is elegant in itself, but
           | exports complexity to other parts of the system.
           | 
           | [1]: https://lwn.net/Articles/416494/
        
         | Macha wrote:
         | Proper ACLs exist on Linux these days as an alternative to
         | user/group permissioning as well for use cases which call for a
         | more powerful system.
        
           | bombcar wrote:
           | The "systematic" part of things is relatively easy to handle
           | (as the code can be made to handle anything complex) - it's
           | the "user" interface that is harder. A user with root access
           | wants to give access to a given file/directory to a user -
           | this needs to be made easy to do successfully and securely.
           | Too many times I've seen entire web directories 777 because
           | they just wanted it to work.
           | 
           | Commands providing "why user X can't access Y" and
           | recommended solutions can help.
        
             | samf wrote:
             | Yes, this is a major problem when you introduce ACLs into
             | unix-like systems. A comment in the article mentions the
             | "Richacl" work. A key problem with this work was that even
             | "chmod 777" might not get you out of a situation where an
             | ACL was denying access. It's been over ten years since I've
             | been involved in this; it might have changed.
             | 
             | The POSIX draft ACLs had the same problem, where a chmod
             | might not grant you the permission that you're asking for.
             | Back when Solaris implemented POSIX draft ACLs, they needed
             | to change many user-level interfaces (e.g., the chmod
             | command and the ftp daemon) to have a chmod request work
             | the way end users expected.
        
           | devchix wrote:
           | Have you worked with setfacl(1), getfacl(1) recently? The
           | agony they inflict makes me want to die. Do you need log dirs
           | read by a non-root logreader? Are there nested subdirs? What
           | are the defaults? Extra crispy boss-mode: is SELinux on? I
           | think the extended ACLs have taken us further into the weeds,
           | and I think the permission architecture needs to be rethink
           | entirely. It was designed for shared university-type
           | computing resources at a time when 30 profs and researchers
           | shared dirs and commingle a set of files, and daemons are
           | users with own places to keep things. No longer. The RBAC and
           | inheritance model, I dunno, they may work correctly but they
           | are so fiddly with so many knobs and intersections that you
           | end up front-loading a huge amount of work; nobody wants to
           | do that, nor have I seen it done correctly, with design and
           | intent.
        
             | GauntletWizard wrote:
             | I'm actually fully behind the POSIX permissions model as a
             | solution for this: If you have a group that all needs
             | read/write, no big deal. If you have a group that needs to
             | write and the world reads, no big deal. If you have a group
             | that writes and another group that reads: No big deal, so
             | long as you have a third group that's the union of both
             | groups and can have a multi-level subdirectory (where a/
             | has 750 and a/b/ has 775). If you have groups that need to
             | read and groups that need to write in a more complicated
             | (or somehow path-specific) problem, you probably need a
             | daemon or setuid program to moderate access, and that's
             | okay.
             | 
             | Happy to argue it or simply be told I'm wrong, but I've yet
             | to encounter a not-insane permissions model that I couldn't
             | solve with some "simple" nested groups (that in and of
             | itself is a tooling problem, but a solvable one) and POSIX.
        
           | trasz wrote:
           | Linux is probably the last major system not supporting NFSv4
           | ACLs. Windows (obviously), MacOS X, Solaris, FreeBSD - all
           | those support them - for at least a decade now.
        
       ___________________________________________________________________
       (page generated 2021-05-17 23:00 UTC)